Using a structured design approach to reduce risks in end user spreadsheet development

Using a structured design approach to reduce risks in end user spreadsheet development

Information & Management 37 (2000) 1±12 Research Using a structured design approach to reduce risks in end user spreadsheet development Diane Janvri...

229KB Sizes 0 Downloads 23 Views

Information & Management 37 (2000) 1±12

Research

Using a structured design approach to reduce risks in end user spreadsheet development Diane Janvrina,*, Joline Morrisonb b

a Department of Accounting, University of Iowa, Iowa City, IA 52242, USA Department of MIS, University of Wisconsin, Eau Claire Eau Claire, WI 54702, USA

Received 31 July 1998; accepted 24 May 1999

Abstract Computations performed using end-user developed spreadsheets have resulted in serious errors and represent a major control risk to organizations. Literature suggests that factors contributing to spreadsheet errors include developer inexperience, poor design approaches, application types, problem complexity, time pressure, and presence or absence of review procedures. We explore the impact of using a structured design approach for spreadsheet development. We used two ®eld experiments and found that subjects using the design approach showed a signi®cant reduction in the number of `linking errors,' i.e., mistakes in creating links between values that must connect one area of the spreadsheet to another or from one worksheet to another in a common workbook. Our results provide evidence that design approaches that explicitly identify potential error factors may improve end-user application reliability. We also observed that factors such as gender, application expertise, and workgroup con®guration also in¯uenced spreadsheet error rates. # 2000 Elsevier Science B.V. All rights reserved. Keywords: Risks in end user development; Spreadsheet errors; Structured design

1. Introduction During the past decade, the use of end-user developed applications, especially spreadsheets, has grown exponentially. However, when identifying and evaluating risks within organizations, end-user developed computer applications are often overlooked. Studies have revealed that as many as 44% of all end-user spreadsheets contain at least one error [5]. Such errors can have catastrophic results. For example, a Florida based construction company lost US$ 254,000 by relying on an incorrect spreadsheet analysis for project

costing [39]. A large mutual fund incorrectly reported a US$ 0.1 billion loss as a US$ 2.3 billion gain by placing the wrong sign on a US$ 1.2 billion spreadsheet input entry [38]. Previous effort suggests that errors in end-user developed applications may be in¯uenced by several factors [28,30]. Here we develop a framework to examine end user risks and speci®cally to investigate the impact of one risk mitigating factor, a structured design approach, on spreadsheet error rates and development times. 2. Background and model development

*

Corresponding author. E-mail addresses: [email protected] (D. Janvrin), [email protected] (J. Morrison)

Cotterman and Kumar [9] classify end-user computing risks into to three major parts: end-user devel-

0378-7206/00/$ ± see front matter # 2000 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 8 - 7 2 0 6 ( 9 9 ) 0 0 0 2 9 - 4

2

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

opment, end-user operation, and end-user control of information systems. We focus on the domain of enduser development. To our knowledge, no work has speci®cally identi®ed and then empirically validated factors impacting outcomes in end-user developed applications, such as spreadsheets. Literature suggests that potential factors include developer attributes, development approach, developer con®guration, problem/process characteristics and application domain. 2.1. Developer attributes Our work focuses on end-users who are both producers and consumers of information. Several studies have indicated that end-users are diverse, and that this diversity leads to a need for differentiated education, training, support, software tools and risk assessment [3,12,14,19,34]. Harrison and Rainer [17] assert that speci®c enduser attributes include age, gender, computer experience, computer con®dence, math anxiety, and cognitive style. Curtis et. al. ®eld studies on software development [10] repeatedly con®rm the importance for software developers to understand their application domain. Adelson and Soloway [1] emphasize the importance of expertise in software design. Frank, Shamir and Briggs [15] ®nd that PC user knowledge ± i.e., application expertise ± positively impacted security-related behavior. Further, several researchers indicate that user training and relevant experience are important factors e.g., [2,37]. 2.2. Development approach Design activities for end-user developed applications tend to be unstructured and often non-existent. Salchenberger notes that end-users tend to develop and implement models without performing requirements analysis or developing a system design. Brown and Gould observed that their participants spent little time planning before launching into spreadsheet coding. Ronen, Palley and Lucas [35] suggest that several attributes of spreadsheets (i.e., non-professional developers, shorter development life cycles, modi®cation ease) tend to preclude a formal spreadsheet analysis and design process. Almost all published spreadsheet design approaches agree on one principle: the importance of

keeping work areas small in order to make errors easier to ®nd, limit the damaging effects of errors, and make spreadsheets easier to understand and maintain [33]. Markus and Keil [24] note that structured design methodologies increase the accuracy and reliability of information systems. Salchenberger proposes the use of structured design methodologies in end-user development and discusses their effectiveness in different situations. Cheney, Mann and Amoroso [7] also support the use of a structured design methodology; they found that the more structured the tasks of the end-user, the more likely the success of the application. Ronen, Palley and Lucas describe a structured spreadsheet design approach based on modifying conventional data ¯ow diagram symbols to create spreadsheet ¯ow diagrams. Several practitioner-directed articles encourage the use of structured design approaches and planning for spreadsheet development. 2.3. Developer configuration Panko and Halverson [29,31] and Nardi and Miller [26] note that the accuracy of end-user developed applications might be increased by encouraging developers to work in teams. However, the former also suggest that the increased communication and coordination costs of working in groups may outweigh the bene®ts of reduced error rates. 2.4. Problem/process characteristics Several studies identify the dif®culties associated with complex versus non-complex programming problems e.g., [13,36]. Studies on the effect of time pressure on programming activities indicate that due to time pressure, end-user developers may not spend suf®cient time on problem de®nition and diagnosis e.g., [2,6]. They also note an absence of formal review and validation procedures within end-user developed applications. 2.5. Application domain While this research covers a wide range of application domains, spreadsheet development should be a central issue, because of the widespread use of enduser developed spreadsheets in the workplace. Panko

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

3

and Halverson appropriately summarize the state of spreadsheet research as follows: In the past, spreadsheeting has been the Rodney Dangerfield [e.g., a not well respected subject] of computer applications. Although everyone agrees that it is extremely important, few researchers have studied it in any depth. Given its potential for harm as well as for good, this situation needs to change. Psychologists Hendry and Green [20] assert that spreadsheets may simplify programming by suppressing the complexities associated with conceptually simple operations like adding a list of numbers or handling input and output. As a result, users tend to ignore their weaknesses, such as their potential for making errors or their dif®culties in debugging or reusing spreadsheets.

Eureka error is identi®ed, everyone can quickly agree that it exists. However, non-Eureka errors are those where the mistake cannot be easily demonstrated. Generally, researchers evaluate the usability of enduser developed applications with satisfaction survey tools [14,22]. Salchenberger de®nes maintainability as the ability to change or update the system as needs change, and auditability as the ability to determine the source of data values in the model. Few researchers have evaluated end-user developed applications for either maintainability or auditability [32]. We also propose the inclusion of cost (usually measured in terms of development time) as another important outcome.

2.6. Outcomes

Based on our review, we developed an initial research framework (Fig. 1) that summarizes risk factors and their relationships to potential outcomes. Our focus is to determine the impacts of development approach on spreadsheet reliability and cost. Modern spreadsheet applications enable users to modularize large spreadsheets by creating `work-

Salchenberger suggests that factors to use in judging the quality of end-user developed applications include reliability, ease of use, maintainability, and auditability. Reliability can be de®ned as the absence of logical and mathematical errors. One of the earliest studies on spreadsheet outcomes (conducted by Brown and Gould) ®nd that of 27 spreadsheets created by experienced spreadsheet users, 44% contained user-generated errors. In another study, Davies and Ikin [11] audited a sample of 19 spreadsheets in an organization and discovered four worksheets contained major errors and 13 had inadequate documentation. Panko and Halverson [29] ask students to develop a spreadsheet model working individually, in groups of two, and in groups of four, and observed total error rates ranging from 53±80%. They also note that very little time (only a few minutes) was spent in design activities prior to implementation. They also note that their participants made several different types of errors. At the highest level, errors could be divided into two basic types: oversight and conceptual. They de®ned oversight errors as `dumb' ones such as typographical errors and misreading numbers on the problem statement. In contrast, conceptual errors involve domain knowledge or modeling logic mistakes. Panko and Halverson use Lorge and Solomon's [23] terminology to further divide conceptual errors into Eureka and non-Eureka errors. Once a

3. Structured spreadsheet design

Fig. 1. Initial research framework.

4

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

Fig. 2. Basic modeling symbols.

sheets' (individual spreadsheet work areas) within a larger `workbook' (set of related worksheets). Users can link values from one worksheet to another within the same workbook, or link sheets among multiple workbooks. This technique readily supports scenario analysis. For example, all of the schedules leading to a master budget can be prepared on separate worksheets within a workbook, and then linked together so that the impact of a change in one parameter can be quickly evaluated. However, this feature also leads to the likelihood of `linking errors.' Data on a source worksheet may be linked to the wrong cell on the target worksheet or a value on a target worksheet that should have been linked from a previous worksheet is `hardcoded:' an actual value from a previous scenario is typed in. In such cases, when the value should change for subsequent scenarios, they do not and errors result. Also, values that are unknowingly linked to subsequent sheets might be deleted, thus destroying the integrity of the subsequent worksheet. To support the task of designing linked worksheets, we build upon the work of Ronen, Palley and Lucas to develop a design methodology based on data ¯ow diagram symbols. Data ¯ow diagrams (DFDs) are documentation tools used by systems analysts to show organizational data sources, destinations, processes, and data inputs

and outputs. Our basic modeling symbols (Fig. 2) are similar to typical symbols in DFD. We assumed that explicitly identifying links in the design stage should reduce linking errors made during the spreadsheet coding phase. Fig. 3 shows an example design model. Worksheet pages are created to represent each of the modules shown on the design model (e.g., Input Page, Sales Budget, Production Budget, Pro Forma Income Statement). Inputs from external sources are entered on a centralized Input Page, and then ¯ow into the Sales Budget, where they are transformed into data items needed for calculations on the other schedules. Using named variables, the speci®ed links are created. Inputs are easily changed on the Input Page for scenario analysis. 4. Evaluation studies Based on the research framework, we designed a ®eld experiment to explore the impacts of the structured design approach on spreadsheet errors. The reduced research model guiding the study is shown in Fig. 4. Harrison and Rainer note a positive relationship between developer con®dence and improved program outcomes. Other researchers identify a positive relationship between domain and application expertise and improved outcomes. Therefore, we hypothesized that both developer attributes and development approach would in¯uence error rate and development time. Structured systems design approaches e.g. [8,16] have successfully been used in systems development

Fig. 3. Example design model: master budget for small home fixture manufacturer.

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

Fig. 4. Reduced research model for experimental studies.

for over 25 years. They also help in making reusable components. This approach can be applied to spreadsheet design by identifying different worksheets and their corresponding data in¯ows and out¯ows. Thus, each individual worksheet effectively becomes a structured module that takes a de®ned set of inputs and translates it into a prescribed set of outputs. Since the worksheet links are explicitly de®ned in the design, we hypothesized that the result should be a decreased error rate for linking errors. 4.1. Study #1 4.1.1. Subjects, treatment, and measures Sixty-one upper- and masters-level accounting and business administration majors in an accounting information systems course at a large public university were randomly assigned to two groups: treatment (structured design) and control (ad hoc design). All subjects were given 1 hour of formal training in spreadsheet development and told to develop a single spreadsheet workbook containing a series of linked worksheets using Microsoft Excel. The assigned problem contained 10 separate worksheets with a total of 51 linked data values among the worksheets. A paper template of the worksheets with one complete data set (from Moriarity and Allen [25]) was given to all subjects to provide check ®gures and mitigate nonlinking errors. (Only numerical values were speci®ed, and the subjects still had to develop and correctly implement the underlying design in order to successfully perform scenario analysis.) All subjects were required to maintain logs recording time spent on design and development activities. The treatment

5

group was required to submit their spreadsheet design prior to being assigned a password to provide computer data entry. A pre-test questionnaire was administered to identify gender, age, and class level, as well as selfreported pre-test application expertise (in terms of both spreadsheets in general as well as the speci®c spreadsheet application, Microsoft Excel), DFD development expertise, and design con®dence. Development time was determined by combining two measures: (1) self-reported design logs covering two- or three-day periods during the experiment; and (2) system-recorded measures of actual implementation times. The subjects were given 16 days to complete the assignment. At the end of that period, questionnaires were administered to ascertain post-test spreadsheet and Excel expertise and design and coding con®dence. Error rates were determined by examining the spreadsheet data ®les. 4.2. Analysis and results Due to an unexpected shortage of software and hardware to complete the assignment, we were forced to allow subjects in Study #1 to work either individually or in self-selected pairs within treatment groups. In Study #2 all subjects worked alone. Means and standard deviations for the independent and dependent variables for all subjects as well as results sorted by con®guration and treatment group are shown in Table 1. Analysis was performed on a per-subject basis, with scores for time and error rate replicated for each subject in a given dyad. A selection internal validity threat exists, since individuals were assigned to the treatment group based on their class section. Pre-test questionnaire data was used to determine the impact of the selection problem. The treatment group was signi®cantly older than the control group (p 0.03), but no signi®cant differences were noted for pre-test spreadsheet expertise, Excel expertise, or DFD expertise. We used least squares multiple regression to test the research model components and proposed linkages. Examination of the correlations among the independent variables revealed no evidence of multicollinearity: that is, r  0.8 [4]. Table 2 shows that gender, age group, class level, and pre-test Excel expertise had no impact on any of the outcomes, while con®guration

6

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

Table 1 Means and standard deviations (Study #1) n

By configuration

By treatment

Overall

Singles

Pairs

Ad hoc design

Struct. design

61

17

44

31

30

Variable

Mean

SD

Mean

SD

Mean

SD

Mean

SD

Mean

SD

Gender (1 Male, 2 Female) Age group (1 20±25, 2 26±30, 3 31±40, 4 40±50) Class level (1 Jr., 2 Sr., 3 Grad) Pre-test SS Expertisea Pre-test Excel expertisea Pre-test DFD expertisea Pre-test design confidenceb Post-test SS expertisea Post-test Excel expertisea Post-test design confidenceb Post-test coding confidenceb Mean number of linking errors/workbookc Error% Time (h)

1.44 1.38 1.90 2.49 2.34 2.16 3.93 3.20 3.12 2.55 2.48 4.16 8.15% 8.83

0.50 0.78 0.68 1.07 1.14 0.99 0.73 0.94 0.89 0.90 0.89 0.08 0.08 4.12

1.50 1.56 2.11 2.61 2.22 1.89 3.72 2.87 2.73 2.93 2.40 4.85 9.51% 8.05

0.51 0.98 0.58 1.20 1.00 0.83 0.89 0.99 0.88 1.03 0.91 0.12 0.12 2.89

1.43 1.30 1.82 2.43 2.41 2.30 4.02 3.32 3.30 2.38 2.50 3.89 7.62% 9.12

0.50 0.67 0.69 1.02 1.19 1.02 0.63 0.88 0.85 0.79 0.88 0.05 0.05 4.51

1.42 1.19 1.97 2.68 2.42 2.26 3.84 3.27 3.19 2.31 2.48 4.91 9.63% 9.98

0.50 0.54 0.60 1.05 1.23 1.00 0.78 0.92 0.85 0.84 0.96 0.09 0.09 6.66

1.47 1.57 1.83 2.30 2.27 2.07 4.03 3.12 3.04 2.80 2.48 3.37 6.61% 8.33

0.51 0.94 0.75 1.09 1.05 0.98 0.67 0.97 0.93 0.91 0.82 0.05 0.05 2.37

a

1 Novice, 5 Expert. 1 Very unconfident, 5 Very confident. c Total possible links 51. b

Table 2 Summary of multiple regressions results (Study #1) N 66 Independent variables

Gender (1 Male, 2 Female) Age group (1 20±25, 2 26±30, 3 31±40, 4 41±50) Class level (1 Jr., 2 Sr., 3 Grad.) Configuration (1 lnd., 2 Pair) Treatment group (1 Control, 2 Treatment) Pre-test SS expertise Pre-test Excel expertise Pre-test DFD experitise Pre-test design confidence R2 a b

Standardized regression coefficients Post SS expertise

Post excel expertise

Post. des. conf.

Post. coding Error% conf.

Time

ÿ0.04 ÿ0.02 ÿ0.04 0.47a 0.00 0.35b ÿ0.02 0.09 ÿ0.22 0.40

ÿ0.21 0.10 ÿ0.05 0.41 ÿ0.05 0.24a 0.27 0.14 0.18 0.41

0.00 ÿ0.26 0.20 0.64b 0.48a ÿ0.08 0.04 ÿ0.09 0.22 0.27

ÿ0.44 ÿ0.20 ÿ0.13 ÿ0.05 0.02 ÿ0.06 0.01 ÿ0.16 0.38 0.25

122.99 ÿ18.79 9.21 ÿ4.22 ÿ115.70 ÿ137.17a 8.97 38.83 15.68 0.30

0.01 0.01 0.00 ÿ0.03 ÿ0.04b ÿ0.02 0.00 0.01 0.01 0.16

p  1. p  0.05.

(singles versus pairs), treatment group (structured versus ad hoc design), and pre-test spreadsheet expertise all displayed signi®cant correlations with one or more of the dependent variables. Following Heise [18], the independent variables that did not explain a signi®cant variation in the

outcomes were excluded from further analysis. Table 3 shows that the variations in the dependent variables are explained, to some extent, by the independent variables in the reduced model. MANOVA analysis was used to detect potential interactions between the con®guration and treatment variables on error rate.

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

7

Table 3 Reduced model regression results (Study #1) N 66 Independent variables

Configuration (1 lnd., 2 Pair) Treatment (1 Control, 2 Design) Pre. SS Exp. (1 Low, 5 High) R2

Standardized regression coefficients Post SS expertise

Post excel expertise

Post. des. conf.

Post. coding conf.

Error%

Time

0.47a ÿ0.03 0.46c 0.35b

0.65b ÿ0.06 0.31b 0.27b

ÿ0.61b 0.45a ÿ0.13 0.19b

0.06 ÿ0.06 ÿ0.25b 0.10

ÿ0.02 ÿ0.03a ÿ0.01b 0.12a

59.09 ÿ233.13b ÿ76.28a 0.27a

a

p  1. p  0.5. c p  0.001. b

This revealed a weak signi®cant difference (p 10) between the control to the treatment group, as well as a weak signi®cant interaction (p 10) among the con®gurations and the treatment. 4.3. Summary and discussion Overall, subjects using the structured design technique exhibited signi®cantly lower error rates: of the 51 possible links in the spreadsheet, the structured design subjects had an average of 3.57 (7%) linking errors, while the ad hoc design subjects averaged 4.1 (10%) linking errors. We believe this was because the design methodology forced the developers to identify the links between worksheets prior to implementation. This explicit identi®cation was then carried over into the implementation. Virtually all of the linking errors were from omitted links (where a value had been `hard coded' instead of linked) rather than incorrect linkage. Possibly subjects were relying heavily on the check ®gures provided for the ®rst scenario instead of examining the reasonableness of their answers for the subsequent cases. Improvement in linking errors was mitigated by subject con®guration: subjects working alone within the ad hoc design group averaged 7.44 linking errors per spreadsheet (14%), while subjects working alone in the design group and both sets of paired subjects all averaged approximately 3.8 linking errors per spreadsheet (7%). For subjects working alone, the MANOVA analysis results imply that the use of the design methodology had a signi®cant impact. Unfortunately, the advantage of using the structured design methodology was wea-

kened when the subjects were working in pairs. Two potential explanations occurred to us: (1) subjects working in pairs gained other advantages that offset the impact of the design approach; or (2) another external factor induced some subjects to elect to work in pairs and it allowed them to create spreadsheets with fewer linking errors. Our results also indicated that subjects working in pairs reported higher post-test spreadsheet expertise but lower design con®dence. The latter result seemed inconsistent with the prior literature. The signi®cantly lower overall time investment for the structured design subjects was unexpected but encouraging. (This result does not consider the time spent initially learning the design methodology, however). Finally, as expected, pre-test spreadsheet expertise was a signi®cant predictor of post-test spreadsheet expertise, reduced error rates, and reduced time investments. The results of this study suggested that developer attributes, con®gurations, and the development approaches are all potential factors impacting end-user development outcomes. However, the unexpected results for time and post-test con®dence as well as the unexpected con®guration variable induced us to perform a second study with revised procedures and measures. 4.4. Study #2 To con®rm the results of Study #1 and remove the possible confound involved in allowing subjects to work in self-selected pairs, a second study was performed. The problem used was slightly more complex than the one in Study #1, with 10 different worksheets and a total of 66 linked cells. All subjects were

8

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

required to work individually so as to increase sample size and allow a more isolated investigation of the impacts of the design methodology and developer attributes. Error types were expanded to include conceptual and oversight errors as well as linking errors. A blank template (from [21]) with only a single check ®gure was given. Explicit post-test measures were introduced to assess design con®dence, coding con®dence, and con®dence that the spreadsheet was errorfree. Additionally, a measure was included to explore the impact of potential domain expertise: subjects who were accounting majors had additional experience and coursework in preparing ®nancial reports and other master budgeting topics than the non-accounting majors. Subjects in the treatment group were also asked to rate their post-treatment DFD development con®dence, and to rate their satisfaction with the development approach. Speci®cally, they were asked

the degree to which they agreed or disagreed with statements regarding: (1) whether the DFD design methodology helped them develop a better spreadsheet; (2) if using the methodology made the development effort take longer; and (3) whether they would use the methodology the next time they developed a complex spreadsheet. 4.5. Analysis and results Table 4 shows overall means and standard deviations for all measures. No signi®cant differences in pre-test questionnaire information were noted between treatment and control groups, thus reducing the selection internal validity concern. Correlation analysis indicated a high degree of correlation between pre-test Excel expertise and overall spreadsheet expertise. Since the Excel expertise measure had

Table 4 Means and standard deviations (Study #2) Variable

Gender (1 Male, 2 Female) Age group (1 20±25, 2 26±30, 3 31±40, 4 40±50) Class level (1 Jr., 2 Sr., 3 Grad) Domain expertise (1 Other, 2 Accounting) Treatment group (1 Control, 2 Treat.) Pre-test SS expertisea Pre-test Excel expertisea Pre-test DFD expertisea Pre-test design confidenceb Post-test SS expertisea Post-test Excel expertisea Post-test design confidenceb Post-test coding confidenceb Post-test error-free confidenceb Mean number of linking errors/workbook Linking error% Mean number of conceptual errors/workbook Mean number of oversight errors/workbook Mean number of total errors/workbook Time (h) Post-test DFD confidenceb Post-test `DFD helped'c Post-test `DFD took longer'c Post-test `Will use DFD again'c a

1 Novice, 5 Expert. 1 Very unconfident, 5 Very confident. c 1 Strongly disagree; 5 Strongly agree. b

Overall

Ad hoc design (n 30)

Struct. design n 58

Mean

SD

Mean

SD

Mean

SD

1.38 1.18 1.33 1.93 1.66 2.86 3.02 2.35 3.11 3.56 3.74 3.25 3.08 3.29 7.63 11.05% 0.38 1.10 9.07 19.00

0.49 0.54 0.54 0.25 0.48 0.75 0.89 0.79 0.86 0.78 0.72 1.09 1.13 1.09 9.52 0.14 0.59 0.84 9.66 6.62

1.37 1.19 1.33 1.93 1 2.91 3.00 2.41 3.04 3.67 3.89 3.11 2.69 3.44 11.56 16.75% 0.37 1.15 12.96 18.13

0.49 0.56 0.62 0.27 0 0.48 0.68 0.69 0.75 0.62 0.51 0.93 0.93 1.09 11.76 0.17 0.56 0.66 11.97 7.35

1.38 1.19 1.31 1.90 2 2.79 2.99 2.36 3.14 3.53 3.68 3.36 3.28 3.26 5.78 8.37% 0.38 1.10 7.26 19.38 2.98 2.84 2.83 2.88

0.49 0.54 0.50 0.31 0 0.83 0.98 0.83 0.93 0.86 0.81 1.15 1.18 1.09 7.74 0.11 0.62 0.91 7.87 6.36 0.91 0.99 0.99 1.04

Independent variables

Gender (1 M, 2 F) Age group Class level (1 Jr., 2 Sr., 3 Grad.) Domain experience (1 -Other, 2 Acctg.) Treatment group (1 Control, 2 Treat) Pre. Excel expertise Pre. DFD expertise Pre. design confidence R2 a

p  0.05. p  0.1. c p  0.001. b

Standardized regression coefficients Post SS expertise

Post. excel expertise

Post. des. conf.

Post. coding conf.

Post error free conf.

# Missing links

# Conc. errors

# Oversignt errors

# Total errors

Total Time

ÿ0.33a 0.21 ÿ0.13 0.57a ÿ0.12 0.30b 0.14 0.08

ÿ0.21 0.21 ÿ0.07 0.38 ÿ0.19 0.29b 0.16 0.06

ÿ0.30 0.42 ÿ0.34 1.02b 0.29 0.30a 0.27a 0.00

ÿ0.17 0.32 ÿ0.31 0.24 0.58a 0.43b 0.16 ÿ0.09

ÿ0.53a 0.04 ÿ0.49a 0.68 ÿ0.15 0.06 0.04 ÿ0.02

1.04 0.01 1.60 2.72 ÿ5.75b ÿ3.18 ÿ0.34 1.35

0.14 0.36 ÿ0.18 0.17 0.06 ÿ0.07b 0.47 0.07

0.42a ÿ0.07 0.24 0.08 0.00 ÿ0.15 0.02 0.03

1.32 0.15 1.89 2.52 ÿ5.63b ÿ3.22a ÿ0.23 1.41

ÿ0.87 0.82 ÿ2.60 ÿ3.66 1.03 ÿ1.11 ÿ0.64 ÿ0.28

0.32c

0.27b

0.31c

0.22a

0.20a

0.17a

0.21a

0.17a

0.09

0.11

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

Table 5 Regression analysis results (Study #2) N 88

9

10

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

Table 6 Regression analysis results for treatment attitudinal measures (Study #2) N 58 Independent variables

Standardized regression coefficients Post-test-treatment group only

Gender (1 M, 2 F) Age group Class levele Domain experiencef Pre. SS/Excel expertise Pre. DFD expertise Pre. design confidence R2

DFD conf.

DFD helpedd

DFD took longerd

Will use againd

ÿ0.58b ÿ0.04 ÿ0.16 0.64 ÿ0.19 0.41b 0.26 0.41c

ÿ0.35 ÿ0.37 0.45 ÿ0.24 ÿ0.11 0.00 0.41a 0.17

ÿ0.06 ÿ0.45 ÿ0.09 ÿ0.33 ÿ0.46b ÿ0.02 0.26 0.15

ÿ0.41 0.18 0.17 ÿ0.61 ÿ0.13 0.05 0.35a 0.19

a

p< 0.05. p< 0.01. c p< 0.001. d 1 Strongly disagree; 5 Strongly agree. e 1 Jr., 2 Sr., 3 Grad. f 1 Other, 2 Acctg. b

the highest correlation with the dependent variables, it was included in the regression analysis, and the pretest spreadsheet expertise measure was omitted. Table 5 shows the initial regression analysis for all variables. Signi®cant results were attained for gender, pre-test Excel expertise, DFD expertise, and design con®dence. A reduced model analysis performed by omitting the non-signi®cant independent variables yielded essentially the same results. Table 6 shows the regression analysis results for the design methodology attitudinal measures obtained from the treatment group subjects. 4.6. Summary and discussion Apparently our independent variables accounted for signi®cant differences in all of the dependent measures except number of conceptual errors and overall time spent on design and coding. An interesting gender difference surfaced: females reported signi®cantly (p  0.05) less perceived post-test spreadsheet expertise and con®dence that their spreadsheets were error-free. This concurs with Harrison and Rainer's ®nding that males tend to have more experience with computers and higher con®dence levels. Interestingly, the females also had signi®cantly (p  0.05) fewer oversight errors. We postulated that this may have been a result of their lower con®dence levels: perhaps

the females reviewed their spreadsheets more and caught their oversight errors. Subjects in higher class levels (i.e., seniors and graduate students) also displayed less con®dence that their spreadsheets were error free. There were no notable correlations between class level and any of the other independent variables; in fact, subjects in the higher levels seemed, in general, to have less pre-test application expertise. We attributed this result to the inexperience and overcon®dence of youth. Domain experience was a positive predictor of posttest spreadsheet expertise (p  0.05) and post-test design con®dence (p  0.01). Neither of these ®ndings were particularly surprising. Domain experience was somewhat correlated to pre-test spreadsheet expertise (r 0.19) as well, so apparently the accounting majors had more prior expertise with spreadsheets than the non-accounting majors. The problem was fairly complex, with ten different worksheets and a total of 66 links, so it is reasonable to expect that domain experience would contribute to design con®dence. Using the structured design methodology, the number of link errors reduced signi®cantly (p  0.01), but had no impact on the number of oversight or conceptual errors. Interestingly, link errors were by far the most common. Each spreadsheet (for the combined control and treatment groups) had an average of 9.07 total errors; of these, 84% were linking errors. It is

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

important to note that using the design methodology had no signi®cant impact on total time spent on the project (aside from the time taken to learn the methodology). Pre-test Excel expertise was the most signi®cant predictor, positively impacting post-test spreadsheet expertise (p  .01), post-test Excel expertise (p < 0.01), post-test design con®dence (p  0.05), and post-test coding con®dence (p  0.01). Subjects with higher pre-test Excel expertise also made fewer conceptual errors (p  0.01) and fewer overall errors (p < 0.01). These ®ndings concur with the results of many studies emphasizing the importance of application experience and resulting expertise e.g., [1,27]. The attitudinal responses indicate that females were signi®cantly less con®dent of their post-test DFD development ability. Pre-test design con®dence was positively correlated to the belief that creating the DFD was helpful and would be used in the future. Subjects with a higher degree of pre-test spreadsheet expertise did not think that developing the DFD caused the development effort to take longer. Overall, from an attitudinal standpoint, the subjects were ambivalent of the costs and bene®ts of the structured design approach. 5. Conclusions, limitations, and future research Our experimental studies con®rm that developer attributes, speci®cally gender and domain and application expertise, signi®cantly impacted error rates. Additionally, we show that using the structured design methodology signi®cantly improved reliability for two different linked spreadsheet applications. Study #1 also inadvertently suggests that developer con®guration is an important factor and has potential interactions with the development approach. Our initial study was confounded, because the pairs were self-selected. Our second study removed this possible bias. Additionally, our research uses absolute error rates to proxy for the theoretical construct of reliability. It is not clear how error rates in spreadsheets are correlated with poor decisions resulting from spreadsheet models. Thus, a construct validity concern exists. Also, the potential interaction of problem complexity on the structured development approach usage needs to be explored further.

11

Our theoretical framework identi®es several outcomes associated with the use of end-user developed spreadsheets. This research concentrates on outcome reliability. The positive impact of development approach cannot be generalized to usability, maintainability, and/or audibility outcomes without more research. Finally, the structured design approach used explicitly identi®ed links during the design stage. Other methodologies that explicitly specify potential error factors prior to implementation may also improve end-user application reliability. As end-user development increases in organizations, the identi®cation of strategies to minimize risks and improve outcomes is becoming increasingly important. This research has made a contribution both by identifying end-user development risk factors and outcomes, and by developing and empirically evaluating the impacts of a structured design approach for spreadsheet development. References [1] B. Adelson, E. Soloway, The role of domain experience in software design, IEEE Transactions on Software Engineering 11(11) (1985), pp. 1351±1360. [2] M. Alavi, I. Weiss, Managing the risks associated with enduser computing, Journal of Management Information Systems 2(3) (1986), pp. 413±426. [3] D. Batra, A framework for studying human error behavior in conceptual database modeling, Information and Management 25(3) (1993), pp. 121±131. [4] R.S. Billings, S.P. Wroten, Use of path analysis industrial/ organizational psychology: Criticisms and suggestions, Journal of Applied Psychology 63 (1978), pp. 677±688. [5] P.S. Brown, J.D. Gould, An experimental study of people creating spreadsheets, ACM Transactions on Office Information Systems 5 (1987), pp. 258±272. [6] E. Cale, Quality issues for end-user developed software, Journal of Systems Management 45 (1994), pp. 36±39. [7] P. Cheney, R. Mann, D. Amoroso, Organizational factors affecting the success of end-user computing, Journal of Management Information Systems 3(1) (1986), pp. 65±80. [8] L. Constantine, E. Yourdon, Structured Design, Prentice-Hall, Englewood Cliffs, NJ, 1979. [9] W. Cotterman, K. Kumar, User cube: a taxonomy of end users, Communications of the ACM 32(11) (1989), pp. 1313± 1320. [10] B. Curtis, H. Krasner, N. Iscoe, A field study of the software design process for large systems, Communications of the ACM 31(11) (1988), pp. 1268±1287. [11] N. Davies, C. Ikin, Auditing spreadsheets, Australian Accountant, 1987, pp. 54±56.

12

D. Janvrin, J. Morrison / Information & Management 37 (2000) 1±12

[12] F. Davis, Perceived usefulness, and user acceptance of information technology, MIS Quarterly 13(3) (1989), pp. 318±339. [13] F. DeRemer, H. Kron, Programming-in-the-large versus programming-in-the-small, IEEE Transactions on Software Engineering 2(1) (1976), pp. 80±86. [14] W. Doll, W. Torkzadeh, A discrepancy model of end-user computing involvement, Management Science 35(10) (1989), pp. 1151±1171. [15] J. Frank, B. Shamir, W. Briggs, Security-related behavior of PC users in organizations, Information and Management. 21(4) (1991), pp. 127±135. [16] C. Gane, T. Sarson, Structured Systems Analysis, PrenticeHall, Englewood Cliffs, NJ, 1979. [17] A. Harrison, R.K. Rainer, The influence of individual differences on skill in end-user computing, Journal of Management Information Systems 9(1) (1992), pp. 93±111. [18] D.R. Heise, Problems in path analysis and causal inference, in: E.F. Borgatta (Ed.), Sociological Methodology, JosseyBass, San Francisco, CA 1969, pp. 38±73. [19] J. Henderson, M. Treacy, Managing end-user computing for competitive advantage, Sloan Management Review 28 (1986), pp. 3±14. [20] D.G. Hendry, T.R.G. Green, Creating, Creating and comprehending and explaining spreadsheets: a cognitive interpretation of what discretionary users think of the spreadsheet model, International Journal of Human-Computer Studies 40 (1994), pp. 1033±1065. [21] R. Hilton, Managerial Accounting, 2nd ed., McGraw Hill, New York, NY, 1994, pp. 439±440. [22] M. Igbaria, S. Parasuraman, A path analytic study of individual characteristics, computer anxiety and attitudes toward microcomputers, Journal of Management 15(3) (1989), pp. 373±387. [23] I. Lorge, H. Solomon, Two models of group behavior in the solution of Eureka-type problems, Psychometrika 20(2) (1955) (cited in [29]). [24] L. Markus, M. Keil, If we build it, they will come: designing information systems that people want to use, Sloan Management Review 36 (1994), pp. 11±25. [25] S. Moriarity, C. Allen, Cost Accounting, 3rd ed., Wiley, New York, NY, 1991, pp. 285±291. [26] B. Nardi, J. Miller, Twinkling lights and nested loops: distributed problem solving and spreadsheet development, International Journal Man-Machine Studies 24 (1991), pp. 161±184. [27] O. Ngwenyama, Developing end-users' systems development competence, Information and Management 25(6) (1993), pp. 291±302. [28] R.R. Panko, What we know about spreadsheet errors, Journal of End User Computing 10 (1998), pp. 15±21. [29] R.R. Panko, R.P. Halverson, Individual and group spreadsheet design: Patterns of Errors, in Proceedings of the 27th Hawaii International Conference on System Sciences, Vol. IV, IEEE Computer Society Press, 1994, pp. 4±10.

[30] R.R. Panko, R.P. Halverson, Spreadsheets on trial: a survey of research on spreadsheet risks, in Proceedings of the 29th Hawaii International Conference on System Sciences, Vol. II, IEEE Computer Society Press, 1996, pp. 326±335. [31] R.R. Panko, R.P. Halverson, Are two heads better than one (at reducing spreadsheet errors in spreadsheet modeling)?, Office Systems Research Journal 15 (1997), pp. 21±32. [32] R.R. Panko, R.H. Sprague, Hitting the wall: errors in developing and code inspecting a `simple' spreadsheet model, Decision Support Systems 22 (1998), pp. 337±353. [33] K. Rajalingham, D. Chadwick, Integrity control of spreadsheets: Organization & tools, in: S. Jajodia, W. List, G. McGregor, L. Strous (Eds.), Integrity and Internal Control in Information Systems, Kluwer Academic Publishers, Boston, MA, 1998, pp. 147±168. [34] J. Rockart, L. Flannery, The management of end user computing, Communications of the ACM 26 (1983), pp. 776±784. [35] B. Ronen, M. Palley, H. Lucas, Spreadsheet analysis and design, Communications of the ACM 32 (1989), pp. 84±93. [36] D. Ross, K. Schoman, Structured analysis for requirements definition, IEEE Transactions on Software Engineering 3(1) (1977), pp. 6±15. [37] L. Salchenberger, Structured development techniques for user-developed systems, Information and Management 24(1) (1993), pp. 41±50. [38] E.J. Savitz, Magellan loses its compass, Barron's 84(12) (1994), pp. 50. [39] M.G. Simkin, How to validate spreadsheets, Journal of Accountancy 164(5) (1987), pp. 130±138.

Diane Janvrin

Joline Morrison