Tools for spreadsheet auditing

Tools for spreadsheet auditing

Int. J. Human – Computer Studies (1996) 45 , 429 – 442 Tools for spreadsheet auditing J. STEVE DAVIS Department of Management , Clemson Uniy ersity ,...

1MB Sizes 7 Downloads 109 Views

Int. J. Human – Computer Studies (1996) 45 , 429 – 442

Tools for spreadsheet auditing J. STEVE DAVIS Department of Management , Clemson Uniy ersity , Clemson , SC 29634-1305 , USA. email: day isêclemson .edu (Receiy ed 4 August 1995 and accepted in rey ised form 29 May 1996) Few tools are available for understanding and debugging spreadsheets, but they are needed because spreadsheets are being used for large, important business applications. The key to understanding spreadsheets is to clarify the data dependencies among cells. We developed and evaluated two interactive tools which aid in investigating data dependencies, an on-line flowchart-like diagram and a tool which represents dependencies by drawing arrows among cells on the display of the spreadsheet. Users found both tools helpful, but preferred the arrow tool. ÷ 1996 Academic Press Limited

1. Introduction In terms of numbers of users, spreadsheets are outnumbered only by word processing packages (Smith, 1990) and are the favorite computer application of managers (Lee, 1986). Spreadsheets were initially employed for quick, infrequently used applications but are now being chosen for all sorts of applications which are important to the mission of a company. Today many applications are so large that researchers have begun to consider parallel processing schemes for efficient implementation (Zhou & Richter, 1994). Already a mainstay for business applications, the spreadsheet may become a programming paradigm with broad application including science and engineering (Du & Wadge, 1990). Spreadsheet applications are perhaps more vulnerable to poor design and to errors than conventional programs. Many spreadsheet users have not been trained in systems analysis and software engineering. Well-trained users may adopt a spreadsheet as a short cut, for situations where a formal approach to software development seems inappropriate. As a result, spreadsheet applications often have problems, which are documented in the practitioner literature (Gruschow 1985; Bryan, 1986; McGrath, 1986). One experimental study found almost half of the simple spreadsheet applications created by experienced users contained errors (Brown & Gould, 1987). For conventional software the need is widely recognized for sound design techniques as well as aids to programmer understanding and debugging, but few such techniques and tools are available for users of spreadsheets, even though the need may be even greater. Frequently, people must work with a spreadsheet that they did not write. In business spreadsheets are often co-developed and used as a means of communication. For example, executives analyse and modify spreadsheets prepared by their staff members (Nardi & Miller, 1991). To improve 429 1071-5819 / 96 / 100429 1 14$18.00 / 0

÷ 1996 Academic Press Limited

430

J. S. DAVIS

understandability of (and reduce defects in) spreadsheet applications, researchers could focus on design techniques or on auditing tools. Design techniques are applicable to the process of creating a spreadsheet and are intended to improve comprehensibility, maintainability and to help prevent errors. We believe there is greater potential benefit from auditing tools, which are intended to aid in understanding and debugging a completed spreadsheet. Ronen, Palley and Lucas (1989) proposed structured design techniques to make spreadsheets easy to understand and to modify without introducing errors, but we wonder whether many people would actually use them. Many people turn to spreadsheets to avoid analysis and design formalities so as to produce an application as quickly as possible. It is unlikely that many spreadsheet users would follow any design procedure that would slow them down. Since people will continue to build complex spreadsheet applications without any particular design discipline, preventive techniques will never eliminate the need for aids in understanding and troubleshooting. Rather than addressing design techniques, this paper focuses on auditing tools which are applicable to any spreadsheet application, no matter how it is designed and documented. These tools are intended for any user, not just for professional auditors. The next section highlights the importance of data dependencies in spreadsheet auditing and describes two tools developed for that purpose. The next section discusses evaluations of the tools and suggests implications of this research.

2. Data dependencies and auditing tools

2.1. DATA DEPENDENCIES

A spreadsheet is essentially a matrix where each entry can contain a number or a formula which references other cells. Saariluoma and Sajaniemi (1991) pointed out that the spreadsheet has two levels, one concrete and visible, and the other more abstract and hidden beneath the surface. On the surface level we see a set of cells displaying values. At the ‘‘deep level’’ cells are connected to each other in a network defined by data dependencies among them. We refer to any cell x mentioned in a formula in cell y as a direct precedent of cell y , and say that y is a direct dependent of x. For example, if the formula in cell H2 is A4 p B4, then A4 and B4 are direct precedents of H2, and H2 is a direct dependent of A4 and of B4. Thus, precedence and dependence are inverse relationships. They are also transitive. Continuing the example, if B4 contains the formula C6 / 100, then C6 is an indirect precedent of cell H2. Data dependencies among cells may be represented as a directed graph in which arcs represent direct dependencies. If a path exists from cell p to cell q , it means that cell q is dependent on p (alternatively, the path means that cell p is a precedent of cell q ). When we speak of ‘‘data dependencies’’, it will be clear from the context whether we refer to precedence as well as dependence. In the remainder of the paper, for brevity in discussing capabilities of tools, we sometimes will mention just

431

TOOLS FOR SPREADSHEET AUDITING

(Tax amount) H2

A4*B4

(Pre-tax income) A4

100.00

B4

C6/100

(Tax rate as decimal) C6

0.33

FIGURE 1. Precedent cells are examined during debugging.

features and procedures for investigating precedents; similar features are available for investigating dependents. Spreadsheet users often need to investigate data dependencies among cells in order to debug the spreadsheet, interpret its results, or change its features. For example, during maintenance activity before modifying a particular cell, a person may wish to determine the impact of that cell on the rest of the spreadsheet. A typical question is ‘‘if the value of this cell is changed, what other cells could be affected?’’ In other words, ‘‘what are the dependent cells of this cell?’’ when trying to understand how a spreadsheet application works, a person may need to trace the steps used to calculate outputs. Each output is produced by a cell containing a formula. When attempting to understand or check the accuracy of a particular formula, a user may need to focus on just those other cells which could influence the calculation. Consider the following example. One might suspect that the value calculated by the formula in cell H2 is wrong (Figure 1). To determine whether there is an error, the user would first check the formula in H2. If it appears correct, the user would want to check the precedent cells. Eventually the user may notice that one of those cells, C6, is labeled ‘‘tax rate as decimal’’ and that its contents, 0.33, must represent a tax rate expressed as a decimal value rather than a percentage. Returning to cell B4, the user notices that C6 is divided by 100, apparently to convert a percentage to a decimal value. Now the error is evident. The problem could be resolved by changing the formula in H2 to read ‘‘A4 p C6’’ or by changing the label of cell C6 to ‘‘tax rate in percent’’ and changing its contents to 33. The process of investigating precedent or dependent cells can be tedious because many spreadsheets contain widely separated cells which are dependent on one another. Chains of dependent cells can be quite long. For example, one study of spreadsheets used in business and government found paths of up to 1092 arcs in the

432

J. S. DAVIS

data dependency graph (Sajaniemi & Pekkanen, 1988). Investigating dependencies by manually finding cell after cell may be frustrating, time-consuming, and error prone. Olson and Nilsen (1988) concluded from experiments that people tend to spend a long time scanning the screen for the appropriate cell coordinates to enter in a formula. Nardi and Miller (1990) found that ‘‘it is difficult to get a global sense of the structure of the spreadsheet, which requires tracing the dependencies among the cells. Many users in our study described awkward pencil and paper procedures for tracing cell dependencies in debugging spreadsheets.’’ In fact users cited the necessity of that tedious process as their main complaint about spreadsheets (Nardi & Miller, 1990). People need tools to aid the analysis of cell dependencies. When we started this research work the auditing tools of available spreadsheet packages appeared unsatisfactory in one way or another. The auditing tools provided by Microsoft Excel 3.0 are representative of those then available. That package provides three interfaces to its auditing facility; each reveals all precedents or allows focusing on just direct precedents. The ‘‘InfoWindow’’ displays a window listing the precedents. The ‘‘Select Special’’ option highlights precedents and allows the user to browse them by pressing the tab key. The ‘‘Worksheet Auditor’’ macro provides capabilities similar to ‘‘Select Special’’. Although these tools identify data dependencies they have shortcomings. There is no graphic display of data dependencies. After navigating among precedents with these tools, there is no way to automatically return to the original cell. There is no easy way to move back and forth between the original cell and a particular precedent cell, or to browse at will the implicit data dependency graph; in some cases multiple keystrokes are needed to traverse a single arc of the graph. We were motivated to develop an improved auditing tool by the difficulty our design team had in understanding and debugging a large spreadsheet application being developed for distribution to managers in the apparel industry. [Our decision was reinforced later when we learned that Saariluoma and Sajaniemi (1991) found that people generally have trouble understanding the ‘‘deep structure’’ (of data dependencies) and would benefit by tools which made it more obvious.] The primary motive was to develop a tool for our own use as developers, but we also wanted to provide assistance to users, who would likely have an even more difficult time understanding a large application developed by someone else. We wanted a visual, on-screen indication of data dependencies and a convenient way to browse the data dependencies. We produced two different interactive tools, one based on the data dependency diagram proposed by Ronen, Palley and Lucas (1989), and another called the ‘‘arrow tool’’ because it represents data dependencies by arrows superimposed on the spreadsheet display. 2.2. THE ON-LINE DATA DEPENDENCY DIAGRAM

Ronen, Palley and Lucas (1989) proposed a graphical, flowchart-like documentation which provides distinctive symbols to represent cells according to whether they function as inputs, outputs, decision variables, parameters, or formulae. Data dependencies among cells are shown by arrows connecting the symbols. Although this type of documentation could be helpful in auditing, it is doubtful that many people will go to the trouble of writing it to accompany their spreadsheet

TOOLS FOR SPREADSHEET AUDITING

433

applications. However, if this documentation could be produced automatically, it could serve as a practical auditing tool, because one could easily produce it when needed. At present, automatic generation of a diagram would be difficult, because finding a reasonably good scheme for the graphical layout is still a research issue. However we expect in the near future effective methods could be developed to automatically generate such diagrams. It would be straightforward for the generator to provide cell references in the symbols, and it could easily display mnemonics in symbols which represent formula if cells are named in the spreadsheet application. Even if automatically generated, just providing the diagram on paper or displaying the diagram on the computer screen would probably not be satisfactory. Whether a diagram is on paper or is displayed on a computer screen, it is separate from the software object being represented. A user may find it tedious and confusing to determine the correspondence between symbols on the diagram and cells in the actual spreadsheet program. An on-line form of diagram appeared to have significant potential if its correspondence with the spreadsheet could be made more obvious. To make the on-line diagram more useful, we developed automated support for identifying correspondence of symbols and cells. Our tool displays the spreadsheet in a window on one side of the screen and the diagram in a window on the other side (Figure 2). To identify the symbol associated with a particular cell, the user merely selects that cell and presses a function key. The selected cell and its associated symbol will be marked with the same color, and the display of the diagram will be centered on that symbol. (In Figure 2 note the association between the highlighted symbol and cell F16). Likewise, one can select a symbol in the data dependency diagram and have its associated cell identified in the other window. An experimental version of this interactive diagram was practical to develop because we could build it for the sample spreadsheet application, rather than having to build a generator that could produce a diagram for any application. 2.3. THE ARROW TOOL

We wanted to develop an auditing tool that resolved a major shortcoming of the on-line data dependency diagram, that it is an auxiliary documentation requiring mental effort to determine how it corresponds to the spreadsheet. Also, we wanted to make it easy to browse cell dependencies, because even if dependent cells are noted it could be very tedious to find them by scrolling. Suppose you wish to investigate precedents of a cell containing a formula, and the spreadsheet has a built-in tool which colors the precedent cells and displays a list of them. If they are within the current display screen, the user can access them quickly by scrolling; or, if they are outside the current display screen but the user knows all their cell names, the ‘‘go to’’ command will lead directly to the them. However, the user is unlikely to have memorized all the cell names in a given formula, in which case the only way to check all the precedent cells is to scroll back and forth between the original cell and the precedent cells. Once the frame of reference is moved to one particular precedent cell, there is no quick and easy way to go back to the original cell or to go to the other cells which are precedents of the original cell. The user may have to remember the location of the original cell in order to return to it.

FIGURE 2. On-line data dependency diagram.

434 J. S. DAVIS

TOOLS FOR SPREADSHEET AUDITING

435

The arrow tool addresses the aforementioned problems by drawing arrows on the actual spreadsheet to depict data dependencies and by providing a means of navigating the arrows. Since it shows relationships directly on the spreadsheet, intuitively this technique should be more useful than auxiliary documentation. Given a selected cell, the tool can identify precedent cells by coloring them and by displaying arrows from precedent cells to the selected cell. It can identify dependent cells by coloring them and by displaying arrows from the selected cell to the dependent cells. Once the dependent or precedent cells have been identified, the tool facilitates browsing of those cells in two ways. First, it provides automatic movement to the cell located at the other end of an arrow drawn on the display, by simply clicking on the arrow with the mouse. Secondly, the tool facilitates cyclic examination (one cell at a time) of a collection of dependent cells or precedent cells. Thus, this technique allows investigating the dependent or precedent cells without having to manually search for those cells (i.e. without traversing arrows and without having to find cells through scrolling). The tool also allows erasing its display of information, including the erasing of arrows and removing the coloring of cells, simply by pressing the appropriate key. Since it depicts cell dependencies directly on the spreadsheet, it likely is easier to use than the on-line data dependency diagram, which requires looking back and forth at the spreadsheet and diagram.

3. Evaluation of tools 3.1. THE EMPIRICAL APPROACH TO EVALUATION

To evaluate the aforementioned tools, we decided that the most realistic tasks were tracing of dependent cells (which is generally involved in maintenance activity) and debugging. We employed both tasks for the arrow tool, but for practical reasons we only used a cell-tracing task for the on-line data dependency diagram. The programming to set up a diagram large enough for a realistic debugging task would have been too tedious. Traditionally tools of this nature have been evaluated in terms of task performance and the opinions of users. In our research, performance was not a good basis for evaluating auditing tools because we had to select relatively simple tasks to insure they could be completed in a reasonably short time (we could only get student subjects for one class period), although performance differences would likely become evident for more difficult, more time-consuming tasks. We measured performance in our experiments, but directly asking the user about the helpfulness of a tool seemed to be the best way to determine its benefits. Our main hypotheses were as follows. H1a: The on-line data dependency diagram is helpful in investigating cell dependencies. H1b: The arrow tool is helpful in investigating cell dependencies. H1c: The arrow tool is more helpful than the on-line data dependency diagram in investigating cell dependencies. H2a: The arrow tool is helpful in debugging. H2b: The arrow tool is more helpful in debugging than Excel 3.0 built-in tools.

436

J. S. DAVIS

3.2. EXPERIMENT 1

3.2 .1 . Objectiy e The objective was to evaluate hypotheses H1a – c by determining usefulness of the on-line data dependency diagram and the arrow tool in investigating cell dependencies. 3.2 .2 . Sample spreadsheet We selected a capital investment advisor in Excel (Davis & Kanet, 1994), because it was a typical business application. In a space of 764 by 37 cells, it includes more than a dozen coordinated worksheets for estimating cash flows, and each cash flow is summarized on a master worksheet which calculates return on investment, payback period, and net present value. Task We asked subjects to imagine that the value in a particular cell was changed to a certain new value. They were to list the names and new values of all cells which would be affected by this change. This type of task was selected because exploring cell dependencies is typically involved in understanding or maintaining a spreadsheet application. Tasks in each of the three exercises were similar in scope. For exercises with the arrow tool, the on-line data dependency diagram, and no tool, there were 7, 6 and 7 dependent cells, respectively. 3.2 .3 . Subjects Participants were 27 graduate students enrolled in an information systems class, who were given course credit for their participation. All had some experience using spreadsheets. They averaged 14.2 months experience as a student and 12.4 months as an employee. 3.2 .4 . Procedure We administered the tests during regularly scheduled classes in a PC laboratory. We provided subjects with written instructions and a diskette containing the exercise files. The instructions included guidelines on how to use the arrow tool and the on-line data dependency diagram. We reviewed basic spreadsheet operations, such as opening a file, entering a formula or value in a cell, and scrolling. We explained how the on-line data dependency diagram and arrow tool could be used to investigate relationships among precedent and dependent cells. With each tool they solved a practice problem in locating dependent cells. When no questions remained about how to use the tools, subjects worked three exercises at their own pace. Because we did not have full control over the availability of subjects, we were not able to fully counterbalance the order of the first two exercises. Eighteen used the arrow tool first, and nine used the on-line flow diagram first. (It turned out that there was no learning effect anyway.) The third exercise was the same for all, and it provided no special tool. Subjects were allowed 25 min for each exercise. After all the exercises, they completed a questionnaire about their attitudes toward the exercises.

437

TOOLS FOR SPREADSHEET AUDITING

TABLE 1 Results of experiment with on -line data dependency diagram and arrow tools (n 5 27)

Dependent cells identified (%) Helpfulness of tool (where 0 5 not helpful, 7 5 very helpful)

Arrow tool

On-line diagram

No tool

47.5 (26.2) 5.8 (1.1)

25.1 (32.9) 4.6 (1.8)

34.4 (32.4) n/a

S.D. in parentheses.

3.2 .5 . Results and discussion Results are summarized in Table 1. Subjects found both tools helpful, supporting hypotheses H1a and H1b. Their ratings of the tools on a scale anchored by ‘‘not helpful’’ (0) and ‘‘very helpful’’ (7) averaged significantly greater than the midpoint, 3.5. For the arrow tool, t 5 10.8 and p , 0.0001 , and for the on-line data dependency diagram, t 5 3.1 and p , 0 .01. Asked for a direct comparison, on a scale anchored by ‘‘on-line diagram more helpful’’ (0) and ‘‘arrow tool more helpful’’ (7), subjects responded 5.0 on average, significantly favoring the arrow tool (t 5 3 .7 , p , 0.001) and supporting hypothesis H1c. Although subjects rated the arrow tool more helpful, we wondered whether it was more useful than the built-in tools of Excel 3.0, and decided to conduct another experiment to find out. We thought it would be a good idea in the next experiment to ask subjects how much they used the tools as well as how helpful they were, to make sure their ratings were based on substantial use of the tools. Although our main hypotheses addressed subjects’ ratings of tool helpfulness, we also scored their performance, the percentage of dependent cells identified. (For the data dependency diagram exercise three of the ‘‘dependent cells’’ were actually vectors of cells, but full credit was given for identifying just one of the cells in each vector.) There was no significant difference in performance with the arrow tool and the on-line data dependency diagram, 47.5% and 34.4%, respectively, but the average performance score with no tool, 25.1%, was significantly less than with the arrow tool. We had not expected use of the arrow tool to significantly improve performance in this experiment or in the next one (with a debugging task) because the tasks are relatively simple. Indeed, we shall see that use of the arrow tool was not associated with better performance in debugging, but it did improve performance in identifying dependent cells in this experiment. On reflection, one realizes that it is likely that a tool could show more obvious performance benefits in investigating dependent cells than in accomplishing debugging, even in simple tasks. When a person begins to debug a formula, the precedent cells mentioned in the formula provide clues on what is worthy of initial investigation. But when a person begins to search for dependent cells, there may be no hint on where to start looking. Thus, a tool could help avoid a random, exhaustive search. In this experiment, at least, the arrow tool improved performance. The rather low average score of 47.5% with the tool may be explained by the nature of the task. Although investigating dependencies is a realistic chore in the life of a spreadsheet user, users seldom need

438

J. S. DAVIS

to locate and list all dependencies associated with a cell, as we asked them to do. More often, users are accustomed to investigating dependencies until they find an explanation for what puzzled them in the first place. They check various dependencies until they have an ‘‘a ha’’ experience and then they are finished. Thus many of the subjects may have finished this exercise prematurely. Compared to a debugging task, it was much harder in this task to know when the task is completed (when all the dependencies have been found). 3.3. EXPERIMENT 2

3.3 .1 . Objectiy e The purpose was to evaluate hypothesis H2a and H2b, by comparing the arrow tool with current auditing tools (represented by those of Excel 3.0) in terms of helpfulness in debugging. (Excel 3.0 lists the precedent or dependent cells, but does not provide a visual display of relationships, and does not provide a convenient way to move among the selected cell and its precedent / dependent cells.) 3.3 .2 . Sample spreadsheet application Students worked with the same spreadsheet used in Experiment 1. There were two exercises, one to be done with the arrow tool and one with the built-in tools. Each exercise required subjects to correct a problem in the calculated value of a specified cell. They were to describe any changes they made in solving the problem by specifying the cell name, the new cell contents, and the new cell value. A different bug was inserted into the application for each exercise. Both bugs were wrong cell references, which were found by Brown and Gould (1987) and by Olson and Nilsen (1988) to be a common type of error. One bug was created by changing a cell reference in a formula to an empty cell adjacent to the correct cell which contained a tax rate. This change eliminated the tax adjustment from the formula in a cell which was to calculate an after-tax cash flow. The other bug was in a vector of cash flows for years 0 through 6. The formulas for years 0, 1 and 2 incorrectly referenced years 1, 2, and 3, respectively of another vector of cash flows (they should have referenced years 0, 1 and 2, respectively, of the other flow). Figure 3 shows how the arrow tool could be used to help find the problem. The user can see that the two highlighted cells improperly refer to cash flows from different years. 3.3 .3 . Subjects Twenty two students in an Information Systems Design course participated as a part of class work. The students averaged 15.9 months experience with spreadsheets as a student and 3.9 months as an employee. 3.3 .4 . Procedure In a PC laboratory students were provided with written instructions and a diskette. They received an explanation of the role of precedent cells in debugging. They were told to use textual cell descriptions as well as available tools to help determine the intended function of the spreadsheet application. They were told to assume that cell labels were correct. For example, if a cell was labeled ‘‘time in seconds’’ then the value in that cell should be considered to be in seconds in any formula which

FIGURE 3. Use of the arrow tool in a debugging task.

TOOLS FOR SPREADSHEET AUDITING

439

440

J. S. DAVIS

TABLE 2 Results of experiment comparing arrow tool with built -in tools (n 5 22)

Mean score (1 5 correct, 0 5 incorrect) Extent tool was, used (where 0 5 did not use, 7 5 used a lot) Helpfulness of tool (where 0 5 not helpful, 7 5 very helpful)

Arrow tool

Built-in tools

0.7 (0.4) 6.9 (1.5)

0.6 (0.5) 5.9 (2.5)

6.7 (1.9)

4.7 (3.3)

S.D. in parentheses.

references that cell; if a cell containing a formula is labeled ‘‘sales in 1993’’, then that is what it should calculate (e.g. not sales in 1990). Subjects were introduced to the tools built-in to Excel for tracing cell dependencies and to the arrow tool. They worked a sample debugging exercise with each. When no questions remained about how to use the tools, students began the exercises, working at their own pace. They were allowed 30 min for each exercise. We employed a within-subjects design. Each student performed an exercise with each of the two tools. The order in which they used tools, as well as the assignment of bugs to exercises, was counterbalanced. 3.3 .5 Results and discussion Results are summarized in Table 2. Ratings of the arrow tool on a scale anchored by ‘‘not helpful’’ (0) and ‘‘very helpful’’ (7) averaged significantly greater than the mid point of 3.5 (t 5 8.1 , p , 0 .0001) , thus supporting hypothesis H2a that the tool is helpful in debugging. Also, helpfulness of the arrow tool was rated higher than helpfulness of the built-in tools (t 5 3 .7 , p , 0 .005) based on questions which asked for a separate rating for each. Subjects were directly asked which tool was more helpful. On a scale anchored by 0 for ‘‘built-in tools more helpful’’ and 7 for ‘‘arrow tool more helpful’’ the mean response was 6.8, significantly in favor of the arrow tool (t 5 8 .6 , p , 0.0001). Therefore, hypothesis H2b was supported. The previously mentioned ratings would not be important if the subjects did not use the tools very much in the debugging exercises, because they would not have a sound basis for their ratings. Subjects were asked how much they used each tool. On a scale anchored by 0 for ‘‘did not use’’ and 7 for ‘‘used a lot’’ the mean response was 6.9 for the arrow tool and 5.9 for built-in tools, showing substantial use of both tools, and significantly more extensive use of the arrow tool (t 5 3 .2 , p , 0 .005). Apparently subjects’ ratings of tool helpfulness were based on actual use of the tools, and their more extensive use of the arrow tool may be additional evidence that they found it worthwhile. Responses on the debugging exercise were scored as 1 if correct and 0 otherwise. The mean score was about the same for the arrow tool as for the built-in tools, 0.7 and 0.6 respectively. The exercises may not have been lengthy enough to distinguish performance levels for the different tools. In some earlier pilot experiments the tasks were so easy that most subjects readily solved debugging problems without bothering to use the available tools. We learned the hard way that little can be learned about helpfulness of a tool if subjects do not use it. Therefore, for this experiment we selected a large sample spreadsheet which

TOOLS FOR SPREADSHEET AUDITING

441

was unfamiliar to the subjects. We intentionally designed the tasks to be difficult enough to encourage subjects to use available tools. Finally, we allowed a limited amount of time. These reasons may help explain why the scores were not higher than 70%, even with the arrow tool.

4. Concluding remarks The experiments are not a sufficient basis for strong claims of a general nature. Further experiments employing more difficult, time-consuming debugging tasks could help provide stronger evidence of the usefulness of tools. Also, it would be interesting to conduct tests with subjects who have had more thorough training with the tools involved. In the experiments reported here, user preferences could have been influenced more by ease of learning than by the inherent usefulness of a tool. Still, it is noteworthy that users at various levels of experience found useful the auditing tools to which they were exposed, and that they favored the arrow tool. Our results suggest that those providing instruction in spreadsheets should acquaint new users with any auditing tools available, and should emphasize use of arrow-tool-type features if available. The arrow tool could be developed for any modern spreadsheet package using the macro language (as we did). After we conducted this research at least one spreadsheet package, Microsoft Excel for Windows 95, Version 7, was released with a built-in feature similar to the arrow tool. Although the tool we developed may have some advantages over the latest ones commercially available, such as the option with a keystroke to browse among the target cells of the arrows, our findings do not support bragging about its superiority. Instead, the findings recommend using whatever arrow-type tools are available (whether built-in or separately developed). Teachers of spreadsheets should consider devoting a portion of the course to spreadsheet auditing techniques and tools, and should emphasize arrow-type tools. The arrow tool (ours or the one in Excel) could be enhanced in several ways. In the current rendition the arrows are not labeled, so the user does not know which cell appears at the other end. This information could be provided by labeling each arrow with the reference and the mnemonic pertaining to the cell at the other end of the arrow. For spreadsheets which allow assigning names to cells, the tool could automatically provide the proper mnemonic label for each arrow associated with a named cell. For example, imagine that the cell K26 represents DISTANCE and contains the formula A2 p C5, where cell A2 represents RATE and cell C5 represents TIME. Assume the user of our system selects cell K26 with the mouse and then requests the identification of precedent cells by pressing the appropriate key. The system would display an arrow from cell A2 to cell K26 labeled ‘‘RATE (A2)’’ and an arrow from cell C5 to cell K26 labeled ‘‘TIME (C5)’’. Even though the precedent cells may be outside the display screen, this labeling scheme makes it obvious which arrow should be followed to investigate a particular precedent cell. An arrow tool could be enhanced by commands to identify (1) all precedents in the chains which begin at a selected cell and end at input cells, (2) all dependents in the chains which begin at a selected cell and end at output cells, or (3) the entire data dependency graph (for the whole spreadsheet). The arrow tool technique could be applied to conventional computer programs

442

J. S. DAVIS

where variable names would play the role of spreadsheet cells. Data dependencies among variables would be indicated by arrows drawn directly on a display of the program text. Data dependency graphs have been found worthwhile for conventional programs (Wilde & Huitt, 1991), but research would be necessary to determine whether this particular technique, drawing arrows on the program text, is practical in that environment. Chris Jarvis and Jack Peck provided guidance and support of this project. Programmers included Sarat Vemuri (whose original ideas helped produce the first prototype of the arrow tool), Venkat Buduma (who improved the arrow tool), and Kalyan Sundhar (who produced the on-line flow diagram). Shankar Sengupta and Vivek Naiknavare helped conduct experiments. This work was partially supported by National Science Foundation Grant IRI-8900354.

References DAVIS, J. S. & KANET, J. J. (1994). Developing an application-specific interactive system based on spreadsheet package. The Journal of Systems and Software , 25 , 201 – 209. BROWN, P. S. & GOULD, J. D. (1987). An experimental study of people creating spreadsheets. ACM Transactions on Office Information Systems , 5 , 258 – 272. BRYAN, M. (1986). Are your spreadsheets lying to you? Business Software , October, 59 – 64. DU, W. & WADGE, W. W. (1990). The eductive implementation of a three-dimensional spreadsheet. Software Practice and Experience , 20 , 1097 – 1114. GRUSHCHOW, J. (1985). Avoid these common spreadsheet errors. Lotus , July , 59 – 62. LEE, D. M. S. (1986). Usage pattern and sources of assistance for personal computer users. MIS Quarterly , 10 , 313 – 325. MCGRATH, J. (1986). PC’s can’t add—but you can teach them to. Business Software , September, 58 – 59. NARDI, B. A. & MILLER, J. R. (1990). The spreadsheet interface: a basis for end user computing. In D. DIAPER et al. , Eds. Human – Computer Interaction —INTERACT ’90 , pp. 977 – 983. Amsterdam: Elsevier. NARDI, B. A. & MILLER, J. R. (1991). Twinkling lights and nested loops: distributed problem solving and spreadsheet development. International Journal of Man – Machine Studies , 34 , 161 – 184. OLSON, J. R. & NILSEN, E. (1987 – 1988). Analysis of the cognition involved in spreadsheet software interaction. Human – Computer Interaction , 3 , 309 – 349. RONEN, B., PALLEY, M. A. & LUCAS, H. C. JR. (1989). Spreadsheet analysis and design. Communications of the ACM , 32 , 84 – 93. SAARILUOMA, P. & SAJANIEMI, J. (1991). Extracting implicit tree structures in spreadsheet calculation. Ergonomicas , 34 , 1027 – 1046. SAJANIEMI, J. & PEKKANEN, J. (1988). An empirical analysis of spreadsheet calculation. Software Practice and Experience , 18 , 583 – 596. SMITH, J. (1990). Spreadsheet sales to double by 1993. Lotus , 6 , 12. WILDE, N. & HUITT, R. (1991). A reusable toolset for software dependency analysis. Journal of Systems and Software , 14 , 97 – 102. ZHOU, H. B. & RICHTER, L. (1994). Very fast distributed spreadsheet computing. Journal of Systems and Software , 25 , 185 – 192. Paper accepted for publication by Editor, B. R. Gaines.