An approach to operating system testing

An approach to operating system testing

An Approach to Operating System Testing R. N. Sum, Jr., R. H. Campbell, and W. J. Kubitz Department of Computer Science, University of Illinois at Ur...

1MB Sizes 8 Downloads 174 Views

An Approach to Operating System Testing

R. N. Sum, Jr., R. H. Campbell, and W. J. Kubitz Department of Computer Science, University of Illinois at Urbane-Champaign

Every new system must be evaluated before delivery to ensure proper functioning and reliability. One part of this evaluation is the system test. A system test validates the high-level functionality of a system. In the context of a general purpose computer operating system, a system test verifies that the user interfaces conform to the system’s specifications. Verification, in the form of program proofs, would eliminate the need for system testing. However, the current proof techniques are not yet adequate. Therefore, a systematic approach to system testing is needed. The approach to be presented in this paper is based on the system test’s position in the software life cycle and derives its tests from the system specifications.

We have developed a heuristic approach to the software system test that results in a test suite in which the individual tests are embedded in a comprehensive testing framework. This testing framework covers the user interfaces and uses many testing strategies. This ap preach is described via application to the system test for the IBM Instruments, Inc. System 9000’ computer. As with any approach, the proof of its usefulness is in the application of it. Therefore, the results of using the approach on the IBM Instruments, Inc. System 9000 computer are presented. (This computer is a small work station that runs XENIX, the Microsoft,’ Inc. derivative of the UNIX3 Version 7 operating system.) During the system test, various statistics were collected on the errors found in the system and on the general reliability of the system. These statistics were then used to aid in the evaluation of both the computer system and the test approach. During the system test’s design and development, several management problems arose. Some of these problems resulted from the system test being a software development project. Others resulted from the geegraphical separation between the test team and the developers. The management tools used to solve some of these problems are discussed briefly. Finally, some conclusions about the approach, in particular, and systems, in general, are highlighted. For the approach, these conclusions include its strengths and its applicability to others systems. For systems, the test of the System 9000 provided some conclusions about portable software.

Address correspondence to Dr. Robert N. Sum, Jr., Dept. of Computer Science, University of Illinois. 1304 W. Springfield Ave., Urbana, IL 61801.

‘IBM and System 9000 are trademarks of International Business Machines, Inc. ‘XENIX and Microsoft are trademarks of Microsoft, Inc. ‘UNIX is a trademark of AT&T Bell Laboratories. Inc.

To ensure the reliability and performance of a new system, it must be verified or validated in some manner. Currently, testing is the only reasonable technique available for doing this. Part of this testing process is the high-level system test. This paper considers system testing with respect to operating systems and, in particular, UNIX. This consideration results in the development and presentation of a good approach to the development and performance of the system test. The approach includes derivations from the system specifications and ideas for management of the system testing project. Results of applying the approach to the IBM System/9000 XENIX operating system test and the development of a UNIX test suite are presented.

1. INTRODUCTION

The Journal of Systems and Software 6, 273-284 0 Elsevier Science Publishing Co.. Inc., 1986

(I 986)

273 0164-1212/86/$3.50

274

2. SYSTEM TEST OVERVIEW

Before a system test approach can be developed, a strong characterization of the system test must exist. Our characterization is based on the software life cycle and includes a system test objective, the system specifications, and the test types of a system test. The first part of the system test characterization is determining the system test’s purpose within the software lifecycle so that the system test objective and test types may be developed. We have used a software life cycle composed of requirements, specification, design, development, testing, and maintenance phases which is similar to one in [ 51. In practice, various types of testing are used during each phase of the life cycle. Therefore, we decompose testing so that module and function testing are associated with development, unit testing with design, system testing with specifications, acceptance testing with requirements, and regression testing with maintenance. In this decom~sition of testing, the system test is the first test of the completely integrated system and the last test before system delivery. Therefore, the purpose of a system test within the software life cycle is to verify the integrated software system from the user’s viewpoint before the user receives the system. The system test compares a system with its specifications. Therefore, the objective of the system test is to demonstrate that the system does not meet its specifications [lo]. Furthermore, Myers [lo] points out that system testing is impossible if there is not a written set of measurable specifications for the product. These

specifications often consist of two classes of documents. The first is a “formal” system specification and the second is the system’s user’s manuals. The “formal” specification, is an overview of the functions and interfaces provided by the system. (To date, “formal” has most often referred to a carefully prepared document, written in English, and not a mathematically derived structure.) The user’s manuals supply details about each interfaces’ individual functions and their usage. In particular, the formal specification of system functions and interfaces lists the items to be tested whereas the user’s manuals support the development of individual tests. From the system specifications, especially the formal specifications, many different types of testing are implied. Although the specific test types depend on the type of system and function being tested, many test types are common to many systems. Reference 10 includes facility, volume and stress, usability, security, performance, storage, configuration, compatibility and conversion, installability, reliability, recovery, service-

R. N. Sum, Jr. et al. ability, documentation, and procedure tests among the most common test types found in system tests.

3. THE SYSTEM TEST APPROACH We now present the approach as used to develop the System 9000 system test and the UNIX test suite. First, some characteristics of the system specifications are presented. Second, a hierarchical framework is derived from the formal specifications [ 1I. Third, tests are developed by using the System 9000 user’s manuals [68] to refine the hierarchical frameworks lower levels. Finally, the relationship of the tests developed to the system test types of [IO] is presented.

3.1. System Specification The development of a system test begins with the examination of the two classes of system specifications. Each class of spe&i~cations provides a particular view of the system interfaces. These views are then used as the framework for the various parts of the system test. Before presenting the specifics of the UNIX test suite and the System 9000, this section characterizes each class of specifications. 3.1.1. Class 1: formal s~~i~cations. The formal specifications [ 11 list the overall contents of the system. This list includes the documentation available, the hardware supported, an installation procedure, the performance claims, and the user interfaces. From the formal specifications, a system test framework is developed which contains one component for each user interface. For example, a typical operating system which contains a command interpreter, some programming libraries, and low-level system requests would create a hierarchical test suite with three components. After the formal specifications determine the system test components, further information for the develop ment of the framework is supplied by the second class of specifications. 3.1.2. Class 2: user’s manuals. The user’s manuals for a system may encompass many documents. These documents describe in detail the commands and functionality provided by each interface. The documents for each interface may even differ in style. The following describe some of the more common manuals that a system contains. The User’s ~a~~a~: The user’s manual provides infor-

mation about the command level usage of the system. It describes the command level user interface

275

An Approach to Operating System Testing in general and includes specific usage instructions for each command.

functions, groups.

file system aids, and various library

The Programmer’s Manual: The programmer’s manual

System Calls: The system calls are direct programming

describes the facilities provided for the various types of programming supported by the system. These facilities include programming libraries and direct system services.

interfaces to operating system services and required a group of tests for each system call.

The Hardware Manuals: The hardware manuals de-

scribe the interface that the system has with its various physical devices. This description includes how the software drivers interface with the system and how they may be used by system programmers.

3.2. System Test Framework

The formal specifications of the System 9000 [l] described four major user interfaces and they supplied the major components for the test suite framework. These interfaces are: Commands: the high-level commands available to every user, Subroutines: the subroutine libraries designed for use in application programs, System Calls: the subroutines designed for use in systems programs that directly invoke operating system functions, Device Drivers: the interface designed for use in systems programs that request access to hardware devices attached to the System 9000. The four interfaces provided a useful global organization for the test suite by allowing some freedom in the development of lower levels of the framework. This freedom was required by the different natures of the interfaces and the demands that those differences placed upon test development. Essentially, these components allowed the development of four subsuites while keeping the test suite unified. The user’s manuals [6-81 supply the information for the derivation of each interface’s subsuite structure. For each major interface in the System 9000, a brief overview of its subsuite structure is presented. Commands: The command interface was the largest area tested in terms of the number facilities provided. To facilitate the management of the large number of commands, they were grouped by the functions that they provided. For example, there were groups for strictly interactive commands, file system commands, and test processing commands. Subroutines:

The subroutines were also grouped according to function. They included general system

Device Drivers: The device drivers were grouped by

physical device. These devices included memory, disk drives, serial ports, printers, and function keypads.

3.3. Test Development Once the main components of the system test framework have been discerned, the individual tests within each component are developed. The major development concerns for individual tests are test coverage, test style, and test derivation. We also note that each component may use different types of programs, test plans, and documentation. Many of these types are discussed below. 3.3.1. Test coverage. Test coverage considers the development of enough tests to ensure that every function provided by an interface has been thoroughly exercised. The development of the test suite framework provides the upper (global) level of test coverage for each interface. The lower level framework supplied by the user’s manuals ensures the coverage and use of every function within an interface. The development of individual tests is the lowest level of test coverage. For example, coverage of a math subroutine library begins with its inclusion in the formal specifications, continues with the manuals that specify the functions that it supplies, and finally ends with the individual tests of its usage. Notice that coverage for a system test differs from coverage for a function test. (The function test is performed on the isolated function before system integration.) The system test concentrates on ensuring the usage and interaction of the function rather than highly detailed accounts of the internal workings of the function. Consequently, the system test is generally not as exhaustive in terms of raw data so that it avoids duplicating the work of the function test. 3.3.2. Test style. Test style refers to how a test is performed. In general, a system test should tailor the test style to the properties of the interface being tested. For example, interactive commands use an interactive test procedure that ensures reproducible results. In the UNIX test suite, three test styles were used: interactive

276 procedures, automated programs, and guided programs. A description of each test style, including its characteristic documentation, follows. Interactive Procedures. Interactive procedures are used when a function is highly interactive, thereby requiring human intervention in the determination of the correct results. Activity of this nature appears in items such as text editors. To provide reproducible results and to document interactive tests, the Test Dejnition Form was developed (see Appendix A). The Test Definition Form contains a standardized header for test development information and a procedures section for the description of the tester’s actions. The standardized header includes the author of the test, the date the test was written, a synopsis of the test, and any dependencies that the test may have on other items in the tester’s or the system’s environment. The procedures section contains a step-by-step description of the commands and actions to be performed by the tester. This description also includes the expected system responses to each of the tester’s actions. Although the Test Definition Form is not as exact as a program, it successfully provided a means of defining reproducible tests that could be executed by any member of the test team. Automated Programs. Automated programs are used when the correctness of a function’s results can be computed and checked by a program without human inte~ention. Activity of this nature is exhibited by most low-level programming interfaces including the system calls. Documentation for automated programs always includes a standardized header similar to that found in the Test Definition Form as well as comments within the program. For individual automated programs, there is an accompanying Test Definition Form that describes how to compile and run the test. For large classes of automated programs, a common system for logging results is developed that includes its own documentation. This common logging system provides a uniform interface across different tests and test programmers. For example, the system calls in the UNIX test suite used a common logging system across its more than 150 programs and half dozen programmers. ~~ided Programs. Guided programs are a hybrid of both interactive procedures and automated programs. Guided programs are most commonly used when a function with a programming interface requires human intervention to determine the correctness of its results. Activity of this nature occurs most commonly in subroutine libraries designed to do terminal input output such as the “termlib” library in the UNIX system. Documentation for guided programs includes a Test Definition Form, a program header, and the program’s comments.

R. N. Sum, Jr. et al. 3.3.3. Test derivation. When suitable test coverage and test styles have been chosen, individual tests are derived from the manuals comprising the second class of specifications. Our discussion includes a basic philosophy for test derivation and some methods for test data selection. 3.3.3.1. Derivation philosophy. The test derivation philosophy for system testing is similar to the philosophy used for other stages of testing. The basic test data selection philosophy used in the UNIX test suite and the System 9000 system test is like that presented in Ill]: 1. Exercise each major feature of a function, 2. Find boundary and special values that may cause difficulty, 3. Find unusual values and bad values including those that exercise system extremes and error handling, and 4, Choose additional test cases by intuition or randomly until the testing budget is used up. 3.3.3.2. Derivation methods. Individual tests in the test suite were designed by studying the manuals relevant to each function. The number of tests and the test data for each function depend on the size and complexity of the function. Unfortunately, because the manuals are written in imprecise English, the mechanical derivation of test data for a system test is impossible in most cases. This placed most of the burden of test data selection for the test suite upon the individual test developers. A test data design methodology was developed that involved examining the input and output specifications found in the manuals and applying a set of simple test data derivation techniques including: exhaustive, explicit, special, random, exception, and stress testing. Basic descriptions of each scheme follow. Exhaustive. Exhaustive testing is the use of every possible input data value thereby allowing the correctness of every output value to be checked. In general, exhaustive testing can be used only for those functions with an extremely small input domain (e.g., (10 values). A simple example of exhaustive testing would be the UNIX “abort” subroutine. It has only one interface without any parameters. Explicit. Explicit case testing is the use of values explicitly used or suggested in the manuals. These values are usually the most typical or most critical uses of the function and therefore they should be tested. They also act as a check for correctness in the manuals. For example, the XENIX manual specifies a use of the “quot” command to produce a list of all files in a file system

277

An Approach to Operating System Testing and their owners. One test of the ‘“quot” command exercises the specific example found in the manual. Special. Special case testing is the use of particular values chosen because they exercise the function at the limits of its range and domain (boundary conditions) or because the values force special output values that may occur infrequently but are critical enough to warrant testing. This type of testing is most obviously used with math subroutines where input values of zero and infinity are common places for error. Random. Random testing is the use of values chosen randomly throughout the input domain of the function being tested, In general, random testing is not the method of choice, but it can improve confidence in functions with large input domains, Some experience has also shown random testing to be able to find some subtle errors while expending only a small amount of effort [ 21. Several of the mathematical subroutine tests were supplemented with randomly chosen values. Exception. Exception testing involves the erroneous use of a function and the error handling capability provided by the function and the system. Exception tests included test data for error conditions described in the user manuals for which error handling was defined as well as test data that would obviously corres~nd to an error but would not correspond to a documented error condition. An example of an exception test is a program that writes to a file that is open for read only access. Stress. Stress testing explores whether the system will support the extremes specified in its documentation, including the determination of the system’s response when those extremes are reached. in particular, stress tests were very important in exercising device drivers, memory management routines, and file allocation and deallocation. The test data used in stress tests included requests involving maximum program and file sizes. One example of a stress test in the test suite is a program that requests as much memory as the system has available.

Volume/Stress: Volume and stress testing are included

3.4. Test Types

ReIiability/Recovery:

We conclude the pr~entation of the approach with a description of how the test types of [lo] are included in the system test. Different test types were included by the development structure, specific testing, or the specifications. Because some of the test types are not directly related to the system software, some of the test types were only indirectly or implicitly used. Below we present a brief description of where most of the test types are used in the system test. Facility: Facility testing is covered by the derivation of

the tests from the system specifications.

by selecting test data that exercise system extremes. usability: Usability of the System 9000 XENIX is de-

rived from general UNIX usability. This was tested by using the system 9000 XENIX in the same manner as other UNIX systems available to the test team. There were no specifications for usability other than implications that the operating system be like other UNIXes. Security: System security

was tested by exercising those operating system functions that concern user passwords, file ownership, and process identification. Many of the security tests were exception tests.

Performance: The System 9000 specifications explicitly

stated that there were no measures to be met other satisfaction. To supplement were run to give a basis for

particular performance than “reasonable” user this, some benchmarks “reasonableness.”

Storage: Storage tests were part of the device driver in-

terface. These device driver tests included memory as well as disks. Configuration: Changes in the configurations of the ma-

chines on which test development was done were made periodically according to a schedule. This resulted in mosts tests being executed on several different configurations. Compatibility/Conversion: Compatibility

and conversion tests were done primarily by using various different systems. The System 9~ “uucp” command was capable of communicating with both itself and DEC VAXJ computers. Some test development work was done on the VAX and was ported to the System 9000 without modification. Many other programs, including a LISP interpreter, were also ported without difficulty.

Installability: Installability

tests were implicit in our periodic receipt of new machines and new versions of the operating system (with bugs fixed).

Recovery and reliability tests were covered by testing the system’s file system dump, restore, and consistency checker.

Serviceability: Service is generally

provided by the manufacturer although the user may be required to do simple tasks. These were handled implicitly.

Documentation: Documentation was tested throughout

the system test as all tests were derived from using

4DEC and Corporation.

VAX

are

trademarks

of

Digital

Equipment

278

the documentation. Both the system and its documentation were tested simultaneously. Procedure: Most of these tests are part of the installa-

bility and serviceability tests. Most of the errors expected here are documentation problems. The System 9000 test did not require any highly specialized techniques for the above test types. The demands of other systems, e.g., a real time embedded system, may require highly specialized techniques. The system specifications combined with the test types did, however, suggest to the test developers the most effective techniques to use for a particular test. 4. TEST RESULTS

We now present some of the rest&s of appIying the approach and using the test suite developed from it. After a terse description of the System 9000, some general test results are presented. These results are followed by some interesting and unexpected details found during further analysis of the system test. Each bug (error) found was documented using a Problem Tracking Memorandum (PTM) which will be more fully described in the following section on test management. It should be noted that the system test was done on a prerelease version of the operating system and that most of the bugs that were discovered were fixed or at least documented in the manuals prior to release of the system. 4. f. System 9000 XENIX

The System 9000 is a small MC68000-based computer system designed for use as a work station. The System 9000 XENIX operating system is a port of the Micro soft’s Version 7 UNIX-based XENIX operating system that supports multiple users and processes. The entire operating system occupies approximately 7 Mbytes of hard disk storage including all binaries and system data files. The memory resident part of the system occupies approximately 144 Kbytes of memory. The source code of XENIX is proprietary and was not available to the test team. This made it difficult even to estimate the number of faults found relative to the number of lines of code tested. 4.2. General Results

This section describes some of the general results and trends found in the errors discovered by the system test. These results consider the distribution of bugs within the system test framework, across some of the test

R. N. Sum, Jr. et al. Table 1. PTMs by Test Area Test area

Number

Percentaae

Commands Drivers System CaIls Subroutines Specifications Incomplete Data

81 5 24 15 29 2

51.92 3.21 15.38 9.62 18.59 1.28

types, by the severity of the error, and by passage of testing time. Table 1 shows how the software system faults are distributed within the interfaces and specifications (documentation). Without any detailed knowledge of the implementation of XENIX on the system 9000, one might expect the greatest number of errors to be present in the commands because these represent the largest amount of code. However, the SYSTEM 9000 has a ported operating system and, based on this knowledge, one might expect that a greater number of software faults would be found in the device drivers and machine specific parts of the operating system. Most faults were found within the commands. The proportion of the bugs found in the system calls is more serious than other bugs because these reflect failures in direct requests for operating system services. These bugs and hardware bugs occasionally interrupted the system test schedule while they were fixed. Documentation errors were expected since both the documents and the system were developed simultaneously. (The Incomplete Data category is the result of some incomplete PTM forms.) Table 2 is a summary of where the bugs were found based on whether the test tested normal usage or errorhandling (exception testing). While exception testing in general did not reveai an excessive number of bugs, it did discover significant system bags, particularly in the system calls and subroutines. Table 3 compares the number of bugs against their impact on the system. Severity level 1 bugs are the most severe and cause the system to crash no matter how the command is used. Level 2 bugs cause a function not to work or a part of a function to crash the system. Level 3 bugs cause part-of-a-function not to work. Level 4

Table 2. PTMs by Test Type Test type

Number

Percentage

Exception Normal Incomplete Data

17 137 2

10.90 87.82 1.28

An Approach to Operating System Testing

279

Table 3. Severity Level Summary

Table 5. University of Illinois Labor Summary

Level

Number

Percentage

Month

RA-months

Number of PTMs

1 2 3 4

10 22 90 34

6.41 14.10 57.69 21.79

Oct. 83 Nov. Dec. Jan. 84 Feb. Mar. Apr. May

1.5 3.0 3.0 6.0 6.0 6.0 3.0 2.0

0 0 14 23 19 19 6 2

bugs are documentation errors or cause an “annoyance” form of error. Based on the description of the severity levels, it is not suprising that level 3 has the biggest percentage of the total. The distribution of bugs appears to be what one would expect in a software system test. Table 4 shows a desirable distribution of the bugs over the time of the system test. This distribution is the discovery of the most severe errors at an early stage in the system test. This activity resulted from the stress put on the system by the system test and from the execution of the most critical tests first. Also, this result is related to measurements of the system’s mean time to failure (MTTF) which were approximately 10 hr during the first month of system test and >336 hr (2 weeks) during the last month of testing. (This MTTF measurement included only system crashes.) Table 5 corresponds to a typical curve for project labor usage. The early months’ productivity reflects the design of the test suite. (Some delay in November was incurred due to a problem with shipping the systems.) In January 1984, the bulk of the tests were coded and the number of bugs discovered peaked. Finally, as the number of test cases increased, the bugs found decreased, and the labor devoted to testing was reduced. It was found that several of the test cases were difficult to formalize and code. Because of the desirability of generating results, the difficult tests were often deferred until later in the testing period. This also contributed to the decline in reported bugs since the difficult tests took longer to design and code. (One RA-month is approximately one-half of a man-month.)

4.3. Further Analysis The test results also revealed several interesting faults related to the hardware, the C compiler, the file system, and the commands. Nine bugs were found in the hardware during the software system test. Many of these were discovered as a result of the stress put on the hardware system by the software system testing activity. It was somewhat unexpected to have the software test discover some timing errors in the hardware. The C compiler was found to have at least three bugs directly traceable to the origins of an early portable C compiler that was several years old. This was discovered through previous testing of other C compilers at the University. At first, a complete test of the C compiler could not be done because the compiler would not compile the C test suite programs. Based on the earlier testing of other C compilers, a list of suspected as well as confirmed bugs was dispatched to the developers. Because the operating system is written in C, the defects in the C compiler are potentially very serious. However, at least initially, a cross compiler was used for the port, not the system’s C compiler. This cross compiler may also have been responsible for some of the system’s bugs, but it was not available for testing. Table 6 displays the distribution of faults over the various subsystems within XENIX. File system bugs, when collected across the various interfaces, amounted to 15% of all bugs found. This could mean that many of the file system bugs were dependent on a few of the device driver bugs or that there may have been some latent design flaws remaining in the file system. Unfor-

Table 4. University of Illinois Severity vs. Time Table 6. Faults Distributed by Function

Time

Severity levels

Month

2

3

Function

Number

Percentage

Dec. 83 Jan. 84 Feb. Mar. Apr. May

3 3 1 3

5 13 10 11 2 2

File system Hardware C compiler Memory mngmt. Other kernel Other software

23 9 5 2 6 111

14.74 5.76 3.2

1 0

1.28 3.84 71.1

280 tunately, without access to source code and system designs, too little information is available to draw a conclusion. Twenty-three of the bugs discovered (the C compiler’s bugs were counted as one bug for this purpose) appeared to be attributable to the XENIX system rather than to the port. These bugs included a read and write system call that failed to provide appropriate error handling when invoked with a null buffer pointer parameter. A stress test also revealed that the file system would allow more links to be made to a file than the documented limit. In this case, the documentation was correct and the error checking within XENIX was inadequate. Finally, a couple of command bugs proved rather disquieting. The first occurred in the system shutdown command and caused the system to hang rather than to cease operation gracefully. The second, which caused much amusement, occurred in the XENIX “remove user” command and always caused the removal of every user in the system as well as the requested user. This obviously rendered the system unusable after everyone logged out. 5. MANAGEMENT PROBLEMS This section describes some of the management tools that were used to solve some problems that occurred during the development of the test suite and during the testing of the System 9000. These tools supported test development, bug tracking, labor organization, and maintenance tests. 5.1, Test Development To guide the system test, a system test plan [9] was drawn up by a small test design team. This system test plan included objectives for the system test, outlines of the testing to be done, naming conventions for the tests, and a rough estimate of the project’s size. Essentially, the system test plan was an informal requirements and specification document for the test suite. To ensure system test coverage, a set of matrices was used that cross-listed the proposed tests with the functions to be tested. At least one matrix was used for each interface and some large interfaces required several matrices. Table 7 shows the number of test cases for each major component of the test suite as indicated by the test matrices. It should be noted, however, that most of these test cases included more than one test so that more than 800 individual tests were conducted. (The C compiler testing is counted as a single test here.) Even though these matrices tended to be sparse, they pro-

R. N. Sum, Jr. et al. Table 7. Number of Tests by Component Component

Number

Commands Subroutines System Calls Drivers

118 55 192 28

vided a convenient way to check coverage of the tests. For the final presentation in the test suite documentation [ 121, the matrices were compressed into a more compact tabular form. (A sample compressed matrix is included in Appendix B.) 5.2. Reporting and Tracking Bugs To manage bug reports, a Problem Tracking Memorandum (PTM) was used. The PTM form included information about its originator, place of origin, severity level, date of origin, test number, the operating system release, the hardware configuration, and both a short synopsis and detailed description of the problem. (A sample PTM is included in Appendix C.) These forms were filled out by a test team member upon the discovery of a bug. They were then sent from the test team to the developers responsible for the problem area. After the bugs were fixed, a response to the PTM was returned and the test repeated. If the retest was successful, then the PTM was closed, that is, the bug was considered fixed. Control of bug fixes was done by the developers. When a number of bugs had been fixed, they would issue a new version of the operating system, with a new version number, to the test team. The test team would then start a new series of numbers on the PTMs. The sample PTM is for a bug in the second version of the operating system. Therefore, it has a number in the 2000s. The next version used numbers in the 3OOOs,and so on. This numeric correspondence facilitated the association of bugs with operating system versions. Because the developers controlled the bug fixes, it was difficult to determine exactly where bugs were located. However, based on the system s~ci~~tions and the test suite, we found that some bugs required changes to the system specifications, particularly the user’s manuals, and that none of the bugs required changes to the test suite organization (see also Table 1). 5.3. Labor Organization The allocation of labor to system test development appears, in retrospect, to have followed a similar pattern to software development. This is attributable to the

An Approach to Operating System Testing close similarity to the processes involved. Initially, a few people were assigned to design and develop the test plan. After testing began, people were added to develop and code tests and to execute the tests. At the University of Illinois during the height of the testing, six people were working on the project. Three of them had some prior UNIX experience, but there was no “guru.” One member had had some brief testing experience in industry. Based on personal observation, this mixture appears to be similar to those used in many testing situations (both in industry and academia), except that a “guru” is almost always included as well. Finally, as the discovery of new bugs decreased and the test suite neared completion, the number of people was decreased. 5.4. Communications A problem that was solved during the project was a method for exchanging PTMs (bug reports) between the combined test and development group at IBM Instruments in Danbury, Connecticut and the test team at the University of Illinois. This problem was solved by using a dedicated notesjiie [3,4] on one of the University of Illinois computers. A notesfile is a news/bulletin board system that allows notes and responses to be appropriately grouped and managed on-line. The PTM notesfile was checked daily by the team in Danbury by logging in over long distance phone lines. This form of communication proved to be faster and more effective than using the U.S. mail service or reporting the bugs by telephone. The scheme resulted in an rapid exchange of bugs and fixes and permitted quick qualification of ambiguous descriptions in the FTMs. 5.5. Maintenance Provisions A key reason for the development of the test suite was to make the regression testing of future operating system releases easier. The test suite organization was instrumental in providing this ability. The hierarchical framework used for the test suite, together with the UNIX file system, provided an easy way to store the test suite on line. The UNIX text processing facilities encouraged full d~umentation. 6. CONCLUSIONS

Having covered the approach from its general origins to the results of its application, we comment on the approach’s strengths, its general applicability, bug portability, and the experience in general. The approach derives many of its strengths from its hierarchical framework. First, the specification derived

281 framework provides a global mechanism that ensures coverage of the entire system during the system test. Second, the use of user interfaces as a criterion for this structure isolates some of the system properties and allows testing to be tailored to these properties. Third, the results provided by the System 9000 system test displayed desirable trends in the kinds of bugs found and when they were found. Fourth, the framework required few changes during the system test because it fits the nature of the system well. Finally, the framework promoted easy documentation for use of the test suite for regression testing of future system releases. In general, the experience with the System 9000 has shown that this system test approach can be effective. The system test approach’s framework concept was especially easy to use with a UNIX-like operating system. It is believed that this structure will also be applicable to many other systems. Most operating systems contain similar system components that include user commands, subroutine Libraries, low-level system requests, and direct hardware access. It may be somewhat more di~cult to apply this approach to other systems where the concept of user interfaces may not be as clear. In these cases, another “natural” partitioning should be found in the specifications. The most important concept is the use of the system specifications to actively determine the organization of the system test. The concept of bug portability arose from the discovery of several bugs that did not result from the port of the XENIX system to the System 9000. As one example, a particular group of bugs allowed the test team to identify the origins of the system’s C compiler. This is an obvious example of the dangers of porting software as these C compiler bugs had survived for several years in commercial software without being fixed. Moreover, one would expect that these bugs would have been fixed because fixes for some derivatives of this portable C compiler have been available in the public do main. In general, these bugs support the need for a good systematic system test, even for portable software. Specifically, they support the approach used for the System 9000 system test as a good approach because it found portable bugs as well as bugs resulting from the port. Finally, construction of the test suite was, we believe, valuable to both IBM and students at the University. For the students in some software classes, an instructional laboratory of 30 System 9000s running XENIX has proven itself adequately reliable. For the students who participated in the system test, it provided a significant learning experience, even though the project was tedious at times. Many of the students have become knowledgeable UNIX users and all of them are now a bit more skeptical of their own programming.

R. N. Sum, Jr. et ai.

7. ACKNOWLEDGMENTS

The authors wish to acknowledge the help and cooperation of the entire Professional Workstation Research Group of the University of Illinois at Urbana-Champaign. Also appreciated was the cooperation of John Morris and his staff at IBM Instruments, Inc. in Danbury, CT and the funding from IBM which made the project possible.

3. Enter a) do: logout

APPENDIX A: TEST DEFINITION FORM

Teetcasc id: UXCMD103 Date Written: Z/8/84 Modified By: Robert Sum

Autbor: Date:

Robert Sum Z/15/84

Function: Mkuser

is the usual way to add users to the XENM

eye-

tern. Description: Use mkuser

to create a new user for the system.

Dependencies: Tester

2. Enter a) do: more /etc/default/mkuser b) Remember the default home directory and shell. c) Change directory to the home directory, i.e. ‘default home directory’lmktester. d) do: 1 e) do: camp .profile /usr/lib/mkuser.prof f) do: more /usr/lib/mkuser.maii System Response a) Changing directory should act silently. b) 1should list just the profile. c) cmp should not return anything, i.e., run silently. d) Remember what the mail is.

must be a super-user.

Restrictions: Copyright (C) 1984 Robert Sum

b) Login in as mktester. c) do: printenv d) do: more .profile e) do: mail f) do: q g) do: logout System Response a) Login should be successful. b) Result of printenv should agree with things set in ‘.profile’. NOTE: This is true only if the default shell is the Bourne shell (sh). cl mail should mail the output of the more in part 2 with an added header. 4 q just exits mail. e) Logout should be successful.

IBM Workstation Research Project Department or Computer Scienre University of Illinois

Comments Figure A-1.

Test Definition Form

This test should be chained with UXCMD105 which tests rmuser.

Procedure 1. Enter a) do: mkuser b) when prompted enter ‘mktester’ as the user name. c) when prompted enter ‘mkpasswd’ as the user passwd. d) when prompted enter ‘Make User Test’ as the user comment. e) when asked if everything is ok, check it out and respond accordingly. f) when everything is ok, answer affirmatively and wait. System Response a) The program will pause for you to check once more. b) Then it will create the user passwd fiie entry. home directory, mail file. his introductory mail. and his ‘.profile’ file.

APPENDIX B: TEST MATRIX (see Figure B-f )

An Approach

Figure Matrix

to Operating

B-1. UNIX/XENIX

System Testing

System

Test

283

Compressed

APPENDIX C: PROBLEM TRACKING MEMORANDUM This Problem Tracking ~emora~d~rn immediately after a bug is discovered.

REFERENCES I. C. Beerup, CS 9000 XENIX System Programming Functional Specification. IBM Instruments, Inc. Nov

appears as it would

severity Levet: 2 Problem Summary: C compiler ewol: expression cawes compiler loop. Deoartment: UiPWGExtension: (21713334741 Oricinator: R. Sum Rcgksaion Test: Y Answered: / / Verified: / / Opened: 12/l/83 Closed: 1 / Test Carve Number: UXCMDSOI Publication Title: N.A. Draft Date: / Soltwaro Level: Driver ZHardware Level: N.A.Application Level: N.A.

2.

~ I

Figure C-l. SYS-2003

Problem

Tracking

/

Memorandum

3.

4. 5.

Problem Description This compiler error message is generated by moderately long expressions, particularly when doing some type casting. The following generated the error:

6.

if((int)c!=26 I[(int)s !=26 11@)I! = 26 I/ fint)u ! = 26 /I (int)f! = 26 j/ (int)d ! = 26)lrc

8. = lrc+4;

where c = char vatiabie, s = short, 1 = long, II = unsigned, f = float, d = double. Ire is an integer local return code.

I.

9.

1983. (This document is internal to IBM and not available to the general pubhc.) W. Duran and S. C. Ntafos, An Evaluation of Random Testing, IEEE Trans. Software Engineering SE-10 (July 1984). R. B. Essick IV and R. Kolstad, Notesjile Reference Manual, Technical Report UIUCDCS-R-82-1081, 1982. R. B. Essick, Notesffles, M.S. thesis, Technical Report, UIUCDCS-R-8~116.5, 1984. R. R. Fairley, Software Engineering Concepts, McGraw-Hili, New York, 1985. IBM Instruments, Inc., XENIX System Device Driver Manual, March 1984. IBM Instruments, Inc.. XENIX System Operations Manual, March 1984. IBM Instruments, Inc., XENIX System Reference Manual. March 1984. J. D. Morris, CS 9000 XENIX System Test P/an, IBM Instruments, Inc., Nov 1983. (This document is internal to IBM and not available to the general public.)

284 10. G. J. Myers, The Art of Software Testing, Wiley, New York, 1979. 11. M. L. Shooman, Software Engineering: Design, Reliability, and Management, McGraw-Hill, New York, 1983.

R. N. Sum, Jr. et al. 12. R. N. Sum, Jr. et al., UNIX/XENIX Test Suite-IBM S9000 System Test, Report of the Professional Workstation Research Group, Department of Computer Science, University of Illinois, June 1984.