127 Fault tolerance by distributed software control for hhigh reliability

127 Fault tolerance by distributed software control for hhigh reliability

398 Abstracts 124 A View on Computer Systems and their Reliability in Japan T. Natsume, Y. Hasegawa, pp 45.49 128 The Verification Support Environm...

114KB Sizes 5 Downloads 172 Views

398

Abstracts

124 A View on Computer Systems and their Reliability in Japan T. Natsume, Y. Hasegawa, pp 45.49

128 The Verification Support Environment VSE Dr. Baur, T. Phuta, P. KeJwal, R. Dexter, D. Hutter, C. Sengier, E. Canver, pp 69-74

The tremendous growth of computer systems applications in industries is still continuing and these systems are migrating as a key function in other areas. Despite this trend, which means these system configurations are becoming highly complex, requirements for the safety of computer systems are not emphasised. Social impacts, confusion, and loss are easily assumed in such environment when systems fail to perform. Here typical systems in Japanese industries are described from the viewpoint of system performance and system failures over the last few years. Issues of system dependability and safety are discussed.

The VSE (Verification Support Environment) is a first approach to integrate the methods of formal sp~ification and verification into an existing industrially approved CASE tool Safety/security-critical software systems (or parts of) can be developed within the VSE under the guidance of a formal design method, based on the principles of modularizafion and refinement, and covering the whole development process. The VSE system integrates an appropriate formal specification language (VSE-SL) and industrially approved components for specification and verification into a single, reedy-to-use software development environment. The VSE project is a national project, initiated and sponsored by the BSI (Bundesamt f ~ Sicherheit in der Inform ationstechnik).

125 A Generk Failure Model for Distributed Systems F. Tam, R. Badh, pp 51.56 This paper discusses the need for classifying failure semantics in distributed systems. It argues that a generic description is essential for the understanding and unification of the varying concepts used. A failure model based on the layering structure is proposed. The architectural framework, failure representation and faults classification for this model are explained. The concepts of Virtual and Actual Failures are introduced and the model's properties are highlighted. Three applications are presented to illustrateits usefulness in supporting the structuring of fault-tolerant distributed systems.

126 Recovery In Distributed Systems from Solid Faults H. Aliouat, pp 57-62 This study deals with recovery processes for transient or permanent hardware faults in a distributed environment. The recovery mechanism can be based on one of several strategies. The considered framework is composed of a set of autonomous physical stations (nodes) each having a local system; and some of them able to replace potential failing ones in case of permanent fault. The whole system can remain in service until the end of the application; this gives the system a non-stop operating cycle. Results of a simulation of a fault-tolerant mechanism based upon one of the proposed strategies are given.

127 Fault Tolerance by Distributed Software Control for High Reliability E. Renaux, pp 63-67 This paper describes a fault-tolerant system that provides availability and high reliability by unit replication, fault masking and correction. In this system, the control is distributed, implemented by software, and uses the results of the well-known diagnosable PMC model, proposed by Preparata, Metze and Chien. From its ability to have important exchange throughputs on highspeed dedicated links, it allows checking of the consistency not only of results computed by the replicated application processors, but also of the software programs, and the parameters kept in application memory. This fault-tolerant system gives a level of reliability equivalent to the soundness of the cache memory.

129 Design and Planning in the Development of Safety-Critical Software with Ada J. Prorok, K. Bfihrer, U. Ammanu, K. Vit pp 75-80 A study recommends 30 approaches for improving the safety of software to be developed in Ada; recommendations are also made against a further 21 approaches. Of particular significance are an Aria Language coding guideline, the use of object-oriented design approaches, disciplined management practices, and early consideration of testing requirements and testability. Most of the recommended approaches are either entirely or substantially carried out in the early phases of a software development project, to avoid "dangerous" practices at a later stage. A proper appreciation of the planning and design of safety-critical software is thus essential for a satisfactory project outcome.

130 The Mythical Mean Time to Failure A.K. Bisset, pp 81-86 As software engineering has emerged as a discipline in its own right, its practitioners have often borrowed methods and techniques from other engineering fields. These techniques have proved useful in the past, and the expectation has been that they would have a similar utility when applied to software. The peculiar nature of software sometimes invalidates such an approach. This paper develops a critique of the use of the Mean Time to Failure (MTrF) metric with software, and discusses some of the implications for software reliability.

131 Practical Formal Methods for Process Control Engineering C. Fencott, C. Fleming, C. Gerrard, pp 87-92 This paper documents the application of formal methods to the specification and verification of process control in safety-critical systems. An executable subset of the formal specification language OBJ is used to model and verify systems utilising Programmable Logic Controllers (PLCs), prior to hardware implementation. The method integrates with existing techniques. Examples are presented to demonstrate the practicality and the benefits of the approach, and its role in an industrial safetycritical context is discussed, for both implementation and maintenance. Current work focuses on hand-held tool support, additional formal verification methods, and the application of the technique to systems development within safety standards.