Software reliability and redundancy optimization

Software reliability and redundancy optimization

618 World Abstracts on Microelectronics and Reliability model is based on the decomposition principle, where a hypercube of a higher dimension is re...

256KB Sizes 0 Downloads 111 Views

618

World Abstracts on Microelectronics and Reliability

model is based on the decomposition principle, where a hypercube of a higher dimension is recursively decomposed into smaller hypercubes, until the reliability of the smallest cube is modeled exactly. The reliability of the large n-cube is then obtained from this smallest base model using a recursive equation. The reliability model used is taskbased it is assumed that the system is operational if a task can be executed on the system. Analytic results are given for n-dimensional hypercubes with up to 75% system degradation. The model is validated by comparing analytic results with simulation results.

A reliability model for total field incidents. WILLIAM J. KERSCHER 111, TONY LIN, HAL STEPHENSON, E. HAROLD VANNOY and JEROME WIOSKOWSKI. Proc. (2. Re/lab. Mainminab. Symp., 22 (1989). AC Rochester is one of the many component divisions of General Motors and as such is involved with the design and manufacture of several components for G M and other vehicle manufacturers in the world-wide market. The AC Product Assurance organization is charged with ensuring the integrity of the division's products. By reflecting the voice of the customer and focusing the resources and talents of the design organization early in the product development cycle, it is possible to ensure that validated products and processes are developed in a timely, cost-effective manner which meets customer expectations. A powerful tool used by an Assurance organization in this efl'ort is reliability prediction. By modeling preliminary designs and predicting their field reliability, it is possible to drive the development process to higher reliability designs. It has been demonstrated, however, that traditional reliability predictions can differ significantly from actual field incident experience, particularly during the early life of the product. Cost-effective software safety analysis. LEWIS BASS and DANIEL L. MARTIN. Proc. a. Reliab. Maintainab. Syrup., 35 (1989). Software safety analysis is necessary for military, industrial, medical or consumer products containing software which generate safety critical data or c o m m a n d safety critical hardware functions. Inadvertent and intentional errors in software logic, programming, and human/software communications have resulted in theft, loss of valuable data, property damage, deaths, and lawsuits. System safety efforts for designing safety into a computer-controlled system focus upon early recognition of hardware hazards, providing software specifications, and providing test and validation criteria to prevent or reduce safety concerns. For Department of Defense projects, software system safety hazard analyses of safety critical software may be required by MIL-STD-882B, Task Section 300. As yet, there is no analogous standard for commercial products. Justification of these analyses for commercial customers must be demonstrated on the basis of cost-effectiveness, timeliness and positive impact of the performance and safety of the system. This paper describes the application of cost-effective software safety' to the motion control systems of two computer controlled robots, one mobile and the other stationary. The paper discusses language-specific issues. The programming language analyzed is "C", but the methodology is equally applicable to other computer languages or automated control systems using programmable controllers. Guidelines for "safe" programming practices and specificalions are included Motivating reliable switching system performance. L. A, FLETCHER, D. E. BURNS and W. P. COCHRANE. Proc. a. Reliah. M(2mt(2in(2h. Syrup., 383 (1989). In this paper, the use of reliability perfl~rmance criteria to motivate the design of reliable local telephone switching systems is discussed. The process of criteria development and the development of new

reliability performance criteria for switching systems are described. New procedures for switching system field performance monitoring and failure analysis are defined. These are designed to obtain more meaningful reliability data on in-service switching systems.

Operability and maintainability hazards in communications systems. MICHAEL W. HULET and KENNETH F. MOREHOUSE. Proc. (2. Re/lab. M(2inmin(2h. Syrup., 379 (1989). The rapidly advancing field of communications technology has numerous hazards which are often overlooked. Lessons learned teach us that consideration should be given to improved engineering design and program management to make it possible for instrumentation technicians and engineers to install, operate, maintain, and relocate communications equipment without assuming unacceptable risks. Specific hazard areas revealed by hazards analysis include various electrical hazards, tmusual lifting, material-handling requirements, inadequate equipment access, inadequate drawing maintenance, and requirements tor employees to work alone in hazardous, remote locations. These hazards can be controlled to an acceptable le'~el of risk if they are approached seriously by program management using accepted hazards analysis and control procedures. Communications systems are often viewed as a harmless, even simple, use of parts and functions which do not appear to involve any serious hazards. This is one work area that is often taken for granted. A thorough examination of the actual work tasks involved, however, reveals the probability of serious hazards in installing, operating, and maintaining a communications system. This paper discusses some industrial applications and lessons learned about hazards in the rapidly advancing communications technology which should be considered to "'sharpen the competitive edge". Performance analysis of fault-tolerant systems in parallel execution of conversations. K. |t. KIM, SHIN HEU and SEUNG M. YANG. IEEE Trans. Reli(2h. 38(1I, 96 (1989). This paper analyzes the execution overhead inherent in the conversation scheme, which is a schemc for realizing fault-tolerant co-operating processes fl'ee of the domino effect. Mult,proccessor/multicomputer systems capable of parallel execution of conversation components are considered and a queueing network model of such systems is adopted. Based on the queueing model, various performance indicators such as system throughput, average number of processors idling inside a conversation due to the synchronization required, and average time spent in the conversation, have been evaluated numerically for several application environments. The numeric results arc discussed and several essential performance characteristics of the conversation scheme are derived. For example, when the number of participant processes is not large, sa~ less than six, the system performance is highly affected by the synchronization required on the processes in a conversation, and not so much by the probability of acceptance-test failure. Reliability analysis of redundant-path interconnection networks. ANUJAN VARMA and ('. S. RAGHAVENDRA. I E E E Trans. Reliah. 38(I), 130 11989). The reliability of some redundant-path multistage mterconnection networks is characterized. The classes of networks are the Generalized lndra Networks, Merged Delta Networks, and Augmented C-Networks, The reliability measures are terminal reliability and broadcast reliability. Symbolic expressions are derived for these reliability measures in terms of component reliabilities. Tile results are useful in comparing network designs for a given reliabilit~ requirement Modeling and analysis of systems with multimode components and dependent failures. KmEM V. I J and VICTOR O. K. LI. I E E E Trans. Reli(2h. 38(I), 08 (1989). This paper presents a model for the reliability and performance analysis of sys-

World Abstracts on Microelectronics and Reliability tems where components degrade in a statistically dependent manner. This cause-based multimode model is based on the idea that deviations of components from the up state have underlying physical causes which can be explicitly identified and are statistically independent. The effects of several causes can be combined in a flexible manner. System reliability and performance measures can be computed approximately by considering the most probable states. Such states can be efficiently generated by algorithms developed for the earlier multimode, statistically independent failure model.

Software reliability and redundancy optimization. DONG-HAE CHI, HSIN-Hu1 LIN and WAY KUO. Proc. a. Reliab. Maintainab. Syrup., 41 (1989). A procedure for reliability-related quality programming is developed to fill existing gaps in software design and development so that a quality programming plan can be achieved. This study investigates the trade-off between system reliability improvement and resource consumption through the management phase. A software reliability-to-cost relation is developed from both a software reliability-related cost model and software redundancy models with common-cause failures. The software reliability optimization problem can be formulated into a mixed-integer programming problem and solved by a branch-and-bound technique. Integrated reliability growth testing. ALAN W. BENTON and LARRY H. CROW. Proc. a. Reliab. Maintainab. Syrup., 160, (1989). Testing in developmental programs for complex systems consists of numerous types--from early prototype testing to functional testing, environmental testing, safety testing, reliability demonstrations test-analyze-and-fix (TAAF), on into production. An integrated reliability growth program which makes use of all of these tests provides an opportunity to find and correct reliability problems throughout development and helps avoid costly get-well programs later for reliability. In this paper we will consider the development of reliability growth under integrated reliability growth and present results and lessons learned derived through experiences on army developmental programs. Modeling server-unreliability in closed queuing networks. C. S. RAMANJANEYULAand V. V. S. SARMA. IEEE Trans. Reliab. 38(1), 90 (1989). A method is presented to model server unreliability in closed queuing networks. Breakdowns and repairs of servers, assumed time dependent, are modeled using virtual customers and virtual servers in the system. The problem is thus converted into a closed queue with all reliable servers and preemptive resume priority centers. Several recent preemptive priority approximations and an approximation of ours are used in the analysis. The method has approximately the same computational requirements as that of mean-value analysis for a network of identical dimensions and is therefore very efficient. A corollary to: Duane's postulate on reliability growth. DANIEL G. FRANK. Proc. a. Reliab. Maintainab. Syrup., 167 (1989). The purpose of this paper is to present the results of an investigation into reliability characteristics demonstrated by selected avionic equipments over the major portion of their expected service life. As a result of this study it was found that avionics equipment items of various types demonstrate remarkably similar trends of a gradual decline in reliability during prolonged service. These data provide a basis for modification of Duane's learning curve approach by extending its applicability to project a reliability profile over an equipment's planned service life. The revised equations are then used to predict changes in equipment reliability and availability, thus providing a capability to estimate more accurately life-cycle support resource requirements and costs.

619

Invariant permutations for consecutive k-out-of-n cycles. F. K. HWANG. 1EEE Trans. Reliab. 38(1), 65 (1989). Consecutive k-out-of-n cycles have been proposed as topologies for k-loop computer networks and describe a circular system of n components where the system fails if and only if any k consecutive components all fail. Suppose that the components are interchangeable. Then the question arises as to which permutation maximizes the system reliability, assuming that the components have unequal reliabilities. If there exists an optimal permutation which depends on the ordering, but not the values, of the component reliabilities, then we call the system (and the permutation) invariant. The circular system is not invariant except for k = 1, 2, n - 2, n--l,n.

Reliability testing of a software-driven system. DON N. HAGIST. Proc. a. Reliab. Maintainab. Syrup., 347 (1989). This paper examines some of the difficulties in reliability testing of a large, integrated hardware/software system, which is driven by the software. A reliability test technique is presented which was developed for the Enhanced Naval Warfare Gaming System, and which meets the requirements of MIL-STD-781. A brief overview of this warfare simulation system is given, and this system is used to illustrate some of the considerations that arise in test planning. Approximate availability analysis of VAXcluster systems. OLIVER C. [BE, RICHARD C. HOWE and KISHOR S. TRIVEDI. IEEE Trans. Reliab. 38(1), 146 (1989). We solve for the availability of an n-processor VAXcluster system using a hierarchical approach that allows us to (1) obtain a closedform answer to an apparently difficult problem and (2) determine the optimal number of processors in the cluster for a given set of cluster parameters. Our novel approach is a two-level hierarchical model in which the lower-level model is a nine-state Markov chain that is solved in a closed form. The nine-state Markov chain is then aggregated into a three-state device analogous to a diode. Subsequently, the system availability is computed by analyzing a simple network. Reliability database development for use with an object-oriented fault tree evaluation program. A SHARIFHEGER,F. A. PATTERSON-HINE, ROBERT J. HARRINGTTON and BILLY V. KOEN. Proc. a. Reliab. Maintainab. Syrup., 283 (1989). The evaluation of the reliability of complex systems requires extensive knowledge about its basic components and their interconnections. One procedure used to determine the reliability of such systems is the fault tree analysis. These fault trees require estimates of basic event probabilities based on the actual field tests and/or expert opinions. Several databases that contain reliability information for certain types of systems and components exist, but traditionally there has been little interaction between the two. This paper describes the development of a fault tree analysis method using object-oriented programming. In addition, it discusses the programs that have been developed or are under development to connect a fault tree analysis routine to a reliability database. To assess the performance of the routines, a relational dataEase simulating one of the nuclear power industry databases has been constructed. For a realistic assessment of the results of this project, the use of one of the existing nuclear power reliability databases is planned. Integration of reliability--and tolerance effect analysis. IR. A. C. BROMBACHER,H. A. DE BOER and J. VAN'T LOO. Proc. a. Reliab. Maintainab. Syrup., 441 (1989). This paper focuses on the development of systems with so-called on*line optimized reliability. In the case of on-line analysis, reliability analysis should have the same importance as functional analysis now has. Reliability should be integrated in a design. The function of a quality assurance team is only to