Pattern Recognition Letters 2 (1984) 419-425 North-Holland
December 1984
Failure detection processes by an expert system and hybrid pattern recognition L.F. P A U Battelle Memorial Institute, 7 route de Drize, CH-1227 Carouge, Switzerland Received 12 July 1984
Abstract: This paper describes the architecture of a failure diagnosis system, as used in automatic testing, automatic imaging inspection, and specific failure detection tasks in electronics. A new knowledge representation scheme is also given, in relation to hybrid pattern recognition rules. Current work on building a knowledge base of diagnostic metarules is mentioned. Key words: Failure detection, automatic inspection, automatic testing, expert system, metarules, hybrid pattern recognition, integrated circuits, data communications.
1. Introduction
The basic troubleshooting process includes failure detection, localization, diagnostics, analysis and monitoring (Pau, 1981). The key element is failure diagnosis, which carries out a breakdown of the observations y e Y (jointly: signals, images, text), into individual failure modes E o, E l . . . . . EN, where E 0 is the no-failure operating mode. Each diagnostic strategy S is a sequential search decision process represented by a function S: D × I x
/~ e
Y~(ff~, T(I~)),
(E0 . . . . . EN),
where the diagnosed failure mode E must minimize either the average risks, error probability, and/or failure diagnosis delay, and
Advice to the reader: As this paper uses explicitely established concepts from as diverse areas as artificial intelligence, failure diagnosis, and pattern recognition, some of the terminology m a y not be familiar. Readers not familiar with failure diagnosis terminology are referred to the glossary in P a u (1981) (IMEKOTC-10 standard glossary). Readers not familiar with artificial intelligence, are referred to the glossary in e.g. the ' H a n d b o o k for Artificial Intelligence'.
D&
functional decomposition of the system under test (modules, logic states, etc.), I£ learning information data base (operational environment, failure events, maintenance actions, etc.), Y& (Y~, l/v) diagnostic observations derived from passive sensors (YP) and active sensors (y-a) interacting with the system under test; These observations include signals, images, logic variables, and text, T £ action required on the system under test, owing to the diagnosis E (repair, reconfiguration, test generation). In this paper, we shall analyze the failure diagnostic processes involved in automatic testing, automatic imaging inspection, and more general technical diagnosis of electronic systems. A specific process is described, consisting of a knowledge representation scheme for electronic systems, an inference procedure, and hybrid pattern recognition rules. This paper indicates some basic design and selection criteria, for the simplification of this diagnostic system. Current work on building a knowledge base of failure detection, diagnosis and automatic test generation rules, is also mentioned. Three areas of application of this architecture and knowledge base are mentioned.
0167-8655/84/$3.00 © 1984, Elsevier Science Publishers B.V. (North-Holland)
419
Volume 2, Number 6
P A T T E R N R E C O G N I T I O N LETTERS
2. Assumptions This paper essentially assumes the intrinsic weakness of the approach where each defect (or process) is being modelled by various failure modes. It also accepts the inaccuracy of most physical device models in failed conditions, and even in nominal conditions. These notions are also addressed by McDermott (1982) and Davis (1982). The case where failed or nominal conditions are known with good confidence will be accounted for by a simplifying truth maintenance procedure on the limited subset of such conditions. In this paper, we claim that there is robustness to be expected from the combined use of a learning information data base I (domain dependent), and o f a set of diagnostic meta-rules S (domain independent). This robustness is in terms of failure diagnosis correctness, and also of efficient automatic test generation ( y a). At the same time, we here claim that one cannot separate the diagnostic inference and failure recognition into two separate expert systems, with
u) uJ
0 cn
u_
December 1984
one for test generation and the other for failure detection. The diagnostic inference unit described below carries out both in an intertwinned fashion. Finally, this paper wants to underline the fundamental advantages to be derived from sensor fusion in failure diagnosis tasks. Essentially, electrical signals alone, and images alone, do not provide enough test coverage nor low enough failure diagnosis error rates. These two types of information must be merged into the same diagnostic information set Y, for later analysis. Diagnostic performances, as well as faster detection times, have already been demonstrated as a result of this approach (Pau (1983)).
3. Diagnostic system architecture The overall diagnostic and test generation system architecture includes knowledge representation, inference, decision and action steps. This Section 3 focusses on the architecture. Status of work on each part is given in Section 6.
ESNOOOTPUTA Ill E4N°°UTPUI ['
'
E3 NO OUTPUT A E2 NO OUT-O E1 NO EFFEC'I
i'~TO
NEXT
LEVEL
'i t.
t.-0 0
INPUTS
OUTPUTS
N
tv~
5~
PARTS
Fig. 1. Decision table Ci for item i. These tables represent jointly the effects of inputs (T), outputs (Y), and structure (D), and are nested together as indicated in Figure 2.
420
Volume 2, Number 6
P A T T E R N R E C O G N I T I O N LETTERS
LI TT
E7 NO O U T P U T S
December 1984
[Il
NEXT
HIGHER
E6 N O S~G & TLM(~) t i ~S N O TLM-Z O U T P E4 NO TLM-10UTP E3 NO SIGOUTPUT E20UT-OF-SPEC El NO EFFECT
ITI
J T p I S
CIRCUIT
- I -
C
1 (C1)
11
E5 NO O U T P A & B F4 NO O U T P B E[3 NO O U T P A [-2 O U T OF SPEC [[1 NO E F F E C T
CIRCUITC21 CIRCUITCNI
I
INP
OUTP
~NeO~ PARTS
Fig. 2. Nesting list structures, L T = labels of all inputs, L y = labels of all outputs, L s = labels of all parts or effects for all N circuits.
421
Volume 2, Number 6
PATTERN RECOGNITION LETTERS
the operating environment, component characteristics, operating modes, normal reference images, and required actions. It is organized in declarative form, and decomposed into open worlds. A natural language interface is possible to outside users. A relational database system such as BASIS-DM, developed by Battelle, serves to create, update and organize said data. The information collection process is described in Pau (1981).
3.1. Knowledge representation F This includes a list frame data structure F, with an associated vector of attributes A(F), building together a script (F, A (F)).
3.1.1. List frame data structure F The system under test is represented by a nested set of decision tables {Ci} (Figure 1), constructed starting with the basic modules and linked in a hierarchical tree structure. Each of the decision tables Ci results either from basic electronic subsystem design, a n d / o r from a CAD representation. The result is a nested list structure (C 1..... CN) amenable to efficient list processing and predicate verification, made o f all the horizontal and vertical labels in said decision tables, organized into inputs (T), outputs (Y), structure (D): LT, Lr, Ls. As some inputs are test stimuli, while other outputs are only measurements, the diagnostic measurements Y~, YP are merged into these lists, regardless of observability or not. This list frame structure obviously encompasses both signals, images, text and logic variables in the diagnostic observations Y, thereby implementing the sensor fusion concept. 3.1.2. Attributes A(F) These attributes (Figure 3) are made of: (1) Attribute labels in the decision matrix, expressing labelled values for each (row, column) pair element in the list structure (Ls); this accounts for structural attributes, e.g. typical defect types in electronics. (2) Measurement attributes, which are the values o f all measurements y = ( y a , yp), whether they are signals, text, images or logic relations. 3.2. Learning database I This application dependent database specifies mode -•-Open
~
-~Short
-{-Output
mode
O p e n / S h o r t mode
Intersection
-•
Intersection open mode
and
~: ~ Failure at one
not
both at t h e same time
422
L s.
Early research on failure diagnosis in various fields (digital systems, automatic inspection, analog systems and processes) has identified three basic diagnostic strategies S: (1) Failure mode removal by analysis and inspection: The detection, diagnosis, localization and removal of the failure mode which has occurred, are carried out in sequence; the sequential removal is set up to process indifferently faults and fault causes, test stimuli, and repair actions. (2) Validation: Diagnosis cannot be considered complete until the system under test has been demonstrated to solve the requirements that were set out in the system specifications; the validations consist of verifying sequentially that these are met. (3) Exploring the operational envelope: The external specifications define the operational envelope within which the system under test must perform correctly in the mode E 0. These external performance limits, which are representative of the real operations, are not necessarily accurate, and quite different systems conditions may occur. These strategies therefore explore the behavior under circumstances not given as performance requirements, including 'severe' operating environments worse than the specified envelope. As an example, a number of the better known software testing strategies S can be classified into the 3 above classes: (1) Failure removal: sensitized path testing, fault seeding, hardware/software test points and monitoring software, - code analyzers, -
other intersections,
Fig. 3. Structural attributes A (F) in
3.3. Diagnostic metarules S
-
or t h e
December 1984
-
Volume 2, Number 6 -
-
-
-
-
-
PATTERN RECOGNITION LETTERS
dynamic test probes, injection of test patterns of bits. (2) Validation: proof-of-correctness, program verification by predicate testing, proof-of-loops, validation using a representation in a specification language, validation by simulation. (3) Exploring the operational envelope: endurance tests, derivation of tests outside the specifications, by a specification language representation, automatic test case generation, behavior of specific routines in extreme cases, stress tests/inputs, time/saturation tests.
In the knowledge base (Figure 4), we write application independent diagnostic and test generation meta-rules in predicate form. These are metarules codifying the three basic diagnostic strategies S defined above, specified for a range of different combinations and representations in the product set D x I × Y. This knowledge base consists currently of more than 400 such meta-rules, and has progressively been built up as practical cases get processed, and as the range of sensors is being increased. 3.4, Test generation T The estimated failure mode /~ triggers the specific types of probing energy applied to the system under test (sensitizing probing, deterministic, random). Thus the metarules S must include test generation.
I DECOMPOSITIONI ~
l,.
o
I LEARNING OATABASEL
,
I
f +
i O,AGNOST'C i
3.5. Failure diagnostic process As stated in Section 2, this process is here decomposed into: (1) diagnostic inference, (2) failure mode recognition, as described below and in Figure 4.
4.
Diagnostic
inference
expert
system
This domain independent expert system, geared towards analysis, assembles scripts ( F , A ( F ) ) , where F is a frame data structure with all knowledge stored together, and A (F) the vector of attributes of this frame. It produces from these scripts and attributes, attributed features (X, A (X)), while excluding contradictory information by proof of hypothesis, and obtaining a prediagnosis by using S and I. This expert system must operate efficiently (time, memory size) with domain dependent knowledge in F; it is activated by a control structure D based on propagating constraints. Semantic networks and simple frames appear not to be applicable knowledge representation schemes within this context. When the inference unit questions I, D and Y, one faces all the problems of a non-monotonic logic, with new axioms added through Y which may be inconsistent with the current theory as inferred. Frame axioms to describe transitions from one situation to another cannot be used, because the failure modes which may occur are unknown.
DIAGNOSTIC OBSERVATIONS 4
{ SYSTEM UNDER TEST
ya
.~// i (x'A(x))
IFA'LUREMODEI [ RECOGNITION and
For more details, see the procedure in Section 4.
yP
|INFERENiE I
r DIAGNOSTIC METARULESt / s I
December 1984
I
I
,
"
+
t
REPAIR + RECONFIGURATION + MAINTENANCE + TEST GENERATION T
alternatives Fig. 4. Diagnostic system architecture. 423
Volume 2, Number 6
PATTERN RECOGNITION LETTERS
Therefore, the best approach, as we have chosen it, is a truth maintenance system (mode Eo), where
the truth values are scalar functions of the attribute values A(F), with backtracking to rules in S explaining the contradictions, and inclusion of pointers to these rules among the features X. The diagnostic inference thus includes the following steps: Step 1. Restrict modules M e (I× D × S) candidates for confrontation; Step 2. Confront these modules as selected to the current situation Y, by checking on preconditions in I × D : This involves checking a predicate formula in the first order logic by a saturation procedure; a saturation procedure consists in adding preconditions until the current condition no longer has the logic value True; Step 3. Select the features X from the modules as selected by the diagnostic meta-rules S, which further eliminates some modules according to their hierarchy. This hierarchy expresses causality relations, failure mode propagation, and time constants; Step 4. Identify the test stimuli to be generated as those provided by the meta-rules S remaining at the end of Step 3.
December 1984
those derived in the diagnostic inference from the feature vectors X. The feature X represented within the frame F, is applied to the semantic discriminant functions Hi(. ) and F e I ~ = E i such that Hi(X)=maxI-Ij(X),
O<_j<_N.
Alternative decision rules, and examples of attribute features A(a) are given in (Pau(1981)). The alternative diagnoses to/~ are those failure modes whose attributes A(a) yield similar values of /-/j(X) close to H/(X). The attribute grammar G allows to combine multisensor data in the frame F processed by the expert system, coming from signals, images, text, and logical predicates (e.g. lengths, angles, texture features, amplitudes, boundary shapes, autoregressive features). By using here attribute features A (a) with continuous values, we avoid the need to quantify the attributes A (F) in Section 4. The semantic discriminants also assist in the sensor fusion, by allowing for statistical vs. syntactic tradeoffs in terms of combining the attribute values. The list structure of Section 3.1 gives a simple grammar G, which is of a regular type, for which grammatical inference should not be a major obstacle for a known list structure F.
5. Failure mode recognition
6. Applications and conclusion
At this stage, the diagnosis is viewed as a failure mode recognition task (Pau (1981)), applied to the attributed features (X, A(X)). This is the final selection stage, as the diagnostic inference expert system may well propose several alternate failure modes, including false alarms. The failure mode recognition is carried out by a
The above framework serves as a unification for a number of past or on-going projects in failure diagnosis and automatic imaging inspection. By unloading the domain dependent aspects into a few units, this approach has helped substantially in reducing the time required for the design of solutions to a number of practical problems. Also, the sensor fusion concept as implemented has lead so far to quite impressive diagnostic and speed performance improvements. As the knowledge base of metarules S will increase, more nondetection, false alarm, and false test generation cases will be eliminated. This same system, with changes only to the data base I, and the failure mode discriminant functions H i, has been applied to the following practical cases:
domain dependent combined syntactic-semantic approach (Fu(1983)) based upon an attribute grammar with semantic decision rules. The grammar is G = ( V N, Vr, P, S), where the production rules are of the type
N-,a,
N 6 VN, a e ( V r U VN)*,
with attribute vector A(a). These attributes are 424
Volume 2, Number 6
PATTERN RECOGNITION LETTERS
(1) Integrated testing of integrated circuits, by combining imaging data and electrical tests Y, with diagnostic metarules S and computer-aided layout data in I (Pau(1983)). (2) Diagnostics in distributed data communications networks: The measurement diversity comes here from differences in node equipment monitors, while the metarules S must help with diagnostic inference over complex virtual network topologies. (3) Design of a computer assisted maintenance training system for avionic equipment, where the student is carrying out the functions T and ya of Figure 4. As of December 1983, detailed status of further research directions was as follows: - Knowledge representation (Section 3.1.1) finalized and tested in the above examples. Attribute selection (Section 3.1.2), and Database (Section 3.2): Problem dependent, and tested using the BASIS database system in an example. - Diagnostic meta-rules (Section 3.3): Several hundred consistent meta-rules have been written in PROLOC, and this knowledge base is still incremented slowly in relation with sponsored research projects for industry.
December 1984
Diagnostic inference (Section 4): Extensions of the inference control structure are considered using a deductive retriever. - F a i l u r e mode recognition (Section 5): Implemented in dedicated examples, from manually constructed grammars G.
-
R e f e r e n c e s
Davis, R. (1982). Diagnosis based on description of structure and function. Proc. National Conf. on A I , Pittsburgh, PA, 137-142. Fu, K.S. (1983). A step towards unification of syntactic and statistical recognition. IEEE Trans. Pattern Anal. Machine Intell. 5 (2), 200. McDermott, D. and R. Brooks (1982). Arby: Diagnosis with shallow causal models. Proc. National Conference on A I , Pittsburgh, PA, 137-142. Pau, L.F. ( 1981). Failure Diagnosis and Performance Monitoring. Marcel Dekker, New York. Pau, L.F. (1982). Failure diagnosis systems. Proc. I M E K O W o r m Congress, Berlin, Acta IMEKO 1982, 37-50. Pau, L.F. (1983). Integrated testing, and algorithms for visual inspection of semiconductor IC's. IEEE Trans. Pattern A n a l Machine Intell. 5(6), November. Merry M. (1983). APEX-3: An expert system shell for fault diagnosis. G E C J. o f Research. 1 (1), 39.
425