Copyright © IF AC Distributed Intelligence Systems, Varna, Bulgaria, 1988
APPLICATION OF AI TECHNIQUES IN CAR ENVIRONMENTS P. Camurati, M. Mezzalama and P. Prinetto Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 1-10129 Turin, Italy
Abstract This paper presents the introduction of A rtificiallntelligence (AI) techniques into the field of Computer-Aided Testing (CAT) and Computer-Aided Repairing (CAR). Such techniques are useful since it is increasingly important to improve the diagnostic capabilities of Automatic Test Equipments (ATE) and AI seems a suitable response. The overall goal is to set up Test/Repair environments which guarantee not only high diagnostic capability, but also high productivity. The system presented in this paper is based on a loop Tester-> Repair Station-> Tester, In a conventional approach, data coming from a Repair Station are not used to improve the Tester's performances. In our solution, data are first validated by the Tester and successively processed in order to gather new knowledge about real cases. Such knowledge is learned by the Tester to improve its symptom interpretative capabilities. Two methodological aspects are relevant in this approach: the knowledge the Tester receives from the Repair Station and the knowledge the Tester itself needs to learn from experience. It is consequently necessary to organize knowledge in suitable knowledge bases and to use it with a proper reasoning mechanism. AI techniques are used to achieve both goals: from one hand they allow us to formalize the problem in a rigorous way, from the other hand they facilitate the generation of an open system, in which the introduction of new diagnostic rules is easy and where a real feedback from the industrial testing environment is possible. Thus, AI contributes to the improvement of diagnostic capabilities in a twofold way: it helps the ATEs in overcoming symptom ambiguity and in learning from experience how to modify the fault finding procedures.
1
Introduction
optimizing the production process [Bren82). In addition, faults are often repetitive and a skilled operator is able to isolate the fault on the base of his own experien ce . Generally speaking, diagnosis and repair problems are extremely difficult and time consuming, but they often are elTectively solved by highly skilled people, These problems seem to
The environment of testing, diagnosis, and repair plays a key role in the optimization of electronic board production. Within this environment the choice of the testing strategy strongly depends on factors such as production volumes, used devices, and available financial investments [Benn82]' [Bren82), [Sche85), [Sche86). Unfortunately, the advent of Very Large Scale Integration (VLSI) components and the introduction of new technologies for device mounting, e.g., Surface Mounted Technology (SMT) are making testing problems harder and harder. The same considerations apply for the diagnosis and repair phases. All these causes force the continuous growth of testing costs for devices, boards, and systems . In realistic models of typical environments test and repair processes are represented by queues (fig. I). Boards enter the test and repair center at a given flow and join a queue of boards waiting for test. If a board passes the test, it moves to the output queue and leaves the center. Otherwise it moves to a conveyer that transfers it to a repair station. After repair, the board moves back to the tester queue, where it joins the other boards waiting for test. It is worth noticing that a failing board will cycle between test and repair until either it passes the test or it is scrapped. Different methods exist for estimating center size, throughputs, cycle time, number of test and repair stations, and queue lengths. The majority of them are based on stochastic process theory and are aimed at
~
I
TtSTtR. OUTPUT Ql.£l.£
.~ R£PAlA QUEUE
II
L __~
ANAlYSts
l
REPA1R
Figure 1: A queue-based model
251
252
P. Camurati, M. Mezzalama and P. Prinetto
be an ideal application domain for Artificial Intelligence solutions [MuIl84]' [Robi84), [Wilk84) in particular under the form of Ezpert SY3tem3. Knowledge based systems implement today sophisticated reasoning schemes and provide empirical rules introduction facilities. Furthermore, they address the problem of allowing the user to modify the rules, updating the knowledge source. The domain of test generation and mainly the diagnosis and repair phases, seems to be an interesting area for the application of knowledge based systems. In this paper we present an attempt to apply conventional AI concepts to improve productivity in the test / repair loop. A first section of the paper presents the repair environment: how this task is performed and what contributions can come from graphics, networking, and AI. Then the system is presented, with particular emphasis on knowledge bases. The issue of self-learning with both functional and in-circuit testers and its implementation in real cases conclude the article.
2
The repair environment
During the last decade a significant evolution in repair environments occurred, which increased rework efficiency and effectiveness and reduced repair time. Essentially we moved from manual diagnostic/ repair functions to computer-based repair stations. Two peculiar facts determined this innovation: the use of graphics at the repair station and networking. Traditionally, in the manual approach, the tester's operator attaches a tag, containing the fault list , to the printed circuit boards (PCB) that do not pass the test. In case of in-circuit testers, the fault ticket identifies the shorted traces and / or the failed components. With functional te3ters using guided probe diagnostics , the failure message identifies a node (or a set of nodes) which will be assumed as a starting point to trace back. When a guided diagnostic is not applied, the message indicates one (or more) component( s), that has (have) been hypothesized to be failed by means of a diagnostic procedure which uses either a fault dictionary or similar information items. At the repair station the rework operator reads the tag and tries to identify the faulty component(s) and / or the short(s), with the aid of board documentation. When he / she has determined the fault location, he / she replaces the component(s) and / or eliminates the short(s). Failure messages are often ambiguous and it is the fundamental job of the operator to interpret them. The ability to identify and remove the faults on PCBs is essentially a function of "the rew/ution and accuracy of the me33age , the operator 's skill and ezperience with the specific peE" [Bate85). Rework efficiency is usually expressed as a percentage of rework effectiveness. As an example , a 90% rework effectiveness, means that given 100 PCBs identified as faulty, 10 of them will still result failed when re-tested after repair. Typical values of rework effectiveness are 90% for incircuit testers and 85% for functional ones. Strategies to improve rework effectiveness include CAR stations, with graphic capabilities and, more recently, the adoption of AI techniques. CAR stations consist of computer systems, normally connected to testers via point-to-point connections or Local Area Network3 (LAN), with limited mass storage and a colour graphic display monitor. This, among other functions, provides a simpler way to identify fault locations on PCBs. For instance, when the failure message refers to
a component, such component is outlined on the screen using special colours. In addition, at the bottom of the screen, a message usually appears to provide detailed information concerning the faulty component. The shortterm history (if any) of the board is also displayed. Even if experience has shown the effectiveness of the graphic support, the role of the human expert remained essential. We decided to move to AI, in order to investigate the possibility of transferring the knowledge from skilled operators to computers. It has in fact been noted that the rework effectiveness increases with the specific knowledge the operator has of a given board. After a sufficient amount of time, when repairing the same type of PCB, an operator is able to deduce and to learn the rule to interpret the message, avoiding ambiguities. The use of AI techniques is not limited to diagnosis: it could be effectively used in improving production planning and real time production monitoring, too. The latter possibility is of great importance in the loop repair environment, since it permits flexible production scheduling, together with the possibility of detecting any distortion, or error, in both the production line and in the test system. AI techniques also allow the ATEs to learn from experience, modifying diagnostic messages as new knowledge on line from PCB processing. The necessary condition to be able to provide this kind of facilities is to work in a networked environment. The system we have designed can be indifferently implemented by two distinct networking structures, both based on LANs. The first architecture is shown in fig. 2 and it implements a hierarchical structure. A minicomputer (host) works as data concentrator and job dispatcher. ATEs can be connected to the host via both a high speed data link and RS232 low speed connections. CAR stations use another high speed data link to communicate with the supervisor. Both the ATE LAN and the CAR LAN can work independently, thus guaranteeing a higher reliability to the system. The managemen t functions of the test an d repai r area are concentrated in the host, which provides automatic logging of test data and repair activity, real time monitoring of ATEs and CAR stations and, more generally, system
ATE LAN
CAR LAN
Figure 2: A Hierarchical LAN statistics. It also provides automatic downloading of test and repair programs. The second architecture (fig . 3) is based on a traditional LAN, whose nodes are both ATEs and CAR stations. One of the computers connected to the LAN can also operate as a file server and perform some monitoring functions.
253
AI Techniques in CAR Em'ironments
....,.. -
T....
r------,
GoooI ......
An:
Figure 3: A Flat LAN ........... ...... ,...._.......-
3
......._--, , ...ty ....,..
Knowledge bases
As previously seen, two main problems must be considered in the management of the test and repair environment. The former is mainly related to the need for continuously monitoring the overall process , in order to detect and correct systematic errors at their very early occurrence. The latter comes out from the necessity of improving the repair phase, by trying to overcome the intrinsic symptom ambiguity of ATEs. Both the above mentioned problems require "intelligent" decisions , based on a powerful handling of both the "experiences" acquired during the process life and on the knowledge of the characteristics of the overall environment.
These considerations naturally lead us to improve the classical test and repair process the addition of a knowledge based system (fig. 4). The goal of the system is to provide the repair station with the list of repair a.ct.ions to be taken. The system is rule- based and uses the standard Prolog interpreter to represent knowleclge and to perform inference. Form the point of view of knowledge representation, the traditional distinction between long-term and short-term knowledge is partially blurred by the presence of 3 kinds of rules and facts: those related to a specific board, those related to a class of boards , and eventually those related to the overall diagnosis process. Rules and facts for classes of boards play the role of long-term knowledge with respect to single boards under test and of short -term modifiable knowledge with respect tu self-learning and the ov<'rall di agnosis process.
For each PCB under test, the knowledge-based ceives from the ATE
s~'stem
re-
• an item identifying the board under test • the symptom list P • the list of previous repair actions, if any. The system has sets of rules specific of each class of peBs and general-purpose. These rules are use to interpret the facts, producing proper diagnostic messages and suggesting repair actions. These rules are used to build the mapping function F which yields a. set of possibly faulty com ponents C starting from the facts provided by the ATE, in particular the symptom list P. Repair actions for boards of the same class are used by the system to detect system-
Figure 4: The System atic errors in the overall process a.nd to avoid endless loops of a given board in the test / repair en\'ironment. In prac tice, such a control is performed by the knowledge based system through a set of rules provided by t he system man ager. These are stored in the "process conlrol rules" file. Rules to control the test and repair loop of each board and to detect systematic errors both in ATEs and in produc tion lines are provided. An example of rul .. is:
if
this is the 3rd time that this
board passes through the repair station ) then scratch it.
3.1
Learning: pabilities
improving diagnostic ca-
After repa.iring the same kind of board , a skilled operator gains a certain expertise and hecomes able to interpret. ambiguous messages delivered by the ATE. The Expert System could emulate such beha\'iour learning from experience how to produce correct diagnostic messages. During operation the tester produces for .. ach board a set P of symptoms corresponding to the diagnostic c()des of those t.est patterns which provide wrong results. These sympt.oms are interpreted by the tester itself, in order to generate t.he set C of "faulty componen/.,". This second step corresponds to the application of a mapping function F(.), such that:
C = F(P) The function F(.) strongly depends on the kind of ATE and the expert system can modify it on the basis of the acquired experience , i.e. , on the basis of the results of the repair actions. To do this, the folluwing entities must he known for each board:
P. Camurati, M. Mezzalama and P. Prinetto
254 • the set P, • the set C,
• the set R, containing the result of the repair action on a single board. the knowledge of the set P is essential, since the function F( .) is not one-to-one, in the sense that the same set C of faulty components may be obtained starting from different sets P of wrong test patterns. Thus, the expert system (fig. 5) performs two tasks: • for each board it accepts in input C and P and provides a new set C* of faulty components ("filtering
taJk")i • it updates by means of ad hoc rules the function F(.), on the basis of the acquired experience. In the real world, the general scheme can be detailed as in in which the internal organization of the ATE from the point of view of the symptom interpretation has been outlined. Within the architecture of the global system, the implementation strongly depends on the kind of ATE, being related to the complexity of F(.) and of the generation and updating module. In the real world implementation of the system, two different soluti ons have been adopted , in dependence of the actual ATE type and, more precisely, on whether the evaluation rules of the ATE are accessible and modifiable ("dynamic teJter") or not ("fixed teJter").
c
p
R L-____~'.~.~------~
be replaced, and the ATE provides a list of symptoms, the mapping function F assumes the form of a "Symptom Probability Matri,," (5 P M), characterized by as many columns as the board devices and as many rows as the possible symptoms. 5 P M j • is the probability that, having received the symptom 5 j , the device D. be faulty. If we assume that the i-th test module Mi tests exhaustively the i-th device, the SP M matrix becomes a diagonal matrix. Seen from another point of view, this means that each test module is assumed to have no intersection with the other ones. Unfortunately, this is not the real case, since many factors cooperate to invalidate such assumption. Among the others, we simply remind the possibility that a given physical fault, such as a stuck-at on a wire of a bus, can induce malfunctioning in all the devices connected to the bus itself. These faults can cause symptoms to be generated for each of them. Obviously, their replacement with new devices will not solve the problem, since the fault is elsewhere. Another important factor is the un reliability of the test module M i : written to test the device D i , it actually tests it only partially, and, in addition, it may test another device Dj. From a practical point of view, these considerations lead to the conclusion that the matrix SPM is neither a diagonal one, nor it contains probabilities all equal to one. In fact , a symptom Si can identify a fault in the device Di with a given probability P(Si,Di), a fault in the device Dj with a given probability P(Si , Dj) and so on, in such a way that: AI
LP(Si,Dk) = l k=p
This modification of the S P!vI must be carried out on the basis of the acquired <"xperience. In fact , on the basis of the results obtained by the repair actions performed when, In the past, the same set of symptoms had been provided by the ATE for boards of the same class of the current one, it must be possible to select the new repair actions that have the highest probability of effectiveness. It is then possible, as a consequence, to modify the contents of the SPM matrix. In the case of fixed ATEs, such as in-circuit ones, since it is not possible to introduce the S P M matrix within the tester, it is necessary to "filter" the symptom list provided by the ATE itself into the proper diagnostic message sent to the repair station. The system operates on the basis of both the contents of the S P M matrix and of a set of filtering rules. These rules include the strategy to properly filter the global symptom set provided by the ATE. The knowledge based system takes care of two main tasks: the former is related with the dynamic modification and updating of SPM matrix , on the basis of acquired experience. The latter is concerned with offering the user the possibility of updating the set of rules contained in the filtering rule file. A consistency check is performed any time the rules are modified.
3.3 Figure 5: A Real-World Model
3.2
Implementation within an in-circuit tester environment
Since the repair operator must be provided with a list of diagnostic messages, i.e., a list of faulty components to
Implementation within a functional tester environment
In the case of functional testers, the function F(.) is more complex. In fact, the generation of the set C of faulty components is performed starting (rom the symptom dictionary generated through simulation , on the basis of the results P of wrong test patterns, and resorting to pattern matching functions, based on maximum similarity criteria. As a consequence, it is much better to use the knowledge
AI Techniques in CAR Environments based system to directly modify the symptom evaluation rules used by the symptom evaluator of the ATE. In fact, in this case, the organization of the resulting system is shown in fig. 6. In this situation ("intelligent ATE') the filter does not work in real time on the symptoms generated by the ATE. Instead, it stores the facts at the end of each repair process, interprets repair results and, off line, it modifies the rules to produce diagnostic messall;es.
255
• Consequent lowering of the mean repair time. • Introduction of a real time feedback, 1Ised to improve the quality of both the global production process and specific test programs. The system has been implemented resorting to different programming languages with Prolog used for knowledge representation and inference. Future trends include the introduction of guided fault isolation. Similar systems are the MIND system, developed by Teradyne and devoted to system testing [Wilk84] and the TDMS (Test Data Management System) developed by CenRad IWiIl83].
References
- '--
[Bate85]
Bateson, J 'In-circuit testing,' Van Norstrand Reinhold, 1985.
IBenn82]
Bennetts, R C 'Introduction to digital board testing,' Crame Russak, 1982.
IBren82]
Brendan, D 'The economics of Automatic Testing,' McCraw Hill, 1982.
IMu1l84]
Mullis, R 'An Expert System for VLSI Tester Diagnostics,' IEEE International Test Conference 1984, pp.196-199.
IRobi84]
Robinson, G D 'Artificial Intelligence and Testing,' IEEE International Test Conference 1984, pp.31-40.
ISche85]
Scheiber, S F 'Economics of testing: deciding whether to test and how much it will cost,' Test and Measurement World, Februa.ry 1985, pp.ll0-130.
Figure 6: A Functional Tester
[Sche86]
The process of symptoms generation may thus be considered as a pattern matching process, aimed a.t finding the best match between the symptom list of a given board and the patterns stored in memory and created using the fa.ult dictionary.
Scheiber, S F 'Cost effective PCB testing,' Test and Measurement World, February 1986, pp.31-40.
IWilk84]
Wilkinson, A J 'A method for test system di agnostics based on the principles of artificial intelligence,' IEEE International Test Confer· ence 1984, pp.188-195.
[Wi1l83]
WiIliams, R W 'Considerations While Intro· ducing a Test Data Management System te the Factory Floor,' IEEE International Tesl Conference 1983 , pp.220-225.
4
Conclusions
The methodologies presented in this paper have been applied in the management of the error messages generated from both the Wiring Pattern Verifier (WPS) for bare printed circuits and in-circuit teJterJ, for complete boards, equipments. In order to improve the productivity of the overall loop Test ..... Repair ..... Test, "intelligent" computerbased repair stations have been implemented. The adopted solution has proved to be valid, since valuable benefits have been gained in many areas, and specifically: • Improvement of the diagnostic capabilities of the testers, on the basis of the global experience progressively acquired through all the tt'chnicians engaged wi th repai ri ng. • Possibility of transferring the acquired experience to the technicians just started in repair activities, without requiring long training periods. • Lowering of the mean number of recycles caused by failures in the repair actions. D.tS .M .-I -