Co p yri g ht © IFAC Eco nom ics a nd .-\nificial Inte lligence. Aix-en - Pro \-encc. France. I ~ I HI)
USING AUTOMATED TECHNIQUES TO GENERATE AN EXPERT SYSTEM FOR R&D PROJECT MONITORING S. I. Gallant* and R. Balachandra** *Collegl' of CO lI/jJII/n "* Coll!'g!' or
Scil'll(l' ..\'OI'/Ii Nls /l'Il1 Cll il'l'I'si/\'. Bos/olI . .\l;\ 8 I1Silll'.I'\ .-Irill/illis/m/ioll ..\'Ol'till'((s/em
Bos/olI . .\1.-1
021 15. L·S.-I
['11 il 'ITlit \' .
02115. L'S.-I
ABSTRACT: Most expert systems require the actual involvement of an expert for cons tructing the system, in spite of the fact that a large amount of data may actually exist. The MA CIE process, by contrast, is a completely automated method for constructing an expert sys tem. An area where considerable data exists which is an appropriate domain for an expert system is that of monitoring the progress of commercial R8D projects. The s e projects are scrutinized very carefully for their technical and commercial potential before large amounts of re so urces are committed for their developmental phase. In spite of such scrutiny many projects are abandoned before they are successfully completed, as they appear to be heading towards failure . If s uch failing projects are identified early, scarce resources could be diverted to other more viable projects. This paper describes an application of the MACIE process to derive an expert s yste m to help in monitoring on-going R8D projects. The knowledge base for the expert system was developed entirely from empirical data collected from over one hundred actual R8D projects in the developmental stage.
KEYWORDS: Expert System , Decision Support Systems, Machine Learning, Linear Discriminant, MACIE , Monitoring R&D Projects Acknowledgment: Thanks to Dave Glaubman and Mark Frydenb erg for useful comments . ' Partially supported by a grant from the Northeastern University Research and Scholarship Development Fund
I. Introduction
There are a number of software environments available for constructing expert systems. Practically all such systems depend up on an expert to specify knowl edge in the form of IF -THEN rules, a project which can be very hard , expensive and time consuming. The most difficult tas k in creating an expert system is constructing such a knowledge base.
is ..
-42213
R,s~,or,d'!'nl
MJtrixcof lroto!'go;rs
Viln.t.k> N..mtS
(l • .arr.ir'Qr"'la\rix)
.IIr,dQv.stions
Knowledge Base if)itialirlfo,
An alternative approach is to generate the knowledge base directly from data. The MACIE (MAtrix Controlled Inference Engine) process is ideally suited for such an approach. The MACIE process starts with the definition of the important factors and the dependency relationships for the problem at hand. Then a sufficient number of cases are gathered which will serve as training examples for a program that generates the expert system knowledge base. The cases are then processed by machine learning techniques to yield a knowledge base consisting of two parts-a matrix of integers (called the Learning Matrix)' and a collection of variable names and associated questions for eliciting values for the variables. See Figure 1.
End
User
6n~'w'en
to
Ques ti oflS
1 Matri x Controlled
,
/
Inf er ence Engine (MACIEl
Question,. c(lnclu~ion',
eXple l"l8tion,
Figur e 1: MACIE Styl e Expert Sys t em training examples, whereas systems with IF-THEN rules must be hand crafted by human experts. Automatic generation of the knowledge base saves time, expense, and human resources , and can be applied where there is data but no available human expert.
This knowledge base structure is different than the IFTHEN rules commonly employed in standard expert systems. It is designed to be automatically generated from 61
62
S. I. Callant and R. BalackllHlra
Once the knowledge base has been created, it may be used by a general purpose expert system inference engine called MACIE. The result is a true expert system which does inferencing (forward chaining), seeks out important unknown information (backward chaining), and gives IFTHEN rules to justify its conclusions even though there are no IF-THEN rules in the knowledge base.
typical sequence in a large firm consists of the following stages:
The following example demonstrates the basics of MACIE's operation.
These stages are described briefly in the following papagraphs.
Suppose the row in the learning matrix for variable V represents the discriminant :
An R&D project usually starts as an idea with one person. If the idea seems to have potential for commercial exploitation, or provide some internal economies in the manufacture of other items, it generates some interest among the managers. At this stage a small project team may be established to explore the idea further. In the next stage, economic information about the new product- data on its market potential, its price elasticity, its manufacturability, and an estimate of the production costs-will be collected. In accordance with the anticipated market size, estimates will also be made about the capacity required and the investment needed for the plant. A business plan will be prepared using these pieces of information, highlighting the revenues, costs and profits from the new product.
(-1) + (-2)VI + (4)V2 + (-3)V3 + (-2)V4 + (l)Vs where V is dependent upon variables VI . .. Vs . Suppose also that V2 is known to be True, and Vs is known to be False. All known variables are represented internally by +1 for True and -1 for False, so the current value for the discriminant is -1
+ (4)(1) + (1)( -
1) = 2
After VI, V3 , and V4 are determined, the ultimate value for the discriminant may be greater than 0 or less than o so we can not yet infer the value for V . Since V3 has the largest coefficient among unknown variables, we seek its value (backward chaining). If V3 is an input variable, we can simply ask the user for its value. Otherwise we look at the discriminant for V3 and continue the process recursively. Suppose for simplicity that V3 was an input variable and that the user tells us its value is False. This brings the current value of the discriminant to 2 + 3 = 5. We can now conclude that variable V is True, since the discriminant will be greater than 0 regardless of the values of the remaining unknown variables . This is the way forward chaining works in MACIE.
If the user asks for an explanation, we can justify our conclusion with the rule: "If V2 is True and V3 is False Then Conclude V is True." Further details of how MACIE works are given in [Gallant 1985a, 1985bJ. This paper describes the application of the MACIE process to derive an expert system for monitoring an ongoing commercial R&D project for its potential for success. The expert system itself is derived from empirical data collected froID a large n·.lmber of actual R&D projects, using an automated technique for generating the coefficients for the expert system.
ll. Commercial R&D Projects It is necessary, first of all, to develop an understanding of how an R&D project moves through its different stagesfrom its conception until successful commercialization. A
i) ii) Hi) iv) v)
The The The The Full
Idea Generation Stage Demonstration Stage Development Stage Test Market Stage Scale Market Introduction.
Additional information regarding potential competition, anticipated economic environment, and possible adverse government actions will also be reviewed. If the project meets some minimum requirements it will be approved for development . The project then moves to the third stage, where a number of technical and engineering problems have to be solved and many details worked out. During this stage there may be changes in the organization and the environment affecting the success of the product . Some changes may make the project unattractive. It is likely that a project will be terminated at this stage even though much resources have already been spent . In the test market stage the product is introduced into a small region. If it proves successful here, the decision is made to introduce the product in the national market. The promotion and advertising plans are develop ed for this venture. The required production capacity and distribution channels are es tablished . There are, however, many individual variations in these stages . Small companies may not go through these stages systemat ically. In some cases, even large companies may skip a few stages if a real ' winner ' of an idea comes around or a top executive with much pres tige is strongly behind the project.
Ill. Evaluation of R&D Projects There are a number of points at which a project has to qualify before moving to the next stage. The intensity and difficulty of evaluation increases as the project moves through the different stages. In this paper we will be mainly concerned with the monitoring and evaluation during the development stage (stage iii above) of the project. The development stage of the project is one of the most difficult stages to manage. The project starts with a great deal of optimism and uncertainty. As it progresses, the uncertainty may become larger, and the optimism about
L" sill g .-\ul o l11al c d T ec hniqu es
successfully completing the project may decrease. At some stage, a decision may have to be made to terminate the project.
In a pioneering study Dean [1968) examined 40 companies and described the procedures and practices used to terminate R&D projects in those firms. Other studies, Rubenstein [1976), Project Sappho [1971), Buell [1967), Holzman [1972), Murphy and others [1974), and Cooper [1980), identified a large number of factors influencing success at the project level. Balachandra and Raelin [1980) suggested that the termination decision of an R&D project in the developmental phase be based upon an intuitive discriminant function. This idea was explored in a study of over 100 R&D projects. In this paper we describe an expert system for predicting the outcome of an R&D project using the MACIE process.
IV. The RDCHECK Expert System The MACIE process was used to develop the RDCHECK expert system. We first identified 45 variables which were considered to be important for the decision to terminate or continue an R&D project. The data on these variables were obtained from actual projects from a previous research [Balachandra 1986). At this stage, we chose a simple dependency model, where the final outcome of the project was influenced by all the 45 variables. We also converted the values for the variables into a trinary form: true, false, or unknown. This implied that questions had to be developed which would result in such trinary responses for all the variables. It should be noted, however, that this is not a restriction for the application of the process. The data in the modified form were than processed by a learning program. The learning program produced a matrix of linear discriminants. Refer to Gallant [1985ac) for a more detailed description of the process involved . This learning matrix was then aggregated with the names of the variables and the questions to produce the knowledge base. The knowledge base was combined with the general purpose inference engine (MACIE) to form the RDCHECK expert system. In the RDCHECK system, the user is first prompted for any initial values for any of the variables. If the information supplied is sufficient to deduce whether the project will succeed or fail, the program terminates and informs the user about the likely outcome. If, however, the information supplied at the beginning is not sufficient to form a conclusion, the program finds the unknown variable whose value is most important for reaching a conclusion.
63
An example of the application of the RDCHECK system is shown in the Appendix. It can be noticed that, in general, not all variables need be known to reach a conclusion. For the set of data used for developing the RDCHECK system, the system predicted accurately 89% of the cases. The accuracy could be increased by generating intermediate variables, but this might be counter productive by giving undue influence to noisy data poi11ts. Noise is likely to be present as many of the responses were subjectively generated .
V. Comparison of the MACIE Process With Other Automated Tools There are a number of commercially available tools for generating expert systems from examples. Most of them such as Expert - Ease tm and Ruiemaster tm generate decision trees using algorithms similar to Quinlan's ID3 Algorithm [Quinlan 1983). A different approach is taken by T I M Mtm, as described later on in this section. Systems based upon decision trees have a number of disadvantages. They ignore initial information, causing inconvenience to the user . For example with R&D monit oring, one might have initial information on 15 of the 45 variables while information for the others might not be so readily available. The known information could be enough to reach a decisio.l, yet it would be of no use to a decision tree unless the variables at the top of the tree were among these 15 variables. MACIE style systems, on the other hand, take full advantage of initial information. More importantly, decision tree based systems tend to be more brittle than discriminant based systems and, viewed as methods for knowledge representation, they are not as powerful [Gallant 1985d). Details of these arguments are beyond the scope of this paper. The main advantage of a decision tree based system is that its working is immediately apparent, whereas the working of a MACIE style system can be harder to understand by inspection. An attempt to generate an expert system for R&D project monitoring using Expert - Ease tm was unsuccessful as it could not handle the number of variables required for the system.
The appropriate question for this variable is then displayed and the system tries to reach a conclusion after the user responds. If the system still can not reach a conclusion, the process cycles with other variables until a conclusion can be made.
A different approach to expert system generation is taken by T I M Mtm. This system retains training examples and does run time comparisons to find the training example closest to a given situation. The disadvantage of such a system is the potential for huge databases and slow operation if there are a large number of training examples. By contrast the run time speed and storage requirement of MACIE does not depend upon the number of training examples. It should be noted that T I M M tm has some generalization capability based upon combining similar examples. (We have not attempted to implement the R&D monitoring system in TIMMtm .) A disadvantage of MACIE with respect to TIMM tm is
At any point during this process, the user can ask why a particular question was asked or why a particular inference was made. In the former case, the system will explain why, and in the latter case, the system provides its justification in the form of an IF-THEN rule.
that MACIE generally takes a much longer time to generate a knowledge base for a large number of training examples. However it should be noted that knowledge base generation is performed once, while the run time evaluation has to be done as many times as there are new cases .
S. I. C; allallt a nd R. Ba lac h a ndra
64
Thus MACIE is more efficient for actual operation of an expert system.
VI. Conclusions and Future Research The RDCHECK expert system for R&D project monitoring, though still under evaluation, appears to be a useful system for an important problem. Such an expert system is of interest in its own right . Of even wider interest is the ability to generate expert systems automatically from data. The MACIE process for expert system generation is applicable to domains such as forecasting, administrative control, decision support systems, and especially diagnostic systems. We are currently evaluating a MACIE style system for diagnosing infantile diarrhea, as well as one for making opening bids at bridge. Data driven problems that are noisy and redundant (i.e. real-world problems) seem most suitable for these techniques. Future research with RDCHECK will involve incorporating intermediate states into the model and further tests of its usefulness in an industrial setting. Research with the MACIE process currently centers around extending and improving the machine learning algorithms . Finally an important caveat is that the projects identified as potential failures using RDCHECK or any other scheme should be scrutinized in great detail before making the decision to terminate.
REFERENCES Balachandra, R . (1986). Signals for R&D Project Success. To appear in Handbook for Technology Management. Balachandra, R . & Raelin, A. (1980) . How to Decide When to Abandon a Project. Research Management , 1980, 23, 24-29. Buell, C. D. (1967). When to Terminate a Research and Development Project. Research Management, 1967, 10, 275-281 Cooper, R. G. (1980) . How to Identify Potential New Product Winners. Research Management, 1980, 23, 10-19. Dean, B. V. (1968). Evaluating, Selecting, and Controlling R&D. (1968) American Management Association, New York, NY. Duda, R. O. & Hart, P . E . (1973). Pattern Classification and Scene Analysis . (1973) John Wiley & Sons, New York. Gallant, S. 1. (1985a) . Automatic Generation of Expert Systems From Examples. Proceedings of Second International Conference on Artificial Intelligence Applications, sponsored by IEEE Computer Society, Miami Beach, Florida, Dec . 11-13, 1985. Gallant, S. 1. (1985b) . Matrix Controlled Expert System Producible from Examples. Patent Pending 707,458 . Gallant, S. I. (1985c). Optimal Linear Discriminants. Technical Report SG-85-30, Northeastern University College of Computer Science. (To appear: Eighth Int. Conf. on Pattern Recognition, Paris, France.)
Gallant, S. 1. (1985d). Brittleness . Technical Report SG86-33, Northeastern University College of Computer Science. Holzmann, R. T. (1972). To Stop or Not-The Big Research Decision. Chemical Technology, 1972, 2, 81-89. Murphy, D. B., Baker, B. N., Fisher, D. (1974) . Determinants of Success . (1974) Boston College, Chestnut Hill, Ma. Rubenstein, A . H., Chakrabarti, A. K., O'Keefe, R. D., Souder, W . E., & Young, H. D. (1976) . Factors Influencing Success at the Project Level. Research Management , 1976, 14, 15-20. iSappho 1971 ). Center for the Study of Industrial Innovation . On the Shelf. London, 1971. Quinlan, J. R . (1983). Learning Efficient Classification Procedures and their Application to Chess End Games in Michalski , R. S., Carbonell, J . G., & Mitchell, T . M . (Eds.) Machine Learning, (1983) Tioga Pub. Co., Palo Alto, Ca.
L:sin g Au to ma led Tec hn iqu es
APPENDIX: Sample run of RDCHECK Expert System
MACIE Version 2 . 0 (c) 1986
s.
I. Gallant
Enter initial values for T, I, or G variables. Format: Variable number, value, Uninitiali%ed variables are set to 0 (UNKNOWN) e.g. lT 3F Numbers and names of variables: 1: 2: 3: 4: 6: 6: 7: 8: 9: 10: 11 : 12 : 13 : 14 : 16 : 16 :
Respondent is IUrD Manager Respondent is Planning Manager Respondent is Engineering Manager Respondent is Marketing Manager Respondent is Advanced Technology Manager Respondent is Assistant to the President Respondent is the Vice President Project ahead of schedule Project staff increased Project Budget increased Product in Infancy Stage of life cycle Product in Growth Stage of life cycle Product in Maturity Stage of life cycle Product in Obsolescence Stage of life cycle Appearance of new products Increased profi tabili ty of the product
17: Increased Return on Investment (ROI) of the product 18: Increased Probability of Commercial Success 19 : Probability of Technical Success greater than 0.8 20: Availability of Capital 21 : Increase in number of End Uses for the product 22: Meeting milestones on time 23: Favorable Disposition of the interest groups 24: Newly enacted favorable Government regulations 26: Newly enacted favorable International regulation8 26 : Probability of a competing product appearing greater than 0 . 8 27: Occurrence of a favorable chance event 28: Pre8ence of internal competition for re80urce8 29: Favorab1e internal competition 30 : Alliance with other ongoing project. 31: Alliance with Corporate activities and objective8 32 : Association between the commercial and technological aspect8 33: Top Management Support 34: IUrD Management Support 36: Commitment of pro j ect staff 36 : Pressure on project leader
65
37: Availability of expertise in the technology 38 : Presence of a pro j ect champion 39: Appearance of project champion in the beginning 40: Appearance of project champion in the middle 41: Appearance of project champion towards the end 42: Presence of project champion throughout the project 43 : IUrD mngt . perception of project mngt . commitment high 44 : IUrD mngt . perception of project mngt . adaptability high 46 : IUrD mngt. perception of project mngt. influence high 46 : RIID mngt . perception of project mngt. effecti venes8 high 47 : Project will succeed
C
12t .. Ot )
36. Is there a good deal of pres8ure on the pro j ect leader? -- > y)e8, n)o, u)nknown, ?)exp1ain , i)nformation
@ 21 . Have the number of planned end use8 increa8ed since the last review? - -> y)e8, n)o, u)nknown, ?)explain, i)nformation
o
18. Has the probability of Commercial succe .. increased since the la8t review? -- > y)es, n)o, u)nknown, ?)explain, i)nformation
@ 33.
-- >
Is there Top Management Support? y)es, n)o, u)nknown, ?)exp1ain, i)nformation
9 . Is the number of project staff larger than at the beginning of the pro j ect or at the time of last review? -- > y)es, n)o, u)nknown, ?)explain, i)nformation
CD 34.
Is there IUrD Management Support?
-- > y)es, n)o, u)nknown, ?)exp1ain, i)nformation
(i) 32. Is there a significant aS80ciation between the commercial and technological aspects of the project? -- > y)es, n)o, u)nknown, ?)exp1ain, i)nformation
43. Is the perception of project management'. commi tment by IUrD management good? -- > y)es, n)o, u)nknown, ?)exp1ain, i)nformation
66
S. \. Ga llant an d R. Balacha ndra
20 . Is the probability of capital being made available for the pro j ect at least O. 8? --> y)es, n)o, u)nknown , ?)explain, i)nformation
o
26. Have any helpful international government regulations been enacted? -- > y)es, n)o, u)nknown, ?)explain, i)nformation
G CONCLUDE: (47) Project will succeed TRUE.
Would you like to try a new case? -- > y)es, n)o, ?)explain, i)nformation on vars .
o 6
't' for explanation of variable number t (E . g . '3' )
Project will succeed is TRUE ing rule :
due to the follow-
IF 36 . Pressure on project leader is TRUE AND 1. Respondent is RIlD Manager i s TRUE AND 21. Increase in number of End Uses for the product is FALSE AND 12 . Product in Growth Stage of life cycle is TRUE AND 18. Increased Probability of Commercial Success is TRUE AND 41 . Appearance of project champion towards the end is FALSE AND 11. Product in Infancy Stage of life cycle is FALSE AND 33 . Top Management Support is TRUE AND 32 . Association between the commercial and technological aspects is TRUE AND 43 . IUrD mngt . perception of project mngt. commi tment high is TRUE AND 38 . Presence of a project champion is TRUE AND 26. Newly enacted favorable International regulations is UNKNOWN AND 40 . Appearance of project champion in the middle is TRUE AND 7 . Respondent is the Vice President is FALSE AND 39 . Appearance of project champion in the beginning is FALSE AND 4 . Respondent is Marketing Manager is FALSE THEN CONCLUDE Project will succeed is TRUE