Artificial intelligence in chemistry

Analytica Chimica Acta, 210 (1988) 9-32 Elsevier Science Publishers B.V., Amsterdam - ARTIFICIAL INTELLIGENCE Printed in The Netherlands IN CHEMIS...

Download PDF

2MB Sizes 1 Downloads 253 Views

Report

PDF Reader
Full Text

Analytica Chimica Acta, 210 (1988) 9-32 Elsevier Science Publishers B.V., Amsterdam -

ARTIFICIAL

INTELLIGENCE

Printed in The Netherlands

IN CHEMISTRY

N.A.B. GRAY Department of Computing Scrence, University of Wollongong, P.O. Box 1144, Wollongong, NSW 2500 (Australia) (Received 20th June 1987)

SUMMARY The techniques and tools of artificial intelligence are reviewed. Chemical problems can serve as test domains for the development of new artificial intelligence methods. Artificial intelligence techniques can also be applied to the solution of practical problems in chemistry. Applications in chemistry include problems where it is necessary to encode chemical expertise, and where chemistry is merely an additional domain to which standard artificial intelligence techniques can be applied. Programs exploiting encoded chemical expertise can assist in solving problems of structure elucidation, synthesis planning, and experiment design. Robotics methods based on artificial intelligence and expert systems can enhance the performance of chemical instrumentation. Systems understanding natural language could improve the handling of chemical information, and artificial intelligence techniques may extend the capabilities of computer-assisted instruction.

Winograd [ 1 ] has suggested two quite different starting points to define artificial intelligence (AI), the dream and the technology. The dream is represented by the goal of duplicating human intelligence; the technology consists of the techniques, such as heuristic search and use of propositional representations, that distinguish the field from others in computer science. The study of general principles of intelligence (the stuff of Winograd’s dream) is the hardest part of AI and has the least short-term pay off. There are many prerequisites that must be satisfied before significant advances can be made; computers (the “artificial intelligences”) must be better able to perceive and act in the real world, must be capable of learning from what has been perceived and the results of their actions, and must possess some internal representation of their world. Currently, research into intelligence per se is limited. Advances in a number of sub-disciplines of AI (Fig. 1) are helping to establish the groundwork for an eventual study of intelligence. Newell [ 21 argues that work in robotics has a long-term import for AI that far exceeds current work on expert systems. Work on knowledge representations, learning systems (and intelligent tutoring systems), and cognitive models all helps to establish principles for the development of intelligent mechanisms.

0003-2670/88/$03.50

0 1988 Elsevier Science Publishers B.V.

Fig. 1. The diverse discipline of artificial intelligence.

Current activity, and financial support, is concentrated on the technical side of AI. Early development in AI led to problem-solving techniques such as heuristically guided state-space search, and automated theorem-proving in the predicate calculus (with a form of search procedure providing the computational basis of the theorem-prover). Search techniques are expensive. Heuristics can be used to limit the scope of the search, and to guide the search process; however, heuristics are often of limited efficacy and, sometimes, may exclude valid solutions from the scope of the search. The classic techniques of AI are computationally weak; unlike algorithmic processes, they cannot guarantee to find solutions in a finite time. However, in the last decade, there has been an upsurge of interest in these techniques. This increased interest is a consequence of the revolutionary change in hardware size and costs experienced in computing generally, rather than of any improvement to the techniques. It is now economically feasible to apply heuristic search and related techniques to an enormous range of practical applications. Successful application of the weak techniques of AI requires that the problem domain be well formalized and limited, i.e., a closed “microworld” that can be analyzed in isolation. Thus, systems for understanding natural language work when restricted to microworlds, e.g., the “blocks” world of the SHRDLU program [ 3 1, or the “world” of stereotyped stories analyzed by the Script Applier Mechanism [ 41. Programs that plan sequences of actions have been most successful in highly restricted situations, such as planning for a single robot in a fixed environment (as with the STRIPS and ARSTRIPS programs) [ 51. Expert systems, the most obvious example of the burgeoning interest in AI technology, again succeed best when applied to well formalized and restricted problems taken from isolated microworlds [ 61. MYCIN [ 71 and XCON (Rl ) [8] are the prototypical expert systems. MYCIN-like systems deal with diag-

11

nostic/classification problems where the system starts with some fixed number of predefined goal states (classifications) and searches for evidence that could identify the classification appropriate for a particular case. The XCONlike systems take data characterizing a particular case and search forward, through chains of rules, to establish inferences, or use the data as constraints when applying rules that perform actions. A great deal of the knowledge currently encoded in maintenance manuals, decision trees and flow charts can be re-expressed through simplistic expert systems. This re-expression may be advantageous if a computer is already a component part of the system being maintained and if necessary data can be acquired automatically. Another aspect of AI technology is “exploratory programming”. In most areas of science, initial ideas are formalized, tested, developed into theories and only then are programs written to solve computational tasks in the domain of the new theories; such programs are fully specified in advance and developed by using conventional software engineering approaches. In fields related to AI, the initial idea may take the form of a program and its testing and formalization achieved through cycles of program modification and development. Researchers in AI have developed special programming environments designed to support this style of exploratory programming. These AI programming environments, commonly implemented as very high-level languages embedded in Lisp, offer elaborate interactive front-ends to interpretive-language systems and debuggers. Once again, the change in hardware costs makes it now economically feasible to apply these development environments for many more applications [ 91. These development environments are now commercially available, as are numerous, more restricted, expert system building tools and shells. Chemistry (primarily, organic structural chemistry) and AI can relate in two ways. First, chemical data can be used as the basis for a “microworld” AI study; for example, chemical data can be processed by an experimental “learning” system. Several such studies have been reported. Thus, the Bacon learning program [lo] has “rediscovered” relationships like the ideal gas laws; the Bacon program takes data relating, say, pressure, volume and temperature for a gas and performs a search for some relationship amongst these variables. Carbon-13 n.m.r. data [ 111 and mass spectral data [ 121 have served as a test domain for the “Version Space” model of concept formation [ 131, again, the learning program performs a search for “concepts” (conjuncts of a set of attribute/value pairs) that are consistent with given data. The use of AI techniques for the solution of practical chemical problems represents the second and, for the chemist, the more interesting link between AI and chemistry. In some cases, it is the chemical aspects of a problem that are at issue. Thus, heuristic state-space search techniques have been utilized in synthesis planning programs such as SYNCHEM [ 141. The “Plan-Generate-Test” paradigm of the structure elucidation programs such as CASE [ 151,

12

CHEMICS [ 161, DENDRAL [ 171, SEAC [ 181, and STREC [ 191, represents another variation of standard AI search techniques as applied to a particular chemical problem. The prediction of spectral properties of candidate structures, created by these structure elucidation programs, is achieved by using specialized “situation-taction” rule interpreters analogous to forward-chaining expert systems. The chemical expertise of programs such as SECS [ 201 is again encoded in the form of “situation+ action” rules. Most of these chemical applications necessarily involve the representation, analysis and manipulation of stereochemically defined structures; these specialized processing requirements limit the applicability of commercial expert system programs. Chemistry can also serve as an application domain for more general AIbased techniques. Thus, rule-based systems can be devised for diagnosing faults in chemical analytical instrumentation, and for suggesting optimal run conditions in cases where there are many interacting control parameters; commercial expert systems shells can be effective for these problems. Standard robotics techniques can be applied in the (chemical) laboratory to extend the range of automated analytical procedures. Natural language processing can be used in programs that attempt to enhance the performance of (chemical) information retrieval systems. Intelligent tutoring systems could be utilized in chemical eduction. TECHNIQUES OF ARTIFICIAL INTELLIGENCE

Search State-space search procedures are applied in many AI problem-solving and planning programs [ 211. A problem is defined in terms of a given state (situation in the AI microworld), a goal state or goal criterion, and a set of operators that define how transitions can be made between states. Operators are described by a set of preconditions (that determine whether they apply in a given state) and by the changes to state-variables that they effect. Classic AI examples of state-space search are problems such as the “Missionaries and Cannibals”. As illustrated in Fig. 2, organic chemical synthesis constitutes a much more interesting example of the use of state-space search techniques. In the synthesis problem, the states correspond to molecular structures; the initial state represents the compound to be synthesized; there is no unique goal state, instead there is a goal criterion (e.g., “this compound is available in the Aldrich catalog”) that can be applied as a test to any newly generated states. The operators that effect state changes are chemical transforms; these are characterized by preconditions (features that must match in the target structure to be synthesized) and bond changes that define the structural differences between target and precursor structures. A state-space can be searched exhaustively by depth-first, or breadth-first algorithms [ 221. Usually, the state-space is too large for exhaustive search to

13

Fig. 2. A State/space search. Steps: (1) use a ditoxo group transform; (2) use conjugation transform; (3) use dehydration of the fi-hydroxy compound.

be applicable. Instead, heuristics must be used in an attempt to limit the scope of the search. “Best-first” search is the simplest of the heuristically guided search algorithms. A score must be computed for each state as it is generated, this score is an empirical estimate of how close a particular state is to one that satisfies the goal criterion. States are kept ordered according to these heuristic scores; and when the search algorithm next needs to generate new states, it considers only operators that apply to that previously unprocessed state that has the best current score. In a chemical synthesis problem, the heuristic scoring function could use empirical measures of the presumed difficulty of synthesis of a particular structure; these measures might include counts of functionality, estimates of steric congestion, counts of stereocenters, and so forth. The SYNCHEM [ 141 program uses some rather simple empirical measures of structural complexity to guide its best-first search for a synthesis. Declarative representations Artificial intelligence programs manipulate facts and symbolic relationships, not numerical values. If they are to be manipulated by a program, facts and relations need to be represented as some form of data structure. For example, a single fact might be represented by a triple (object, attribute, value). Another representation might utilize property-lists to collect together all the attributes and values of a particular object [ 231. Relations are defined in terms of named links between objects; these named links often are held in the property-lists of the related objects. “Frame” data structures may be used to define more complex objects that possess numerous attributes and relation-links; as well as having slots to hold values for attributes, a frame data-structure may also have a description of the range of values permitted to an attribute, or may

14

identify functions that can be used to compute a value for an attribute should it be required. Triples, property-lists, and frames are all dynamic data-structures that have to be created as necessary and may grow to arbitrary size. Such data-structures are most readily manipulated in languages such as Lisp [ 231 and Prolog [ 241 that employ dynamically created list structures as the basis for all data-structures. When development of a program is complete, its requirements for datastructures can be re-analyzed, then, the program can usually be converted to use a conventional procedural language such as C. However, during development, dynamic list manipulations and automatic garbage-collection facilities are of significant value; consequently, Lisp and Prolog are the major development languages for AI programs, for these languages allow the programmer to defer commitment to particular data structures and access methods [ 91. Lisp programmers can select triples, property-lists, frames, or applicationdefined data structures. These can be optimized according to the way that they will be processed by the Lisp program. Lisp-code can also be represented and manipulated as data; this can be useful in a situation such as where it is necessary to define a test-function that is to be associated with a slot in some frame data structure (see Lenat’s AM program [25] for some applications of this technique). Figure 3 shows a chemical structure and a Lisp-list representation. The programmer has chosen that: the first element of the list structure is the compound name, the second is the number of atoms, the third element of the list is a sublist that defines the types and connections of these atoms. At some later stage in processing, this list representation of the molecule might be augmented by additional data such as a representation of the symmetry group. The Lisp routines used to process the structures are coded with explicit reference to the lists and sublists of the chosen representation. A procedure to identify doubly-bonded atoms would select the sublist defining the atoms and their connections, and then search for atoms that had duplicate entries in their neighbors list. The Prolog representation, also shown in Fig. 3, emphasizes a uniform decProlog

Lisp

c(l) c(2) CC (2 cc (1 cc (2 cc (3 CC (4 cc (1 (0 (1

) )

6 7 7)) 3 3)) 2 4)) 5)) 6)) 5)) 1))

c(3) c(4) c(5) c(6) o(7) bondordcr(l.7.2) bondordcr(l.2.1) bondorder(2.3.2) bondorder(3.4.1). bondordaf4.5.1) bandorder(5.6.1). bondorder(l.6.1)

Fig. 3. A chemical structure with possible Lisp and Prolog representations.

15

larative style. All factual data are represented in terms of Prolog clauses with a “head” but no “body”; examples are the facts that predicate “c(X)” (i.e., “X is a carbon atom”) is asserted as true for atoms l-6. Prolog’s uniform representation of facts makes some forms of data processing very simple. Doublybonded atoms could be identified by asking the Prolog interpreter to answer the question ?bondorder(X,Y,B); the neighbors of a given atom z would be listed by the Prolog interpreter in response to the queries ?bondorder(z,X,_) and ?bondorder (X,z,_ ).

Rules and rule interpreters Structural organic chemistry can claim some credit for helping establish the current style of rule-based programming. In the early days of the DENDRAL project, it was necessary to encode rules that could be used to predict mass and ‘H-n.m.r. spectra for simple acyclic monofunctional compounds. The first attempts at encoding the necessary chemical knowledge involved Lisp-programmers attempting to encode, as special-purpose routines, their understanding of rules described by chemists; these attempts were unsuccessful. However, it was noted that the basic structure of each of the chemists’ rules consisted of a definition of a substructure together with a description of some associated spectral features. Rather than have special Lisp code for each rule, it was possible to apply a generalized rule-matcher (see Fig. 4) that graph-matched the substructures, defined in the rules, onto a structure; if the substructure matched, then the corresponding spectral feature was predicted [ 171. This approach to the organization of chemical knowledge proved successful. The chemical context is complex. The rule-interpreter must match a substructure (representing the rule premise) onto a particular target structure. Rules

r CH3-N

C.

->3pmtmtre.smalce,aotmd2.3ppn -> 3 pmton triplet resmmce In re&m

CH3CHc

-> 3 proton doublet remnmcc

In regim

0.9-13

ppm

CH3-Cc-

-* 3 proton singlet resort-

Ill regmtl 0.9-1.3

ppm

N*c d -c

+.

Rule-matcher (‘inference engine’)

C

\

CHJ-CIIZ-

A

Structure @roblem data) Fig. 4. A rule-baaed prediction system.

0.8-1.1

ppm

16

More usually, data characterizing a problem are represented simply as a set of (object, attribute, value) triples; the premise clauses of rules specify a conjunct of triples that must match data characterizing the example problem. The ruleinterpreter required for handling triples is much simpler, and more general, than the specialized graph-matchers used in the DENDRAL systems [ 171. The MYCIN project [ 71, that grew out of the earlier DENDRAL project, sought to exploit more general applications of the (object, attribute, value) model, with rules specifying premise clauses in terms of triples, and a generalized rule matching interpreter. A rule-based interpreter requires a rule-matching algorithm, and libraries of domain-specific rules. Additionally, the interpreter requires a prescription of how to select those rules to be applied, and a method of soliciting necessary problem-data. Systems devised for different applications differ greatly in these respects. In the DENDRAL spectral predictors [ 17,26,27], it was appropriate to apply all rules, and their order of application was immaterial. The data consisted of definitions of structures that were to be tested; all data were available before the rule system was run. The problem-data were not changed by execution of the rules; the only effect of rules being executed was the output of predicted spectral properties. The MYCIN system [7] provides a form of consultancy service that can advise physicians on microbial therapy. Data, characterizing a patient, are obtained by the system asking questions of the user. The final rule outputs are recommendations of therapy; intermediate rules add inferred facts to the problem data (for example, successful execution of one rule might add the fact that the particular patient under consideration must have meningitis). In a consultancy system, it is important to keep interactions with the user focused on a particular topic. MYCIN uses a backward-chaining, goal-directed strategy to determine the order in which rules are applied. The backward-chaining strategy works by: (1) selecting a particular bacterium-disease diagnosis as current goal, (2) identifying the rules that provide evidence for that goal, (3) analyzing the premise clauses of these rules. These premise clauses specify tests on (object, attribute, value) triples; if the value of some attribute is not yet known, the system establishes a recursive sub-goal to determine the value. Values may be determined through chains of inference or may have to be requested from the user. (Code of a simplified MYCIN-like system is provided in the text by Winston and Horn [ 231. ) The forward-chaining procedures used in XCON [ 81 and similar programs start with data characterizing the problem. In the case of XCON, these data define a set of components for construction of a computer system; the problem involves first checking that the set includes all necessary components, and then grouping components that are to be placed in the same cabinet or are to be attached to a particular bus. Potentially applicable rules are identified by

17

matching the premise parts of rules to the existing data. The 0PS5 language [ 281, in which XCON is implemented, has elaborate facilities for identifying potentially applicable rules and for choosing which rule to apply when several are applicable. When rules are executed, data elements may be added to the problem data base (as when additional components are specified) or removed from the data base (as when several components are packaged into a larger unit). The changes made by the execution of a rule will enable/disable the execution of other rules. The process is continued until no further changes can be effected. A typical MYCIN rule is PREMISE:

ACTION:

($AND

(SAME CNTXT GRAM GRAMNEG) (SAME CNTXT MORPH ROD) (SAME CNTXT AIR ANAEROBIC) ) (CONCLUDE CNTXT IDENTITY BACEROIDES TALLY .6)

This rule tests the GRAM, MORPH and AIR attributes of the object identified by CNTXT, if all these attributes have appropriate values, the rating of BACTEROIDES (in the IDENTITY attribute of the object) is upgraded. (If the Gram-stain of the object was not known, MYCIN would prompt the user, using an appropriate predefined textual prompt, to enter the data during the evaluation of this rule.) MYCIN rules are simply Lisp data structures organized according to the needs of the rule-interpreter. Details of how the system performs its backward-chaining search are maintained in list structures created by the underlying interpreter (as illustrated in the Winston-Horn text). OPS5 rules are similar in general style. A rule is a list data structure (with a “p” as first element acting as a flag to show that the list represents a production rule) that contains the rule-name, some number of premise clauses, the “ +” symbol separating premise from action clauses, and a number of action clauses that make, modify, or remove elements from the problem data base. Rule syntax is slightly more complex than with MYCIN; an example OPS5 rule is (p example2 (phase layout ) ( sbi h place incabinet) { thecabinet (cabinet h space (s > 0) )} - (powersupply) + (make powersupply) (modify thecabinet h space (compute s - 1) ) This rule is applicable if the system is in the “layout” phase, a component (called an “sbi”) is in a cabinet that still has empty space, and there is no

18

available power supply. Execution of the rule adds a power supply and changes the amount of space left in the cabinet. A Prolog interpreter is a backward-chaining, goal-directed inference engine. Problems are given to Prolog as queries, for example “?diagnose(Disease,patientl )” represents a request that Prolog find all disease(s) that satisfy the predicate “diagnose” in relation to patient 1. A Prolog “program” comprises statements of fact [e.g., air (bacteroides, anaerobic) ] and rules (e.g., a rule specifying that the diagnosis D for patient P is to be established by first finding the infection I, then the bug B, and then completing the diagnostic process could be written as: diagnose(D,P):-findinfection (I,P), findbug(B,P), finddiagnose (I,B,P,D ) ; each new subgoal, like findinfection (1,P ) would cause automatic invocation of other rules that could identify possible infections that could be matched to variable I and referenced later when identifying the bug). Rules can specify that data are to be requested from the user (e.g., grampos:promptread( [is the stain Gram-positive], [X I_),X = = yes), or may prescribe how a data value can be inferred from other factual information available to the system. Prolog’s underlying goal-directed inference system simplifies the implementation of basic expert systems for classification tasks. Complex problems, involving assessment of strengths of evidence for different hypotheses and so forth, are less well suited to Prolog.

Tools The basic tools for AI-related work are Lisp [23] and Prolog [24] interpreters. Lisp and Prolog have been available on medium-sized computers (DEC PDP-10/20, VAX 11-780, etc.) for many years. Satisfactory versions are now also available for larger personal computers; for example, ExperTelligence [ 291 has reasonable implementations of both Lisp and Prolog for l-Mbyte Macintosh computers. Any large AI system will require some coding of specialized routines in Lisp or Prolog. However, there is no need for workers to spend effort on duplicating user-interface routines, or rule-matching procedures; these parts of an expert system are now standardized. There are several small packages for use with Prolog interpreters; an example is “APES-Augmented Prolog for Expert Systems” [ 301. APES provides a set of routines that simplify the problems involved in keeping track of facts already established, handling plausibility estimates for uncertain data, and querying of users for new factual data as needed. The OPS5 interpreter is readily available in the form of an overlay to Lisp [ 291. In addition to the basic tools like OPS5, Lisp and Prolog, there are several “shells” and development systems available commercially. “Shells” are expert system construction kits that have been created by adapting code originally written for a specific expert system. The first such shell was EMYCIN [ 311, this consisted of the code that implemented MYCIN’s goal-oriented backwardchaining search and user-interaction routines (i.e., it was MYCIN after emp-

19

tying out the rules concerning bacteremia and meningitis). Shells provide an inference engine with some method for selecting rules (backward-chaining, forward-chaining, etc.), interaction routines, and often elaborate multi-window display management routines. Waterman [ 61 has listed the various shells and other development tools as available in 1986 from companies such as Teknowledge (Ml and Sl systems development tools and Tl training system), Texas Instruments (the Personal Consultant language for rule-based and frame-based systems on microcomputers), and Intellicorp (KEE, a frame-based system derived from an early expert system for molecular genetics). There are several more specialized tools. For example, “ES/P Advisor” is a limited expert system shell devised specifically for applications in which information from a technical manual is to be re-coded for use in a simplistic expert system. The ES/P Advisor is one of a number of packages released by the Alvey Directorate in the U.K.; the Alvey/IKBS Expert Systems Starter Pack [ 321 was assembled to allow companies to assess possible applications of expert system technology. The packages are all of relatively low cost and all use standard IBM PC systems. Programs such as ExpertEase and RuleMaster represent still another type of tool. The ExpertEase program can be used to develop a decision tree for a classification task in some domain where there is no existing theory (ExpertEase is included in the Alvey/IKBS package). The program is given data characterizing examples of known classification; the data are given in the form of tables of attributes and values. An inductive inference mechanism is used to derive a decision tree from these examples. Subsequently, unknown samples can be classified by the system. RuleMaster [ 331 is a second-generation version of ExpertEase that has provision for interfacing to other modules (e.g., data-acquisition modules). RuleMaster can encode its decision trees in the form of code in the C-language; this code can then be linked to other programs. Some chemical applications of the RuleMaster system have been reported [ 34,351. Commercial AI shells and AI languages are strongly recommended to those experimenting with possible AI-based approaches to the analysis of data. However, it should be recognized that they may not always prove satisfactory. Problems requiring the identification of relations between classes and attributes may be better solved by conventional statistical techniques. AI tools can help to establish the feasibility of a particular approach to problem-solving; however, it is often appropriate for the final program to be recoded in a more conventional procedural language. Use of restricted rule-interpreters, or even of the underlying languages such as Prolog and Lisp, can make it difficult to encode special user-interfaces or to implement iterative procedures required in particular applications. In a typical recent example of a Prolog-based expert system [36], it was noted that only about 10% of the 4000 lines of Prolog

20

actually represent knowledge at a level intelligible to experts and useful for explanations. EXPLOITING CHEMICAL EXPERTISE

Structure elucidation Work on computer-assisted structure elucidation has been reviewed recently [37], and is the subject of other plenary lectures and focused discussions at this conference. Computer-assisted structure elucidation is a three-phase process. There is a planning phase in which structural constraints are inferred from available chemical and spectral data. An algorithmic generation phase follows in which all stereoisomers compatible with the given constraints are generated. In the third and final test phase, detailed structural/spectral relationships are used to predict properties for each generated candidate; a comparison of predicted properties and observed data is used to derive a plausibility score for ranking candidates. Other possible processing in the test phase includes the simulation of experiments that might be used to differentiate among the remaining candidate structures; through these simulations, it may be possible to determine the most effective sequence of experimental steps. Artificial intelligence techniques are applicable in both the planning and test phases. Their use in the test phase is simple. Spectral properties are predicted by using “substructure+spectral-feature” rules as illustrated in Fig. 4. For prediction systems, substructures are elaborate and, where appropriate, use definitions of configurational stereochemistry. A rule-matcher finds those substructures that can match a given candidate and determines the exact matching environment; the matching environment is then used in the action part of the rule to determine the appropriate spectral feature. In a system for predicting 13C-n.m.r. spectra, the action part of the rule merely asserts the expected occurrence of a signal in a particular region of the spectrum [ 271. The utility of such a spectrum-predictor is limited by the size and quality of the library of the “substructure+ 13C-signal” rules that it uses. A predictor for mass spectra is more elaborate; the action part of a mass-spectrum prediction rule specifies a set of bonds cleaved and atoms transferred, the resulting ion must be identified by a process that applies these cleavage and transfer steps in the particular environment determined by the substructure-matching [ 261. Massspectrum prediction rules are class-specific; the prediction system must be provided with detailed rules devised for the analysis of compounds from a given class. This requirement for detailed, class-specific prediction rules limits the applicability of the approach. The planning phase, in which structural constraints are inferred from spectral and chemical data, involves much more difficult problems than the test phase. An essential requirement for a structure-elucidation system is that it considers all possible structures; consequently, any constraints applied to the

21

generator must be correct, otherwise some valid candidate structures will be missed. This requirement has two consequences. First, generators should not be constrained by structural inferences obtained by interpretive systems that yield merely relative plausibilities of different functional groups and substructures; use of such plausibility ratings should be confined to the ranking of candidates in the final test phase. Secondly, if the generator works with buildingblocks inferred from spectral features then the inference system must be complete; either the spectral feature has a single unambiguous structural interpretation, or the inference system must create all possible alternative structural interpretations. Some spectral data admit unambiguous interpretation. It is easy to identify the presence of carbonyl groups from infrared absorptions, aromatic nuclei from ultraviolet absorption and fluoresence spectra, and methyl groups from magnetic resonance signals. Other unambiguous data, possibly characterizing quite large structural fragments, can be obtained from n.m.r.-decoupling experiments. New m.s./m.s. experiments may be capable of the identification of multi-atom fragments in a structure [ 381. Quite large portions of a structure may be known from data concerning the origin of a compound and the workup procedures applied during its isolation. Problems of “ambiguity” arise in systems that attempt to infer quite detailed structural information from data such as 13C-n.m.r. shifts [ 39-411. These 13Cinterpretation systems attempt to identify the environment of a resonating atom from its shift (or from a pair of shifts). For example, a triplet at w 40 ppm in the 13Cspectrum can be identified as a methylene in one of quite a number of different environments, with the -CH,- bonded to vinylic carbons, methines, quaternary alkyl carbons and so forth. It is easy to obtain these initial alternative structural interpretations; however, it is still necessary to identify those that are mutually consistent [ 39,421, and that are also compatible with substructures derived from other data. The substructures derived from the spectral interpretation process act as constraints on a generator; the generator attempts to overlap the substructures derived from 13C-n.m.r. analysis and so build up representations of complete molecules. This process can be completed provided that the sets substructures derived for different signals include some subsets that are mutually consistent. However, it can happen that the library of “spectral-feature+substructure” rules is incomplete and appropriate substructures are not suggested for all resonances. Then the constraints provided by the sets of available substructures cannot be satisfied and the generator fails. Rather than have an empirical interpretation system that suggests constraints for use in a structure generator, it is better to use a closely coupled generator/predictor system. Such a generator starts with the atomic components (usually, 13C multiplicity data are available so that the atoms can be given as a list of methylenes, methines, etc.) and information on those structural fragments which have been estab-

22

lished as present from unambiguous chemical and spectral data. The generator can then, at each bond-creation step, check whether a new bond suggests spectral properties incompatible with observed data. In somewhat simplified terms, this check-process involves recomputation of the substructural environments of those atoms in the region of the new bond, use of these substructures with “substructure-t spectral-feature” rules to obtain regions where spectral signals are predicted, and tests that the observed spectrum does contain appropriate signals in these regions that can be assigned to the relevant atoms. This approach avoids the problems that arise when the file of “substructure*spectralfeature” rules is incomplete; if a substructure is absent from the file, the corresponding predicted range is assumed to span the entire spectral region. The processing required in a structure-elucidation system is complex. Such a system must have a (sub)structure representation that can incorporate stereochemical detail, and is suited to efficient processing. Processing involves frequent graph-matching of substructures, and canonicalization of partial structures; permutation symmetry groups of atoms and edges have to be determined, and changes to these groups must be made as bonds are created. The interpretation of any “spectral-feature-+substructure” interpretation rules represents only a minor aspect of the overall processing that must be performed in a structure-elucidation system. Some data (e.g., coupling patterns and details of chemical history) are not really amenable to simple rulebased analysis. Consequently, high-level AI tools that focus on rule-interpretation are of little value. A structure-elucidation system has to be encoded in Lisp or in a procedural language such as C. Although constituting one of the first application areas for AI in chemistry, structure elucidation by combined spectral techniques is not ideally suited to current expert-system technology. Expert-system technology is appropriate to those problems that take a few hours of the expert’s time, do not require data representations significantly more elaborate than (object, attribute, value) triples, and occur with sufficient frequently as to justify the development of specialized tools [ 61. Quite apart from the problems relating to special structure representations and elaborate iterative processing, structure-elucidation problems simply do not occur with sufficient frequency. Further, when they do arise, structure-elucidation problems are best solved by techniques such as xray crystallography or 2-D n.m.r. experiments. Expert systems for spectral interpretation are better suited to simpler applications such as identification of the elements present in samples through the use of x-ray fluorescence [ 431, identification of the presence of infraredactive bonds in a molecule [ 441, and so forth. These applications require merely “spectral-feature+&ructural-attribute” rules. The knowledge represented in these rules is shallow; only a very short chain of inference is required to go from spectral data to structural-attribute (often, one-step inference). There are cases where plausibilities of particular structural-attributes are interde-

23

pendent; in such cases, several rules may need to be applied in sequence until no further changes in plausibilities are obtained. The spectral data required by these systems can all be provided at the outset, so most will work by forwardchaining through sets of inference rules. These problems are therefore ideally suited to OPS5 interpreters; developers of new spectral interpretation systems should use OPS5-like interpreters, at least for feasibility testing. Most current spectral interpretation systems employ special-purpose rule-interpreters. The PAIRS infrared-interpreter has a table-driven interpretation program using tabular representations of classification rules; the rules entered by the chemist are converted to tabular form from a higher-level representation (in the specially developed CONCISE language) by an auxiliary program. A recently proposed “expert system” for qualitative interpretation of x-ray fluoresence spectra has its rules encoded as functions and procedures in Pascal [43]. This approach is much less satisfactory than the data-structure style of OPS5 rules or the tables used in PAIRS [ 441. The great advantage of having rules as datastructures is that they can then be manipulated by other programs. Rules can be checked for mutual consistency; analysis of rules can allow programs to answer questions like “What spectral features suggest group XXX?” or “Why was group YYY not suggested for this spectrum?“; it is sometimes possible for new rules to be generated by programs [ 111. All these advantages are lost if the rules are encoded procedurally rather than being represented as data for the rule-base/rule-interpreter model. Chemical synthesis Synthesis planning programs have been available for almost 20 years [ 451, and yet remain surprisingly under-utilized. This under-utilization is due in part to inadequacies of available libraries of chemical transforms (these libraries represent the encoded expertise of synthetic chemists); this is being slowly addressed through the work of various industrial consortia [ 461. The other reason for under-utilization is that these programs are usually regarded as tools solely for helping to plan complete, practical syntheses, which is a task still largely outside of their capabilities. Many more routine applications of these systems have been neglected. The existing programs would be of value in exploring possible biosynthetic degradations of drugs or inter-conversions of agrochemicals (through the application of transforms representing postulated biochemical reactions), in teaching, and simply as aids to synthetic chemists needing to retrieve details of reactions relevant to given structural problems. Possibly, wider use of these programs will be achieved as they become available on large personal computers. Synthesis planning is a straightforward application of state-space search, as illustrated in Fig. 2. Development of a path through the state-space involves both tactical and strategic considerations. Tactical considerations determine the choice of transforms that should be applied to a given structure to derive

24

new states representing possible synthetic precursors (or metabolites etc.). Strategic considerations direct the development of focused paths through the state-space, avoiding the near-exhaustive search that results from a purely tactical analysis. (Tactical analysis and exhaustive search are all that are required for studies of degradation pathways for drugs.) Current synthesis-planning programs all have highly specialized mechanisms for indexing, retrieving, and applying chemical transforms. However, processing at the tactical level is really quite similar to the type of processing used in a mass-spectral predictor. Again, a substructure template must be matched onto a structure and a sequence of bond changes conducted in the context of a particular matching environment. Individual transforms are basically “substructure+bond-change” rules. The premise clauses in a transformrule consist of a definition of a substructure that must be matched onto the current target molecule, and descriptions of supplementary tests on the appropriateness of the transform. (Supplementary tests in the premise part of the rule might check for interfering functionality elsewhere in the structure, or might limit consideration of transforms to those that satisfy some goal set by the strategy module. Goals set by the strategy module can include requirements that transforms break particular bonds, and reduce/increase the density of functionality in the structure.) If a given target satisfies all the premise clauses of a rule, the bond changes in the action part are made and the precursor structure (s ) generated. Additional tests are typically applied to generated precursors to confirm that these do not contain any features that might invalidate the forward synthetic reaction. The processing requirements of the tactical level are now all well established. Programs for applying transforms require: (1) graph-matchers that match transform-substructures to target molecules and are used to identify other functionality; (2) ring-finders and stereochemistry analysis routines that are used to derive data for analyzing the feasibility of transforms; (3) simple structure-editing routines for performing the changes to bonding; and (4) structure-canonicalization routines that convert generated structures to standard forms so simplifying the recognition of structural equivalence. These algorithmic processes are best encoded in a procedural language that supports both recursion and iteration and provides for efficient access to large arrays and lists of data. Prolog-based tactical-level chemical-synthesis planning systems have been reported; one uses a form of Wiswesser Linear Notation for structure representation [ 471, the other employs n-ary predicates that represent connection table entries [ 481. These systems are currently incomplete, lacking most of the necessary canonicalization procedures, stereochemistry analyzers, user interfaces and so forth. The choice of Prolog as a programming language, and the types of structure representation used, are likely to hinder the development of practical tools based on these programs. The problem of chemical planning remains open. In current programs, either

25

weak strategies are employed by the program (as in SYNCHEM [ 141) or the synthetic chemist is involved in the processes for planning (as in the interactive LHASA 1491 and SECS [20] programs). Neither of these approaches is satisfactory. The most promising enhancement of current synthesis-planning programs appears to be the Starting Material Selection Strategy “SST” program by Wipke and Rogers [50]. The SST program analyzes the skeleton of the target structure and selects plausible starting materials that exhibit significant skeletal similarities; the state-space search is then directed toward these potential starting materials. Dolata’s work on applications of the predicate calculus represents another extension to the strategic capabilities of the SECS program 1511. The standard SECS program has a limited strategy module that analyzes structures to identify bonds, retention or cleavage of which should be included in the goals to be achieved by the next transform applied. Dolata’s QED program provides a mechanism for formalizing the considerations of such a strategy module. QED permits strategic rules to be represented in high-level terms rather than be encoded as part of some FORTRAN program. Typical strategic rules used with QED are “not-Spiro-bond-break” (if a bond is to a Spiro atom, then there is strong evidence that it should not be broken by a transform), and “dont-break-and-isolate” (if two stereo centers are joined by a bond, then there is evidence that the bond should not be broken by a transform). As well as helping to clarify the principles of strategic bond selection, QED has allowed some refinements to the analysis of stereochemical problems. Experiment planning While structure elucidation and synthesis planning have been studied for 20 years, it is only recently that attention has turned to the use of AI techniques in the planning of simple experimental procedures, such as the choice of conditions for separating particular mixtures. However, it is just these frequently occurring, limited-domain problems that are best suited to current expert-system approaches. The SpinPro system for optimizing ultracentrifugation runs represents a fairly successful application of current technology [52]. SpinPro is implemented directly in Lisp and uses a goal-directed, backward-chaining approach. The user’s requirements for purity, type of sample, and so forth are determined in a question and answer consultation session. These data are employed by the rules to select an appropriate combination of run-parameters including rotor, run-time, solvent, and concentration. A current area of interest is the development of expert systems that can aide in the planning of separations by liquid chromatography [ 531. A Prolog-based system has been reported that can assist in the selection of separation procedures for certain classes of steroids [54]. Varian Associates have been developing the ECAT (Expert Chromatographic Assistance Team) program that can assist in several aspects of liquid chromatography [55 1. The ECAT pro-

26

gram has a forward-chaining module that can help in the selection of column and mobile phases. The ECAT program has rules for different classes of molecules, molecular-weight ranges, detector systems and so forth; future extensions are planned to include a module for analyzing gas chromatographic separations. Another limited-domain expert system that uses knowledge about solubilities of compounds has been developed for determining formulations of agrochemicals as solutions, emulsions, wettable powders and so forth [ 561.

INTELLIGENT INSTRUMENTATION

AND ROBOTICS

The choice of optimal parameters for an instrument system represents another area requiring expertise on the part of the user. A standard example in computing concerns the choice of parameters for running a large computer system. A computer-operating system has many control parameters, such as the number of concurrent jobs, the size of memory partitions reserved for databuffering, the time-quantum for time-shared jobs and so forth; these parameters must be adjusted, by the operators at the computer console, so as to optimize service to a varying mix of jobs in the computer system. This particular problem has been addressed in the YES/MVS [57] expert-system project. YES/MVS is an OPS5-based system; when activated, it reads tables defining the state of the operating system and its queues, and these data are matched against the rules, so initiating a forward-chaining inference process. Rules are applied resulting in recommendations of changes to the operating regime; the process continues until no additional rules are activated. Similar rule-based systems could be devised for g.c./m.s, n.m.r., and other chemical instrumentation. Most instruments are already computerized, allowing control settings and signal values to be read (and possibly changed) by an expert “tuning” program. The processing would typically work through forward-chaining rules from the data to recommendations of changes in operating conditions. Again, OPS5 interpreters are suitable for feasibility studies; practical application may require development of special purpose rule-interpreters. There are already a few reports of expert systems developed for control of chemical laboratory instrumentation. TQMS is an expert system for tuning triple-quadrupole mass spectrometers; it is implemented by using the KEE system-building tool and works in a goal-directed, backward-chaining manner [ 581. The initial goal “Calibration is Successful” starts the backward-chaining process through the rules “(if (detector is coarse-tuned) (detector is finetuned) (then (calibration is successful ) ) )“, and “ (if (detector output is maximized after varying lensl-ql voltage) ... (then (detector is coarse-tuned) ) )“; the premise clauses in these rules can invoke routines that adjust voltages so that the clause can succeed. In addition to modules for planning liquid-chro-

21

matographic separations, Varian’s ECAT system has a backward-chaining module used to diagnose column failure conditions [ 551. Automatic analyzers have been exploited in the chemical laboratory for two decades. These analyzers have electromechanical devices that perform fixed tasks involving simple manipulations such as filling a syringe from the next sample bottle and injecting samples. Tasks requiring greater dexterity and more complex manipulations can now be performed by using laboratory robots available from commercial manufacturers such as Perkin-Elmer. An early review identified such applications as automated titrations, loading centrifuges, loading samples for a pulsed n.m.r. machine, and the spotting of thin-layer plates (a slow, boring, repetitive task requiring great. precision, ideal for a robot and not for a human laboratory worker) [ 591. Because laboratory robots form part of a more generally computerized system, action sequences are not totally predefined as in standard automatic analyzers; instead, action sequences can be adapted in response to various feedbacks. For example, (1) a robot may prepare a sample for spectrophotometric analysis, (2) the sample is run and the results analyzed by a program and found to be unsatisfactory, (3) the robot can be instructed to prepare a new sample, at some different concentration, and (4) the sample re-run. All these actions could be performed during some overnight shift so that the analyst could be presented with satisfactory results on the next working day. INFORMATION RETRIEVAL AND PROCESSING NATURAL LANGUAGE

The sheer size of the chemical literature ( m4450 000 articles abstracted in Chemical Abstracts in 1985) makes the study of chemistry of somewhat daunting task. Chemists are well served by their secondary literature such as Chemical Abstracts, and the various current-awareness bulletins and indices [60, 611;on-line searching systems are well developed [62] and now provide for structure-based searches in addition to keyword-orientated searches [63]. Nevertheless, any additional assistance would be desirable. Natural language (NL) front-ends to systems for business data-base management are now available [64]. These NL front-ends ,allow users who are unfamiliar with the technical characteristics of the underlying data-base management system to query data bases using typed English input. Many NL systems are limited and do little more than recognize keywords and ignore the other words that are entered as part of search requests. Sophisticated systems must handle quantification, adverbial and adjectival qualifiers, negation, conjunctions and disjunctions, ellipses, anaphora, and ungrammatical inputs. These NL capabilities are not essential for chemical information systems. Chemistry is a highly structured technical field and the users of a chemical information system are technically skilled. In such situations, it is more effective to use systems that offer elaborate hierarchical-menu selection facilities,

28

or systems that use some form of Query By Example (where the user fills in a table that looks something like the data that the system is to retrieve). In chemical information retrieval, many request involve specification of substructures; and, it is certainly much easier to define substructures by using a menu-driven system rather than to attempt their precise description in English. Some interactive information-retrieval systems incorporate mechanisms that help the user refine an initial search request [ 651. The system retrieves a few items that satisfy the initial request and presents these to the user who then rates their relevance. Based on the user-ratings and data characterizing the items retrieved, the system refines the initial request. Existing systems of this type employ statistical information characterizing the frequency of occurrence of keywords and other search terms in documents. AI-related work is attempting to improve further on the performance of interactive information retrieval systems by constructing models of the user’s information needs [ 661. An interesting inverse to the normal problem of maximizing the retrieval of relevant data has appeared in an expert system designed to minimize the disclosure of confidential information. The EDAAS system [67], developed for the U.S. Environmental Protection Agency, is used to assist information specialists who have to determine whether information concerning the manufacture and distribution of toxic chemicals can be released to the public (as required by the Freedom of Information Act) without compromising data classified as confidential business information. Shank and Abelson [4] have demonstrated applications of their Script Applier Mechanism (SAM) to the analysis of newspaper stories. SAM can be used to generate a precis or to translate a story. SAM works by matching input stories onto templates (scripts ). A script for the analysis of a report of a traffic accident has “slots” to be filled in for “place of accident”, “number dead/injured”, “cause of accident “, “police action” and so forth. These slots are filled by a procedure that analyzes the text of the story to identify appropriate data elements; the slot-filling process exploits constraints derived as the SAM programs build a very elaborate “Conceptual Dependency” model of the information in the given text. Once the slots in the script have been filled, these data can be passed to the procedures that answer questions interactively, or those used to generate a precis or a translation. In principle, such a script analysis could be of use in screening of chemical literature so as to extract, automatically, information of interest to a particular researcher. A script-based analysis should be capable of much more detailed processing than a simple keyword retrieval system. Typical “scripts” for chemical papers are: “natural products from X”, “synthesis of Y “, “an application of named reaction ABC”, and “use of the selective reagent Z”. However, problems again arise in relation to the representation of chemical structure. A substantial part of the contents of a chemical paper is conveyed through structural diagrams (hence, the use of “Graphics Abstracts” in Tetrahedron Letters). Any automated procedure

29

for analyzing chemical papers would have to “read” these structural diagrams in addition to the textual information. Some AI-related work on understanding graphics in technical documents has been reported [68], but the diversity of structural representations used in chemical papers would defeat any system likely to be available in the near-term future. A few journals require the citation of Chemical Abstracts Registry Numbers for all structures; a script-analysis program that had access to the Chemical Abstracts files might be able to process such articles. In the longer term, it might be possible for journals to provide both human-, and machine-readable versions of complete articles (machine-readable text is already quite widely available). In the human-readable representation, structures would be presented graphically whereas in the machine-readable form they would be encoded as augmented connection tables with details of sterochemistry (and possibly conformation). Until structures are included in the machine-readable versions of articles, the potential for script analysis must remain limited. INTELLIGENT TUTORING SYSTEMS

Intelligent computer-assisted instruction (CAI) represents another application area proposed for expert-system approaches. Some years ago, the LHASA chemical synthesis program was adapted for instructional use [69]. Current versions of LHASA and SECS are too demanding of computer resources to be widely used as instructional aids; the general lowering in the relative cost of computing power may make such use more feasible in the future. None of the structure-elucidation systems has yet been adapted for teaching purposes. Obviously, rule-based systems (such as the PAIRS infrared-interpreter) can be presented to students studying spectral methods of structure elucidation. Simple passive demonstrations of existing systems are likely to be of limited educational value; in general, it would be more effective to demonstrate, briefly, a sophisticated system like PAIRS and then require the students to implement some subset of the system in OPS5 or in some more specialized rule-based language. Rule-based systems can serve as the basis for tutorial systems that focus on the experiential level of knowledge, heuristics, simple spectral/structural correlations and so forth. Students must also learn the underlying physical basis of the phenomena they are studying, as represented by functional models of molecular vibrations, nuclear resonance phenomena, and so forth. Intelligent CA1 systems such as IDM [70] (see Fig. 5) incorporate both an experiential knowledge base and a functional knowledge base and permit the student to explore a problem at both experiential and functional levels. In the chemical context, such systems could be devised for infrared spectroscopy (a PAIRSlike system would provide the experiential level, a program for computing vibrational modes would underlie the functional level), or chromatography (ex-

30

Fig. 5. The Integrated Diagnostic Model (from ref. 70).

periential rules are used in systems such as ECAT, functional models could be based on the equations that define how changes in chromatographic conditions affect resolution).

REFERENCES

5 6 7 8 9 10

11 12 13

T. Winograd, in D.G. Bobrow and P.J. Hayes (Eds.), Artif. Intell., 25 (1985) 375. A. Newell, in D.G. Bobrow and P.J. Hayes @Is.), Artif. Intell., 25 (1985) 375. T. Winograd, Understanding Natural Language, Academic, New York, 1972. R.C. Schank and R.P. Abelson, Scripts, Plans, Goals and Understanding, Lawrence Erlbaum, Hillsdale, NJ, 1977. P.R. Cohen and E.A. Feigenbaum, Handbook of Artificial Intelligence, Vol. 3, William Kaufmann, Los Altos, CA, 1982, p. 523. D.A. Waterman, A Guide to Expert Systems, Addison-Wesley, Reading, MA, 1986. B.G. Buchanan and E.H. Shortliffe, Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project, Addison-Wesley, Reading, MA, 1984. J. McDermott, in W. R&man (Ed. ), Artificial Intelligence Applications for Business, Ablex, Norwood, NJ, 1984, p. 11. B. Sheil, in W. Reitman (Ed.), Artificial Intelligence Applications for Business, Ablex, Norwood, NJ, 1984, p. 287. P. Langley, G.L. Bradshaw and H.A. Simon, in R.S. Michalski, J.G. Carbonell and T.M. Mitchell (Eds.), Machine Learning! An Artificial Intelligence Approach, Tioga Press, Palo Alto, CA, 1983, p. 307. T.M. Mitchelland G.M. Schwenzer, Org. Magn. Reson., 11 (1978) 378. B.G. Buchanan, D.H. Smith, W.C. White, R. Gritter, E.A. Feigenbaum, 3. Lederberg and C. Djerassi, J. Am. Chem. Sot., 98 (1976) 6168. P.R. Cohen and E.A. Feigenbaum, Handbook of Artificial Intelligence, Vol. 3, William Kaufmann, Los Altos, CA, 1982, p. 385.

31 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

39 40 41 42 43 44

45 46

H. Gelernter, G.A. Miller, D.L. Larsen and D.J. Bemdt, 1st Conf. Artificial Intelligence Applications, IEEE, Silver Springs, MD, 1985, p. 92. M.E. Munk, C.A. Shelley, H.B. Woodruff and M.O. Trulson, Fresenius, Z. Anal. Chem., 313 (1982) 473. S. Sasaki and Y. Kudo, J. Chem. Inf. Comput. Sci., 25 (1985) 252. R.K. Lindsay, B.G. Buchanan, E.A. Feigenbaum and J. Lederberg, Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project, McGraw-Hi& New York, 1980. B. Debska, J. Duliban, B. Guxowska-Swider and Z. Hippe, Anal. Chim. Acta, 133 (1981) 303. L.A. Gribov, M.E. Elyashberg, V.N. Koldashov and I.V. Plentjov, Anal. Chim. Acta, 148 (1983) 159. W.T. Wipke, G.I. Ouchi and S. Krishnan, Artif. Intell., 11 (1978) 173. A. Barr and E.A. Feigenbaum, Handbook of Artificial Intelligence, Vol. 1, William Kaufmann, Los Altos, CA, 1982, p. 32. P.H. Winston, Artificial Intelligence, 2nd edn., Addison-Wesley, Reading, MA, 1984. P.H. Winston and B.K.P. Horn, Lisp, Addison-Wesley, Reading, MA, 1981. I. Bratko, Prolog Programming for Artificial Intelligence, Addison-Wesley, Wokingham, UK, 1986. R. Davis and D.B. Lenat, Knowledge-Based Systems in Artificial Intelligence, McGraw-Hill, New York, 1982. A. Lavanchy, T. Varkony,D.H. Smith, N.A.B. Gray, W.C. White, R.E. Carhart, B.G. Buchanan and C. Djerassi, Org. Mass. Spectrom., 15 (1980) 355. C.W. Grandell, N.A.B. Gray and D.H. Smith, J. Chem. Inf. Comput. Sci., 22 (1982) 48. L. Brownston, R. Farrell, E. Kant and N. Martin, Programming Expert Systems in OPS.5: An Introduction to Rule-Based Programming, Addison-Wesley, Reading, MA, 1985. ExperTelligence Inc., 559 San Ysidro Road, Santa Barbara, CA 93108. P. Hammond and M.J. Sergot, Apes Reference Manual, Logic Based Systems, 1984. W. Van Melle, A.C. Scott, J.S. Bennett and M. Pearis, Report No. HPP-81-16, Computer Science Department, Stanford University, 1981. Alvey Expert Systems Starter Pack, NCC, Manchester, Gt. Britain, 1985. D. Michie, S. Muggleton, C.E. Riese and S.M. Zubrick, Proc. 1st Conf. Artificial Inteliigence Applications, IEEE, Silver Springs, MD, 1984, p. 591. C.E. Riese and J.D. Stuart, in T.H. Pierce and B.A. Hohne (Eds.), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 18. L.H. Keith and J.D. Stuart, in T.H. Pierce and B.A. Hohne (Eds.) Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 31. J.V. Thomson, Proc. 2nd Australian Conf. Applications of Expert Systems, New South Wales Institute of Technology, Sydney, 1985, p. 51. N.A.B. Gray, Computer Assisted Structure Elucidation, Wiley, New York, 1986. K.P. Cross, P.T. Palmer, C.F. Beckner, A.B. Giordani, H.G. Gregg, P.A. Hoffman and C.G. Enke, in T.H. Pierce and B.A. Hohne (Eds.), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 321. N.A.B. Gray, Artif. Intell., 22 (1984) 1. J.E. Dubois, M. Carabedian and I. Dagane, Anal. Chim. Acta, 158 (1984) 217. M.E. Munk, R.J. Lind and M.E. Clay, Anal. Chim. Acta, 184 (1986) 1. A.H. Lipkus and M.E. Munk, J. Chem. Inf. Comput. Sci., 25 (1985) 38. K. Janseens and P. van Espen, Anai. Chim. Acta, 184 (1986) 117. H.B. Woodruff, S.A. Tomellini and G.M. Smith, in T.H. Pierce and B.A. Hohne (Eds.), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 312. E.J. Corey and W.T. Wipke, Science, 166 (1969) 178. W. Sieber, in T. Bemold and G. Albers (Eds.), Artificial Intelligence: Towards Practical Applications, Elsevier, Amsterdam, 1985, p. 167.

32 47

48

49 50 51 52

53 54 55

56 57

58 59 60 61 62 63 64 65 66 67 68 69 70

C.W. Moseley, W.D. LaRoe and Hemphili, in T.H. Pierce and B.A. Hohne (Eds. ), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 231. T. Wang, I. Burnstein, M. Corbett, S. Ehrlich, M. Evens, A. Gough and P. Johnson, in T.H. Pierce and B.A. Hohne (Eds.), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 244. E.J. Corey, A.K. Long and S.D. Rubenstein, Science, 228 (1985) 408. W.T. Wipke andD. Rogers, J. Chem. Inf. Comput. Sci., 24 (1984) 71. D.P. Dolata, Ph.D. Thesis, Santa Cruz University, CA, 1984. P.R. Martz, M. Heffron and O.M. Griffith, in T.H. Pierce and B.A. Hohne (Eds. ), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 297. Anal. Chem., 58 (1986) 1192A. H. Gunasingham, B. Srivasan and A.L. Anenda, Anal. Chim. Acta, 182 (1986) 193. R. Bach, J. Karnicky and S. Abbott, in T.H. Pierce and B.A. Hohne @Is.), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 278. B.A. Hohne and R.D. Houghton, in T.H. Pierce and B.A. Hohne (Eds), Artificial Intelligence Applications in Chemistry, American Chemical Society, Washington, DC, 1986, p. 87. J.H. Griesmer, S.J. Hong, M. Karnaugh, J.K. Kastner, M.I. Schor, R.L. Ennis, D.A. Klein, K.R. Milliken and H.M. Van Woerkom, Proc. Natl. Conf. AI, AAAI, William Kaufmann, Los Altos, CA, 1984, p. 130. C.M. Wong, R.W. Crawford, J.C. Kunz and T.P. Kehler, IEEE Trans. Nucl. Sci., 31 (1984) 805. R.E. Dessy, Anal. Chem., 55 (1983) 1232A. C.L. Bernier, J. Chem. Inf. Comput. Sci., 25 (1985) 164. E. Garfield, J. Chem. Inf. Comput. Sci., 25 (1985) 170. P.F. Rusch, J. Chem. Inf. Comput. Sci., 25 (1985) 192. R.E. Stobaugh, J. Chem. Inf. Comput. Sci., 25 (1985) 271. M. Bates and R.J. Bobrow, in W. R&man (Ed.), Artificial Intelligence Applications for Business, Ablex, Norwood, NJ, 1984, p. 179. G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983. R.H. Thompson and W.B. Croft, in K.N. Karna (Ed.), Expert Systems in Government Symposium, IEEE, Washington, DC, 1985, p. 448. W. Bercaw, J.L. Feinstein and F. Sims, in K.N. Karna (Ed.), Expert Systems in Government Symposium, IEEE, Washington, DC, 1985, p. 212. R.P. Futrelle, in K.N. Karna (Ed.), Expert Systems in Government Symposium, IEEE, Washington, DC, 1985, p. 386. R.D. &low and L.J. Joncas, J. Chem. Educ., 57 (1980) 868. H.R. Smith, P.K. Fink and J.C. Lusth, in K.N. Karna (Ed.), Expert Systems in Government Symposium IEEE, Washington, DC, 1985, p. 128.