242
trends in analytical chemistry, vol. 4, no. lo,1985
.
Artificial intelligence used for the interpretation of combined spectral data H. J. Luinge and H. A. van ‘t Klooster Utrecht, The Netherlands
For many years a research object in our laboratory has been the computer-aided intepretation of spectra, aiming at the identification, classification or structure elucidation of organic compounds, using spectroscopic concepts and rules, information theory, retrieval methods, pattern recognition and decision trees. The purpose of one of our present research projects is the development of an integrated system for the interpretation of spectral data (ISIS), based on artificial intelligence. Ultimately, the system should be able to handle infrared, mass, nuclear magnetic resonance and ultraviolet spectral data or any combination of these. Especially from the combinations many advantages are to be expected, as the aforementioned spectral data are, at least partially, complementary. Existing interpretation systems differ from our design in using only one kind of spectral data (DENDRAL)i or in using more kinds of spectra separated in time (STREC)z. Our project aims at the use of any spectral information being available during the whole process of structure elucidation. In building the ISIS system, a growing amount of expertise from various sources in the field of analytical molecular spectroscopy is going to be implemented. In the starting phase of the project, we have restricted ourselves to infrared and mass spectra, and to compounds containing only carbon, hydrogen and oxygen. The need to expand to other spectral data and compounds containing other elements as well, implies that addition of new interpretation rules should be possible in a straight-forward way. For the development of ISIS we use PROLOG, an effective computer language for programming artificial intelligence. As described elsewhere3,4 this language is extremely suitable for representation of relations between spectral and structural data. Moreover, the list descriptions of PROLOG can be used to represent and manipulate chemical structures. As for now, we use a micro version of PROLOG called micro-PROLOG, running on an Apple IIe micro0165-9936/85/$02.00.
computer5. At present, the system consists of a knowledge base, containing some interpretation rules, an interpreter which applies these rules, and an interface between user and knowledge base to offer the possibility of adding new rules in a straighforward manner without having to know anything about PROLOG as a programming language. The rules can be divided into two classes, one class containing rules giving evidence that some substructural unit is present, the other that some unit is absent. That both classes of rules are necessary can be seen from the fact that absence of positive evidence for a certain unit does not always mean that the unit is absent. It is, for example, very well possible that a compound contains a certain functional group, although, due to molecular symmetry, the infrared spectrum does not show the corresponding absorption. At present, the system can be used in two modes: in the first mode, the interactive mode, the user is asked for any information necessary to determine the structure of the unknown; in the second mode spectral data can be entered in advance so the system can operate as a turnkey system. The spectrum of an unknown compound is checked for the presence or absence of spectral features, representing a number of predefined classes, such as alcohol, aromatic, carbonyl and others, so that the search for a solution to the problem can be reduced by leaving aside the classes which evidently are absent. Then, for every class that could be present, a further specification is made so that one can differentiate a primary from a secondary alcohol, or a monosubstituted from a parasubstituted aromate. An example of a dialogue with the system in the interactive mode is given in Fig. 1. As an illustration of the way the ISIS system is using the knowledge in the knowledge base, a simplified part of the PROLOG program is given in Fig. 2. The first statement in the control program will have the effect that the knowledge base is searched for all predefined classes followed by a check for the presence or absence of the classes. The rule for checking a class x consists of three conditions: the 0
Elsevier
Science Publishers
B .V.
trends in analytical chemistry, vol. 4, no. 10,198s
IS
THERE
IS
THIS
ANY ABSORPTION
BETWEEN
A BROClD ABSORPTION
3575
243
AND 3170
OF AT LEAST
CM-1
IiODERATE
?
INTENSITY
Y
DOES THE t46SS SPECTRUM CONTAIN A PEAK CORRESPONDING . . . THE LOSS OF R FRAGMENT WITH N/E = 18 ? :HE COMPOUND LET’S TRY TO IS THERE &NY n IS THERE ANY
?
WITH
CONTAINS SOME ALCOHOL FUNCTION; FIND OUT SOtlE IIORE . . . ABSORPTION BETWEEN 1095 AND 1005
CM-’
?
ABSORPTION
BETWEEN
1135
4ND
1055
CM-’
?
. . .
”
IS ” IS
THERE
ANY ABSORPTION
BETWEEN
1210
BND
1145
CM-’
?
THERE
ANY ABSORPTION
BETWEEN
1615
AND
1590
CH-’
?
Y IS
THIS
:HE
A SHARP
COHPOUND
IS
ABSORPTION
OF BT LEaST
NODERATE
INTENSITY
?
CI PHENOL.
Fig. 1. Example of a dialogue with ISIS in the interactive mode.
first searches for evidence of any kind (positive, negative or no evidence), the second prints out the evidence found, and the third will further evaluate it, i.e. it will look for any subclasses when the absence
start
if (forall class(x) then check(x)) check(x) if y evidence x P c y print x & 0 y evaluate x N x evidence y if T x presence y R x evidence y if 0 x evid-for y & L (x presence y) add pos print X if PP(THE COt!POlJND CONTAINS SOME X FUNCTION;) PP(LET’S TRY TO FIND OUT SONE MORE...) neg print x no print X if PP(SOHE X FUNCTION COULD BE PRESENT;) & PPCLET’S TAKE A CLOSER LOOK...1 neg evaluate x x evaluate y if (forall y true-of (2) then xl sub-evidence z & x sub-print
class(aromatic) class(alcoho1) neg evid-for aromatic if not abs ((1630 1525 pas evid-for aromatic if ne ((77)(6.5)) neg evid-for alcohol if not abs (I3575 3170 pos evid-for alcohol if H-me ((la)) aromatic(mono-subst-phenyl) aromaticldi-12-subst-phenyl) aromatic(di-13-subst-phenyl) aromaticfdi-14-subst-phenyl) alcohol(prinary-alcohol) alcohol(secondary-alcohol) alcohol(tertiary-alcohol) alcohol (phenol)
P R 0 G R A n
P
zl
of the parent class has not been confirmed. In this example only the first condition will be further explained. In case the presence of a class has already been determined, it is not necessary (i.e. in this search routine) to try to find more evidence for its presence or absence. Therefore, the first rule that matches with the evidence conditions will check whether class x is already present. If this is not the case, the match will fail and the program will search for another possibility to satisfy the evidence condition. This possibility is given by the fourth rule which will actually search the knowledge base for positive or negative evidence. Any evidence found will then be added as a fact to the working space of the computer. As can be seen in the knowledge base, negative evidence for the presence of an alcohol function will be found, when there is not at least a moderate absorption between 3575 and 3170 cm-i of broad shape in the infrared spectrum. Positive evidence is found, when the mass spectrum contains a peak corresponding with the loss of a fragment with mass 18. By searching and checking all classes given in the knowledge base, and evaluating those for which positive or no evidence is found, detailed information can be provided about the structure of the compound under study. Up to now, ISIS is capable of detecting aromatic and alcohol functionalities only. Using a testset containing 494 infrared and mass spectra of organic compounds with molecular weights up to about 300 a.m.u., for all aromates (n = 114) and alcohols (n = 120) the correct classes and subclasses were found. Of the non-aromatic compounds 89% was correctly classified (11% wrongly as an aromate), and of the non-alcohols 96% were correctly classified (4% wrongly as an alcohol). Our attention will be focussed on the further development of ISIS as an expert system by expanding the knowledge base with more interpretation rules and improving the interpretation efficiency. References
any
any)(1515
moderate
broad))
1450
moderate K N 0 w L E D G E
any))
B A s E
Fig. 2. Simplified part of the ISIS program written in micro-PROLOG, consisting of a controlprogram and a knowledge base.
R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum and J. Lederberg, Applications of Artificial Intelligence for Organic Chemistry - the Dendral project, McGraw-Hill, New York, 1980. L. A. Gribov, M. E. Elyashberg and V. V. Serov, Anal. Chim. Acta, 95 (1977) 75.
D. L. Massart and J. Smeyers-Verbeke,
Trends Anal. Chem.,
4 (2) (1985) 50.
W. F. Clocksin and C. S. Mellish, Programming in Prolog, Springer-Verlag, Berlin, 1984. K. L. Clark and F. G. McCabe, Micro-PROLOG: Programming in Logic, Prentice-Hall International, London, 1984. H. J. Luinge and H. A. van ‘t Klooster are at the Analytisch Chemical Laboratorium of the Rijksuniversiteit Utrecht, Croesestraat 77A, 3522 AD Utrecht, The Netherlands.