Journal of Moleculur Structure, 267 (1992) 261-268 Elsevier Science Publishers B.V., Amsterdam
Integrated Expert System SCANNET for Storage Analytical Chemistry.
261
and Retrieval of Spectral Data in
B. Debska and Z.S. Hippe Department of Computer Chemistry, Technical University 35-041 Rzesz6w, Poland
Abstract
Computer program system SCANNET, now in its final shape, may be widely applied in qualitative analysis of organic substances, i.e. in identification of the structure of a given (unknown) compound. The main feature of the system is the simultaneous access to more than one database and also simultaneous display (on one screen) up to six different (%-NMR, ‘H-NMR, IR, MS, FL4 and UV) spectral curves for profound and advanced interpretation.
1. INTRODUCTION One of the commonly accepted methods of structural identification of organic compounds (i.e. elucidation of the structure of an unknown, pure substance) is the application of so called library search (LS) algorithm [l]. The algorithm relies on the sequential comparison of a molecular spectrum of an unknown compound with subsequent spectrum from the library of respective standard reference spectra. Identity of the spectra compared proves the identity of the structures, whereas the similarity may suggest the similarity - to a given extent - of the two compared structures (unknown and reference). Previously, the collections of standard reference spectra were available in the form of books or journals [2-41, microfiches [5], or microfilms [6]. The search through such collections is very awkward and time consuming. These drawbacks have been removed by the implementation of modern spectrometers equipped with the mini- or microcomputer for spectra acquisition [7]. Additionally, that type of instruments may be fitted with firmware programs for fast and convenient access to and manipulation of information, gathered in automatically created spectral data banks. Usually, these spectral banks contain data for a selected molecular spectroscopy [8-111. There exist, however, systems that enable to store 2-5 different types of molecular spectra [12]. Such multimethodical spectral data banks seem to be more usable and convenient for chemists, because the combined information from various molecular spectroscopies may support mutually, leading to more reliable results in solving the problems of structural analysis. Following the multimethodical approach, briefly here outlined, the integrated computer program system SCANNET has been developed at the Department of
0022-2860/92/$05.00
0 1992 Elsevier Science Publishers B.V. All rights reserved
262
Computer Chemistry, TU Rzesz6w’. 2. THE ARCHI’l’KCTURE OF THE INTEGRATED SYSTEM SCANNET
The general architecture of the system SCANNET follows some fundamental rules which determines the required features of the system and ensure the effectiveness of implementation of its tasks as an expert system for structural analysis [13]. As main modules of the system we may enumerate the knowledgebase (SCANNET Multispectral Knowledgebase, SMK) and a package of utility programs; it plays the role of a specific inference engine combined with the user interface (SCANNET Inference Engine & User Interface, EUI). Some more details about the main parts of SCANNET are given below. 2.1. SCANNET Multispectral Knowledgebase, SMK The system enables the acquisition and retrieval of spectral data for six different analytical methods, namely: nuclear magnetic resonance spectrometry (13C-NMR, ‘H-NMR), infrared spectrometry (IR), mass spectrometry (MS), Raman spectrometry (RA) and ultraviolet spectrometry (UV). Besides of spectral data, also other important information (for example, structural, factographical, analytical, etc.) is stored; thus the entire spectral bank is upgraded to the level of comprehensive knowledgebase. The overall structure of the spectral/supplementary information of the system is shown in the Figure 1. Database of General Information
P-4 T
J
Multimethod Spectral Database
WDI 13C_EXP 13C SPE
1H EXP lH-SPE
IR EXP IR-SPE
MS EXP MS-SPE
RA_EXP RA SPE
UV_EXP UV SPE
Figure 1. Structure of the knowledgebase of the system SCANNET. 2.1.1. Database of General Information, [DGI] Database of General Information is located into a separate file. Each record of the file contains alphanumeric information about a compound being regarded, such as: structure, name, chemical formula, CAS-number, selected physico-chemical parameters and
‘The grant CPBP 01.17 for research acknowledged.
and development
of the system
(1986- 1990),
is kindly
263
pointers to records in Multimethod Spectral Database for fast access to spectra. The general information within the system has the format common for all types of spectral data. The content of a selected record is exemplified in the Figure 2. (the screen for bromobenzene).
Info on tbc chemical corparnd
ml.
chemical fomla: lzui5liR Density:
~.WL~E
Dipole ant: Dicl.
In general
database
bmmknzcne
r(arC:
cmst.:
fg/mYl.
I.788 IDI, 5.m
Eight: 1!57.828
CK-llllDbCr :
state: Ilguid
1.55978
temp. 293.15 [PI
Fkfractluc
temg. 233.15 WI
IkIting
point:
242.358
nil
Bulling
pint:
t.?Y.Ea
WI
, tap.
293.15 WI
Index:
Solubilitu:
Figure 2. General information screen for bromobenzene. 2.1.2. Multimethod Spectral Database, MSD Multimethod Spectral Database, being a part of the system knowledgebase (see Figure l), consist of twelve loosely connected files, two for each spectral method used. In any case, separate information is stored about conditions of a spectrum preparation (files named * EXP) and about spectrum itself (files named * SPE)2. The structure of records 07 * EXP files depends on the spectral method. Detiils are given in Table 1. As for the-* _SPE files (with various molecular spectra), in each record is kept the following information: (i) definition of registration coordinates, (ii) discrete spectrum (only for ‘H-NMR, IR, RA and UV), and (iii) parameters of absorption bands (peaks), for all types of spectral method used (computed for ‘H-NMR, IR, RA and UV from discrete spectra or inputted for 13C-NMR and MS). 2.2. SCANNET Inference Engine & User Interface, EUI The main and really unique feature of EUI is the ability to access simultaneously all six spectral databases, enabling the inspection - at the same very moment - all molecular spectra of a given compound, on one screen (Figure 3.). There is also convenient function of EUI to swap to any of six spectra presented on the screen for more elaborated inspection. 2Here the asterisk represents * means interchangeably
a global name related to the type of the spectral method (hence, 13C, lH, IR, MS, RA or UV).
264
Table 1 Content of records in spectral databases Database
Record content
13C_EXP
instrument, temperature, sweep width, pulse width, pulse repetition time, number of scans, transform size, observe frequency 13C,irradiation frequency ‘H, lock frequency 2H, kind of spectrum location, reference, sample tube, purity, solvent, solute, solution volume
lH_EXP
instrument, temperature, sweep width, sweep time, sweep offset, spectrum amplitude, reference, solvent, concentration, purity
IR_EXP
instrument, temperature, sample preparation, solvent, frequency region, concentration, cell thickness, optical material of cell, purity, vertical axis, horizontal axis
MS_EXP
instrument, ionizing method, electron beam energy, sensitivity, sensitivity against n-butane, inlet temperature, ionization chamber temperature, ionization current, purity
RA_EXP
instrument, temperature, excitation source, detecting system, exciting radiation, slit width, resolution, horizontal axis, sensitivity, purity
UV_EXP
instrument, solvent, concentration, cell thickness, purity, vertical axis, horizontal axis
Figure 3. Molecular spectra of bromobenzene, presented on one screen
265
3. MAIN FEATURES OF SCANNET INFERENCE ENGINE & USER INTERFACE Some of the very powerful features of SCANNET utility programs (inference engine & user interface) are discussed using the examples of: (1) input of a structure, (2) input of a spectrum, (3) spectrum pre-processing and (4) searching through a spectral database. 3.1. Input of a structure. The structural information is inputted in the form of truncated list of atoms and list of
bonds. An example .of the molecular editor screen for bromobenzene is shown on Figure 4.
tI
field
ulwedlt
PI help
fmcesr
Figure 4. List of atoms and list of bonds for bromobenzene arbitrarily).
(atoms are numbered
That form of input is then canonized to get the unique and unambiguous structural representation of a given molecule. The canonization process is necessary, because there are no constraints in the way of assignment of numerical identifiers to particular atoms of the molecule; in other words, the user may code his/her structure fully arbitrarily. This may eventually lead to millions of various codes for the same structure. However, owing to the very fast and powerful canonization algorithm applied by us, the system writes in less than 0.3 seconds onto the disc the canonical form of the structural code, and then automatically displays the structure on the screen (Figure 5). The correction of eventual errors, say double bond instead single bond, etc., is still possible. Using selected function keys (working as “buttons”), the user may conveniently change the shape and content of the display.
266
Figure 5. Example of the structure (bromobenzene)
displayed by the system.
3.2. Input of a spectrum The initial information required to begin the process of creation of any spectral database in SCANNET, is the structure of the compound being regarded. Applying the procedure briefly described in the section 3.1., the system is forced to check whether or not the structure entered is already stored in Database of General Information. Presence/absence of the structure involves fully automatic selection of the option: updating/loading of the data. Both procedures are executed in very user-friendly form: the technique of movable bar is used throughout the system to select the required field and to fill it properly (see Figure 6.).
spectrurno.1
Imtrummt: Ionizing
tE Consolidated
retkod:
Elcctrmlbear sem1t.
fbdcl
EI - cIectmn
cnmgy:
aga1mt
n-butane:
Ionizationchamber tmjl.: Purity:
21 IBJ
running mnd1t10ns
tIeam lonizf4tIml
78.8
EmsIt.: 77.m
kul
Inlet
79.38 25e.e
[‘Cl
Cpak hcightmm Hgl
taparature:
Ionlz.
currmt:
[%I 18.5
anal.
Figure 6. Screen for input/updating of MS-spectrum running data
II
[IeI II
267
One step further, the same technique is used to enter true spectral data (Figure 7., an example of MS-spectrum).
tbmltcr>ed1tband paramztcrspb importspectraldata
Figure 7. Input of spectral data for the MS-spectrum
3.3. Spectrum pre-processing This procedure is performed on ‘H-NMR, IR, RA and UV-spectra only. Using some special coefficient and parameters, that characterize the instrumentation used by us to record the spectra and to convert them into the digital form, the system automatically recalculates the data into so called “real” spectra. It means, that real values of two-dimensional coordinates of a spectrum being regarded are computed, and downloaded onto the disk. During the realization of spectrum pre-processing, the user is instantly informed about each step of the entire operation. Particularly, the information is automatically displayed about: reading/writing the spectrum from/onto the disk, accumulation of spectra, smoothing of the spectrum, identification of isolated and/or overlapped peaks, optimum resolution of the overlapped peaks and calculation of spectral parameters of subsequent bands (peaks). 3.4. Searching through a spectral database There are two main options of searching process: (i) searching for a given structure or (ii) searching for a given spectrum (spectra). In the first case, the user gets the information whether or not the compound being investigated is recorded in the database. If yes, the system automatically displays all various spectra (if all recorded) on one screen (Figure 3.). From this level of hierarchy, SCANNET informs about structure of the compound, its properties and other general data. Also, any selected spectrum may be zoomed and inspected in details. In the second case SCANNET allows to find a spectrum (spectra) identical and/or similar to the spectrum entered. Here, the spectrum of an unknown compound may be entered as a list of spectral parameters of all meaningful1 bands (peaks) or may be directly read from the disk, in the discrete form.
266
As a result of the searching procedure, a list of chemical names of compounds, having the spectra identical (fitting factor = 1.0) or similar (fitting factor < 1.0) to the spectrum of unknown compound is displayed. Some other more sophisticated searching procedures are available, for example searching through stepwisely narrowed subsets of similar spectra [14], etc.
4. SUMMARY The computer program system SCANNET may be regarded as a very powerful and convenient research tool for identification of chemical structures, based on the search through specifically organized multispectral knowledgebase. This knowledgebase contains i.a. spectral databases for six different analytical methods: i3C-NMR, ‘H-NMR, IR, MS, RA and UV. In any case, the system enables the storage up to 5 spectra for any compound (this spectra may be run under different conditions, using various instruments, in various coordinates, etc.). The information contained in the knowledgebase may also be used just for the documentation purposes. It seems, that SCANNET may be implemented in any research lab dealing with molecular spectroscopy; it already has found successful application in teaching of graduate students on university level.
5. REFERENCES 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Z.S. Hippe, Artificial Intelligence in Chemistry: Structure Elucidation and Simulation of Organic Reactions, Elsevier(Amsterdam)/PWN(Warsaw) (1991). Chemical Abstracts, Chemical Abstracts Service, Columbus, Ohio, USA. Beilstein-Handbuch der organischen Chemie, Beilstein Institut, 6000 Frankfurt 90, FRG. CRC Atlas of Spectral Data and Physical Constants or Organic Compounds, 2nd Ed., The Chemical Rubber Company, Cleveland, Ohio, USA, (1975). W. Bremser, L. Ernst, B. Franke, R. Gerhards and A. Hardt, Carbon-13 NMR Spectral Data, Verlag Chemie, Weinheim, (1981). The Sadtler Handbook of Infrared Spectra, Heyden, London, (1978). J. Zupan (editor), Computer-Supported Spectroscopic Databases, Ellis Horwood Series in Analytical Chemistry, New York, (1986). The NIH-EPA-Chemical Information System Status Report No. 6 (1977) National Institute of Health, Bathesda Maryland USA. F. Erni and J.T. Clerc, Helv. chim. Acta 55 (1972) 489. V.A. Koptjug, Z. Chem. 15 (1975) 41. J. Zupan, M. Penca, D. Hadzi and J. Marcel, Analytic. Chem. 49 (1977) 2141. E.J. Karjalainen (editor), Scientific Computing and Automation (Europe), Elsevier Sciece Publishers B.V., Amsterdam (1990) R.I. Levine, D.E. Drang and B. Edelson, A Comprehensive Guide to AI and Expert Systems, McGraw-Hill, New York (1986). TU-Rzeszow Res. Report No. U-1573, Rzesz6w (1990).