Automated interpretation of vibrational spectra

Automated interpretation of vibrational spectra

3 Vibrational Spectroscopy, 1 (1990) 3-18 Elsevier Science Publishers B.V., Amsterdam Review Automated interpretation of vibrational spectra H...

2MB Sizes 0 Downloads 108 Views

3

Vibrational Spectroscopy, 1 (1990) 3-18 Elsevier Science Publishers B.V., Amsterdam

Review

Automated

interpretation

of vibrational

spectra

H.J. Luinge Department

of Analytical

Chemistry,

University of Utrecht, Croesestraat (Received

77a, 3522 AD Utrecht (The Netherlands)

16th May 1990)

Abstract Automated interpretation of spectra has been a subject of research for many years. In this review attention is focused on computerized systems capable of deducing structural information from vibrational spectra. Different techniques used are compared and some future trends are addressed. Keywords: Infrared sisted interpretation

spectrometry; Review; Automated spectral of spectra; Knowledge-based system

0 1990 - Elsevier Science Publishers

Structure

determination;

Computer-as-

library search techniques, pattern recognition methods and knowledge-based approaches [2]. When performing a library search, the spectrum of the unknown is compared with a large collection of reference data. The result is generally a list of spectra that are most similar to the unknown. Important aspects with respect to library search are the representation of the data, the similarity measures used and the search algorithm applied. All these factors influence the speed of the retrieval and comparison process, and the data representation also determines the amount of space necessary for the storage of the reference collection. Library search systems are currently commercially available from most instrument manufacturers. Also, spectral libraries containing several hundreds to thousands of spectra of widely varying compound types can be obtained. Compared with the number of currently known compounds, however, this still is a relatively small number, Hence, the correct answer to a particular problem is not always found. In those cases, other methods might be more appropriate. In pattern recognition approaches, it is assumed that similar compounds will have similar spectra. A division can be made into non-supervised and supervised techniques. Non-supervised

Structure elucidation is an important area of Chemical and physical research in chemistry. properties and biological and pharmaceutical activity are known to be strongly correlated with structure. Many techniques have been developed to obtain information on the composition and structure of a compound, amongst which vibrational spectrometry takes an important place. Several surveys have found infrared spectrometry to be the first or second most rapidly growing instrumental technique. Especially the introduction of Fourier transform spectrometers and the development of new sampling techniques (ATR, DRIFT, PAS) have led to increasing interest. As infrared spectra generally are complex, much is required for the interpretation. experience Hence, as with other spectrometric techniques, attempts have been made to automate the interthe need for autopretation process. Recently, mated processing has become even more imof chromatographic portant as combinations methods with infrared spectrometry (GC-IR, LC-IR) have increased the amount of spectra produced enormously. A review of these “hyphenated” techniques has been published [l]. Automated methods for structure elucidation can roughly be divided into three categories: 0924-2031/90/$03.50

interpretation;

B.V.

H.J. LUINGE

methods are used to find classes of similar spectra.

In many cases a mathematical function is applied to define an abstract model for each class. These models are used during supervised pattern recognition in order to classify the spectrum of an unknown compound. An advantage of pattern recognition approaches is that no a priori assumptions have to be made regarding the spectral information used to discriminate between classes. Knowledge-based methods attempt to encode the logic of the visual spectral interpretation as carried out by humans into an automated computer-based procedure [3]. Whereas the success of a library search depends on the size and quality of a data base of spectra, the success of a knowledge-based approach depends on the quality of a data base of interpretation rules. As such a database contains knowledge about the interpretation process, it is called a knowledge base, and the implemented procedure a knowledge-based system (or expert system). The acquisition of knowledge is one of the major issues in the development of such systems.

LIBRARYSEARCHSYSTEMS The first attempts to automate the retrieval of reference spectra date from 1951 when Kuentzel [4] described the application of an IBM card sorter to punched cards containing infrared absorption maxima. Since then, several reports have appeared on the use of computer techniques to search spectral data bases stored on magnetic tape or disk. In order to minimize the amount of computer storage needed and to reduce search times, spectral representations have been proposed that limit the amount of data, while retaining maximum information. Initially, use has been made primarily of binary data [5-141. A spectrum is divided into specific resolution elements and each one is represented by a binary digit. The value of a bit indicates the presence or absence of a peak in the corresponding resolution increment. Although peak location search systems were found to be efficient in the storage of spectral data, the incorporation of intensities [15,16] and

band widths [17,18] greatly extended the ability of systems to discern between similar spectra. As computer power and memory capacity increased, libraries with complete absorbance spectra were compiled. Searches have been performed with entire absorbance spectra, limited regions of the spectra and the entire spectrum with a weighting function applied to (de)emphasize selected areas of the spectra [19]. Improving search performance by using alternative similarity measures Currently several mathematical functions for calculating the similarity of two spectra are in use, of which the most common are the sum of squared absorbance differences, the sum of absolute absorbance differences, the sum of squared absorbance differences of the first derivative spectrum and the sum of absolute absorbance differences of the first derivative spectrum [19]. Other similarity measures are the dot-product of two spectra, yielding a value of 1 for two equal spectra and a value of 0 for orthogonal ones [20] and the correlation coefficient [21]. Powell and Hieftje [22] used the cross-correlation function to determine the similarity of unknown and reference spectra. They applied a weighting procedure by calculating the mean cross-correlation for the range - 10 < r < 10 and subtracting this mean from each point in that range. The resulting value at zero T was used as an indicator of similarity. As the correlation technique is slower than conventional file-searching methods, it will probably find greatest use in the search of small collections of similar spectra or as a match-ranking procedure following preliminary selection by a faster search method. Yu and Friedrich [23] applied the odd moments of the cross-correlation function for library searching and obtained comparable results with searches based on correlation coefficients. Delaney et al. [24] introduced a composite Grotch metric for comparison of highly compressed binary vapour-phase data. In this metric a match based on the absence of peaks in both reference and sample spectrum can be weighted differently from a match based on their presence. The weight parameter was adjusted in order to

AUTOMATED

INTERPRETATION

OF VIBRATIONAL

SPECTRA

maximize the separation between the correct best matching spectrum and the rest of the library. Blaffert [25] applied fuzzy set theory to spectral library searching. Peak tables of spectra can be compared by counting the peaks that occur in both reference and sample spectra. Because of random variations in the exact peak positions, use can be made of fixed peak windows. All peaks within the window are then said to belong to the spectrum. Blaffert introduced the use of a continuous grade of membership between 0 and 1. A peak in the middle of the window has a membership value of 1, whereas if the distance is very large it becomes zero. The shape of the membership function might be Gaussian or Lorentzian, but nonstatistical functions are also possible. Membership functions can be applied to both wavenumber and intensity data. Compression by Fourier transformation The need for efficient search-identification methods in infrared spectrometry has become increasingly important. This is primarily the result of the increased growth of spectral libraries and the rapid maturation of gas chromatography-infrared spectrometry (GC-IR) producing large amounts of spectral data. Much effort has been given to the development of data compression techniques in order to keep storage requirements to a minimum and obtain fast search speeds while retaining as much information as possible. One method of effectively compressing a spectrum is to take its Fourier transform (FT). The interferogram contains a large part of the signal information concentrated in a relatively small region very near the centreburst [20]. As the noise is randomly distributed throughout the interferogram, the effective signal-to-noise ratio is increased when this small region is used. It has been demonstrated that the search reliability of an interferogrambased system can be extremely high [26], even when only 100 data points selected from a region displaced approximately 60 points from the centreburst of the Fourier representation are used [27]. GC-FT-IR manufacturers have provided users with Gram-Schmidt reconstruction software employing just this region of the interferogram. White et al. [28] have shown, however, that the

5

optimum portion of the interferogram to be used for Gram-Schmidt orthogonalization reconstructions is heavily dependent on the identity of the mixture components and also the GC-FT-IR instrument stability. According to their results, the 60-point displaced region is not the optimum choice. By nulling parts of the spectrum that yield little information for the discrimination of similar compounds and transforming to the Fourier domain, superior results can be obtained [20]. The method is capable of distinguishing compounds that are difficult to discriminate by other infrared search methods. Spectral null i ng may yield an improvement of two orders of magnitude over full bandpass spectra. A further compression of the time domain representation was described by Owens and Isenhour [29]. By assigning bit values of 1 to data points in the time domain representation greater than 0 and a value of 0 to the remaining ones, a clipped representation is obtained. At least a 256-bit search vector is needed in order to maintain compound specificity. The compression allows a lOO-fold decrease in storage requirements. Kawata et al. [30] proposed a method of library searching in which only the phase components of Fourier transforms of both the sample and the reference spectra are used for spectral identification. Tests have shown that the phase-correlation algorithm has a better discriminatory power than ordinary correlation methods. Also, samples are correctly identified with spectra recorded under different instrumental conditions. It appears that the Fourier phase essentially contains the information on the peak positions and peak-height ratio, whereas the amplitude contains the peak broadening and system function convolved with the spectra. Compression by principal component analysis Another dat a compressi .on techniq ue used to reduce storage space is factor or principal component analysis (Fig. 1). This technique is capable of analysing large data bases and removing redundant properties, leaving only significant information. A detailed description of factor analysis can be found in [31]. Hangac et al. [32] obtained a

6

HJ. LUINGE

5-fold reduction in search file size and a proportional reduction in search time with no deterioration in search performance. The method was extended to combined spectral data by applying factor analysis to a library of concatenated infrared-mass spectra [33]. A study on the effects of different types of data pretreatment before compression revealed that the best results were obtained with no pretreatment at all (covariance about the origin) [34]. Information about the mean and the relative errors between resolution elements is best preserved in this way. In some cases dividing each absorption by its standard deviation (correlation about the origin) gave a better performance, but this type of pretreatment is more sensitive to noise. Harrington and Isenhour [35] recently described a method for compressing spectral data using robust eigenvectors in order to cope with outliers and non-normal distributions in the data. Searches of libraries compressed with robust eigenvectors performed better than when compressed with conventional eigenvectors and similarly to using a non-compressed library. By looking for spectral entries with a low retained variance, poor quality data in a library could be located.

the unknown. A comb!nation of some form of dimensional reduction and prefiltering should multiply the benefits of search time saving. Relatively little work has been reported on prefilter strategies for infrared library searches. Wang and Isenhour [36] applied principal component analysis to Fourier-transformed absorption spectra. They developed a search prefilter based on factor analysis of the time-domain representation of a spectral library. The values of the loadings were stored in an index file and used to access a primary file of compressed data. Anderegg and Pyo [37] developed a methodology for selecting the subset of the library most likely to contain a particular functionality by comparing patterns of absorbances. From a database of vapour-phase spectra a set was selected of which the corresponding compounds contained a particular functional group. From these data an average spectrum within a spectral window (e.g., 3800-3400 cm-’ for OH-containing compounds) was calculated that was considered to represent the functional group. Comparison of reference spectra with the average spectrum using a difference square metric yielded a score reflecting the similarity with the average. References with similarities above a preset threshold were further analysed. Bjerga and Small [38] recently developed a method for automated selection of library subsets for infrared spectral searching based on the calculation of principal components from a spectral data base. The projected spectra are used to find regions of spectral similarity. Subsequently, the spectra lying in the region closest to the unknown are compared with the sample spectrum. In this way the advantages of a full-spectrum comparison are retained whereas the overall search speed is increased. An implementation of a peak search as a prefilter for a full spectral search in an IR detector for gas chromatography (GC) has been described recently by Cooper and Wilkins [39].

Improving search speed by using subsets An alternative approach to speeding up the search process is to select a subset of the spectral library likely to contain the spectra most similar to

Improving search speed by using hierarchical trees Zupan and Munk [40] introduced the use of hierarchical trees for the storage and retrieval of infrared spectra. For the retrieval of any spectrum

Specml litmy

c

rl

PCA ”

c

u com-

Etg

n

l-l

Pmjcction

Eigen

-x-”

L-l

Fig. 1. Schematic representation of the eigenvector compression of a spectral library matrix containing n spectra of u features using c principal components.

AUTOMATED

INTERPRETATION

OF VIBRATIONAL

SPECTRA

in the library the number of comparisons is proportional to log N, where N is the number of spectra in the tree. The hierarchical tree is generated by incorporating spectra into an already existing tree. The location of incorporation depends on the Euclidean distance of the spectrum to the spectra in the tree at a particular node. Although the time required to build a hierarchical tree can be long, a search through the tree is extremely fast. The algorithm has been adapted [41] to offer the possibility of restarting a search at different nodes in the trees rejected during the first pass. A search will be more time consuming in this way, but will yield more reliable results. Evaluation

of search performance

Methods for the evaluation of library search performance can be roughly divided into three classes. The first of these is the subjective inspecfrom a search. tion of the “hit list” obtained Although this is a statistically not well founded method, it is the most common approach. The second class of methods counts the percentage of correctly identified spectra in order to assign a quality measure to a search. These methods suffer from the bias that the results clearly depend on the quality of the test spectra. The last group of methods attempts to assess the degree to which similar compounds have similar spectra. The most important problem here is the difficulty in quantifying structural similarity. An example of the latter class of methods in which use is made of propagation trees has been described [42]. The n spectra best matching an unknown are subsequently used as starting points for new searches. Each closest match is represented by a node in a propagation tree built to an arbitrary depth. The number of different spectra in a given size of tree indicates the ability of the search system to retrieve similar compounds. Delaney et al. [43] described a method for evaluation of library search performance that avoids the need to quantify structural similarity, but implicitly employs the chemical information embodied in the compounds on the hit list. In their QELS method (Quantitative Evaluation of Library Searching Performance), each spectrum in

7

a test set is searched for in a reference library and the resulting lists of matches are stored. The performance of the method under investigation (e.g., a spectrum compression scheme or a similarity measure) is then studied by finding the index positions of each of the previously found matches when searched with that method. A weighted sum is obtained from the index positions. A figure of merit is calculated by normalizing with respect to the best and worst possible scores. Using the QELS method, a study on the effect of noise on library search performance was done [44]. A relationship between the signal-to-noise ratio (S/N) of vapour-phase FT-IR spectra and the utility for library searching was established. It was found that for S/N values from 2 to 5 fair to good results can be obtained, whereas values above 5 yield good to excellent results. As the QELS method had several disadvantages, e.g., the results depended on the size of the library and a standard library was required, Harrington and Isenhour [45] proposed an improved quantitative reliability measure (QRM) of library searching. The fundamental difference between QRM and QELS is that the intra-library closest matches for each library representation are used instead of the intra-library closest matches of a chosen standard library representation. Closure effects on library search performance have been described [46]. Two libraries were obtained from spectra normalized to unit length and spectra normalized to unit maximum absorbance. Searches of the former library used a dot-product and searches of the latter a Euclidean similarity metric. A quantitative reliability measure was used to establish the performance of both approaches. It appeared that there were more cases in which the dot-product search performed much better than the Euclidean similarity metric. Most of the time, however, the two searches performed similarly. Similar conclusions could be drawn when noise was added to the data. For both high- and low-frequency noise the dot-product search performed substantially better. For the medium noise the performance was more often better for the Euclidean similarity search. Rosenthal and Lowry [47] investigated the effect of different sampling methodologies on library

H.J. LUINGE

8

search performance. They concluded that sophisticated algorithms such as Kubelka-Munk and Kramers-Kronig transformations are required to obtain data that are comparable in terms of position, profile and intensity.

PATl-ERN

RECOGNITION

TECHNIQUES

Pattern recognition techniques represent spectral data as points in a multi-dimensional space, where each dimension corresponds to one spectral feature. Unsupervised techniques can be applied to find clusters of points in this pattern space, assumably corresponding to spectra of compounds with similar structural features. If a significant and reliable relationship can be established between few spectral features and classes of compounds, the physical basis for it is likely to be known and interpretation rules will probably exist already. However, when many more than a few spectral features are required to establish a correlation, it might be difficult to translate the relationship into an understandable interpretation rule. Hence, one has to adhere to rather abstract mathematical decision functions.

Unsupervised techniques The simplest approach is to create mathematical projections of pattern space onto two or three dimensions. Clustering of points may then be apparent by visual inspection. Selection of the most informative dimensions can be achieved by performing a principal component or eigen analysis. In many instances such an analysis yields a small set of orthogonal dimensions, as linear combinations of the original ones, that represent most of the variance in the data points. Hence, pattern space can be projected onto these dimensions while retaining most of the information present in the original space. Another group of methods, summarized as cluster analysis, tries to reveal clusters mathematically by calculating distances between data points and identifying groups for which the “intra-group” distances are significantly smaller than the “inter-group” distances (Fig. 2). Frankel [48] applied cluster analysis and eigen analysis to the EPA library of FT-IR spectra in

Fig. 2. Dendrogram pattern space.

representing

the clustering

of data in

order to identify classes of compounds in complex mixtures. The emerging patterns demonstrate the influence of molecular structure on the spectra in a way familiar to chemical spectroscopists. The results are also useful in the evaluation of a library, which generally is not error free, and in assessing the difficulties to be expected when using FT-IR spectra for complex mixture analysis. Jalsovszky and Holly [49] applied cluster analysis to the infrared absorption maxima of a set of alcohols measured in the vapour phase and in carbon tetrachloride solution in order to distinguish different structural classes.

Supervised techniques Most applications of pattern recognition methods to structure elucidation are supervised. Instead of revealing structure in pattern space by trying to find clusters of data points that correspond to specific structural features, the desired classes are predefined. Each spectrum in pattern space can be tagged with its appropriate class and these classified reference data can be used directly when analysing unknowns. The k-nearest-neighbour method, for instance, determines the distance of an unknown to all references and determines to which class the closest k references belong. If it is assumed that the distance measure really captures some structurally significant aspect of spectral similarity, an unknown should be structurally similar to and can thus be assigned to the same class as its nearest reference point. Actually this ap-

AUTOMATED

INTERPRETATION

OF VIBRATIONAL

SPECTRA

preach is just a conservative form of file search. The decision regarding class membership can be based on data for more than one neighbour, and various voting schemes can be devised under which the unknown is assigned to that class including the majority of its closest three, five and subsequent neighbours. A disadvantage of the k-nearestneighbour method is its computational cost. All distances between unknown and references have to be calculated and no use is made of possible clustering of the reference points. A less computationally costly method is the learning machine approach. Mathematical techniques can be applied to find decision planes that optimally separate references belonging to different classes. Such a plane can be described by a decision function, and is obtained by an iterative process of adjustment of this function until the classes are optimally separated. Clearly such a learning process will not always yield a perfect separation between classes. By applying the decision function to an unknown, a value is obtained that determines at which side of the plane the unknown falls and hence to which class it probably belongs. The performance of the decision functions can be assessed according to their success rate at classifying additional spectra of known test compounds. This prediction rate can sometimes be enhanced by applying suitable preprocessing steps, such as normalization, scaling or even by taking Fourier, Haddamard or Walsh transforms of the original data. Despite the relatively large effort expanded over almost 20 years, only a few examples of practical, routine uses of supervised pattern recognition in structure elucidation exist. Initially attention was focused primarily on the development of linear learning machines, capable of classifying spectra into one of several structural classes [50-571. A branching tree approach to the classification of monosubstituted phenyl rings was proposed by Tsao and Switzer [58,59]. They selected bands from the infrared and Raman spectra that correspond to phenyl ring vibrations and were able to determine the atoms attached to the ring with high accuracy. Using ratios of band heights proved to be the best scaling technique. Bink and van ‘t Klooster [60] combined prin-

9

cipal component analysis with information theory in order to derive classification rules for organic compounds from their infrared spectra. Domokos et al. [61] applied various supervised and unsupervised pattern recognition methods to a collection of 385 vapour-phase infrared spectra in order to study the effect of data reduction and preprocessing scale transformations on recognition ability.

KNOWLEDGE-BASED

SYSTEMS

Knowledge-based systems are computer pro-’ grams that apply knowledge of spectrum-structure relationships to spectral data in order to obtain information on the molecular structure of an unknown compound. In knowledge-based systems spectra are regarded as consisting of distinct spectral features each related to a particular structural fragment. Generally an interpretation proceeds by assigning structural features to the bands in a spectrum resulting in a set of substructures likely to be present in the unknown. Obviously, this approach is similar to the human way of interpreting spectra. I contrast to library search and pattern recognition systems, no large libraries of reference spectra are required for knowledge-based systems. Instead, a data base containing interpretation knowledge is present. It is this so-called knowledge base that is used during the interpretation of a spectrum. There is a clear distinction between the knowledge base and the so-called inference engine, i.e., the part of the system that applies the rules to the data. This separation makes it possible to modify, test and examine encoded spectral knowledge easily. Further, the representation of rules as data structures allow knowledge-based systems to present a comprehensible explanation of the results (Fig. 3). The structure elucidation process can be simplified as consisting of three stages. The first stage is the interpretation of the spectral data resulting in a set of structural fragments 1ikel.y to be present. The second stage is combining the fragments to all possible structures in accordance with a molecular weight or formula. The last stage is a

H.J. LUINGE

10

6

Knowledge base

4-l

lnfercnce engine

Results

Fig. 3. Geneml overview of a knowledgebased spectral analysis.

system for

prediction of the theoretical spectrum corresponding to each of the structures constructed followed by a comparison with the original data. Automated structure elucidation systems generally contain modules that can perform one or more of these tasks. The knowledge acquisition process plays an important role in the construction of a knowledgebased interpretation system. (Partial) automation of this process makes it possible to create small systems covering a particular problem domain in a relatively short time. A scientist working with steroids has little use for an algorithm that identifies the presence of a steroid backbone, but differentiating the substitution pattern in the steroid could be critical. Hence, being able to develop a tailor-made knowledge-based identification system for a particular class of compounds (semi-)automatically can be very useful. Spectrum interpretation of pure compounds The approach most often used in knowledge-

based systems involves the hypothesis of a particular structural fragment followed by a search for correlated spectral features in the knowledge base. An alternative approach is to select a spectral feature and to search the knowledge base for correlated fragments. The latter method is currently used in systems designed for the interpretation of NMR spectra.

Initial work on the CHEMICS system of Sasaki and co-workers dates from the early 1970s [62]. The system uses a list of 189 small standard fragments, typically containing a single carbon or heteroatom with a specified number of hydrogens, for which spectrum-structure correlations have been stored. The interpreter applies these correlations to infrared, NMR and mass spectral data and excludes the fragments for which no evidence is found. The result is a set of structural units that is used to construct all possible complete structures. The actual interpretation of infrared data [63] starts by calculating two parameters from the three strongest bands in the regions of 3700-3200 cm-’ (ISI), 1899-1700 cm-’ (IS2) and 1699-1500 cm-’ (IS3). The values of ISl, IS2 and IS3 are calculated by the empirical equation ISi = 512[1 (T/100)], where q is the percentage transmittance of the strongest band in the corresponding spectral region. Depending on the scores of ISl, IS2 and IS3, the presence, uncertainty or absence of carbonyl and hydroxyl groups in the unknown compound is established. In a similar way several other parameters are calculated, compared with stored threshold values for each structural fragment and used to determine their presence or absence. After the interpretation of NMR and mass spectral data, the surviving substructures are combined into all possible sets completely satisfying the molecular formula. Subsequently all possible structures are constructed by combining the fragments in each set in all possible ways. In subsequent papers the overall system was improved by allowing for the introduction and rejection of substructures (macrocomponents) selected by the user as being present or absent in the unknown compound [64,65]. This information further constrains the structure-generation process, yielding fewer candidate structures. Other extensions have been the introduction of the elements nitrogen, sulphur and the halogens, yielding a total of 630 standard fragments [66], and the adaptation of the structure generator to cope with the new fragments [67]. The results of the latter system, however, are subject to doubt with respect to both exhaustiveness and non-redundancy [68]. Very recently an infrared analysis system based

AUTOMATED

INTERPRETATION

OF VIBRATIONAL

SPECTRA

on symbolic logic (IRASSL) has been described [69]. This program was designed for a better utilization of infrared data by the CHEMICS system (symbolic logic applied to structure elucidation from infrared spectra was first described by Gribov and co-workers [70,71]). After having entered infrared spectral data and the molecular formula of the unknown compound, the program focuses on some of the absorption bands and the corresponding substructures are picked from a correlation table consisting of 258 substructures. Next, the relationship between each of the bands and the substructures found is established as a logical equation. Having solved the equation, all possible combinations of the selected substructures are constructed and stored if not exceeding the molecular formula. Substructures that are present in all of the resulting sets are said to be present in the unknown compound and assigned a label A. Those belonging to the same functional group in a relation of exclusiveness are labelled B. From the results thus obtained, the user may construct a large structural fragment likely to be present and enter it into the CHEMICS system as a constraint during structure generation. In 1974 Visser and van der Mass [72] started their work on the systematic computer-aided interpretation of vibrational spectra. They enabled a computer to perform interpretations of Raman and infrared spectra by using hard-coded FORTRAN subroutines as interpretation rules. In subsequent years a program (CRISE) was developed capable of determining characteristic spectral intervals for different functional groups from example data [73,74]. The main disadvantage of the interpretation programs developed up till then was the difficulty of modification as the rules were totally interwoven with the program. In 1984 work was initiated to develop a knowledge-based system consisting of an inference engine, a knowledgebase and a sound user interface, which ultimately resulted in the EXSPEC system [75]. This program, written in PROLOG [76,77], regards interpretation rules as data, thus making it easy to manipulate or modify the coded knowledge. The overall system consists of an interpretation module [78], an automated rule generation program [79] and a structure generator [68,80]. It was desig-

11

ned for the interpretation of both infrared and mass spectra. Interpretations of spectra are performed by traversing a network of interconnected structural fragments and determining the probability of the presence of each of the fragments by checking the data for characteristic spectral features. A fragment is selected from the network only if all smaller fragments that are contained within it have a probability above 0.5. Use is made of a knowledge base containing spectrum-structure correlations to which probabilities have been assigned, one for the likelihood that a compound with the particular structural feature shows the correlated absorption and one for the probability that any other compound has a similar absorption. The former value represents the selectivity of a spectral region, whereas the latter is related to its (pseudo)specificity. Finally, Bayesian statistics are applied in order to acquire a combined probability from the values obtained from each rule. In addition to spectrum interpretation, also knowledge acquisition has been automated in EXSPEC. Characteristic spectral regions can be found for any fragment of interest and the usefulness in interpretation rules can be judged by calculating the information content for each region. Finally, results from an interpretation, i.e., the presence or absence of structural fragments, can be used to constrain the structure generator which constructs all possible topological isomers for a given molecular formula. Another expert system combining infrared and mass spectra has been developed by Curry [81]. For the interpretation of low-resolution mass spectra the STIRS program [82] is used. The infrared rule base consists of over 1000 correlations between observed infrared bands and vibrational modes of specific substructures. Substructures are stored hierarchically in a data base. The system contains a module that controls the progress of the interpretation, beginning with the most general fragment. For each substructure evidence is sought using the two spectral knowledge sources. The reasoning module combines the evidence found and makes deductions resulting in a confidence level (- 100 to + 100%) for each substructure. Tests performed on 1807 infrared spectra of

I‘!

compounds containing on average 8.1 of the total number of 500 substructures in the data base, revealed that a confidence level of > 45% only 1.4 incorrect and 4.6 correct substructures are reported. Szalontai and co-workers [83,84] described the ASSIGNER system for functional group analysis based on 13C NMR and infrared data. A set of 260 infrared correlations has been defined. The program combines interpretation results into complete structures. Hippe and co-workers [85,86] use a matrix algorithm for the identification of substructures from infrared data. For each of 270 different substructures empirical decision functions have been stored that are used to determine the composition of an unknown sample. A dynamic approach to the interpretation of infrared spectra is used in the IRSCAN-D algorithm [87,88]. A list of 103 different structural units is scanned and to each substructure an identification factor (l-100) is assigned by checking the presence of the characteristic features in the spectrum. In the second dynamic stage of the interpretation bands that are highly characteristic for the most likely fragments are blocked and the interpretation process is repeated using the rest of the data. It is assumed that bands are assigned to the most likely fragment only once in this way. Tests on approximately 200 compounds resulted in an average detection of 97% of the substructures present in the unknown compound. The proportion of false positives with respect to the correctly identified substructures was 60%. Moldoveanu and Rapson [89] proposed an expert system for the combined interpretation of IR, 3C NMR and mass spectra. Rough matches for the different spectral methods are combined in order to select certain organic groups which may be present in the organic molecule. The interpretation of infrared spectra is based on a search for matches of the peaks in the experimental spectrum with specific peak intervals stored in a data base. Currently rules for 89 organic groups are available, whereas the average number of rules per group amounts to five. In contrast to other programs, the search does not follow a hierarchical tree pattern. Furthermore, peak intensities are generally not

H.J. LUINGE

considered in the search. As the interpreter is very simple it has the advantage of being very fast. The occurrence of false positives is not regarded as a problem, because the results are combined with those of the other techniques. A methodology for evaluating and optimizing infrared interpretations was described by Saperstein [90]. The procedure involves numerically estimating the spectral difference between the interpreted infrared spectrum and a synthetic spectrum composed of the characteristic subspectra of the structural fragments obtained during an interpretation. The probabilities assigned to the fragments can then be iteratively fine-tuned by minimizing the estimated numerical difference. Although improvements are possible, the approach shows that optimization schemes for spectrum interpretations are feasible and can be a fruitful area of study. The IRIS interpretation system [91] contains a data base of 200 spectral rules for 110 common structural units. Performance tests on the poorly described system yielded extremely high values for both recall (86%) and reliability (85%). Seil et al. [92] presented a procedure for deriving classification rules automatically from a given set of infrared spectra. From 1600 solid organic compounds sets with similar skeletons are generated. For each set the distribution of all bands of the corresponding spectra is generated. Close-lying bands are grouped into bundles if their number exceeds that expected for a uniform distribution. A bundle is called a cluster if the number of bands within the bundle is > 30% of the number of compounds in the corresponding set and > 1.5 times the expected value for uniform distribution. To each cluster a weight is assigned depending on the number and the density of the bands comprising the cluster. During the classification of an unknown compound a score is calculated for each cluster. This score consists of the percentage of bands lying within a cluster multiplied by the respective cluster weights. The unknown is then likely to belong to the set with the highest score. Blaffert [93] developed EXPERTISE as an expert system for the interpretation of infrared spectral data. The system is capable of deducing spectrum-structure correlations from a library of spec-

AL’TOMATED

INTERPRETATION

OF VIBRATIONAL

SPECTRA

tra. A structure generator combines interpretation results into a set of candidate structures without recourse to a molecular weight or formula. Edwards and Ayscough performed a feasibility study on the development of a blackboard problem-solving system based on spectroscopic data [94]. Their cooperative structure elucidation problem solver (COSEPS) aims to use multiple sources of spectroscopic knowledge interacting in an intelligent manner to determine the structural characteristics of an unknown molecule. Instead of production rules for the representation of knowledge use has been made of so-called frames [95]. The system consists of knowledge sources, a blackboard and a scheduler. The knowledge sources contain the domain knowledge for each spectroscopic technique and information on substructural relations. The blackboard is a database containing intermediate results generated during the interpretation process. Data on the blackboard are hierarchically organized into levels of analysis. The scheduler records changes on the blackboard and triggers those knowledge sources that can contribute to the new solution state, i.e., it decides which part of the available knowledge is to be used in the interpretation process. The blackboard architecture might alleviate many of the problems of ambiguity that result from spectral features giving rise to a number of plausible substructures. Munk and co-workers have been working on the development of the CASE system, a set of programs for the interpretation of multi-spectral data. They described a table-driven procedure for the automated generation of spectral interpretation rules from a library of reference spectra [96]. The original structure generator ASSEMBLE [97] was replaced with COCOA [98], which approaches the generation process in an unusual way. Instead of assembling structural units into a complete structure, bonds are removed from a hyperstructure encompassing all possible structures until the imposed constraints are fulfilled. Recently a program has been described for the automated classification of candidate structures obtained from structure generators [99]. In 1980 Woodruff and Smith [loo] published a paper on a program for the analysis of IR spectra (PAIRS). Since then a number of improvements to

13

the original system have been described and offsprings have been constructed capable of analysing mixtures. The program is a fairly conventional table-driven system. Structural fragments are selected from a list and spectral evidence is sought for in the data using interpretation rules stored in a rule base. The rating assigned to each fragment depends on the number of matches found and on the importance assigned to each spectrum-structure relationship. PAIRS also allows for interdependences between the probabilities for classes defined, e.g., the rating for ALDEHYDE depends partly on the system’s assessed probability for ACID groups. Tomellini and co-workers developed a minicomputer version of the program [loll and applied it to the interpretation of vapour-phase spectra [102]. The addition of a rule generator to the PAIRS system [103] made it possible to develop interpretation rules from example data in a more mathematical and objective way than before. The rule generation process consists of the following steps: entry of peak tables; determination of occurrence vs. position distribution; selection of regions of interest; assignment of maximum expectation values to each region based on maximum occurrence values; division of regions into position subdivisions; determination of band width distributions; calculation of partial expectation values for each position subdivision; calculation of an intensity factor for each intensity interval of each position subdivision; calculation of a final expectation value for each position subdivision with a given intensity interval and a given width interval; and generation of rules using the calculated final expectation values. In order to test the performance of the rule generator, a set of 51 vapour-phase spectra of non-c&unsaturated alcohols was used to generate rules. Subsequent interpretation of the same spectra yielded extremely good results, i.e., with a threshold of the expectation value set to 0.50 the recall obtained was 82% with a reliability of 100%. As the original PAIRS program only yields a numerical indication of the presence of a specific structural group, an improved version was developed allowing the user to trace the decision-mak-

14

ing process [104]. It provides a way to see the rules which are used by the interpreter to arrive at the expectation values. Although this version of PAIRS was a major improvement over earlier versions, it still did not transfer the knowledge behind the rules. Recently, a more sophisticated user interface has been developed that raises the interpreter to the level of a smart assistant [105]. By allowing the user to overrule decisions during the interpretation process, a new dimension is added to the system. Mixture analysis The systems described so far were generally developed for the identification of pure compounds. Recently, a number of papers have appeared in which attention was focused on the analysis of spectra of mixtures. The PAIRS program, for instance, has been adapted to identify compounds in environmental mixtures [106-1081. This program for automated waste mixture interpretation (PAWMI) was developed to identify pure compounds in mixtures obtained from waste sites. For each of the pure compounds rules were generated. The program has been provided with an advanced peak picker (PUSHSUB) that relieves the operator from setting a threshold for including or excluding peaks. Based on the first 256 data points of the interferogram of a compound, a threshold curve is calculated which is subtracted from the original spectrum. The threshold is then set to the mean absorbance value of the resulting spectrum. To the ten largest peaks with an intensity of at least 15% of the largest peak, goodness values are assigned. A goodness of 0.99 is set as the maximum value for correct matches of all peaks. Fewer rules with increased goodness values are written for compounds that do not have ten peaks of the required intensity. In order to take account of peak shifting, each goodness value is split into three values (20, 30 and 50% of the overall goodness) corresponding to peak windows of f 10, f 5 and + 3 cm-‘. To reduce the number of false positive identifications of components, a subroutine was added performing a correction for spectral similarity on the results obtained (PAIRSPLUS) [107]. The program subtracts a percentage related to the spectral similarity between

H.J. LUNGE

the compound with the highest goodness value and each of the other compounds from their respective goodness values. Next the mean and standard deviation of the remaining goodness values are calculated and compounds with values exceeding twice the standard deviation above the mean are considered to be present. The subtraction process is then repeated until no additional components are found. In this way a significant improvement has been obtained in the ability of PAIRS to identify components in mixtures. A further improvement to PAIRS was made with the IntIRpret program [108]. It includes an automated knowledge acquisition subroutine and makes use of peak intensity information. Based on the number of peaks in a peak window and the respective intensities, three factors are derived representing the relative importance of each of the peaks for a component. For a training set of 62 pure compounds and a data set of 67 four-component mixtures, a recall of 81% with a reliability of 82% was obtained. Unfortunately, two almost identical papers have been published in different journals [108,109]. Recently, two knowledge-based systems for mixture analysis have been described as further improvements of the PAWMI program. The IRBASE [llO] program has been developed for the rapid creation of compound-specific infrared spectral descriptions. These descriptions form the knowledge base of the second system, MIXIR [ill], used to identify the components of unknown mixtures. In IRBASE knowledge is stored as facts instead of being locked into static, predetermined rules as in the PAIRS system. This separation of logic from data simplifies revision of both control and information modules. Further, it allows unlimited flexibility in the use of information during a spectral interpretation. IRBASE consists of two main modules: a database program which generates an initial spectral description of a pure compound and a processing program which evaluates the quality of each component in the description to arrive at a reduced description. A correlation table is used to generate a spectral description for a particular compound. Bands in the spectrum are assigned to functional groups and characteristic shifts likely to be observed in mixtures are retrieved from this table. The result is

AUTOMATED

INTERPRETATION

OF VIBRATIONAL

15

SPECTRA

a set of two descriptions for each compound, one corresponding to the spectral features to be expected for the compound as a major component in a mixture and one for the compound as a minor component (< 30% by volume). The processing program determines the significance of each of the spectral regions for the compound at hand. For this purpose three factors are calculated: a factor describing the amount of overlap between the region containing the band of interest and other regions, a factor describing the intensity of the band with respect to other bands in the spectrum and a factor representing intensity of the band relative to others found in the same region for the compounds in the data set. Empirical weighting values are used to combine the factors into an overall significance value for each region. These values are used to select the fifteen most significant features for a compound. The MIXIR system is a user-interactive knowledge-based system that assists chemists in determining the likely components of complex mixtures. It uses the knowledge base developed by the IRBASE program to infer conclusions concerning the presence of pure components. The user can provide additional information stored in a static data base. During the interpretation process a dynamic data base is maintained. It indicates the current state of inquiry for each component. For each feature in the spectrum the uniqueness is calculated, i.e., the number of compounds in the data base which are potential explanations for the feature. The uniqueness is calculated during several interpretation cycles using the information in the dynamic data base, which is revised after each cycle using the interpretation results. This feedback process ideally converges to one for a given spectral feature. Interpretation of a set of 20 twoand three-component mixtures yielded excellent results. A major problem with systems for mixture analysis as described above is the fact that they rely heavily on the correct detection of peaks in the mixture spectrum. Especially when concentrations of components are very low, the problem of overlapping bands plays an important role. A way to overcome the problem was described by Hongkui et al. [112]. In their study least-squares fitting

was applied to the qualitative analysis of vapourphase infrared spectra based on comparing standard reference spectra with the sample mixture spectrum. In an iterative procedure compounds with the worst fit levels are removed from consideration in order to find a set of reference spectra with an optimum fit for the sample spectrum. The results are further used for quantitative analysis of the mixture. FUTURE

TRENDS

The large number of papers on automated structure elucidation that have appeared over the last 5 years demonstrates that this area is still an important field of interest in computer-assisted chemistry. Owing to the development of new sampling techniques (e.g., DRIFT, ATR) and hyphenation with chromatographic separations (CC-IR, LC-IR), making it possible to analyse samples of widely differing natures, the amount of data produced by infrared spectrometers is increasing rapidly. Further, the fact that the mentioned techniques yield slightly different spectra imposes problems on direct automated comparison. Therefore, the development of algorithms to cope with these difficulties is important. As the amount of storage space and processing speed available will increase rapidly in forthcoming years, attention will probably be focused on the development of similarity measures and al ternative data representations in order to improve search performance. The necessity to compress data in order to save space will be less important. With respect to knowledge-based systems, a lot of work still has to be done on the extraction of all relevant information from the spectra. Currently use is generally made of peak tables supplemented with band-width information. Characteristic shapes of bands or global patterns occurring in spectra are not used. mainly because such features are difficult to describe in computer format. An interactive use of the human user as a pattern recognizer (e.g., by showing characteristic patterns graphically for comparison with the measured data) might solve part of the problem. A topic that is attracting considerable research interest currently is the application of neural net-

16

H.J. LUINGE

OUTPUT PA’ITERN

OUTPUT LAYER

HIDDEN LAYER

INPUT LAYER

f

t

t

t

INPUT PAlTERN

Fig. 4. Schematic

depiction

of a neural

net.

works to spectrum interpretation. A neural net is a simplified model of the human brain consisting of several layers of neurons that pass signals to each other depending on the input signals that they receive (Fig. 4) [113]. The net can be trained to give an output in terms of structural information from spectra used as input. Representation of spectra and structural data is of primary importance. An application of a neural net to the recognition of ‘H NMR spectra of alditols has been described recently [114]. From the number of announcements at major conferences, the appearance of papers on the use of a neural approach to the interpretation of infrared spectra will be only a matter of months. This area of research will yield new insights into the nature of the structure elucidation process.

REFERENCES 1 P.R. Griffiths and J.A. de Haseth, Fourier Transform Infrared Spectrometry, Wiley, New York, 1986. 2 N.A.B. Gray, Computer-Assisted Structure Elucidation, Wiley, New York, 1986. 3 G.W. Small, Anal. Chem., 59 (1987) 535A. 4 L.E. Kuentzel, Anal. Chem., 23 (1951) 1413. 5 A.W. Baker, N. Wright and A. Gpler, Anal. Chem., 25 (1953) 1457.

and G.L. Covert, Anal. Chem., 39 (1967) 6 D.H. Anderson 1288. 7 D.S. Early, Anal. Chem., 40 (1968) 894. 8 G.A. Massios, Am. Lab., 3 (1971) 55. 9 K. Tanabe and S. Saeki, Anal. Chem., 47 (1975) 118. 10 J. &pan, D. Hadzi and M. Penca, Comput. Chem., 1 (1976) 77. 11 F.H. Heite, P.F. Dupuis, H.A. van ‘t KIooster and A. Dijkstra, Anal. Chim. Acta, 103 (1978) 313. 12 P.F. Dupuis and A. Dijkstra, Fresenius’ Z. Anal. Chem., 290 (1978) 357. 13 P.F. Dupuis, A. Dijkstra and J.H. van der Maas, Fresenius’ Z. Anal. Chem., 291 (1978) 27. 14 P.F. Dupuis, P. Cleij, H.A. van ‘t Klooster and A. Dijkstra, Anal. Chim. Acta, 112 (1979) 83. 15 R.C. Fox, Anal. Chem., 48 (1976) 717. 16 G.T. Rasmussen and T.L. Isenhour, Appl. Spectrosc., 33 (1979) 371. 17 E.C. Penski, D.A. Padowski and J.B. Bouck, Anal. Chem., 46 (1974) 955. 18 F.V. Warren, Jr. and M.F. Delaney, Appl. Spectrosc., 37 (1983) 172. 19 S.R. Lowry, D.A. Huppler and C.R. Anderson, J. Chem. Inf. Comput. Sci., 25 (1985) 235. 20 J.W. Sherman, J.A. de Haseth and D.G. Cameron, Appl. Spectrosc., 43 (1989) 1311. 21 S. Sa&i and K. Tanabe, Appl. Spectrosc., 38 (1984) 693. 22 L.A. Powell and G.M. Hieftje, Anal. Chim. Acta, 100 (1978) 313. 23 J.P. Yu and H.B. Friedrich, Appl. Spectrosc., 41 (1987) 869. 24 M.F. Delaney, J.R. Hallowell, Jr. and F.V. Warren, Jr., J. Chem. Inf. Comput. Sci., 25 (1985) 27. 25 T. Blaffert, Anal. Chim. Acta, 161 (1984) 135. 26 J.A. de Haseth and L.V. Azarraga, Anal. Chem., 53 (1981) 2292. 27 J.A. de Haseth and T.L. Isenhour, Anal. Chem., 49 (1977) 1977. 28 R.L. White, G.N. Giss, G.M. Brissey and C.L. Wilkins, AnJ. Chem., 55 (1983) 998. 29 P.M. Owens and T.L. Isenhour, Anal. Chem., 55 (1983) 1548. 30 S. Kawata, T. Noda and S. Minami, Appl. Spectrosc., 41 (1987) 1176. 31 E.R. Malinowski and D.G. Howery, Factor Analysis in Chemistry, Wiley Interscience, New York, 1980. 32 G. Hangac, R.C. Wieboldt, R.B. Lam and T.L. Isenhour, Appl. Spectrosc., 36 (1982) 40. 33 S.S. Williams, R.B. Lam and T.L. Isenhour, Anal. Chem., 55 (1983) 1117. 34 P.B. Harrington and T.L. Isenhour, Appl. Spectrosc., 41 (1987) 449. 35 P.B. Harrington and T.L. Isenbour, Anal. Chem., 60 (1988) 2687. 36 C.P. Wang and T.L. Isenbour, Appl. Spectrosc., 41 (1987) 185. 37 R.J. Anderegg and D. Pyo, Anal. Chem., 59 (1987) 1914.

AUTOMATED

INTERPRETATION

OF VIBRATIONAL

SPECTRA

38 J.M. Bjerga and G.W. Small, Anal. Chem., 62 (1990) 226. 39 J.R. Cooper and C.L. Wilkins, Anal. Chem., 61 (1989) 1571. 40 J. Zupan and M.E. Munk, Anal. Chem., 57 (1985) 1609. 41 J. Zupan and M.E. Munk, Anal. Chem., 58 (1986) 3219. 42 G.T. Rasmussen and T.L. Isenhour, J. Chem. Inf. Comput. Sci., 43 (1979) 1382. 43 M.F. Delaney, F.V. Warren, Jr. and J.R. Hallowell, Jr., Anal. Chem., 55 (1983) 1925. 44 J.R. Hallowell, Jr., and M.F. Delaney, Anal. Chem., 59 (1987) 1544. 45 P.B. Harrington and T.L. Isenhour, Anal. Chim. Acta, 197 (1987) 105. 46 P.B. Harrington and T.L. Isenhour, Appl. Spectrosc., 41 (1987) 1298. 47 R.J. Rosenthal and S.R. Lowry, Mikrochim. Acta, Part II, (1986) 291. 48 D.S. Frankel, Anal. Chem., 56 (1984) 1011. 49 G. Jalsovszky and S. Holly, J. Mol. Struct., 175 (1988) 263. 50 B.R. Kowalski, P.C. Jurs, T.L. Isenhour and C.N. Reilly, Anal. Chem.. 41 (1969) 1945. 51 H.B. Woodruff, S.R. Lowry and T.L. Isenhour, Anal. Chem., 46 (1974) 2150. 52 D.R. Preuss and P.C. Jurs, Anal. Chem., 46 (1974) 520. 53 R.W. Liddell, III, and P.C. Jurs, Anal. Chem., 46 (1974) 2126. 54 S.R. Lowry, H.B. Woodruff, G.L. Ritter and T-L. Isenhour, Anal. Chem., 47 (1975) 1126. 55 H.B. Woodruff, S.R. Lowry, G.L. Ritter and T.L. Isenhour, ,4nal. Chem., 47 (1975) 2027. 56 H.B. Woodruff, S.R. Lowry and T.L. Isenhour, Appl. Spectrosc., 29 (1975) 226. 57 H.B. Woodruff, G.L. Ritter, S.R. Lowry and T.L. Isenhour, Appl. Spectrosc., 30 (1976) 213. 58 R. Tsao and W.L. Switzer, Anal. Chim. Acta, 134 (1981) 111. 59 R. Tsao and W.L. Switzer, Anal. Chim. Acta, 136 (1982) 3

J.

60 J.C.W.G. Bink and H.A. van ‘t Klooster, Anal. Chim. Acta, 150 (1983) 53. 61 L. Domokos, I. Frank, G. Matolcsy and G. Jalsovszky, Anal. Chim. Acta, 154 (1983) 181. 62 S. Sasaki, Y. Kudo, S. Ochiai and H. Abe, Mikrochim. Acta, (1971) 726. 63 S. Sasaki, H. Abe, Y. Hirota, Y. Ishida, Y. Kudo, S. Ochiai, K. Sato and T. Yamasaki, J. Chem. Inf. Comput. Sci., 18 (1978) 211. 64 S. Sasaki, I. Fujiwara, H. Abe and T. Yamasaki, Anal. Chim. Acta, 122 (1980) 87. 65 T. Oshima, Y. Ishida, K. Saito and S. Sasaki, Anal. Chim. Acta, 122 (1980) 95. 66 K. Funatsu, C.A. Del Carpio and S. Sasaki, Fresenius’ Z. Anal. Chem., 324 (1986) 750. 67 K. Funatsu, N. Miyabayashi and S. Sasaki, J. Chem. Inf. Comput. Sci., 28 (1988) 18. 68 H.J. Luinge and J.H. van der Maas, Chemometr. Intell. Lab. Syst., 8 (1990) 157.

69 K. Funatsu, Y. Susuta and S. Sasaki, Anal. Chim. Acta, 220 (1989) 155. 70 L.A. Gribov and M.E. Elyashberg, J. Mol. Struct., 5 (1970) 179. 71 L.A. Gribov, M.E. Elyashberg and L.A. Moscovkina, J. Mol. Struct., 9 (1971) 357. 72 T. Visser and J.H. van der Maas, J. Raman Spectrosc., 2 (194) 563. 73 T. Visser and J.H. van der Maas, Anal. Chim. Acta, 122 (1980) 363. 74 T. Visser and J.H. van der Maas, Anal. Chim. Acta, 133 (1981) 451. 75 H.J. Luinge, Trends Anal. Chem., 9 (1990) 66. 76 G.J. Kleywegt, H.J. Luinge and B.J.P. Schuman, Chemometr. Intell. Lab. Syst., 4 (1988) 273. 77 G.J. Kleywegt, H.J. Luinge and B.J.P. Schuman, Chemometr. Intell. Lab. Syst., 5 (1989) 117. 78 H.J. Luinge and J.H. van der Maas, Anal. Chim. Acta, 223 (1989) 135. 79 H.J. Luinge, G.J. Kleywegt, H.A. van ‘t Klooster and J.H. van der Maas, J. Chem. Inf. Comput. Sci., 27 (1987) 95. 80 G.J. Kleywegt, H.J. Luinge and H.A. van ‘t KIooster, Chemometr. Intell. Lab. Syst., 2 (1987) 291. 81 B. Curry, ACS Symp. Ser., 306 (1986) 350. 82 K.S. Haraki, R. Venkataraghavan and F.W. McLafferty, Anal. Chem., 53 (1981) 386. 83 G. Szalontai, Z. Simon, Z. Csapo, M. Farkas and G. Pfeifer, Anal. Chim. Acta, 133 (1981) 31. 84 M. Farkas, J. Markos, P. Szepesvary, I. Bartha, G. Szalontai and Z. Simon. Anal. Chim. Acta, 133 (1981) 19. 85 B. Debska, J. Duliban, B. Guzowska-Swider and Z. Hippe, Anal. Chim. Acta, 133 (1981) 303. 86 Z. Hippe, Trends Anal. Chem., 2 (1983) 240. 87 M. Jamrbz and Z. Latek, J. Mol. Struct., 115 (1984) 277. 88 M. Jamrbz, Z. Latek and Z. Hippe. Anal. Chim. Acta, 181 (1986) 65. 89 S. Moldoveanu and C.A. Rapson, Anal. Chem., 59 (1987) 1207. 90 D.D. Saperstein, Appl. Spectrosc., 40 (1986) 344. 91 S. Guonan, C. Xun, H. Weidong, S. Faxiao and W. Chuan, Kexue Tongbao, 32 (1987) 960. 92 J. Seil, I. Kohler, C.W. v.d. Lieth and H.J. Opferkuch, Anal. Chim. Acta, 188 (1986) 219. 93 T. Blaffert, Anal. Chim. Acta, 191 (1986) 161. 94 P. Edwards and P.B. Ayscough, Chemometr. Intell. Lab. Syst., 5 (1988) 81. 95 M. Minsky, The Psychology of Computer Vision, McGraw-Hill, New York, 1975. 96 M-0. Trulson and M.E. Munk, Anal. Chem., 55 (1983) 2137. 97 C.A. Shelley, M.E. Munk and R.V. Roman, Anal. Chim. Acta, 103 (1978) 121. 98 B.D. Christie and M.E. Munk, J. Chem. Inf. Comput. Sci., 28 (1988) 87. 99 A.H. Lipkus and M.E. Munk, J. Chem. Inf. Comput. Sci., 28 (1988) 9. 100 H.B. Woodruff and G.M. Smith, Anal. Chem., 52 (1980) 2321.

18 101 .%A. Tomellini, D.D. Saperstein, J.M. Stevenson, G.M. Smith, H.B. Woodruff and P.F. Seelig, Anal. Chem., 53 (1981) 2367. 102 S.A. Tomelhni, J.M. Stevenson, and H.B. Woodruff, Anal. Chem., 56 (1984) 67. 103 S.A. Tomellini, R.A. Hartwick, J.M. Stevenson and H.B. Woodruff, Anal. Chim. Acta, 162 (1984) 227. 104 S.A. Tomellini, R.A. Hartwick and H.B. Woodruff, Appl. Spectrosc., 39 (1985) 331. 105 B.J. Wythoff, C.F. Buck and S.A. Tomellini, Anal. Chim. Acta, 217 (1989) 203. 106 M.A. Puskar, S.P. Levine and S.R. Lowry, Anal. Chem., 58 (1986) 1156. 107 M.A. Puskar, S.P. Levine and S.R. Lowry, Anal. Chem., 58 (1986) 1981.

H.J. LUINGE

108 L. Ying, S.P. Levine, S.A. Tomellini and S.R. Lowry, Anal. Chem., 59 (1987) 2197. 109 L. Ying, S.P. Levine, S.A. Tomellini and S.R. Lowry, Anal. Chim. Acta, 210 (1988) 51. 110 B.J. Wythoff and S.A. Tomellini, Anal. Chim. Acta, 227 (1989) 343. 111 B.J. Wythoff and S.A. Tomellini, Anal. Chim. Acta, 227 (1989) 359. 112 X. Hong-km, S.P. Levine and J.B. D’Arcy, Anal. Chem., 61 (1989) 2709. 113 D.E. Rummelhart, G.E. Hinton and R.J. Williams, Parallel Distributed Processing: Explorations in the microstructure of Cognition, MIT Press, Cambridge, MA, 1986. 114 J.U. Thomsen and B. Meyer, J. Magn. Reson., 84 (1989) 212.