8
Applications in other spectroscopies
The OSCAR algorithm for calculating the solution to the deconvolution problem is a general-purpose tool that is easily modified to handle problems in other areas of chemistry. The program sometimes needs additional constraints to handle the data in a given application area. In most cases there is no need to change the program, only the way we look at the data. We start by examining how OSCAR is used to handle single-dimensional chromatographies. After that particular application we shall see how the method works for different'hyphenated instruments and methods. Then we apply it to signals where there is no physical separation (chromatography) in the second dimenI R spectra of kidney stones
1
O.B 0.6 0.4 0.2 0 25 100
Sample number
o
0
Spectra
Figure 8.1 The component spectra can be found with AutoAR for collections of discrete samples. In this case the "chromatographic" axis is discontinuous. 171
172
sion. A typical example of this kind of signal is the data produced in a titration experiment. Finally we apply OSCAR to situations where there is no continuity between the spectra in the data matrix. This last case is valid for discrete samples (Fig. 8.1).
8.1
Single-dimensional signals and AR
It is possible to use OSCAR for calculations that are based on single-dimensional
chromatographies [Karjalainen and Karjalainen 1992]. In everyday work a common instrument of this kind is an high-performance liquid chromatograph using UV detection. This type of instrument is used in clinical laboratories for the determination of drugs. Industrial laboratories use it for routine quality control checks. Automatic sample changers keep the machines operating around the clock. Normally the signals from these instruments are handled by integrators. These devices do not have any internal model about the shape of a chromatographic peak. They use an approximated baseline curve and different ways of isolating the single peaks from each other by graphically plotting separation lines between them. We know well that the peaks are not isolated from each other by separation lines, because in reality the peaks do overlap. The separating lines defined by the integrator can be plotted in many different ways. The results from all these separating lines have an error, but we do not know how large it is. Different integrators use different algorithms and get different results from the same data [Papas and Tougas 1990]. The integration and peak isolation process are repeated with different triggering conditions and settings until the operator using the integrator is satisfied with the result. There remains a large SUbjective element in the results, because the judgment about the quality of the peak isolation is primarily dependent on the experience of the operator and what he considers to be an esthetic solution. OSCAR gives us the possibility of handling the single-dimensional signals better than the integrators. The calculations are made with the AutoAR program, and no change in the program itself is necessary. The basic idea is to regard the individual samples as "mass numbers" in one chromatogram. We combine mentally all samples and standards in one analytical batch together into a single "virtual GC-MS run" (Fig. 8.2). In this imaginary -mn the individual samples correspond to single mlz lines. For this data matrix we perform the same analysis as for a GC-MS mn. The result is also similar. We obtain the shapes of the elution curves for each component. This is a new kind of result that is not obtained when using the hardware or software of the integrators. The second kind of output we obtain are the intensities for single peaks. These intensities are "the mass spectra" of the solution. In the case of single samples the mass spectra are concentration lists for each kind of compound that is found in the samples. The "mass spectrum
173
20 OBSERVATIONS, ONE SAMPLE
rA~-----1
Elution profile for . one compound
• Mass spectrum
~
f-~ma~x
10 OBSERVATIONS, MANY SAMPLES
: Elution profiles of the compounds
I
concentratio~s ~:
for a sample
~
/
-=_..,
_...z........._ _ _
_ _--If
A single run
Figure 8.2 Samples from a batch of a single-dimensional chromatographic run can be put together to form a two-dimensional matrix, which can then be analyzed by AutoAR. Before the analysis can be performed the time scales of the chromatograms have to be synchronized. (Reprinted with permission of Elsevier Science Publishers B. V. from the Analytica ChimicaActa, from Karjalainen and Karjalainen 1991).
lines" are the amounts corresponding to each sample. The spectral lines with the "mass" 1 are the amounts of different compounds in the first sample. The spectral lines with the "mass" 2 are the amounts for the second sample in the batch of samples. The difficulty in applying the AutoAR to single-dimensional data is the preprocessing that is necessary before this step. We must initially synchronize all single chromatograms in such a fashion that the peaks in all of them have identical retention times. We must deform the chromatograms in such a way that the tops of corresponding peaks have the same positions on the time axis. At the same time, if we deform the signal shapes along the longitudinal axis we must compensate for
174
any elongation or shortening by making compensatory changes in the intensities of the signal. We must provide for input from the human operator, because the best judge for the overall shape of the chromatogram is an experienced operator. The operator shows to the synchronizing software which peaks should be considered identical. The final adjustments are then better left to the software to make the peaks fully synchronized in time. After this synchronization has been performed, the analysis with AutoAR can be made. There are some general precautions that must be observed when handling single-dimensional samples. There should be some variation in the relative amoun~s of the peak areas. If the ratio between two chromatographic peaks in all samples is constant, the problem is unsuitable for this kind of analysis. This holds for the pure standards as well. The standards should not be prepared by the simple dilution of one stock solution because this produces "mass traces" that are fully correlated. It requires a bit more effort to make "non-orthogonal" standards when preparing the standard mixtures, but it is an effort that is well spent. Similarly, if we know that the amounts of any two components in all samples are constant or the ratio between the two components remains constant, we cannot use this method for calculating the concentrations. There should be some variation in the relative proportions to make the method possible. If we know before the analysis that the amounts remain fully constant we should add known "spiking" to some of the samples to perturb this ratio. Mathematically, the best results are achieved in those cases where the variations in the concentrations are large because it means that the resulting "spectra" are maximally dissimilar. The more dissimilar, i.e. orthogonal, the spectra are, the better is the precision that we can expect. The analysis of whole batches of samples as an entity is possible using OSCAR. The instrument that is best suited for this is the gas chromatograph, because the retention times are very reproducible. Less preprocessing is needed to synchronize the analytical traces. The liquid chromatographs are more difficult. The retention times are not as reproducible as with the gas chromatograph. Additionally, there is interaction between successive peaks and sometimes even between successive samples. It is a common observation that the impurities in some samples contaminate even the runs following them. The analysis of whole batches of samples as an entity is the logical next step in the evolution of the integration methods. The first generation of the integrators reacted to the signal on the fly. The amount of internal memory in the integrators was so limited that the data for a single sample could not be stored in RAM to permit reintegration of the signal. The current generation of integrators is better because the data can be kept in RAM and they are automatically stored on disk. The main deficiency in the current integrators is the primitive peak model or the lack of a peak model. The "integration" of the signal using OSCAR is the next
175 step. The memory of the computers has now reached the point where a whole batch of samples can be analyzed as an entity. The benefit of this is improved precision in the final results. Other research groups have applied the same idea to analyzing concentrations in industrial processes [Tauler et al. 1993].
8.2
Other hyphenated instruments
The AutoAR is used to analyze data from many other sources than GC-:-MS for which the method was initially developed. The precision of the results depends on the overall statistical properties of the observation matrix. If we have highly overlapping peaks and the spectra are almost similar for the overlapping components, we cannot expect to get highly precise estimates of the concentrations. HPLC instruments often are equipped with diode array detectors that cover the optical spectrum in the ultraviolet and visible range. The usual spectral range is between 200 and 800 nanometers. The number of photodiodes varies between 100 and 1000 elements. The spectral features present in the spectra of most samples are so broad that the sampling frequency along the spectral axis is quite sufficient in these instruments. The AutoAR handles this kind of measurement. The background values for the spectra are much higher those for the mass spectra, but proper values are easily found after some initial experiments. The standard errors that are found for the resulting spectra are higher than for GC-MS data. The standard deviations for the elution curves are higher than the corresponding standard deviations for GC-MS data. This is to be expected due to the higher correlations between different optical spectra. The mass spectra are inherently more dissimilar to each other than the optical spectra. The reason that some users of AR have had difficulties with HPLC-UV-Vis data is the background values, which are' higher in optical spectra than in mass spectra. OSCAR searches for the best values of the backgrounds thereby facilitating the analysis of HPLC-UV-Vis data. If the information from one chromatographic run is not sufficient, experimental tricks should be used to increase the available information in HPLC. The same sample could be analyzed on two different analytical systems run in parallel. With this arrangement it is possible to obtain more information about the molecules present in the sample. Because this is an experiment with three-dimensional data, the AutoAR program is not equipped to handle it. The modifications to the AutoAR needed to handle three-dimensional data are not impossible to implement and should serve as a stimulus to the programming reader. The noise in the diode array detectors grows with the aging of the instrument. There is some direct damage in the diodes due to the UV photons. In practice the
176
detector element should be changed after some years of use, otherwise the signalto-noise ratio suffers, making the data analysis more difficult. The list of hyphenated instruments is extremely long if we count all examples that have been mentioned in the literature. Infrared spectra can be combined with gas and liquid chromatographic instruments. The quality of the IR signal is generally much worse than for the UV-Vis spectra. The GC-IR signal is best analyzed as part of one extended data matrix. In this extended GC-MS matrix the IR signal has been appended to the mass spectroscopic data as extra "masses". This is not possible with all combined GC-IR-MS instruments because the sample does not flow in synchrony through both instruments. Efforts should be made to convert the IR data into a synchronized fonnat by proper preprocessing because in this extended format the AutoAR can analyze the data.
8.3
Two-dimensional data with internal continuity
There are several combined methods in chemistry that do not possess the unimodal chromatographic peaks in the second dimension. Still the readings along the second dimension are continuous. A typical example of this kind of information is kinetic experiments. These experiments are often measured by continuously scanning optical spectra with a ~iode array instrument. The changes in the spectra are smooth as a function of time;, there are no arbitrary jumps in the data. A second familiar experiment is titration or experiments where the temperature is gradually changed. The spectral dimension can be visible spectra, ultraviolet spectra or infrared spectra. In some cases even mass spectra are registered. An example is the combination of pyrolysis with GC-MS [Windig and Meuzelaar 1984]. With these kinds of data the non-spectral dimension follows a different curve than in chromatography. The curve changes smoothly but there are no clear unimodal maxima as in chromatography. The AutoAR can serve in thes~ situations if the program is slightly modified. We omit the sorting operation in the "core AR" routine that makes sure that the peaks stay unimodal. With this simple modification, two-dimensional data matrices can be handled without difficulties. Because the requirement to handle non-chromatographic data is frequently encountered, we have put the sorting operation into the AutoAR program as a selectable option. A check box is marked by the user to force unimodality. We could use other types of constraints here than just the adjustment of baselines in both dimensions. We could gradually ''tighten'' the smoothness of spectra and concentration curves by some smoothing function that can be gradually adjusted in small steps. This is not necessary in practice because the simpler method of baseline adjustment works well. If other types of constraints are needed as guiding constraints, the AutoAR program should be modified to use these constraint types.
177
IR spectrum of component #8 100.----.----._--~._--~--_,----~--~r_--~----~--~
90 80
70 60
50 40 30 20
50
100
150
200
250
300
350
400
450
500
Figure 8.3 The IR absorbance spectrum of the largest component reconstructed by AutoAR from a set of 300 kidney stones. The numbers on the x-axis are simply the positions in the vector holding the values. The IR data have been collected between wavenumbers of 4000 to 400 cm-t. The spectrum is displayed so that the smaller wavenumbers are located on the left hand side in the figure. The y-axis is scaled here to have the value 100 for the highest peak.
For this kind of data the background values can be very high along the nonspectral axis. Here it is important to provide sufficient dynamic range for AutoAR to find the proper backgrounds.
8.4
Using OSCAR for spectra of discrete samples
The last area where the OSCAR approach has been used is isolating the constituent spectra in batches of distinct samples with no internal continuity in the second dimension. This situation arises when the only possible manipulation that can be made on a sample is taking the spectrum. The samples can represent totally different specimens that have no relation to each other. The only link between the individual samples is that they contain similar components. The proportions of the components vary from sample to sample. The samples can come from individual patients or the environment; there is no need for continuity between them.
178
IR spectrum of component #4 100r----.----.----,1I---.----.----.-----r----.----.----~
90 80 70
60 50 40 30 20 10 50
100
150
200
250
300
350
400
450
500
Figure 8.4 The IR absorbance spectrum of the component number four found by AutoAR from a set of 300 kidiley stones. The numbers on the x-axis are the positions in the vector holding the values. The data have been collected between wavenumbers of 4000 to 400 em-i. The wavenumbers grow from the left to the right. The y-axis is scaled to have the value 100 for the highest peak in the spectrum.
The second situation where we have two-dimensional data matrices without continuity in the second non-spectral dimension is process control data or quality control data. Samples are taken from a reaction vessel after some manipulation or addition. There is no continuous scanning of spectra that would form a continuity between the successive spectra. The time intervals between discrete samples are typically irregular. The analysis with AutoAR is possible in these cases as well. The unimodality requirement in "core AR" is turned off and the data are analyzed in the conventional way. The result is a set of spectra and concentrations that are maximally reproducible. They may not always correspond to the physical spectra but they represent the solution that is most reproducible, based on the current data set. If two components are present in a constant concentration ratio in all samples, their spectra are fused by the method. There is no way to take them apart because the raw data contain no information about them separately. Still, the analysis of the sum spectra can be rewarding because it gives us ideas about the mechanisms involved. Even in those cases where we notice the fusion of more than one component because of equilibria in the mixtures the knowledge
179
Species #4 30
E
j20
~
~ 10
a:
100
SO
150
200
250
-
300
3SO
450
400
soo
0.08 0.06 0.04 0.02
1
.L .• 50
II
01.
l•
.I.lI.J
100
I
I I 150
, Hl 200
l
.~
2SO
300
350
Figure 8.5 The location of the component number four in the data set. The upper figure shows the IR spectrum. The vertical lines in the lower figure show the amounts of the component in the set of the 300 samples. This component is much more rare in the data set than the component in Fig. 8.3 which is almost evenly spaced through the set.
obtained about the equilibria is very useful. We might notice that experiments should be made with shorter intervals to catch transients. There are limits to how much we can observe in a given situation. These limitations should not discourage us from fully analyzing all data that we obtain. The purpose of the computation is to gain more insight into the nature of what is happening behind our observations. In our example, IR spectra from kidney stones were subjected to processing with AutoAR. The analysis of their composition is useful in the diagnosis and treatment of kidney stones. The unimodality was turned off, so no sort was made. The analysis was done using eight components. The AutoAR "discovered" the eight component spectra that could be used to explain the 300 sample spectra. The component spectra found with OSCAR could be identified in the spectral search of the kidney stone library (Mattson KS Library, AT! Mattson). The kidney stone library contains systematic mixtures of the inorganic compounds found in kidney stones. Three forms of oxalate were found, the major form having a correlation coefficient of 0.98 with the library compound (calcium oxalate, 10 percent dihydrate). This component was responsible for 26 percent of the total absorbance
180
present in the set of the 300 spectra. The two other forms had slightly lower correlations with their library counterparts. They represented 20 and 22 percent of the total amount. Component five had a correlation coefficient of 0.97 with a library spectrum obtained from a mixture of magnesium ammonium phosphate (95 percent) and calcium phosphate (5 percent). It was present in the data set with 15 percent of total. Component four correlated with 0.96 with a mixture of uric acid (90 percent) and monosodium urate (10 percent). Its amount in the data set was 8 percent. Component three correlated at 0.97 with a mixture of hydroxylapatite (65 percen.t) and apatite (35 percent). Its amount represented 6 percent of the total. The smaller components-their amounts were just above one percent-had lower but significant correlations with the library spectra. Because the unimodality requirement had been turned off in the program, we could use the program on a transposed observation matrix. The transpose was used to facilitate the comparison of individual spectra with the sum of all spectra.
References Karjalainen EJ. Mathematical isolation of component spectra in UVNIS and GClMS. How unique are the resolved spectra? Journal Pharmaceutical & Biomedical Analysis 1991; 9: 637-641. Karjalainen EJ, Karjalainen UP. Component reconstruction in the primary space of spectra and concentrations. Alternating regression and related direct methods. Analytica Chimica Acta 1991; 250: 169-179. Karjalainen EJ, Karjalainen UP. Simultaneous analysis of multiple chromatographic runs and samples with alternating regression. Chemometrics and Intelligent Laboratory Systems 1992; 14: 423-427. Papas AN, Tougas TP. Accuracy of peak deconvolution algorithms within chromatographic integrators. Anal Chem 1990; 62: 234-239. Tauler R, Kowalski B, Fleming S. Multivariate curve resolution applied to spectral data from multiple runs of an industrial process. Anal Chem 1993; 65: 2040-2047. Windig W, Meuzelaar HL. Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures. Anal Chem 1984; 56: 2297-2303.