N, and N = NY, then there is an excess of non-aligned peaks in X and accordingly s( X, Y) = N,,/N, < 1. Clearly the measure depends on how a peak and its start and end are defined. We use the algorithm [5] for this purpose. Thus peaks are set by a threshold z and a lower threshold z’ is used to set start and end 287
n
Chemometricsand IntelligentLaboratorySystems
Y(t)
Chromatography
systems
II
cl
HPLC
t
FPLC
lb!r-l d
output: printer pl0tt.S
I I Fig. 4. The computer system used to collect and to feed data to a mainframe AMDAHL computer for processing by CHAS.
+ X(t) Fig. 3. When chromatogram
a peak of chromatogram X and Y are considered to be aligned.
a peak
of
times. CHAS computes s(X, Y) for any number of z values for a fixed z’. An average is also computed as follows
tern (HPLC, Applied Chromatography Systems, Macclesfield, U.K.) and a fast protein liquid chromatography system (FPLC, Pharmacia Biotechnology, Uppsala, Sweden), though the program has sufficient flexibility to handle data generated by any means. Fig. 4 illustrates the interface between our chromatography systems and eventual
(3)
the summations being over values of z. CHAS can generate a matrix of similarity/distance measures between a collection of chromatograms and the matrix can be used as input for some further analysis. For example, a cluster analysis could be done to obtain natural groupings of chromatographic patterns [4].
AN APPLICATION time
We developed CHAS to analyse data generated by a high-performance 288
liquid chromatography sys-
Fig. 5.
(hours1
Original Research Paper
&
-_
r’
(b)
+._ G-z -t-i-f-L__ I I
0
20
40
60
I
I
!
I
SACRYlR SACRYZR SACRY3C SACRYSC SACRY!X ShCRY6C
100
tlrnutea
Fig. 5 (continued).
analysis by CHAS. A British Broadcasting Corporation (BBC) microcomputer was used for data logging on to floppy disks and data files subsequently uploaded to Leeds University’s AMDAHL computer for analysis by CHAS. Logging was done from the input terminals of a chart recorder and data were fed to an analog-to-digital (AD)
channel of the BBC micro via a 741 operational amplifier. An elution gradient was recorded by a second AD channel of the BBC and a third analog channel was used to detect the start and end of a run and to cater for automatic recording of multiple runs. Software for the BBC-micro was written in 289
n
Chemometrics
L
and Intelligent
-
Laboratory
Systems
-i
Fig. 5 Chromatograms illustrating separation by Sephacryl 300. (a) Gel filtration chromatogram on Sephacryl 300 from which six fractions, A-F, were extracted for subsequent analysis. (b) Computer display of FPLC chromatograms of fractions A-F. (c) A computer rearrangement of Fig. 5b after resealing absorption values and removal of 90% of baseline drift of A. A = SACRYIR; B = SACRY2R; C = SACRY3C; D = SACRY4C; E = SACRYSC; F = SACRY6C.
BBC
Original
Example 1: Monitoring serum separation using Sephactyl300
Research
Paper
w
As a first step in a series of experiments a Sephacryl gel filtration medium was used to separate serum that had been depleted of albumin by absorption on Blue Sepharose. The separation was evaluated by taking six fractions, A-F, of the gel filtration chromatogram (Fig. 5a) and analysing each of these on an FPLC system using a Superose 6 gel filtration column to separate by molecular weight. The FPLC chromatograms were captured and reduced, as described above, and appended to a master file using CHAS. A computer generated plot of the FPLC chromatograms of fractions A-F is shown in Fig. 5b. A more informative layout is shown in Fig. SC, where the chromatograms have been scaled to adjust for differing sample concentrations and some baseline of A has been removed. This shows each peak eluting at a later time and, as the Superose 6 medium separates by molecular weight, the Sephacryl 300 fractions A-F are also substances with increasing molecular weight.
Q chromatogram with the IgG removed is also shown in Fig. 6a (SEPNlC) and so too is the chromatogram for the fraction bound to the affinity column (SEPN2C). The first peak of SEPKlC represents the IgG in the sample; it is clear that this has been effectively removed by the affinity column, whilst that bound to the column is almost pure IgG. A clearer layout for these chromatograms, with areas marked, is shown in Fig. 6b. The high baselines of chromatograms SEPKlC and SEPNZC have been removed and they have been resealed. Table 1 gives the similarity matrices between these three chromatograms for the correlation measure r(X, Y) and the averaged peak alignment measure, eq. 3. The values concur with a subjective assessment of similarity on a zero-toone scale. Another computer generated display is shown in Fig. 6c which, although no more informative in this example, demonstrates a feature of CHAS that may be of use. We have used this type of display to illustrate evolving urinary chromatogram patterns following burn injury [13] and the facility could be of value for chromatograms derived from multi-channel detectors.
Example 2: Binding of Immunoglobin G on a Protein A column
CONCLUSION
A further fraction, G, of the Sephacryl 300 chromatogram in Fig. 5a was subsequently analysed by anion exchange on a Mono Q HR 5/5 column. This gave the full line chromatogram in Fig. 6a (with identifier SEPKlC). The immunoglobin G (IgG) in the sample was then partially removed from this fraction by passing it through a protein A affinity column. The MONO
There are a number of aspects of the CHAS system which are useful for processing chromatographic data, or any short duration electrophysical waveform, which are not available together on other systems. First, it is for data management. The amount of data that may be generated to adequately represent a “library” of waveforms may be quite formidable and the availability of a
TABLE Similarity
1 measures
of the chromatograms Chromatogram
in Fig. 6b evaluated
over the time interval
Alignment
SEPNZC l
0.862 1
Averaged, using eq. 3, over thresholds ** Using eq. 2.
l
minutes
identifiers
SEPNlC
SEPKlC SEPNlC
2-23
Correlation 0.601 1 z = 0.005-0.03
l
*
Alignment
l
0.231 0.179 in steps of 0.005 and taking
Correlation
l
*
0.342 - 0.097 z’ = 0.001.
291
n
Chemometrics and Intelligent Laboratory Systems
reliable and efficient system to add, delete, access and display entries in the library is essential. CHAS is able to fulfill this role. In addition it possesses various procedures which are useful in chromatography and which are not found together in other systems. These include the capacity to detect, integrate and identify peaks, to remove baseline drift. to compute similarities between chromato-
grams, and to adjust chromatograms to correct for retention time perturbations so that peaks align with pre-set reference retention times. Also CHAS can be used to obtain a segmented chromatogram; a facility which, besides giving a compact representation, can be used as a smoothing filter. The program does not, however, possess a feature for dealing explicitly with noise, for instance, there is
0.200
(cl)
: I
tlrnutea
Fig. 6.
292
Original Research Paper
---_-
n
----SW-
(b)
3
5
10 Hlnutee
15
20
Fig. 6 (continued)
no capability for signal-to-noise evaluation. It is assumed that signals to be analysed by CHAS will have been pre-processed to some extent to filter noise, for instance, by analogue filtering at data capture. The program accordingly analyses them as they are presented to it; it makes no allowance for imprecision, inaccuracy or error to the signal. However, its facilities can be selectively used to correct for errors, for instance, by corrections for
baseline drift. In addition, the peak detection algorithm, which is an important feature of the system, is insensitive to signal noise and baseline drift and can be used to distinguish between “real” and “noisy” peaks. The program’s graphical facilities are also useful. The ability to instantly obtain a terminal display of a chromatogram and to superimpose another is extremely useful for chromatogram 293
H
Chemometrics
and Intelligent
A ‘3-dlmenelona~’
Laboratory
Systems
dlaphy.
0.300
(cl
Fig. 6 The binding of IgG by a Protein A column. (a) Chromatograms on a Mono Q column of the low molecular weight fraction G in Fig. 5a; SEPKlC = whole fraction (); SEPNlC = after passing through a Protein A column (- - -); SEPN2C = the sample bound to the Protein A column (------). (b) Another display of Fig. 6a after resealing and baseline drift removed and with peaks detected and areas marked. A = SEPKlC; B = SEPNlC; C = SEPNZC. (c) A three-dimensional display of Fig. 6b.
More elaborate displays of comparisons. chromatograms can be done as the examples given above indicate.
294
ACKNOWLEDGEMENTS
RJM and EHC were supported by a grant from the Yorkshire Cancer Research Campaign. We are grateful to Professor M. Wells and other members of Leeds University Computer ‘Service for their advice and encouragement.
Original
REFERENCES 1 P. Tarroux and T. Rabilloud, Complete computer system for processing chromatographic data, Journal of Chromatography, 248 (1982) 249-262. 2 D.L. Gustine and J. McCulloch, Versatile microcomputercontrolled, automated gradient analytical high-performance liquid chromatography system, Journal of Chromatography, 316 (1984) 407-414. 3 P.W. Banda, MS. Tuttle, L.E. Selmer, Y.T. Thatachari, A.E. Sherry and MS. Blois, Data processing of urine chromatograms for clinical management of melanoma, Computers and Biomedical Research, 13 (1980) 549-566. 4 R.J. Marshall, R. Turner, H. Yu and E.H. Cooper, Cluster analysis of chromatographic profiles of urine proteins, Journal of Chromatography, 297 (1984) 235-244. 5 R.J. Marshall, The determination of peaks in biological waveforms, Computers and Biomedical Research, 19 (1986) 319-329. 6 M.L. McConnell, G. Rhodes, U. Watson and M. Novotny, Application of pattern recognition and feature extraction techniques to volatile constituent metabolic profiles obtained by capillary gas chromatography Journal of Chromatography, 162 (1979) 495-506.
Research
Paper
n
7 H.A. Scoble, J.L. Fasching and P.R. Brown, Chemometrics and liquid chromatography in the study of acute lymphocytic leukemia, Analytica Chimica Acta, 150 (1983) 171-181. P.R. Brown and H.F. Martin, 8 H.A. Scoble, M. Zakaria, Liquid chromatographic profile classification of acute and chronic leukemias, Computers and Biomedical Research, 16 (1983) 300-315. 9 M.E. Parrish, B.E. Good, M.A. Jeltema and F.S. Hsu, Pattern recognition and capillary gas chromatography in the analysis of the organic gas phase of cigarette smoke, Analytica Chimica Acta, 150 (1983) 163-170. 10 K.G. Beauchamp, Signal Processing using Analog and Digital Techniques, Allen-Unwin, London, 1973, p. 41. 11 R.J. Marshall, A manual for CHAS, Internal report of the Cancer Research Unit, University of Leeds, 1985. 12 J.H. Van Bemmel, Biological signal processing, in D. Ingram and R. Bloch (Editors), Mathematical Methods in Medicine, Part 1, Wiley, New York, 1984, pp. 225-272. 13 H. Yu, R.J. Marshall, E.H. Cooper and J. Settle, Tubular proteinuria after bum injury, in G. Lubec and V. Campese (Editors), Advances in Non-invasive Nephrologv, John Libby, London, Paris, 1985, pp. 187-190.
295