ANALYTICAL
BIOCHEMISTRY
51, 180-192 (1973)
A Fortran IV Computer Program for Automatic of Amino Acid Composition1 R. TAYLOR The
Lord
Rank
Research
AND
Centre,
Calculation
M. G. DAVIES
Lincoln
Road,
High
Wycombe,
England
Received May 19, 1972; accepted June 23, 1972
In food research there is an important requirement for the rapid determination of amino acids. Large numbers of samples for amino acid analysis are produced by studies on the optimization of animal feed constitutents and the development of novel sources of protein. Several methods of automating the data processing of amino acid analysis have been published. Graham and Sheldrick (1) described a program in KDF 9 Algol which processes a height and width of each peak obtained manually from the chart. More recently, Starbuck et al. (2), Ozawa and Tanaka (3)) and Gerding (4) developed programs utilizing the data obtained by manual calculation or by an integrator. Data processing was further automated by the use of logging equipment. Porter and Talley (5), Yonda et al. (6)) and Krichevsky et al. (7) used methods whereby the voltage was digitized from the photometer circuit, and a punched paper tape was produced which could be processed by computer to produce peak areas. Cavins and Friedman (8) recorded the photometer output on magnetic tape, which was played into an electronic integrator, producing punched paper tape for processing on an IBM 1130 computer. A relatively simple system was described by Robins et aZ. (9). A data logger is used to sample the voltage from a retransmitting slide wire. The output tape is processed using a program written in Elliott 803 Algol. This follows exactly the manual calculation according to the t,rapezium rule and gives a printout of areas in the same units. An attempt has been made by Exss et al. (10) to identify as well as quantitatively determine amino acids from the data provided by an integrator attached to a standard Technicon analyzer using a 22 hr run. The program presented here is written in Fortran IV for use with an IBM 1130 comput,er. The output produced gives the peak areas, identifies up to thirty amino acids, and prints out per cent composition of the ‘A complete request.
listing
of the program
in Fortran
180 Copyright @ 1973 by Academic Press, Inc. All rights of reproduction in any form reserved.
IV can be made available
on
AMINO
ACID
COMPUTER
PROGRAM
181
sample. Intervention by the operator during the calculation and the punching of cards prior to computer processing have been reduced to a minimum. MATERIALS
AND
METHODS
Chromatography
System
Amino acids are separated on a single column 40 X 0.6 cm of Zeo-Karb 225 spherical resin, 8% cross-linked. The resin is fractioned to 10-12 p using an elutriation device. The column is thermostated at 60°C and the buffer flow rate is 0.4 ml/min, giving a pressure of 350 psi. Gradient elution is controlled by a programmer described by Thomas (11). The programmer converts an analog record in the form of a black area on a rotating drum into the mixing of two sodium citrate buffer solutions, one of pH 2.20 and 0.2 M Na+, the other of pH 12.25 and 0.4 M Na+. The drum is scanned by a photoconductive cell on a moving head. Other photoconductive cells scan light tracks which control the reversal and resetting of the scanner, the addition of methanol to the buffers, the operation of an automatic sample loader, and the replacement of ninhydrin with a wash of 50% methyl Cellosolve to clean the analytical system between each analysis. Two columns are operated simult,aneously and the effluent is analyzed with a ninhydrin/hydrazine sulfate reagent, using a Technicon AutoAnalyzer. Samples are loaded in 0.1 N hydrochloric acid on the low pressure side of the pump. This gives a tight application band, and better resolution of aspartic acid, threonine, and serine is obtained than when pH 2.20 sodium citrate buffer is used. The stock acidic buffer solutions are passed through a column of Zeo-Karb 225 ion exchange prior to use. Data Acquisition
System
The voltage from a chart recorder modified as described by Davies and Watts (12) is sampled by a digitial voltmeter (Solartron Electronic Group type LM 1420.2). Pulses are generated at a rate proportional to the input voltage, and the numbers generated during a specific period are registered on a counter. This information is converted into binary coded decimal form and punched out on paper tape. Alternate readings are taken from two traces on the recorder corresponding to the two analyses being performed. One trace generates positive integers and the other trace negative integers. Successive readings are separated by an “end-ofrecord” symbol. Since one analysis cycle takes 3.5 hr, ten chromatograms are produced overnight, At the completion of a chromatogram, the wash period causes
182
TAYLOR
AND
DAVIES
a drop in the baselines on the recorder, being punched on the data tape. This is a that data input should be halted and the overnight production can be processed in
resulting in values of + 1BBB signal to the computer program stored data processed; thus the one run on the computer.
Sample Preparation For proteins, 100 mg of sample ground to 70 mesh is refluxed under nitrogen wit’h 100 ml redistilled 6 N hydrochloric acid. The hydrolyzate is filtered, 10 ,ug of norleucine per milligram of sample is added, and hydrochloric acid is removed using a rotary evaporator. The residue is dissolved in 0.1 N hydrochloric acid and analyzed. For the determination of free amino acids in cell extracts and fermenter filtrates, interfering materials are moved prior to analysis by precipitation, gel filtration, ultrafiltration, or ion exchange. A preliminary analysis without norleucine is performed to establish whether an interfering peak occurs at that position on the chromatogram. DESCRIPTION
OF
THE
PROGRAM
The object of the program is to recognize the occurrence of peaks, calculate each peak area, and identify each peak as fare as possible. The concentration of internal standard (norleucine) used in an analysis is related to the amount of sample analyzed in such a way that the program, having identified a peak, can continue to calculate the percentage, residue percentage, and percentage nitrogen for each amino acid. DATA
INPUT
Predata The predata are punched on 9 cards. Card 1 contains a dilution factor. Cards 2-5 contain peak identifiers for each of the thirty amino acids in each analysis. These identifiers are obtained from the chromatogram, counting from the end of the wash period so that they are related to elution times. Cards 6-9 contain the standard color factors for each amino acid. These are the ratios of the peak areas to the area of the internal standard (the norleucine peak) for a known mixture containing 10 pg/ml of each amino acid. All predata are checked before each group of chromatograms is processed and are amended accordingly.
Main Data As the punched tape is read in the values are tested, and stored only if the absolute values are outside the range lO@l + 6. The data to be
AMINO
ACID
COMPUTER
PROGRAM
183
stored are sorted into negative, zero, and positive values. Zeros are ignored, otherwise each value is transformed by taking logarithms and multiplying by 1@&36.6. Finally, each value is converted to integer to save storage space and stored in one of two arrays depending upon its algebraic sign. If two or more numbers have the same sign, all but the first are ignored. If this situation occurs more than ten times before cessation of data reading, an error message is output and further calculation aborted. Data reading is stopped in any one of three ways: (a) When at least one of the arrays is full. (b) When successive positive and negative values of 10@f occur and more than 106 points have been stored. (c) When successive positive and negative values of 10613 occur and less than l@l points have been stored but only provided more than ten instances have occurred of adjacent numbers having the same sign. After the output the next section of tape is read in. If the first value outside the range 1666 + 6 is 9999 this is the indication that no more chromatograms are to be read. The program will return control to monitor. DATA
MANIPULATION
See the flow diagram below. Each array is taken in turn and processed to the output stage. If the array contains seven or less points the following calculations are by-passed: 1. Mispunch routine. Mispunches caused by equipment malfunction occur very infrequently but when they do occur are immediately obvious. Such values are replaced by the average of the values on either side. 2. Smoothing routine. The points are taken seven at a time moving along one point each time. A curve of the form y = ax2 + bx + c is fitted to the seven points, where x is chosen to vary from -3 to +3 in steps of 1. An estimate of y (YO) is made at x = 6 and it can be shown that this is given by the equation:
yo= 5 + ; (Yt+ Y6) +
(!b + !b) _ 2(Y12f ?17) 7
The first of the seven points (yl) is replaced by y. and the procedure repeated with the next set of seven points. 3. Differencing routine. Differences are taken between adjacent points and stored in the difference array. The differences are converted back to floating point numbers by dividing by 1000pI.0. 4. Peale start test. Successive differences are tested and a peak is as-
184
TAYLOR
AND
DAVIES
NPSfI, PEAK AREA AREA BASELlNE
FIG.
=
NPF,I,
TOTAL PEAK AREP.
UNDER =
1. Area
AREA UNDER BASELINE.
““‘;
ps”l)*
(NPFlI)-
NW,,)
(
calculation
of
a single
peak.
sumed to start when a zero or positive difference is followed four places later by a difference equal to or greater than @&@9. That is. DF IIF
and
(I) $ 0.0 (I + 4) 3 0.0009
5. Area calculation (Fig. 1). Once a peak start has been found the peak area calculation commences. This area includes the area below the baseline and is calculated using the t’rapezium rule: AREA
= ; I (FIR,ST + LAST) + INTERMEDIATES
(N.B. The horizontal
increment is taken as 1.B.)
NPSlll NPF,,)
FIG.
2. Area
calculations
NPSIBI NIT, 2,
NPS (41 NPFtBJ
of peaks
NPSISI NPF14)
in a multipeak
NPFIS,
group.
AMINO
ACID
COMPUTER
PROGRAM
185
Area calculation continues until a peak end is encountered-see below. 6. Peak end test (Fig. 2). A peak end is distinguished by two conditions, both of which can occur only when a negative difference is followed immediately by a zero or positive difference: l)F 1)F
and
(a) Pe,ak end distinguished
(I) < 0.0 (I + I) 3 0.0
by new peak start: DF (1 + 2) > 0.0
and
DF
(b) Peak end distinguished Either and
(I + 3) 3 0.0005
by return to baseline. DF DF
(I + 2) = 0.0 (I + 3) < 0.0
or and
IIF (I + 2) > 0.0 0.0 6 DF (I + 3) < 0.0005
Condition (a) occurs when a new peak starts before the previous peak has returned to baseline. A note is made of the changeover point and area summation continues until condition (b) is encountered. When a return to baseline occurs, the actual peak areas are calculated by subtracting the area under the baseline from the total area. In the case of a single peak this merely involves calculating the area of the trapezium formed by the peak start and end points (see Fig. 1). However, the calulation is more difficult for a multipeak group. An estimate must be made of the baseline position within the group and allowance made for a fluctuating baseline (see Fig. 2). Up to 80 individual peaks can be accommodated. 7. Peak identification. First, the array of peak identifiers is used to identify the norleucine peak; if no suitable peak can be found, the norleucine peak area is assumed to be 1.0 and an error message to this effect is output. Next, each of the peaks is tested in turn. Those whose area is <11~~.0 are ignored. The areas of the other peaks are then divided by the norleucine peak area. Each peak is identified, where possible, using the array of peak identifiers and the norleucine ratio is divided by the standard color factor to give the percentage of the amino acid in the sample. The value is then multiplied by the appropriate factor to give the residue per cent and per cent amino nitrogen. OUTPUT
For each peak the following details are printed out: peak number; peak start; peak end; area; ratio (i.e., ratio of peak area to norleucine peak
186
TAYLOR
AND
DAVIES
START
I
j
READ PREDATA
OUTPUT
AN
lNlTlALlSE .
ERROR
_ -4
MESSAGE
VARIABLES I
TRANSFORM
IN
ARRAY
LOGICAL
FLOW
DIAGRAM
2111
AMINO
ACID
COMPUTER
187
PROGRAM
TRANSFORM AND IN
21-21
STORE
IT
ARRAY
A
/DOES\ NO
B CONTAIN
NO z
(I!
=
2
-,
12’
YES
L t.
WORK
ON
r
ARRAY
A
COUNT
AN0
CORRECT
ANY
M ,SPUNCHES
OIFFERENCES AOJAC’YT
OF “AWES
0 F
Flow diagram
continued
188
TAYLOR
AND
DAVIES
VF SEARCH PEAK
FOR
A
START
TRY TO IDENTIFY ALL OTHER
PEAKS
I CALCULATE PARAMETERS DENTWED
Flow diagram
continued
OTHER FOR PEAKS
-
:
AMINO
ACID
COMPCTER
PROGRAM
189
area) ; peak name (where possible) ; per cent (of amino acid in original samples) ; residue per cent amino nitrogen. The information for the last three parameters is given only for those peaks which have been identified. No output appears if there are seven or less points in the array. In addition to the main output, certain “error” messages may appear. These are as follows: An asterisk over the “peak number” column indicates that the data readin terminated while a peak end search was still in progress. Normally this will be unimportant but could signify that the data storage arrays were filled before all the important peak data were read in. If the norleucine peak cannot be identified, a message to this effect is output over the result. DISCUSSION
The system described here offers a relatively cheap solution to the problem of calculation of results from large numbers of column chromatographic separation. First, the capital outlay on the data acquisition system for two channels is comparatively low. Second, the use of paper tape translation to an off-line computer for the processing of lo-12 chromatograms in one period reduces running costs. By sampling data at a higher rate the system could be adapted for very high speed liquid chromatography. Also, it could be applied to other column chromatographic separations such as the analysis of proteins, peptides, carbohydrates, and nucleotides. A typical computer printout for two chromatograms is shown in Fig. 3. Calculation of norleucine ratios is carried out for all the peaks detected on the chromatograms. If the peak is not one of the thirty compounds commonly encountered in the mixtures analyzed, the space in subsequent columns is left blank. Since all the stages necessary for the calculation of results are linked together in one program, the necessity for punching out peak areas separately for further calculation is eliminated. In addition, the interpretation of chromatograms is made easier for technicians. The variation of elution times of amino acids which occurs with large variations in concentration is reduced by running similar samples successively. In practice, identifier values are checked daily and, if necessary, up-dated for the particular batch of samples analyzed. This can be done either by visual inspection or by taking the values directly from a computer printout. In the case of protein analysis, correction for hydrolytic losses of amino acids has to be determined for each particular protein separately. Consequently, correction factors were not incorporated in the computer program.
190
TAYLOR
COLUMN MISPUNCHES
AND
DAVIES
1
1 PEAK NO*
1 2 3 4 5 6 7 R 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27
PEAK START
402 438 464 503 646 680 737 746 005 888 904 942 944 981 1057 1072 1113 1155 1190 1224 1294 1336 1511 1554 1580 1608 1662
PEAK END
437 463 502 585 b79 736 742 785 087 903 914 943 980 1056 1071 1112 1154 1189 1223 1264 1335 1340 1553 1579 1607 1659 1731
AREA
RATIO
8.1641 4.7805 6 a0263 11 a9638 6.4958 11 r4462 0.1430 0.2437 4.6912 0.1729 011449 0.1272 2.8519 6 87254 0.3377 3.7700 6.4363 2a1791 2.4580 2.9899 10 2844 0.1468 2.4952 0.2933 6.6411 11 a7524 204264
PEAK NAME
3r7465 2.1937 2.7655 504902 219809 5.2526 010556 OalllH 2.1528 0.0793 0.0665 0.0583 le3087 3r0863 0.1549 1.7301 2e9536 1.0000 1.1280 la3721 0.5894 0.0673 1.1453 0.1345 360476 5.3932 la1134
PCT.
RESIDUE l?Cf*
ASP THR SER GLU GLY ALA
4.3615 2.3044 2.3436 6.5830 la9899 3.4924
3.7727 199564 le9428 5.7799 1.5123 2.7069
0.4579 0.2719 0.3117 0.6253 0.3721 0.5483
CYS VAL
Oe2025 2.0502
0.1874 1.7345
0.0237 0.2460
MET GNn2
1.6800 396567
1.4767 3a6567
0.1579 0.2852
ILE LEU NLE TYR PHE ABA
lrH116 3.1288 1.0000 la6183 lr7546 0.9662
1.5634 2.7002 1.0000 1.4581 1.5633 0.7971
0.1938 0.3347 1.0000 0.1246 0.1491 0.1314
HIS ORN LYS NH3 ARC
1.4661 0.1583 3.2805 5.9924 1.4966 m------B
1.2960 Oe 1368 2aH770 5.9924 1.3424 -m--m---
0.3973 0.0335 0.6296 4.9377 0.4819 -----mm-
51.3306
45.5339
:1.7145
TOTALS FIG.
3. Computer
printout
PC10 AMlNO N2
for two chromatograms.
SUMMARY
A computer program is described which enables amino acid tions to be calculated automatically and expressed in a variety Calculations have been performed on protein hydrolyzates extracts. The system is relatively simple and economical and applied to other column chromatographic analyses.
composiof units. and cell could be
ACKNOWLEDGMENT The authors wish to thank the Director permission to publish this paper.
of the Lord Rank Research Centre
for
AMINO
COLUMN MISPUNCHFS
ACID
COMPUTER
191
PROGRAM
2
-NONE-
PEAK NO.
1
2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27
PEAK START
359 371 397 439 465 504 649 680 807 948 1071 1114 1157 1192 1226 1275 1287 1296 1423 1497 1509 1553 1579 1606 1659 170A 1732
PEAK END
370 386 43e 464 503 577 679 738 893 1070
AREA
1605 1658 1707
0.1111 0.1240 7.7086 4 r4067 5.5424 10.8505 5.8420 1019615 4a2112 9r9116 3.6433 6r2576 2.4102 2a5601 3a2292 0.1892 0.1822 1.4696 0.1132 0.1651 2 t6525 0.4751 5.9944 14.6912 5 a8648
1711 1761
0 r2126
1113
1156 1191
1225 1274 1286 1295 1343 1437 1508 1552 1578
C.1405
RATIO
010460 010514 3rl982 lr8282
2a2995 4a5017
244237 4r5477 lr7471 4r1121 1.5115
215962 1.0000
lr0621 1.3397 010784 080756 016097 0.0469 0.0685 191005 0.1971 2.4870 6.0952 2.4332 010583 0.0882
PEAK NAME
PCT.
HYP MS02 ASP THR SER GLU GLY ALA VAL GNH2 ILE LEU NLE TYR PHE
la0975
0.9471
010444 3.7231 1.9204 la9487 5*3977 1.6180 3.0237 1.6639 4e8722 185628 2.7502 1.0000 1*5238 1.7132
0.0400 3.2205 1.6304 1.6155 4.7392 1.2296 2.4129
ABA
HIS ORN LYS NH3 ARG
PCT. AMINO Nt
o* 1174 0.0034 0.3909 Oe2266 0~2591 015127 0.3025
1.4077 4r8722
084747 011996 013800
la3659 213734
0.1693 0.2942
1.0000
1*0000
1.3730 1.5265
011173 0.1456
0.9995
0.8246
0.1359
lb4090 0.2319 2.6770 6.7724 3r2704
1.2456 0.2003
0*3818
-wwwwww TOTALS
RESIDUE PCTr
49r2410
2.3478 6a7724 2t9336
0.0491 O-5140 5a5805 1*0530
wwwwwwww
wwwwwwww
44ao791
12.3085
FIG. 3 (Continued) REFERENCES 1. GRAIIAM, G. N., AND SHELDRICK, B. (1965) Biochem. 1. 96, 517. 2. STARBUCK, W. C., MAURITZEN, C. M., MCCLIMANS, C., AND BUSCH, H. (1967) Anal. Biochem. 20, 439. 3. OZAWA, K., AND TANAEA, S. (1968) Anal. Biochem. 24, 270. 4. GERDING, J. J. T. (1969) Znt. J. Protein Res. 1, 169. 5. PORTER, W. L., AND TALLEY, E. A. (1964) Anal. Chem. 36, 1692. 6. YONDA, A., FILMER, D. L., PATE, H., ALONZO, N., AND HIRS, C. H. W. (1965) Anal. B&hem. 10, 53. 7. KRICHEVSKY, M. T., SCHWARTZ, J., AND MACE, M. (1964). Anal. Biochem. 12, 94. 8. CAVINS, J. F., AND FRIEDMAN, M. (1968) Cer. Chem. 45, 172. 9. ROBINS, A. J., EVANS, R. A., SIRIWARDENE, J. A. DE S., AND THOMAS, A. J. (1966) Biochem. J. 99, 46P. 10. Exss, R. E., HILL, H. D., AND SUMMER, G. K. (1969) J. Chromatogr. 42, 442.
192
TAYLOR
AND
DAVIES
11. THOMAS, A. J. (1970) in “Automation, Mechanization and Data Handling in Microbiology” (Baillie, A., and Gilbert, R. J., cds.), (The Society for Applied Bacteriology Tech. Ser. So. 4), p. 107. Academic Press, New York. 12. DAVIES, M. G., AND WATTS, D. (1971) Lab. Practice 20, 4, 324.