A fortran IV computer program for automatic calculation of amino acid composition

A fortran IV computer program for automatic calculation of amino acid composition

ANALYTICAL BIOCHEMISTRY 51, 180-192 (1973) A Fortran IV Computer Program for Automatic of Amino Acid Composition1 R. TAYLOR The Lord Rank Resear...

610KB Sizes 42 Downloads 80 Views

ANALYTICAL

BIOCHEMISTRY

51, 180-192 (1973)

A Fortran IV Computer Program for Automatic of Amino Acid Composition1 R. TAYLOR The

Lord

Rank

Research

AND

Centre,

Calculation

M. G. DAVIES

Lincoln

Road,

High

Wycombe,

England

Received May 19, 1972; accepted June 23, 1972

In food research there is an important requirement for the rapid determination of amino acids. Large numbers of samples for amino acid analysis are produced by studies on the optimization of animal feed constitutents and the development of novel sources of protein. Several methods of automating the data processing of amino acid analysis have been published. Graham and Sheldrick (1) described a program in KDF 9 Algol which processes a height and width of each peak obtained manually from the chart. More recently, Starbuck et al. (2), Ozawa and Tanaka (3)) and Gerding (4) developed programs utilizing the data obtained by manual calculation or by an integrator. Data processing was further automated by the use of logging equipment. Porter and Talley (5), Yonda et al. (6)) and Krichevsky et al. (7) used methods whereby the voltage was digitized from the photometer circuit, and a punched paper tape was produced which could be processed by computer to produce peak areas. Cavins and Friedman (8) recorded the photometer output on magnetic tape, which was played into an electronic integrator, producing punched paper tape for processing on an IBM 1130 computer. A relatively simple system was described by Robins et aZ. (9). A data logger is used to sample the voltage from a retransmitting slide wire. The output tape is processed using a program written in Elliott 803 Algol. This follows exactly the manual calculation according to the t,rapezium rule and gives a printout of areas in the same units. An attempt has been made by Exss et al. (10) to identify as well as quantitatively determine amino acids from the data provided by an integrator attached to a standard Technicon analyzer using a 22 hr run. The program presented here is written in Fortran IV for use with an IBM 1130 comput,er. The output produced gives the peak areas, identifies up to thirty amino acids, and prints out per cent composition of the ‘A complete request.

listing

of the program

in Fortran

180 Copyright @ 1973 by Academic Press, Inc. All rights of reproduction in any form reserved.

IV can be made available

on

AMINO

ACID

COMPUTER

PROGRAM

181

sample. Intervention by the operator during the calculation and the punching of cards prior to computer processing have been reduced to a minimum. MATERIALS

AND

METHODS

Chromatography

System

Amino acids are separated on a single column 40 X 0.6 cm of Zeo-Karb 225 spherical resin, 8% cross-linked. The resin is fractioned to 10-12 p using an elutriation device. The column is thermostated at 60°C and the buffer flow rate is 0.4 ml/min, giving a pressure of 350 psi. Gradient elution is controlled by a programmer described by Thomas (11). The programmer converts an analog record in the form of a black area on a rotating drum into the mixing of two sodium citrate buffer solutions, one of pH 2.20 and 0.2 M Na+, the other of pH 12.25 and 0.4 M Na+. The drum is scanned by a photoconductive cell on a moving head. Other photoconductive cells scan light tracks which control the reversal and resetting of the scanner, the addition of methanol to the buffers, the operation of an automatic sample loader, and the replacement of ninhydrin with a wash of 50% methyl Cellosolve to clean the analytical system between each analysis. Two columns are operated simult,aneously and the effluent is analyzed with a ninhydrin/hydrazine sulfate reagent, using a Technicon AutoAnalyzer. Samples are loaded in 0.1 N hydrochloric acid on the low pressure side of the pump. This gives a tight application band, and better resolution of aspartic acid, threonine, and serine is obtained than when pH 2.20 sodium citrate buffer is used. The stock acidic buffer solutions are passed through a column of Zeo-Karb 225 ion exchange prior to use. Data Acquisition

System

The voltage from a chart recorder modified as described by Davies and Watts (12) is sampled by a digitial voltmeter (Solartron Electronic Group type LM 1420.2). Pulses are generated at a rate proportional to the input voltage, and the numbers generated during a specific period are registered on a counter. This information is converted into binary coded decimal form and punched out on paper tape. Alternate readings are taken from two traces on the recorder corresponding to the two analyses being performed. One trace generates positive integers and the other trace negative integers. Successive readings are separated by an “end-ofrecord” symbol. Since one analysis cycle takes 3.5 hr, ten chromatograms are produced overnight, At the completion of a chromatogram, the wash period causes

182

TAYLOR

AND

DAVIES

a drop in the baselines on the recorder, being punched on the data tape. This is a that data input should be halted and the overnight production can be processed in

resulting in values of + 1BBB signal to the computer program stored data processed; thus the one run on the computer.

Sample Preparation For proteins, 100 mg of sample ground to 70 mesh is refluxed under nitrogen wit’h 100 ml redistilled 6 N hydrochloric acid. The hydrolyzate is filtered, 10 ,ug of norleucine per milligram of sample is added, and hydrochloric acid is removed using a rotary evaporator. The residue is dissolved in 0.1 N hydrochloric acid and analyzed. For the determination of free amino acids in cell extracts and fermenter filtrates, interfering materials are moved prior to analysis by precipitation, gel filtration, ultrafiltration, or ion exchange. A preliminary analysis without norleucine is performed to establish whether an interfering peak occurs at that position on the chromatogram. DESCRIPTION

OF

THE

PROGRAM

The object of the program is to recognize the occurrence of peaks, calculate each peak area, and identify each peak as fare as possible. The concentration of internal standard (norleucine) used in an analysis is related to the amount of sample analyzed in such a way that the program, having identified a peak, can continue to calculate the percentage, residue percentage, and percentage nitrogen for each amino acid. DATA

INPUT

Predata The predata are punched on 9 cards. Card 1 contains a dilution factor. Cards 2-5 contain peak identifiers for each of the thirty amino acids in each analysis. These identifiers are obtained from the chromatogram, counting from the end of the wash period so that they are related to elution times. Cards 6-9 contain the standard color factors for each amino acid. These are the ratios of the peak areas to the area of the internal standard (the norleucine peak) for a known mixture containing 10 pg/ml of each amino acid. All predata are checked before each group of chromatograms is processed and are amended accordingly.

Main Data As the punched tape is read in the values are tested, and stored only if the absolute values are outside the range lO@l + 6. The data to be

AMINO

ACID

COMPUTER

PROGRAM

183

stored are sorted into negative, zero, and positive values. Zeros are ignored, otherwise each value is transformed by taking logarithms and multiplying by 1@&36.6. Finally, each value is converted to integer to save storage space and stored in one of two arrays depending upon its algebraic sign. If two or more numbers have the same sign, all but the first are ignored. If this situation occurs more than ten times before cessation of data reading, an error message is output and further calculation aborted. Data reading is stopped in any one of three ways: (a) When at least one of the arrays is full. (b) When successive positive and negative values of 10@f occur and more than 106 points have been stored. (c) When successive positive and negative values of 10613 occur and less than l@l points have been stored but only provided more than ten instances have occurred of adjacent numbers having the same sign. After the output the next section of tape is read in. If the first value outside the range 1666 + 6 is 9999 this is the indication that no more chromatograms are to be read. The program will return control to monitor. DATA

MANIPULATION

See the flow diagram below. Each array is taken in turn and processed to the output stage. If the array contains seven or less points the following calculations are by-passed: 1. Mispunch routine. Mispunches caused by equipment malfunction occur very infrequently but when they do occur are immediately obvious. Such values are replaced by the average of the values on either side. 2. Smoothing routine. The points are taken seven at a time moving along one point each time. A curve of the form y = ax2 + bx + c is fitted to the seven points, where x is chosen to vary from -3 to +3 in steps of 1. An estimate of y (YO) is made at x = 6 and it can be shown that this is given by the equation:

yo= 5 + ; (Yt+ Y6) +

(!b + !b) _ 2(Y12f ?17) 7

The first of the seven points (yl) is replaced by y. and the procedure repeated with the next set of seven points. 3. Differencing routine. Differences are taken between adjacent points and stored in the difference array. The differences are converted back to floating point numbers by dividing by 1000pI.0. 4. Peale start test. Successive differences are tested and a peak is as-

184

TAYLOR

AND

DAVIES

NPSfI, PEAK AREA AREA BASELlNE

FIG.

=

NPF,I,

TOTAL PEAK AREP.

UNDER =

1. Area

AREA UNDER BASELINE.

““‘;

ps”l)*

(NPFlI)-

NW,,)

(

calculation

of

a single

peak.

sumed to start when a zero or positive difference is followed four places later by a difference equal to or greater than @&@9. That is. DF IIF

and

(I) $ 0.0 (I + 4) 3 0.0009

5. Area calculation (Fig. 1). Once a peak start has been found the peak area calculation commences. This area includes the area below the baseline and is calculated using the t’rapezium rule: AREA

= ; I (FIR,ST + LAST) + INTERMEDIATES

(N.B. The horizontal

increment is taken as 1.B.)

NPSlll NPF,,)

FIG.

2. Area

calculations

NPSIBI NIT, 2,

NPS (41 NPFtBJ

of peaks

NPSISI NPF14)

in a multipeak

NPFIS,

group.

AMINO

ACID

COMPUTER

PROGRAM

185

Area calculation continues until a peak end is encountered-see below. 6. Peak end test (Fig. 2). A peak end is distinguished by two conditions, both of which can occur only when a negative difference is followed immediately by a zero or positive difference: l)F 1)F

and

(a) Pe,ak end distinguished

(I) < 0.0 (I + I) 3 0.0

by new peak start: DF (1 + 2) > 0.0

and

DF

(b) Peak end distinguished Either and

(I + 3) 3 0.0005

by return to baseline. DF DF

(I + 2) = 0.0 (I + 3) < 0.0

or and

IIF (I + 2) > 0.0 0.0 6 DF (I + 3) < 0.0005

Condition (a) occurs when a new peak starts before the previous peak has returned to baseline. A note is made of the changeover point and area summation continues until condition (b) is encountered. When a return to baseline occurs, the actual peak areas are calculated by subtracting the area under the baseline from the total area. In the case of a single peak this merely involves calculating the area of the trapezium formed by the peak start and end points (see Fig. 1). However, the calulation is more difficult for a multipeak group. An estimate must be made of the baseline position within the group and allowance made for a fluctuating baseline (see Fig. 2). Up to 80 individual peaks can be accommodated. 7. Peak identification. First, the array of peak identifiers is used to identify the norleucine peak; if no suitable peak can be found, the norleucine peak area is assumed to be 1.0 and an error message to this effect is output. Next, each of the peaks is tested in turn. Those whose area is <11~~.0 are ignored. The areas of the other peaks are then divided by the norleucine peak area. Each peak is identified, where possible, using the array of peak identifiers and the norleucine ratio is divided by the standard color factor to give the percentage of the amino acid in the sample. The value is then multiplied by the appropriate factor to give the residue per cent and per cent amino nitrogen. OUTPUT

For each peak the following details are printed out: peak number; peak start; peak end; area; ratio (i.e., ratio of peak area to norleucine peak

186

TAYLOR

AND

DAVIES

START

I

j

READ PREDATA

OUTPUT

AN

lNlTlALlSE .

ERROR

_ -4

MESSAGE

VARIABLES I

TRANSFORM

IN

ARRAY

LOGICAL

FLOW

DIAGRAM

2111

AMINO

ACID

COMPUTER

187

PROGRAM

TRANSFORM AND IN

21-21

STORE

IT

ARRAY

A

/DOES\ NO

B CONTAIN

NO z

(I!

=

2

-,

12’

YES

L t.

WORK

ON

r

ARRAY

A

COUNT

AN0

CORRECT

ANY

M ,SPUNCHES

OIFFERENCES AOJAC’YT

OF “AWES

0 F

Flow diagram

continued

188

TAYLOR

AND

DAVIES

VF SEARCH PEAK

FOR

A

START

TRY TO IDENTIFY ALL OTHER

PEAKS

I CALCULATE PARAMETERS DENTWED

Flow diagram

continued

OTHER FOR PEAKS

-

:

AMINO

ACID

COMPCTER

PROGRAM

189

area) ; peak name (where possible) ; per cent (of amino acid in original samples) ; residue per cent amino nitrogen. The information for the last three parameters is given only for those peaks which have been identified. No output appears if there are seven or less points in the array. In addition to the main output, certain “error” messages may appear. These are as follows: An asterisk over the “peak number” column indicates that the data readin terminated while a peak end search was still in progress. Normally this will be unimportant but could signify that the data storage arrays were filled before all the important peak data were read in. If the norleucine peak cannot be identified, a message to this effect is output over the result. DISCUSSION

The system described here offers a relatively cheap solution to the problem of calculation of results from large numbers of column chromatographic separation. First, the capital outlay on the data acquisition system for two channels is comparatively low. Second, the use of paper tape translation to an off-line computer for the processing of lo-12 chromatograms in one period reduces running costs. By sampling data at a higher rate the system could be adapted for very high speed liquid chromatography. Also, it could be applied to other column chromatographic separations such as the analysis of proteins, peptides, carbohydrates, and nucleotides. A typical computer printout for two chromatograms is shown in Fig. 3. Calculation of norleucine ratios is carried out for all the peaks detected on the chromatograms. If the peak is not one of the thirty compounds commonly encountered in the mixtures analyzed, the space in subsequent columns is left blank. Since all the stages necessary for the calculation of results are linked together in one program, the necessity for punching out peak areas separately for further calculation is eliminated. In addition, the interpretation of chromatograms is made easier for technicians. The variation of elution times of amino acids which occurs with large variations in concentration is reduced by running similar samples successively. In practice, identifier values are checked daily and, if necessary, up-dated for the particular batch of samples analyzed. This can be done either by visual inspection or by taking the values directly from a computer printout. In the case of protein analysis, correction for hydrolytic losses of amino acids has to be determined for each particular protein separately. Consequently, correction factors were not incorporated in the computer program.

190

TAYLOR

COLUMN MISPUNCHES

AND

DAVIES

1

1 PEAK NO*

1 2 3 4 5 6 7 R 9 10 11 12 13 14 15 16 17

18 19 20 21 22 23 24 25 26 27

PEAK START

402 438 464 503 646 680 737 746 005 888 904 942 944 981 1057 1072 1113 1155 1190 1224 1294 1336 1511 1554 1580 1608 1662

PEAK END

437 463 502 585 b79 736 742 785 087 903 914 943 980 1056 1071 1112 1154 1189 1223 1264 1335 1340 1553 1579 1607 1659 1731

AREA

RATIO

8.1641 4.7805 6 a0263 11 a9638 6.4958 11 r4462 0.1430 0.2437 4.6912 0.1729 011449 0.1272 2.8519 6 87254 0.3377 3.7700 6.4363 2a1791 2.4580 2.9899 10 2844 0.1468 2.4952 0.2933 6.6411 11 a7524 204264

PEAK NAME

3r7465 2.1937 2.7655 504902 219809 5.2526 010556 OalllH 2.1528 0.0793 0.0665 0.0583 le3087 3r0863 0.1549 1.7301 2e9536 1.0000 1.1280 la3721 0.5894 0.0673 1.1453 0.1345 360476 5.3932 la1134

PCT.

RESIDUE l?Cf*

ASP THR SER GLU GLY ALA

4.3615 2.3044 2.3436 6.5830 la9899 3.4924

3.7727 199564 le9428 5.7799 1.5123 2.7069

0.4579 0.2719 0.3117 0.6253 0.3721 0.5483

CYS VAL

Oe2025 2.0502

0.1874 1.7345

0.0237 0.2460

MET GNn2

1.6800 396567

1.4767 3a6567

0.1579 0.2852

ILE LEU NLE TYR PHE ABA

lrH116 3.1288 1.0000 la6183 lr7546 0.9662

1.5634 2.7002 1.0000 1.4581 1.5633 0.7971

0.1938 0.3347 1.0000 0.1246 0.1491 0.1314

HIS ORN LYS NH3 ARC

1.4661 0.1583 3.2805 5.9924 1.4966 m------B

1.2960 Oe 1368 2aH770 5.9924 1.3424 -m--m---

0.3973 0.0335 0.6296 4.9377 0.4819 -----mm-

51.3306

45.5339

:1.7145

TOTALS FIG.

3. Computer

printout

PC10 AMlNO N2

for two chromatograms.

SUMMARY

A computer program is described which enables amino acid tions to be calculated automatically and expressed in a variety Calculations have been performed on protein hydrolyzates extracts. The system is relatively simple and economical and applied to other column chromatographic analyses.

composiof units. and cell could be

ACKNOWLEDGMENT The authors wish to thank the Director permission to publish this paper.

of the Lord Rank Research Centre

for

AMINO

COLUMN MISPUNCHFS

ACID

COMPUTER

191

PROGRAM

2

-NONE-

PEAK NO.

1

2 3 4 5 6 7 8 9 10 11

12 13 14 15 16 17 18 19

20 21 22 23 24 25 26 27

PEAK START

359 371 397 439 465 504 649 680 807 948 1071 1114 1157 1192 1226 1275 1287 1296 1423 1497 1509 1553 1579 1606 1659 170A 1732

PEAK END

370 386 43e 464 503 577 679 738 893 1070

AREA

1605 1658 1707

0.1111 0.1240 7.7086 4 r4067 5.5424 10.8505 5.8420 1019615 4a2112 9r9116 3.6433 6r2576 2.4102 2a5601 3a2292 0.1892 0.1822 1.4696 0.1132 0.1651 2 t6525 0.4751 5.9944 14.6912 5 a8648

1711 1761

0 r2126

1113

1156 1191

1225 1274 1286 1295 1343 1437 1508 1552 1578

C.1405

RATIO

010460 010514 3rl982 lr8282

2a2995 4a5017

244237 4r5477 lr7471 4r1121 1.5115

215962 1.0000

lr0621 1.3397 010784 080756 016097 0.0469 0.0685 191005 0.1971 2.4870 6.0952 2.4332 010583 0.0882

PEAK NAME

PCT.

HYP MS02 ASP THR SER GLU GLY ALA VAL GNH2 ILE LEU NLE TYR PHE

la0975

0.9471

010444 3.7231 1.9204 la9487 5*3977 1.6180 3.0237 1.6639 4e8722 185628 2.7502 1.0000 1*5238 1.7132

0.0400 3.2205 1.6304 1.6155 4.7392 1.2296 2.4129

ABA

HIS ORN LYS NH3 ARG

PCT. AMINO Nt

o* 1174 0.0034 0.3909 Oe2266 0~2591 015127 0.3025

1.4077 4r8722

084747 011996 013800

la3659 213734

0.1693 0.2942

1.0000

1*0000

1.3730 1.5265

011173 0.1456

0.9995

0.8246

0.1359

lb4090 0.2319 2.6770 6.7724 3r2704

1.2456 0.2003

0*3818

-wwwwww TOTALS

RESIDUE PCTr

49r2410

2.3478 6a7724 2t9336

0.0491 O-5140 5a5805 1*0530

wwwwwwww

wwwwwwww

44ao791

12.3085

FIG. 3 (Continued) REFERENCES 1. GRAIIAM, G. N., AND SHELDRICK, B. (1965) Biochem. 1. 96, 517. 2. STARBUCK, W. C., MAURITZEN, C. M., MCCLIMANS, C., AND BUSCH, H. (1967) Anal. Biochem. 20, 439. 3. OZAWA, K., AND TANAEA, S. (1968) Anal. Biochem. 24, 270. 4. GERDING, J. J. T. (1969) Znt. J. Protein Res. 1, 169. 5. PORTER, W. L., AND TALLEY, E. A. (1964) Anal. Chem. 36, 1692. 6. YONDA, A., FILMER, D. L., PATE, H., ALONZO, N., AND HIRS, C. H. W. (1965) Anal. B&hem. 10, 53. 7. KRICHEVSKY, M. T., SCHWARTZ, J., AND MACE, M. (1964). Anal. Biochem. 12, 94. 8. CAVINS, J. F., AND FRIEDMAN, M. (1968) Cer. Chem. 45, 172. 9. ROBINS, A. J., EVANS, R. A., SIRIWARDENE, J. A. DE S., AND THOMAS, A. J. (1966) Biochem. J. 99, 46P. 10. Exss, R. E., HILL, H. D., AND SUMMER, G. K. (1969) J. Chromatogr. 42, 442.

192

TAYLOR

AND

DAVIES

11. THOMAS, A. J. (1970) in “Automation, Mechanization and Data Handling in Microbiology” (Baillie, A., and Gilbert, R. J., cds.), (The Society for Applied Bacteriology Tech. Ser. So. 4), p. 107. Academic Press, New York. 12. DAVIES, M. G., AND WATTS, D. (1971) Lab. Practice 20, 4, 324.