Compuf. Biol. Med Vol. 19. No. 6. pp. 453-459. Printed in Great Britain
OOIO-4SZSjS9 $3.00 + .I0 :C 1989 Pergamon Press plc
1989
AAQUANT: A COMPUTER PROGRAM FOR QUANTITATIVE AMINO ACID ANALYSIS OF PROTEINS AND PEPTIDES RICHARD P. MILLER* and ROBERT A. FARLEY Department
of Physiology
and Biophysics, University of Southern Los Angeles, CA 90033, U.S.A.
(Received 5 December 1988; in revised firm
California
School of Medicine,
14 March 1989; received for publication 11 April 1989)
Abstract-Quantitative amino acid analysis is an important tool used in the characterization and structural determination of peptides and proteins. A new computer program, AAQUANT, has been developed specifically to aid researchers in analyzing amino acid composition data. AAQUANT calculates amino acid recoveries, including 95% confidence intervals, following acid hydrolysis of peptides and proteins, and also includes useful routines to locate regions of a specified amino acid composition in known protein sequences, compute amino acid composition reports of known protein sequences, generate proteolytic digestion maps of proteins, and create and edit protein sequence data files. This report describes the AAQUANT routines, and demonstrates the use of the program. Amino acid analysis
Data analysis
Computer
programs
Protein
characterization
1. INTRODUCTION The amino acid composition and the linear sequence of amino acids in a protein molecule are the primary determinants of its physical characteristics and thus its biological function. Consequently, amino acid analysis is an essential analytical tool used in the characterization and structural elucidation of peptides and proteins. Amino acid analysis is usually performed by first hydrolyzing a protein sample in 6 N HCl for 24-70 h, derivatizing the amino acids to allow for sensitive detection, separating the derivatized hydrolysis mixture by HPLC (high-performance liquid chromatography) or ion exchange chromatography, and depending on the method of chemical derivatization, detecting the presence of each amino acid by its U.V. absorbance or fluorescence at characteristic elution times in the chromatogram. The amount of individual amino acids recovered from the hydrolysate is determined by measuring the U.V. or fluorescence signal of each amino acid and comparing this value to a standard curve of quantity vs peak area for the respective amino acid. Advances in separation methods and instrumentation now permit accurate and reliable quantitation in the low pmol range [1,2]. Amino acid analysis is often used to quantitate the amount of purity of a protein [3,4], to determine the stoichiometry of specific labeling of a protein, and to identify an unknown protein or peptide by its amino acid composition [5,6]. The results of amino acid analyses are routinely reported in the biochemical literature, although estimates of error in measurements are rarely included. This report describes a computer program, AAQUANT, specifically developed to assist the researcher performing quantitative amino acid analysis of proteins and peptides. AAQUANT includes routines to create and update standards and hydrolysis data files and performs analyses of amino acid recovery data including relative recovery ratios and estimates of error in measurements. In addition, AAQUANT can create and edit protein sequence data files, generate amino acid composition reports of known protein sequences, * To whom correspondence
should
be addressed 453
RICHARD P. MILLER and ROBERTA. FARLEV
454
search a known protein sequence for a region of a specified amino acid composition, and construct proteolytic digestion maps of protein sequences. AAQUANT should prove to be a useful computer program for protein chemists and other researchers performing amino acid analysis of proteins. 2. PROGRAM
DESCRIPTION
AND
SYSTEM
REQUIREMENTS
AAQUANT was written in Turbo Pascal (Borland International) for use on IBM PC compatible machines using DOS 2.1 or greater. AAQUANT requires approximately 250 kilobytes of RAM, and can support various graphics hardware including Hercules, IBM CGA, EGA or 3270, and AT&T 6300. Graphics hardware is required only to display and print standard curve plots. An Epson (RX, MX or FX series) or compatible printer is required to print graphics video screens. From the main menu of the program three pull-down menus may be selected, and individual routines are selected and executed from these menus. A brief description of the available routines follows. 2.1. Standards
data, hydrolysis data and data analysis
2.1.1. Standards data. Data to construct standard curves (amino acid quantity vs peak area) for 18 amino acids (those surviving acid hydrolysis) and one internal standard are entered and saved to a file. A linear regression of the data for each amino acid is performed, and if graphics hardware is present, a plot of amino acid quantity vs peak area may be displayed and printed. Standards files may be reviewed and updated as needed. 2.1.2. Hydrolysis data. Data from acid hydrolysis experiments (integrated peak areas for each amino acid) are entered and saved to a file. The data may be reviewed on screen or printed. 2.1.3. Data analysis. A value for recovery of each amino acid and the 95% confidence interval for this value is calculated by inverse prediction from linear regression analysis [7] of the standards and hydrolysis data. Additionally, relative recovery ratios including ratio error ranges for each amino acid can be determined by interactively selecting any amino acid as the reference. Results of this analysis are displayed and may be printed. 2.2. Protein sequence
jile editor
AAQUANT is supplied with a sequence/text editor, PROEDIT, in order to create and edit protein sequence files. Sequence files may contain an unlimited number of comment lines followed by the protein name and then the sequence of one letter amino acid abbreviations. The file is in ASCII format similar to the GenBank “tape files” [S] and the Molecular Biology Computer Research Resource (MBCRR) data files [9]. The format of the data files is simple to create and edit with PROEDIT. The comment section is edited with a small but powerful WordStar(R)-like text editor. The protein sequence is edited separately from the comments. The cursor is freely movable throughout the sequence, and the position of the cursor within the sequence and the sequence length are displayed and updated with each keystroke. Function keys control insertion, deletion, and file merge operations. 2.3. Protein sequence
analysis
Three useful routines for protein sequence analysis have been included in AAQUANT. 2.3.1. AACTR, The AACTR routine calculates the number of occurrences and percent of occurrences of each amino acid in a protein, and constructs an amino acid usage table for the protein. Relative ratios of each amino acid can be interactively determined. AACTR and will identify potential Nalso calculates the mol. wt of the protein sequence, glycosylation sites [lo]. Analysis of the protein may be constrained to a certain region if desired. Results of the analysis may be printed, including a matrix list of all amino acid relative ratios. 2.3.2. FZNDCOMP. The FINDCOMP routine searches a selected protein sequence for
AAQUANT:
a computer
program
for quantitative
amino
acid analysis
455
a user-specified amino acid composition. The principle of the algorithm is to search all windows of length = (sum of occurrences of all amino acids in the specified composition) along the length of the protein. A match score for each window is calculated based on the similarity between the specified and the protein window compositions, and a match is declared when this score is above a user-defined reject value. For example, if the results of amino acid analysis of a peptide from a known protein indicated the presence of 2 Gly, 1 His, 1 Thr, 1 Leu and 1 Lys, these data would be entered in the FINDCOMP routine and used to search all windows of 6 amino acids along the protein sequence. A score for each window is then calculated based on the number of occurrences of amino acids in the specified composition. A window of the protein consisting of the sequence Gly-Leu-HisThr-Thr-Lys would be assigned a score of 5 points, and would be declared a potential match if the user chose to display all matches of 80% or greater. This approach to identify a specified composition in a protein sequence is only approximate, and the probability of correctly identifying a matching composition depends on the accuracy of the hydrolysate composition determination, and increases with the number of amino acids in the specified composition. Prior to the search, amino acids Asp and Asn are converted to Asx, and Gln and Glu are converted to Glx, since acid hydrolysis does not distinguish amidation states. Identified match regions may be displayed and printed. 2.3.3. PDZGES7’. Specific proteolytic sites of a protein may be determined and a map of these sites generated with the PDIGEST routine. A data file of proteolytic enzymes and chemical reagents is supplied with AAQUANT, and this file may be updated or edited to create a personal file of reagents. The user may select up to three enzymes for use in generating one proteolytic digest map of a protein. The resulting map is displayed on the screen and may be printed or saved as an ASCII disk file for later incorporation into a document. Although all AAQUANT routines are simple to use and are self explanatory, help screens are provided with most routines to facilitate use of the program. 3. PROGRAM To demonstrate
the use of AAQUANT,
APPLICATION the following
experiment
was performed.
3.1. Methods The beta subunit polypeptide of the enzyme (Na+ + K+)-ATPase was isolated, purified and digested with TPCK-treated trypsin as previously described [ll]. Approximately 1OOpg of the trypsinized material was loaded onto a Cl8 HPLC column and the beta subunit peptides were separated as previously described [12]. Peptides were identified by U.V. absorbance at 214nm and 30 s fractions along the chromatogram were collected. One well-separated peak (designated PlO) eluting approximately 60min after the start of the solvent gradient was selected for analysis in this experiment. The peptide, with 2.0nmol alpha-2-aminobutyric acid included as an internal standard [3], was hydrolyzed in 200~1 of 6N HCl at 110°C for 24 h in an evacuated hydrolysis tube (Pierce Chemical). The hydrolysate was dried in a Speedvac centrifuge (Savant Instruments) for 2 h under vacuum. The amino acids in the hydrolysate were reacted with phenylisothiocyanate (Pierce Chemical) to convert the amino acids to their phenythiocarbamyl (PTC) derivatives using the method described in [a]. After a 15 min reaction, the sample was dried for 2 h in a SpeedVac centrifuge, dissolved in 30 ~1 of 50% ethanol, and dried again for 30 min. The sample was then dissolved in 100~1 of Buffer A (described in [2]). The final concentration of alpha-2 aminobutyric acid in this sample was approximately 20pmol//d. HPLC separation of the PTC-amino acids in the sample was carried out on a Instruments) according to the method 0.45mm x 1Ocm Cl8 ODS-2 column (Thomson developed by Ebert [2]. Twenty per cent of the sample (20~1, approximately 400pmol of internal standard) was injected onto the column. The U.V. absorbance at 254nm was measured using a Waters Associates U.V. detector and peak areas were integrated on a Shimadzu integrator. Amino acid recoveries were calculated using AAQUANT.
456
RICHARD P. MILLER and ROBERT A. FARLEY
4. RESULTS 4.1. Quantitation The quantitated recoveries of amino acids following acid hydrolysis of peptide PlO is presented in Fig. 1. Recovery of the internal standard was 100% (406.5 f 17.8pmol recovered, estimated 400pmol injected onto HPLC), indicating that none of the sample was lost due to handling. Based on the recovery of amino acids in 20% of the hydrolysis sample, it can be estimated that approximately 600-750pmols of peptide PlO were recovered from the HPLC separation and were subjected to acid hydrolysis. 4.2. Location
qf peptide PI0 in the beta subunit sequence
Because the amino acid recovery from acid hydrolysis is not lOO%, the composition of a peptide determined after amino acid analysis will not usually match exactly with the composition of a peptide deduced from the protein sequence. The FINDCOMP routine searches the sequence for the location of the peptide that most closely matches the experimental data. The amino acid composition of peptide PlO was used in the FINDCOMP routine to search for a region of similar composition in the (Naf + K+)-ATPase beta subunit. Selecting a minimum match value of 75% resulted in the identification of three matches, all corresponding to the region of the protein sequence from residue 220 to 242 (Fig. 2). A trypsin digestion map of the beta subunit was created with the PDIGEST routine (Fig. 3), and from this map a tryptic peptide with an amino acid composition similar to peptide PlO was identified at residues 223-247. Based on the similarity of DATA ANALYSIS A A Q U A N T (V.l.0) Amino Acid Composition Analysis -__--------------============E===================== SAMPLE SAMPLE Number
FILE NAME: DESCRIPTION: of repeated
STANDARDS STANDARDS Ratio
Ala Arg Asx Cmc cys Glx Gly His Ile Leu Lys Met Phe Pro Ser Thr Tyr Val ISTD
on:
Amount -___-A R B c 2 G H I L K M F P S T Y V
B:102088.STD 10/20/88 62.5
FILE NAME: DESCRIPTION:
based
AA ----_
B:PEPTIDElO.DAT beta subunit tryptic measures: 1
0.0 0.0
120.0 0.0 0.0 232.9
457.6 0.0
218.1 191.2 170.4 0.0 156.4 222.7 0.0 0.0 141.7 142.9 406.5
to
peptide
750
pm01
FREQUENCY
+/-
95% CI -___----__
Ratio -___-
0.0
0.0 0.0
0.0
21.7 0.0 0.0 17.1 5.6 0.0 24.3 9.2 10.4 0.0 24.7 14.3 0.0 0.0 18.7 29.7 17.8
1.0 0.0 0.0 1.9 3.8 0.0 1.8 1.6 1.4 0.0 1.3 1.9
95% CI --__--_-__ (0.0,
(0.0, (1.0, (0.0, (0.0, (1.8, (3.3, (0.0, (1.7, (1.4, (1.3, (0.0, (1.3, 11.7.
0.0) 0.0) 1.0) 0.0) 0.0) 2.2) 4.6) 0.0) 2.0) 1.9) 1.6) 0.0) 1.3) 2.11
0.0
io.0;
o.oj
0.0
(0.0,
0.0)
1.2 1.2
(1.1, (1.2,
1.3) 1.2)
3.4
(3.0,
4.0)
Fig. 1. Results of amino acid composition analysis of peptide PlO. Printout of Hydrolysis Data Analysis Routine. Amino acid recoveries, in pmols ( f 95% confidence) were calculated by inverse prediction from linear regression analysis of hydrolysis and standards data. Ratios of recovered amino acids were calculated by selecting Asx recovery as the reference. Numbers in parentheses are the low and high ranges for ratios based on the 95% confidence interval for each recovery. Because proteins are oftencarboxymethylated prior to hydrolysis in order to prevent loss of Cys, AAQUANT includes carboxymethylcysteine (cmc) in the list of recoverable amino acids. (ISTD, internal standard.)
AAQUANT:
a computerprogramforquantitative amino acidanalysis
457
PROTEIN SEQUENCE ANALYSIS AAQUANT (v.1.0) FINDCOMP - Search for Composition Similarity ================================================
FILE NAME: ENTRY:
C:\DATA\DOGATPBl.PRO DOGATPBl
Test Composition: Ala Arg Asx CYS Glx Gly His Ile Leu Reject: Matches 1 220 2 225 3 226
0 0 1 0 2 4 0 2 2
Lys Met Phe Pro Ser Thr Trp Tyr Val 75.0% Found: - 236 - 241 - 242
1 0 1 2 0 0 0 1 1
76.5% 82.4% 76.5%
Number of Matches:
3
Fig. 2. Results of the FINDCOMP analysis. The amino acid composition of peptide PlO (Fig. 1, integer values used)was used to search thebetasubunit aminoacidsequencefora region ofsimilar
composition, as described in thetext. A reject valueof 75% minimum match was selected, and three matchescorresponding to theresidue 220-242region of the protein wereidentified. A A Q U A N T (v.1.0) PROTEIN SEQUENCE ANALYSIS PDIGEST -- Proteolytic Digestion Map ================ ====================================------FILE NAME: B:DOGATPBl.PRO ENTRY: DOGATPBl Protein Length: 302 amino acids Proteolytic Trypsin
Enzymes selected:
Trypsin
30 10 20 1 A R*G K*A K*E E G S W K*K*F I W N S E K*K*E F L G R*T G G S 40 31 WFK*ILLFYVIFYGCLAGIFIGTIQVMLLTI
50
60
Trypsin
70 61 SEFK*PTYQDR*VAPPGLTQIPQIQK*TEISFR*
80
90
Trypsin Trypsin
100 110 91 P N D P K*S Y E E Y V R*N I V R*F L E K*Y K*D
120 S A Q K*D E M
Trypsin
150 130 140 121 I F E D C G N M P S E I K*E R*G E F N N E R*G E R*K*V C R*F
Trypsin
151 160 K*LEWLGNCSGINDETYGYR*DGK*PCVLIK*LN
Trypsin
200 181 190 R*V L G F K*P K*P P K*N E S L E A Y P V M K*Y 211 220 QCTGK*R*DEDK*DR*IGNVEYFGLGGYPGFPLQ
230
240
Trypsin
250 241 YYPYYGK*LLQPK*YLQPLLAVQFTNLTMDTE
260
270
Trypsin Trypsin
290 300 271 280 I R*I E C K*A Y G E N I G Y S E K*D R*F Q G R*F D V K*I E V
Trypsin
301 K*S
170
180 210 S P Y V L P V
Fig. 3. Trypsin digest map of the (Na’ + K’)-ATPase beta subunit. The map was created by the PDIGEST routine of AAQUANT. The underlined region (added for emphasis) denotes residues 2233247, the tryptic peptide corresponding to peptide PlO. Asterisks mark trypsin cleavage sites.
RICHARD P. MILLER and ROBERT A.FARL.EY
458 compositions,
peptide
247 in the beta subunit
PlO corresponds sequence.
to the potential
tryptic
peptide
at residues
223-
5. DISCUSSION The peptide PlO obtained in this experiment from a trypsin digest of the (Na’ + K+)ATPase beta subunit was subjected to amino acid composition analysis and was identified in the beta subunit at residues 223-247 by a comparison of its amino acid composition and the amino acid sequence of the beta subunit. In a previous experiment, a beta subunit tryptic peptide eluting at the same HPLC position as peptide PlO was isolated and analyzed for amino acid sequence by gas-phase sequencing, and was found to correspond to residues 223-247 of the protein [ll]. This peptide and peptide PlO are probably the same. Figure 4 shows the amino acid composition ratios of residues 223.-247 of the beta
A A Q U A N T
(v.1.0) PROTEIN SEQUENCE ANALYSIS AACTR - Amino Acid Usage Table =============================================~==== FILE NAME: ENTRY:
C:\DATA\DOGATPBl.PRO DOGATPBl
Protein Length: Molecular weight: CONSTRAIN: YES Start at: End at: Amino acids: Molecular weight:
302 amino acids 35112.1 223 247 25 2863.5
Acidic: 1 Basic: 1 Hydropathy (KD): -0.30 Potential N-glycosylation:
Neutral: 23 NONE
Ratios based on: FREQUENCY Number AA ---_- ---_--
% --_
Ala Arg Asn Asp ASX cys Gln Glu Glx Gly His Ile Leu Lys Met Phe Pro Ser Thr Trv
A R N D B c Q E Z G H I L K M F P S T W
0.0 0.0
Tyk
Y
Val V
0 0 1 0 1 0 1 1 2 6 0 1 2 1 0 2 3 0 0 0 6 1
4.0 0.0 4.0 0.0 4.0 4.0 8.0 24.0 0.0 4.0 8.0 4.0 0.0 8.0 12.0 0.0 0.0 0.0 24.0 4.0
Ratio -_--0.00 0.00 1.00 0.00 1.00 0.00 1.00 1.00 2.00 6.00 0.00 1.00 2.00 1.00 0.00 2.00 3.00 0.00 0.00 0.00 6.00 1.00
Fig.4. Amino acidcomposition of a selected region of the (Na+ + K+)-ATPase beta subunit. Printout of AACTR routine. The beta subunit amino acid sequence was constrained to residues 223-247 prior to analysis by the AACTR routine. AACTR determines the number of occurrences and per cent of occurrences of each amino acid, and calculates a ratio of occurrences based on a user selected reference amino acid. Values for Asx and Glx occurrence are the sum of occurrences of Asp and Asn, and Gln and Glu, respectively. AACTR also determines the mol. wt of the protein and the constrained region, identifies potential N-glycosylation sites. and calculates the average hydropathy based on the Kyte and Doolittle 1133 hydropathy scale.
AAQUANT:
a computer
program
for quantitative
amino
acid analysis
459
subunit. The composition ratios of peptide PlO (Fig. 1) agree reasonably well with the actual ratios of this region except for Tyr which is much lower than expected in peptide PlO. The low recovery of Tyr from acid hydrolysis of proteins has been noted previously, and may be due to destruction during the hydrolysis reaction [4]. In the experiment described above, higher Tyr recovery would have resulted in a higher per cent match between the peptide PlO composition and residues 223-247 of the beta subunit as determined by the FINDCOMP routine. Low Tyr recovery in this experiment, however, did not prevent the FINDCOMP routine from identifying one region with an amino acid composition highly similar (76~82%) to the peptide PlO composition data. The results of the FINDCOMP search and the unequivocal identification of Lys in peptide PlO indicate that this peptide was derived by trypsin cleavage at 222Arg and 247Lys. 6. PROGRAM Copies
of AAQUANT
may be obtained
AVAILABILITY by contacting
the author
(R.P.M.).
7. SUMMARY AAQUANT, a computer program specifically developed for the researcher performing amino acid analysis of proteins and peptides, is described. Quantitation of amino acid recoveries following acid hydrolysis of a purified peptide and identification of the peptide in a protein sequence based on composition similarity using AAQUANT is demonstrated. AAQUANT should prove to be a useful program for amino acid composition data analysis. Acknowledgements-This work was supported
in part by grant GM28673 from the National Institutes of Health, and grant DMB8613999 from the National Science Foundation. R.A.F. is an Established Investigator of the American Heart Association, and R.P.M. was supported, in part, by a University of Southern California PreDoctoral Merit Fellowship. The authors wish to thank David J. Miller for his excellent advice and assistance in computer programming.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
R. L. Heinrikson and S. C. Merideth, Analyt. Biochem. 136, 65-74 (1984). R. F. Ebert. Analyt. Biochem. 154, 432-435 (1986). J. Riordan and R. W. Geise, Meth. Enzym. XLVIIE, 31-40 (1977). S. Moore and W. H. Stein. Meth. Enzrm. VI, 819-825 (1963). 0. Lockridge, S. Adkins and B. N. La-Du. J. biol. Chem. 26i, 12945-12952 (1987). K. MacPhee-Quigley, T. S. Vedvick. P. Taylor and S. S. Taylor, J. biol. Chem. 261, 12565-12570 (1986). J. H. Zar, Biostatisrical Analysis, 2nd Edn, Prentice-Hall, NJ (1984). H. S. Bilofski and C. Burks. Nucleic Acids Res. 16, 1861-1864 (1988). T. F. Smith. K. Grushkin, S. Tolman and D. Faulkner, Nucleic Acids Rex 14, 25-30 (1986). H. Schacter and S. Roseman. The Biochemisfry of Proteins and Proteoglycans. W. Lennarz, Ed. Plenum Press, New York (1980). 11. T. A. Brown, B. Horowitz, R. P. Miller. A. A. McDonough and R. A. Farley. Biochim. Biophys. Acta 912, 224-253 (1987). 12. R. P. Miller and R. A. Farley, Biochim. Biophys. Acta 954, 50-57 (1988). 13. J. Kyte and R. F. Doolittle. J. m&c. Bid. 127, 105-132 (1982). About the Author~~RrCHARD P. MILLER received the B.S. degree in Biological Science from California State University, Hayward in 1982, and the Ph.D. degree in Physiology and Biophysics from the University of Southern California in 1989. He is currently a postdoctoral fellow in the Department of Physiology and Biophysics at the University of Southern California. His research interests include the structure of Na,K-ATPase and other membrane proteins, and the analysis of protein sequence data. About the Author---ROBERT A. FARLEY received the Ph.D. degree in Biophysics from the University of Rochester in 1975. After postdoctoral work at Yale University and Harvard University he joined the faculty of the Department of Physiology and Biophysics at the University of Southern California, where he is currently an Associate Professor. Dr Farley’s research interests include the investigation of the structures of membrane transport proteins. and the analysis of mechanisms of ion transport in animal cells.