COMPUTER PROGRAMS IN BIOMEDICINE 3 (1973) 191 - 198. NORTH-HOLLAND PUBLISHING COMPANY
COMPUTER APPROACHES TO PROTEIN STRUCTURE. VI. A MULTIPLE OPTION MODEL BUILDING PROGRAM * Andrew M. TOMETSKO Department of Biochemistry, University of Rochester Medical Center, Rochester, New York 14642, USA
Conformational analysis of proteins is complicated by the presence of thousands of atoms which often are involved in multiple interactions. As a result, complete molecular models of proteins contain a relatively high information density. Computer graphics provides a means of controlling the flow of information by permitting the construction of large numbers of models containing different amounts of structural detail. The multiple option FORTRAN IV program described in this report permits the observer to focus on specific interactions and/or regions of a protein model with all or only specified amino acid side chains present on the backbone. The models can be readily translated, rotated, and adjusted in size, and constructed with specified visibility limits. Minor adjustments in the basic program provides complete families of stereoscopic pairs for three dimensional analysis of the models. Peptides
Protein structure
1. Introduction
Graphic display
Molecular models play an important role in conformational analysis of macromolecules. Conventional spacefilling and framework models have found extensive use for representing the structures of polypeptides and proteins. As new crystallographic data accumulate for additional protein molecules, model building techniques which facilitate the conversion of the x, y and z coordinates of the atoms into the corresponding molecular models becomes increasingly important. Consequently, methods have been developed in this laboratory [ 1 - 3 ] and elsewhere [ 4 - 6 ] to provide framework type models of protein structures through computer graphics. In addition to significantly reducing the construction time, the graphic method facilitates control of structural detail. The latter consideration is particularly important since the complete model o f even relatively small proteins (re-
sulting from a rigorous crystallographic analysis) will contain thousands of bonds (vectors). Since interest at any time usually centers on a relatively small segment of the total structure (e.g., the peptide backbone, the active site, the location of particular amino acids, etc.), the superfluous detail that is provided in the complete model is often distracting and, at times, interfers with the analysis of the structure. The problem o f constructing models with high information density (e.g. 1000 atoms) is circumvented to a great extent by generating a large number of graphic models (in minutes) with each highlighting different structural features. The overlapping vectors which are characteristic of two dimensional graphic displays can be readily analyzed by constructing stereoscopic pairs or by animation o f the models. * Thus, computer graphics provides a useful complement of and/or alternative to physical framework models.
* This Research was supported by the Public Health Service (GRSG) and by the Atomic Energy Commission (B.N.L.).
* The stereoscopic models were viewed with a Three Dimensional Viewer obtained from Stereo-magniscope, Inc., 40, 81 St., Elmhurst, New York, New York, USA.
192
A.M. TOMETSKO
2. Program description The described multiple option program was designed to generate graphic framework models of proteins. The program provides capabilities for controlling model size and position (relative to the origin), capabilities for the construction of all or specified amino acid side chains, capabilities for isolating segments of a protein (e.g. an active site), capabilties for rotating the model in the X - Y or Y-Z planes to provide different views, capabilities for defining visibility limits in the model, and capabilities for automatically generating stereoscopic pairs of each structure. 2.1. Method o f model construction
Framework molecular models (e.g. Brumlik or Dreiding type) emphasize the relative positions of nuclei in the molecular structure. The rods connecting two nuclei represent covalent bonds and are essentially vectors in three dimensional space. Computer graphics [ 1 , 4 - 6 ] provides an alternative means for constructing bond vectors. Since the computer is programmed to scan a data set and construct the appropriate vectors, a single program is generally applicable for the construction of many protein models from the atomic coordinates obtained by X-ray diffraction analysis or computer simulation.
0 (off) to 32, depending upon the Y coordinate. The statement IZ=2 (i.e. pen down) in the program were changed to read IZ=WO-AAY and IZ=WO-AZY, where AAY and AZY are the Y coordinate values of each atom during rotation of the model in the X - Y and Y - Z directions, respectively. The upper limit of beam intensity, WO, will usually depend upon the characteristics of the particular CRT system.
2.3. The data set
The data set for a protein consists of a card for each atom, and the cards are read from the N-terminal to the C-terminal. Each card contains the sequence position, the amino acid letter abbreviations (i.e., GLY, ALA, VAL, etc.), the atom symbol (i.e., N, CA, CB, etc.) and the x, y, and z coordinates. The atoms of each amino acid are read into the computer in the following order, N, CA, CB (SIDE CHAIN), C, O. The atoms of amino acids containing ring systems are read in the order described previously [3]. For amino acids containing branch atoms (i.e. LEU, ILE, VAL, GLU, GLN, ASP, ASN, THR and ARG) the atom containing the designation "1" is read in preceding the atom containing the designation "2". For example, with GLU, OE1 precedes OE2, with LEU, CD1 precedes CD2, with ILE, CG1 precedes CG2, etc.
2.2. Hardware and software requirements
3. The flowchart The computer program was written in the FORTRAN IV language and was processed by an IBM 360-65 computer equipped with a CALCOMP 560R plotting system. The program is readily adjusted for use with other plotting systems by changing the line drawing statements. With the Calcomp 560R system, lines are drawn using a "CALL PLOT" statement. When a CDC 6600 computer (Brookhaven National Laboratory) equipped with a Calcomp 835 plotting system (CRT output) was used, the program was adjusted to provide a modified FORTRAN IV Language (7), and the line drawing statement were changed from "CALL PLOT" to "CALL CALCMP". Using the 835 system, a further adjustment in the program was introduced to take advantage of the variation in line intensity, which could be varied from
The flow diagrams in figs. 1 - 4 indicate consecutive sections of the program.
4. A description of program options The computer program is designed to facilitate the structural analysis of proteins based on the x, y and z coordinates of the atoms. As structure-function questions arise from model building or analysis of the local environment of the atoms, the options in the program provide the means for focusing on specific interactions and regions of the model. It is advisable to initiate the study of a new protein by generating models of the peptide backbone, in order to gain
MULTIPLE OPTION MODEL BUILDING PROGRAM
193
YES
DO220 J:I,JB
Q~
READAMINOACIDS/
INTOAMINOARRAy
CALCULATETHECHANGI._~
IN THEREFERENCEPOINT FOLLOWINGROTATION
X
READATOMSYMBOLS/
Q
NOIDENTIFICATION/
~
YES
ADJUSTMODELSIZE
COMPUTECOORDINATES
I
[ YES
I
NO
COMPUTECOORDINATES OFTHEREFERENCEPOINT
Fig, 1 - 4 . Flow diagrams for consecutive sections of the program are shown.
()
Fig. 2.
v
94
A.M. TOMETSKO
I
SPEC,~DIRECTIONOFROTA~ION I
I
CALCULATE ANDNORMALIZE COORDINATEPOSITIONS
I
COMPART EHEYCOORDINATEVALUEl WITHTHEVISIBILITYLIMITS
I
EVALUATE~HE S,,E ~ARAMETER IRA,I
I
YES
1
0
YES
EVALUATENEXTATOMIN THE SEQUENCE
I ~ EXECUTETHE SPECIFIEDOPERATIONI
I
I YES
I
RETURNTO i~ I SX SY
I B~M OFF
YES
STORE COORDINATESI ~'JIOFATli) AT mY"I UX,UY F
Fig. 3.
MULTIPLE OPTION MODEL BUILDING PROGRAM
I
ADVANCE TONEXTFRAME
195
I
1
EVALUATEOPTIONSPARAMETER,(BT)I
GENERATE STEREOSCOPIC PAIR
>
@k@~ <
I RE5
I AR I
INCREMENTSIZEPARAMETER
I
~ r
(,) Fig. 4.
I
I ROTATE MODELI AR=AR+ABC I
196
A.M. TOMETSKO
insight into the relative positions of the corresponding atoms and chain segments, and the general meandering of the peptide backbone in space. Subsequently, the backbone will serve as an internal reference as amino acid side chains are added, the model is rotated, and parts of the model are phased out.
value of ABC (i.e., AR=AR+ABC). Thus, if ABC=-O.174, each model will rotate 10 ° with respect to the preceding structure. Single images can be generated by setting JB=I and specifying the desired angle of rotation, AR.
4.3. Specifying amino acid side chains 4.1. Model size Size variations are provided by specifying appropriate values of JB, BT, RA, and ROZ, where JB indicates the number of models that will be constructed, B T is an operational parameter which instructs the computer to vary the model size, RA is the initial size factor, and ROZ is the change in model size in each subsequent frame. During the execution of the program with these modifications, the statement, BT=1.00, determines that the size of the models will be varied. As a result, the coordinates initially read into the computer will be decreased (or enlarged) by the value ofRA. For example, ifRA=0.1, the first model contructed will be one-tenth the size of the original coordinates. In each subsequent model the value of RA would be increased by ROZ (i.e. RA = RA+ROZ), with the size of the final model being determined by the value of JB. Once the appropriate working size model is determined (i.e. a model in frame j is suitable) the value of G should be set equal to RAj and the value of RA in the program should then be reset (RA=1.00). All models will automatically be generated with the specified size factor.
4.2. Rotation of the models Often it is of interest to view a molecular model of a protein from different vantage points. The computer program is designed to rotate the model in the X - Y or Y - Z planes about an internal reference point (X=Y=Z=5.00). In order to initiate the rotation sequence, values of the following variables in the program must be adjusted: BT, AR, ABC and PL, where BT is the general option parameter, A R is the angle of rotation, ABC is the change in the angle of rotation in each subsequent model, and PL indicates the plane of rotation. IfAR=0.00, the model will be constructed in the original orientation. During a complete rotation of the molecule through 360 ° , the angle of rotation, AR (in radians) is increased by the specified
The number of different side chains on the backbone will be indicated by the value of lB. IflB=0, all amino acids will be constructed. If five amino acid side chains are to be constructed on the peptide backbone, abbreviated names of the amino acids must be provided as data, and the value of IB should be adjusted accordingly (i.e. IB=5). The amino acid data cards should precede all other data, and the names of the amino acids are read into the AMINO array. During the execution of the program, the computer evaluates each amino acid with respect to the AMINO array and determines if the side chain should be constructed. In the case where only the backbone atoms, N, CA, C and 0 are required, IB=I should be used in conjunction with a single blank data card.
4.4. Setting visibility limits In many cases it is of interest to study specific parts of a model, such as the active site region of an enzyme. The program provides options to study the complete model, or to "phase out" sections. The values of TO and TP set the near and distant visibility limits respectively. Thus, vectors which contain a Y coordinate value greater than TP and less than TO would not be plotted. If not in use the TO value should be set to 0 and the TP value should exceed the largest Y coordinate value. Increase in the value of TO would essentially place the camera inside the model, whereas a decrease in TP might be employed to isolate the section of a model (e.g., the active site of an enzyme) that is immediately before the camera. Note also that rotation of the model with visibility options in effect would generate models in which invisible sections of the model would come into view while some currently visible vectors would fade out.
4.5. Generating three dimensional models Since three dimensional displays of protein models
197
Fig. 5. Stereoscopic models of the A chain of insulin. Four amino acids, HIS, PHE, T Y R and PRO were specified as data, and resulted in the construction of the two tyrosine side chains on the backbone.*
Fig. 6. Stereoscopic models o f the B-chain of insulin are constructed. Four a m i n o acids, HIS, PHE, T Y R and PRO were specified and are constructed on the backbone.*
Fig. 7. Stereoscopic models o f insulin are constructed. Five amino acids, HIS, PHE, TYR, PRO and CYS were specified and are constructed on the backbone.*
198
A.M. TOMETSKO
are far superior to two dimensional structures, the computer program can be adjusted to provide stereoscopic pairs of each model. The modification involves redefining the value of BT (i.e. BT=2.00). In executing the program with this specification, the computer generates the initial model, increments the angle of rotation, AR (i.e. AR=AR+ABC), and constructs the new image on the subsequent frame. The resulting stereoscopic images could be viewed directly, or could be transfered to a film (double exposure using colored filters) [3]. The use of three dimensional models are particularly important in structure-function studies where the relative positions of the atoms in space is ofteritical for the observed biological or catalytic function. Alternatively, the method of Graver et al. [8] might be employed to provide animated three dimensional models.
capable of generating models containing in excess of 2000 atoms (vectors). It is advisable to use the options available through the program to simplify the model when the number of atoms prevents facile analysis of the structure.
6. Availability of the program A complete listing of the program, and a list of variable names, and their corresponding ranges and description will be provided by the author upon request.
Acknowledgement I would like to thank Dr. D.C. Hodgkin for supplying the X-ray coordinates of insulin.
5. Sample run X-Ray crystallographic analysis has yielded a detailed model of the insulin molecule [ 9 - 1 1 ] , and the described program was run using the insulin coordinates [12] as data. Using the stereoscopic view option, models of the A chain, B chain and insulin were generated as shown in figs. 5 - 7 , respectively. The vectors protruding from the backbone represent the carbonyl oxygen bonds. The computer was instructed to construct four amino acids, HIS, PRO, TYR, and PHE, on the A and B chain backbones. In addition to these amino acids, the insulin model (fig. 7) contains the disulfide bonds, where CYS was also specified as a data card for the AMINO array. The models of the A and B chains are readily compared with the insulin structure, and indicate the usefulness of controlling the information content and complexity of each model. Analysis of the individual chains prior to studying the insulin model facilitates the delineation of the corresponding structural features. Plotting time varies from one second per image using a CRT output to one minute per image with a paper plotter, and will depend upon the number and size of the vectors in the structure. The program is
References [1] A.M. Tometsko, Computers Biomed. Res. 3 (1970) 690. [2] A.M. Tometsko, Computers Biomed. Res. 4 (1971) 407. [3] A.M. Tometsko, Computers Biomed. Res. 5 (1972) 150. [4] C. Levinthal, Sci. Am. 214 (1966) 42. [5] E.F. Meyers, Jr., Nature 232 (1971) 255. [6] P.K. Warme, R.W. Tuttle and H.A. Scheraga, Computer Prog. Biomed. 2 (1972) 248. [7] K. Fuchell and S. Heller, Fortran for Brookhaven's CDC 6600 Computer (BNL, 1967). [8] N. Graver, W.H. de Veer and A.M. Tometsko, J. Biol. Photogr. Assoc. 40 (1972) 171. [9] M.J. Adams, T.L. Blundell, E.J. Dodson, G.G. Dodson, M. Vijayan, E.N. Baker, M.M. Harding, D.C. Hodgkin, B. Rimmer and S. Shear, Nature 224 (1969) 491. [10] T.L. BlundeU, G.G. Dodson, E.J. Dodson, D.C. Hodgkin and M. Vijayan, Recent Prog. Hormone Res. 27 (1971) 1. [11] T.L. Blundel, J.F. Cutfield, S.M. Cutfield, E.G. Dodson, G.G. Dodson, D.C. Hodgkin, D.A. Mercola and M. Vijayan, Nature 231 (1971) 505. [ 12] D.C. Hodgkin, private communication of X-ray coordinates of insulin.