COMPUTER CORNER
TIBS 20 - NGVEMBER1995 t~
Dali: a network tool for protein structure comparison Increasingly, ancient evolutionary relationships, which are no longer evident from staring at sequences alone, are being revealed by structure-structure comparisons ~. When successful, structure comparison searches in databases can lead to a ,~.onsiderable information gain through precise prediction of protein function (e.g. identification of the active site in barley endochitinase via reference to iysozymes 7) and unification of several protein families into a functional superfamily (e.g. DNA polymerase I~ with kanamycin nucleotidyltransferase and othersS). The recent surge in newly solved threedimensional protein structures has whetted the appetite for systematic structure-structure comparison with the potential for fascinating evolutionary discoveries. To make tools for such discoveries generally available, the European Molecular Biology Laboratory is now providing Interact access to the Dull method for protein structure comparison 9. The services include a database of precalculated structural ne~ghbours for all public
If you are studying cytochrome c oxidase in Paracoccus because it is more convenient than doing experiments with the homologous enzyme from human mitochondrla, you are exploiting the conservation of the basic biochemical machinery in living organisms through billions of years of evolution. Such evolutionary connections are evident in the results of sequence database searches using programs such as BLASTl or FASTA~. On one hand, the power of these database searches derives from a simple rule of thumb: sequence identity above 25% (higher for shorter sequences) implies a conserved three-dimensional fold and, very likely, conserved biological function~. However, the reverse is not true: protein families diverge with few specilic constraints on the amino acid sequence 4. For example, comparison of ancient branches of globins (myoglobln, haemoglobin, leghaemoglobln) reveals that only one position In the chain requires an invarlant amino add, yet the same haem.blndlng function and a conserved three.dimensional fold are retained ,~. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
~=
~ ......
l~bte l, Digest of new protein stru©tures IIn ~99~ PDB code Protein
Fold class or protein family
laorA
Al(lehy(leferr~oxln oxldoreductase
lash lasyA
Asearlshemoglobin domain I
lbvpl
Bluetonguevirus
lclc
EndoglucanaseCelD
lctl
UM domain
lctt lexg
Cytldine deamlnase Cellulose-bindingdomain of ex~l,4-~anase Ferredoxlnreductase
Uniquefold except for I~.graspmotif in one domain Belongs to globin f.~mlly Belongs to class I of tRNA synthetases (e.g., seryl tRNA synthetase) Jellyroll domain at amino terminus, car'ooxyterminal domain is a helical bundle Carboxy-terminal domain similar to glucoamylase and ~endotoxln; amino-terminaldomain Immunoglobulin-llke Contains ZInQ-flngerslmila" *o GATA-:I.transcription factor Unique fold Jellyroll fold
lfnc 1Irk lknb ImseC ln~ lplq lpyiA lqorA 1ray lsso lwapA lwlbA
478
AspartatetRNA synthetase
Insulin receptor Knob domain of adenovirus type 5 fiber protein c~AybDNA-bindingdomain LuxFprotein Proliferating-cellnuclear antigen Pyrtamidinepathway re.later 1 Quin~ oxldoreductase Synaptotagmin Htstone4tkethermophilic protein Trp RNA-bindingattenuation protein Antifreezeprotein
Belongs to a family with phthalate dioxygenase reductase and cytochrome 1~5reductase Belongs to tyrosine kinase family Similar to exo-l,4-1~lycanase Tandem homendomalns Partial TIM barrel fold Belongs with DNA polymerase ill j3 subunit Unique fold Belongs to long-chainalcohol dehydrogenases Similar to fibronectin type III domains Similar to intedeukin-8 ~sandwich, contained, for example, in phaseolin A single long helix
structures and an e-mail server for searches with newly solved ones. C~pa~ng dngle, new structures
At the last stages of solving a new protein structure, crystallographers and nuclear magnetic resonance ~MR) spectroscopists are keen to know if their structure presents a unique protein fold or if it has an unexpected structural similarity to a known protein fold. To answer these questions, the Dali server performs a database search with a new structure against all struc° tures in the Protein Data Bank. Coordinates of a new structure can be sent using either World Wkle Web (WWW) software or electronic marl to
[email protected]. A list of all structural neighbours of the query structure in protein fold space and an optimal structural alignment with each neighbour is returned. The searches are fully automated, se an important technical issue is how to decide by objective, quantitative criteria which structural similarities to report. The Dali method measures structural relatedness in terms of similarities of intramolecular distance matrices, that is, the alignments are determined only by comparing the three-dimensional coordinates. The statistical significance of strength of structural similarity as a tunction of chain length has been detertabled empirically. Hits h~ the database search are ranked in order of statistical significance, with a cutoff set to a Zscore equal to 2 (two standard deviations above expected) for any pair of structural domains This type of ranking |acilitates the ~nterpretation of database search resui~ as the biologically most interesting matches make it to the top of the list (an empirical observation).
In addition to searches with single structures, the Dali search engine is used to maintain an effective all-againstall structure comparison of all protein families in the Protein Data Bank. These precalculated results are publicly available in the FSSP database, which is continuously updated and is conveniently accessible on the WWW. Let us look at an example of the benefits from largescale automated comparative structure analysis as a means of keeping track of the flow of new structures. At the end of June 1995, the Protein Data Bank had released 125 new © 1995,ElsevierScienceLtd 0968-0004/95/$09.50
{a) ~hich s~uotura~ e~ements in adenosine deaminase corresponS to ~hose in grease? structure structure # 130 140 150 230 240 250 2~0 270 2kau-C 2kau-C VTAGG!DTHIHWICPQQAEEALVSGVTTMV~G TPAAIDCAL%~/ADEMDIQVALHSDTLN~SG~9~DTL~.%!GGR~I~F~EGA 2kau-C 2kau=C ee eeeeeee s thhhhhhhhhteeeeeee hhhhhhhhhhhhhht eeeee tt ss hhhh~hhtt eeettttst 2kau=C ladd -NKPKVELHVHLD-YEFV~GWYVEV~ LFPGHVEAY~GAVKNGIHRT~A ........SPE%q~P~L%VDIKTERVGHGy-2kau-C ladd s eeeeeeegg hhhhhhhhhtteeeeee g hhhhhhhhhhhhht eeeeee hhhhh.hhhht seeee g k~ to~a~,~he c o m m o n core has 8 strands and 11 hegices {b) in the mightof the structuraaagignment, is there any c o m m o n p a ~ e m of sequence conse~ation be~veen the grease and a~en~;ine deaminase? Fou~ meta~ i;gands (*)are consewed in both families.
structure 2kau-C 2kau-C 2kau=C 2kau-C 2kau-C 2kau=C 2kau=C 2kau=C 2kau-C 2kau=C ladd ladd ladd
sequence ure!k!eae
urelpromi urel_promv ure1_bacsb ure2_helfe urelbacpa ure2helpy urea=canen urel lacfe urelureur ada_mouse ada human add ecoli
130 140 150 V~_AGG ! DTH 15~4!CPQQAEEALVSG%~ETMVGG VTAGGIDTHI HFICPQQAQEGLVSGVTTFIGG %~fAGGIDTHI HFICPQQAEEGLVSGVTTFIGG VTAGGIDAHIHFICPQQiETALASGVTTMIGG VTAGGIDTHI HFI SPQQIPTAFASGVTTMIGG VTAGGIDTHVBFINPDQVDVALANGITTLFGG VTAGGIDTHIHFISPQQIPTAFASGVTTMIGG VTAGAIDCHVHYICPQLVYEAISSGITTLVGG VTAGGIDLHVHYITPSiAQAALDNGXTTLFGG YTAGGLDTHVHRLEPE IVPVALDGG IT~/ITG -NKPKVELHVHLD-YEFVEMKAKEGVVI~/EV-DKPKVELHVHLD-YEFVEM~/(EG~V~ ---PLTDiHRHLD-FENIEDAARHGLHYVELe
230 240 250 260 270 TPAAIDCALTVADEM~D!QVALHSDTLNESGFVEDTLAAIG~RTXHTFHTEGA TPMAI HNCLNVADEMDVQVAIHSDTI~NEGGFYEETVKAIAGRVIHVFHTEGA TPMAI HNCLNVADEMDVQVAI HSDTLNEGGFYEETVKAIAGRVIHVFHTEGA TAAAI DTCLKVADRYDVQVAI HTD .~/~EGGFVEDTLKAiDGRVIHTYHTEGA TPAAI HHCLNVADEYDVQVAI HTDTLNEAGCVEDTLEAIAGRTIHTFHTEGA TPAS IDRSLTVADEADVQVA! } ] S D T L N E A G F L E ~ I N G R V I HSFBVEGA TPSAINHALDVADKYDVQVAI RTDTLNEAGCVEDTMAAiAGRTMHTFHTEGA TPAAIDNCLTIAEHHDIQIN!HTDTLNEAGFVEHSIAAFKGRTIHTYBSEGA TAAGIENS iAAANKYDVQYAVBTDSLNEGGFVENTINAFNGNTVHTFHTEGA NRNA! DLALTVAEKTDVAVAI HTDTLNEAGFVEHT IAAMKGRT IQLTIQKVL LFPGHVEAYEGAVKNG IHRTVHA ....... SPEVVREAVDIETERVGHGY-LLPGHVQAYQEAVKSG IHRTVHA ....... SAEVVKEAVDIKTERLGHGY- LFLSH. o .FNRARDAGWHITVHA ....... GPES IWQAIREGAERIGHGV--
w
w
e
{¢} How close are grease and adenosine deaminase in three dimensions? Here, the structures are translated apart horizontally after
least-squares three=dimensional superimposition. Metal ions and figands are highlighted
Urease (TIM barrel domain)
Adenosinedeaminase
(d) What is known about the function of urease and adenosine deaminase? All sequences are hypertext-linked to SWISS-PROT, where, for example, the following information can be viewed. STANDARD; PRT; 567 AA. -I- CATALYTIC ACTIVITY: UREA + H(2)O = C0(2) + 2 NH(3}.
ID CC
UREI KLEAE
ID CC
ADA MOUSE STANDARD; PRT; 352 AA. -I- CATALYTIC ACTIVITY: ADENOSINE + H(2)O = INOSINE + NH(3) (ALSO ACTS ON DEOXYADENOSINE),
R~re 1 Fold classification and multiple structure alignments in the FSSP database. A World Wide Web session is illustrated using grease n as an example. After first finding the correct Protein Data Bank identifier by typing 'urease' into the search option box (in this example, it is '2kau-C', the representative structure for the urease 7 subunit), one moves onto the next screen by a hypertext link. This display indicates the position of urease in the protein fold classification tree and makes a convenient starting point to select different alignment views and learn more about the relationship between structure, sequence and function. Here, we use adenosine deaminase for further investigation of its structural similarity with urease. (a) The structural elements in adenosine deaminase corresponding to those in urease. (b) Similarities can also be viewed in terms of sequence. (©) The three-dimensional structures of urease and adenosine deaminase can be directly compared. (d) Further information on the proteins under investigation can be obtained by direct links to SWlSS-PROT.
coordinate entries with a deposition date in the same year. The automatic analysis performed by the F$SP database digested the pool into 105 entries that are structural and sequence homologs (>25% sequence identity) of structures already in the
Protein Data Bank and 20 more interest- seven members of functionally and ing entries that did not have significant structurally conserved protein families sequence similarity to 'old' structures. (Table I). Focusing now on these 20 'new' proteins reveals three structurally unique Browsing foMs and their ~ghboms The user of the FSSP web server can folds, ten recurrent folds without any apparent biological connection, and choose from several views (Fig. 1) to
479
COMPUTER CORNER ~
~
=
i
~[
~i!li ~ Q~/area
~
~
TIBS 2 0 -
NOVEMBER 1 9 9 5
,,~
~
W l ~ Web seevms Ndated to thnm.dimensional protein stru©tum Tool Access / URL
Databasesea.,chby comparison of 3D structures Structuralclassificationand i~ecalculatedstructuralalignments Structuralclassificationof proteins Structuralclassificationof proteins Retrieve3D coordinates
Dali server
http://wwv.embl heidelberg.de/dali/dati.html
FSSP database SCOP CATH
http://www.sander.embl-heidelberg.de
ProteinData Bank ExPaSy
http:/ /www.pdb.bnl.gov/
http://www.bio.cam.ac.uk/scop/
http://www.biochem.ucl.ac.uk/bsm/cath/
li). You are invited to use the Web
browsers for a closer look at the wonderful world of protein evoiution. A©knowl~[ements We thank Antoine de Daruvar for soft-
ware support, and Reinhard Schneider for the HSSP alignment database.
Rderences
1 Altschul,S. F. et at. (1990) J. MoL BioL 215, 403-410 2 Pearson,W. R. (1990) Methods Enzymol. 183, ftp: / /ftp.dcs.ed.ac.uk/pub /rasmol RASMOL 63-98 3 Sander,C. and Schneider,R. {1991) Proteins 9, PredictProtein http://www.sander.embl-heidelberg.de 56-68 server 4 Zukerkandl,E. and Pauling,L. (1965) in Biotechserver http://www.sander.embl-heidelberg.de Evolving Genes and Proteins (Bryson,V. and Vogel,H. J., eds), pp. 97=166, Academic SRSserver http://www.embl-heidelberg/srs/srsc Press 5 Lesk,A. M. and Chothia,C. (1980)J. Mot. BioL 136, 225-270 visualize the complicated relationships between neighbours. The structural 6 Holm,L. and Sander,C. (1994) Proteins 19, 165-173 stored as numbers in the database neighbours of any one protein of known 7 Holm,L. and Sander,C. (1994) FEBSLett. 340, entries. The all-against-all table of struc- structure can eRher be displayed in the 129-132 tural similarities opens the door to a form of a multiple alignment or viewed 8 Holm,L. and Sander,C. (1995) Trends classification of folds. For introductory in three-dimensional superimposition Biochem. ScL 20, 345-347 browsing, the fold classification is pre- using RASMOL1°. In studying remotely 9 Holm,L. and Sander,C. (1993)J. MoL BioL sented in the form of a tree, which is related protein families, it is useful to 10 233,123-138 Sayle,R. A, and Mflner-White,E. J. (1995) similar in appearance, though not in expand the structural alignment of Trends Biochem. ScL 20, 374-376 meaning, to phylogenetlc trees. The fold protein families of known structure by 11 EtzoM,T. and Argos, P. (1.q97! Comput. AppL BioscL 9, 49-57 tree only reflects structural similarity all sequence relatives in the SWISS-PROT 12 Jabri, E., Carr, M. B., Hausinger,R. P. and and not all branches correspond to databases. Hypertext links are provided Karplus, R A. (1995) Science 268, phylogenetlc relationships by common to allow browsing of functional annota998-1004 SWISS-PROTsequencedatabase, Swissmodelhomologymodelling,etc. Moleculargraphicsfor all platforms starting from PCs Predictsecondarystructurefrom sequence Validateproteinstructurecoordinate sets Browsedatabanksin molecular biology
descen,. Selecting one protein in the classification takes you to the list of struc;ural neighbours from which you may choose to start a walk in fold space, hopping from ne!ghbour to netghbour, or Inspect detail~ structural alignments
http://expasy.hcuge.ch/
tions and literature references u. Access to computational services and biological databases over the Internet, in particular through the WWW, is an increasingly important re= search tool for the bicchemtst ('|'able
L|ISAUOLI~AND CHRmSSANDER Protein Design Group, European Molecular Biology Laboratory, D-69012 Heidelberg, Germany.
Proposal for • new regional evpnizatlon of biochemists - The Federation of Mrican Biocheml©al Societies (FARS) Following the formation of the Federation of European Biochemical Societies, FEBS, in 1964, other regional organizations of biochemists emerged, These were the Pan American Association of Biochemical Societies, PAABS, and the Federation of Asian and Oceanlan Biochemists, FAOB, A region that has not previously been covered is Africa. A major reason for this was the political: troubles In southern Afdca but, now that biochemists and others can move freely throughout Africa, it is timely that a regional organization should be set up to enhance the development of biochemistry through the support of local biochemical societies. It is proposed to hold the first Pan African Conference on Biochemistry and Molecular Biolo~ in 'Health, Food Production, Environmental Protection and Industrial Development' in Nairobi, between 2 and 6 September 1996, D e formation of FABS will be put forward at this conference and a draft constitution is being prepared. The International Union of Biochemistry and Molecular Biology (IUBMB) has lent its support, in terms of both the conference and ~ launch of FABS, as has FEBS, The support of other organizations is now being sought. An international Organizing ~ m ~ u ~ r the Chairmanship of Professor D. W. Makawiti, Chairman of the Biochemical Society of Kenya, is being ~ i M ~ i c h I am a member, Anyone able to assist in the launch of FABS is invited to contact either Professor Makawiti or myself,
Unlvers~College London, Department of Biochemistry and Molecular Blolo~, Gower Street, London, UK WCIE 6BT. Tel: + 4 4 1 7 1 387 7050 ext. 2169. Emall:
[email protected] m
=wv
Dominic W, Makawiti Dept. of Biochemistry, Univ. of Nairobi, PO Box 30197, Nairobi, Kenya. Tel: +254 244 2793. Emaih
[email protected]