Novel Sequences Propel Familiar Folds

Novel Sequences Propel Familiar Folds

Structure, Vol. 10, 447–454, April, 2002, 2002 Elsevier Science Ltd. All rights reserved. Novel Sequences Propel Familiar Folds Zahra Jawad and Mass...

497KB Sizes 73 Downloads 34 Views

Structure, Vol. 10, 447–454, April, 2002, 2002 Elsevier Science Ltd. All rights reserved.

Novel Sequences Propel Familiar Folds Zahra Jawad and Massimo Paoli1 Department of Biochemistry University of Cambridge Tennis Court Road Cambridge, CB2 1QW United Kingdom

Recent structure determinations have made new additions to a set of strikingly different sequences that give rise to the same topology. Proteins with a ␤ propeller fold are characterized by extreme sequence diversity despite the similarity in their three-dimensional structures. Several fold predictions, based in part on sequence repeats thought to match modular ␤ sheets, have been proved correct. Introduction The excess of known protein sequences over known structures has significantly enhanced efforts toward the extrapolation of folds from sequence data. Such predictions can often be facilitated by internal sequence repeats, suggesting a repetitive modular structure. Despite this, it can still be very difficult to gain clues about a fold because a great diversity of sequences can be compatible with the same architecture. This is a characteristic of the ␤ propeller fold, a principally all ␤ sheet topology of proteins with extreme sequence diversity, unrelated functions, and widespread phylogenetic occurrence (Table 1). The increasingly numerous ␤ propellers are constructed from a modular building block (four-stranded ␤ sheet) repeated four to eight times around a central axis (Figure 1A). The strands of the ␤ sheet modules are twisted so that the fourth outer strand is almost perpendicular to the first inner strand, giving a propellerlike appearance. Each ␤ sheet in the circular array packs onto the adjacent sheets through hydrophobic contacts to form localized hydrophobic cores (Figure 1A). The nonpolar residues at the sheet-to-sheet interfaces are the biggest constraints on this folding topology [1–5]. At present, this modular construction, highlighted in Figure 1A, has been found in over 20 nonhomologous but structurally related domains (Table 1). On the basis of thorough sequence investigations, many other proteins are predicted to adopt this fold [6–10]. Functional Diversity among ␤ Propellers In spite of their identical topology and similar structure, ␤ propeller proteins have surprisingly varied functions, ranging from enzyme catalysis to scaffold and signaling molecules, from ligand binding and transport to mediators of protein-protein interactions. Roughly half of known propellers are enzymes; their catalytic activities cover a broad range of reactions and substrates (Table 1). Prosthetic groups are often found bound at the cen1

Correspondence: [email protected]

PII S0969-2126(02)00750-5

Review

tral cavity that forms as a result of ␤ sheet packing (schematically represented in Figure 1B). The d1 haem is bound to nitrite reductase’s eight-bladed propeller [11], and the cofactor pyrrolo quinoline quinone (PQQ) binds noncovalently the six-bladed glucose dehydrogenase [12] and the eight-bladed methanol, alcohol, and ethanol dehydrogenases [13–16]. A comparison of the glucose and methanol dehydrogenase enzymes revealed that while PQQ binds to the same cavity in both proteins, different interactions hold the ligand in place [17]. From the location of catalytic residues and cofactor binding sites, it appears that the modular arrangement of ␤ sheets acts as a scaffold that acquires specific activities once decorated with functional loop regions (Figure 1B). The structures of other ␤ propeller proteins, namely haemopexin and tachylectin-2, show that ligand binding can occur at external surfaces of the toroidal molecule. In haemopexin, a haem ligand is bound between two four-bladed ␤ propeller domains [18], and tachylectin-2 binds an oligosaccharide ligand to a small pocket at the outer edge of each of its five modular sheets [19]. An auxiliary binding role has been presumed for the C-terminal propeller domain of matrix metallo-proteinases (MMPs). In collagenase (MMP-1), it helps substrate recognition [20] while it regulates the activity of gelatinase A (MMP-2) by associating with the tissue inhibitors of MMPs [21]. Protein-protein interactions are involved in the functions of the G protein ␤ subunit [22, 23] and the N-terminal head domain of clathrin [2], which further demonstrates the binding versatility of the propeller fold. The propeller domains of TolB [24, 25] and regulator of chromosome condensation (RCC1) [26] are also implicated in the formation of macromolecular complexes.

Sequence Repeats and Modular Structure Proteins with the same propeller fold often have different sequences (Table 1). Interestingly, similarities between the ␤ sheets of a given structure in terms of sequence repetition have been defined in most ␤ propeller proteins. These are sometimes relatively well conserved and hence easily identifiable but can also be weak or limited to a short peptide segment, and some have therefore been detected only after structure determination. Rather than starting with the first strand, the sequence repeat starts with either the second, third, or fourth strand of a ␤ sheet (reviewed in [27]). The polypeptide chain then continues through all the successive modules to finish off with the C-terminal end completing the initial sheet. This tethering of the N and C termini into one modular sheet, referred to as “Velcro” [28] or “molecular clasp” [26], is a reoccurring feature in nearly all propellers, suggesting that ring closure may be important for the stability of the propeller fold [26–29]. Sequence repeats are generally 40–50 residues in length, although extended loop insertions can result in

Structure 448

Table 1. Modular Construction, Functions, and Origins of Known Proteins with ␤ Propeller Domains ␤ Sheets

Protein

Function

Origin

Pdb

Ref

•, 4

haemopexin (N-/C-domains)

haem binding and transport

1QHU

[18]

4

collagenase C-domain

substrate recognition and binding

1FBL

[20]

4

gelatinase C-domain

substrate recognition and binding

1RTG

[21]

•, 5

tachylectin-2

1TL2

[19]

•, 6

sialidase

carbohydrate binding in immune response removal of sialic acid residues

2SIL

[66]

•, 6

neuraminidase

removal of sialic acid residues

1NN2

[67]

•, 6

glucose dehydrogenase

1C9U

[12]

•, 6

phytase

conversion of pentose and hexose sugars phytic acid hydrolysis

1CVM

[5]

•, 6

TolB

•, 6

1CRZ 1C5K 1IJQ

[24] [25] [48]

1E1A

[57]

•, 6

low-density lipoprotein receptor diisopropyl fluoro phosphatase tricorn protease ␤6 subunit

1K32

[52]

•, 7

methylamine dehydrogenase

oxidation of primary amines

2BBK

[53]

•, 7

galactose oxidase

1GOG

[35]

•, 7

transducin G ␤ subunit

1GOT

7

Tup1

electron transfer in oxidation of alcohols scaffold in signal transduction complex transcriptional repression

1ERJ

[22] [23] [40]

7

ARPC1

initiation of actin polymerization

1K8K

[41]

•, 7

control of nucleo-cytoplasmic transport inhibition of ␤-lactamase

1A12

[26]

1JTD

[30]

•, 7

regulator chromosome condensation ␤-lactamase inhibitor protein-II prolyl oligopeptidase

1QFS

[3]

•, 7

clathrin head N-domain

1BPO

[2]

•, 7

nitrous oxide reductase

1QN1

[4]

•, 7

integrin extracellular segment

hormonal and neural peptides metabolism vesicle coating for intracellular transport reduction of nitrous oxide to nitrogen cell-cell and cell-matrix adhesion

1JV2

[50]

•, 7

tricorn protease ␤7 subunit

cytosolic protein degradation

1K32

[52]

•, 7

quino-haemo amine dehydrogenase

amines oxidation

•, 8

methanol dehydrogenase

oxidation of primary alcohols

8

ethanol dehydrogenase

oxidation of primary alcohols

1JJU 1JMX 4AAH 1H4I 1FLG

[68] [69] [13] [14] [16]

8

alcohol dehydrogenase

oxidation of primary alcohols

1KB0

[15]

•, 8

nitrite reductase

reduction of nitrite and oxygen

Mammalian rabbit: Oryctolagus cuniculus Mammalian porcine: Sus scrofa Mammalian human: Homo sapiens Invertebrate Tachyleum tridentatus Bacterial Salmonella typhimurium Viral Influenza Bacterial Acinetobacter calcoaceticus Bacterial Bacillus amyloliquefaciens Bacterial Escherichia coli Mammalian human: Homo sapiens Vertebrate squid: Loligo vulgaris Archea Thermoplasma acidophilum Bacterial Thiobacillus versutus Bacterial Dactylium dendroides Eukaryotic bovine: Bos taurus Eukaryotic Saccharomyces cerevisiae Eukaryotic bovine: Box taurus Eukaryotic human: Homo sapiens Prokaryotic Streptomyces exfoliatus Mammalian procine: Sus scrofa Mammalian rat: Rattus norvegicus Bacterial Psudomonas nautica Eukaryotic Homo sapiens Archea Thermoplasma acidophilum Bacterial Paracoccus denitrificans Pseudomonas putida Bacterial Methylophilus methylotrophus Bacterial Pseudomonas aeruginosa Bacterial Comamonas testosteroni Bacterial Thiospaera pantotropha

1AOF

[11]

•, 6

7

complex component involved in cell invasion receptor signalling/associations organophosphate compounds hydrolysis cytosolic protein degradation

The symbol • denotes proteins with distinct sequences or sequence repeats.

modules of over 80 residues. The most striking example of repeated sequence is found in tachylectin-2, where five nearly identical repeats form a regular and highly symmetrical five-bladed domain [19]. A relatively strong consensus occurs in haemopexin domains, where the sequences of the first and second strands in each ␤

sheet are the most conserved [27]. In RCC1 the repeated motif Val-Tyr-x-Trp-Gly is associated with the clustering of hydrophobic side chains [26]. Strong sequence repeats, relatively similar to RCC1, are found in the ␤-lactamase inhibitor protein II (BLIP-II) [30]. However, in this propeller domain, the sequence repeats make an

Review 449

Figure 1. Schematic Diagrams Illustrating the Modularity of Proteins with a ␤ Propeller Fold (A) The ␤ sheet, or blade, containing the N terminus (and the Velcro in all but the fourbladed haemopexin domains) is shown in dark blue. The order of the colors highlights the increasing number of modules in the structures. The following propeller structures are shown: the four-bladed assembly of the N-terminal domain of haemopexin (1QHU); the five-bladed structure of tachylectin-2 (1TL2); the six-bladed assembly of the YWTD protein low-density lipoprotein receptor (1IJQ); the seven-bladed structure of the ␤ subunit of the G protein, G␤ (1GOT); and finally the eight-bladed propeller of methanol dehydrogenase (4AAH). The first panel in the figure was prepared using Ribbons [59], while the remaining panels were produced with MOLSCRIPT [60] and Raster3D [61]. All the atomic coordinates used were taken from the Protein Data Bank [62]. (B) Simplistic conceptual diagram summarizing the various properties and features of members of the propeller fold. A six-bladed array is shown as a representative of the ␤ propeller scaffold.

Structure 450

unusual modular unit in which the canonical fourth strand is replaced by a partial helical turn. Methanol dehydrogenase provides another example of sequence repeats with the “tryptophan docking motif,” characterized by the consensus Ala-x-Asp/Asn-x-x-Thr-Gly-Asp/ Glu-x-x-Trp [14]. In the proteins mentioned above, the relatively high level of sequence similarity between modules identifies a clear sequence consensus or repeated motif. Sequence similarities are much weaker in cases such as the WD [28, 31, 8] and kelch [32, 9] repeats, two distinct and phylogenetically dispersed sequence repeats. WD repeats generally occur four to eight times [31], though as many as 16 are present in the telomerase gene [33]. They are found mainly in eukaryotic proteins involved in cellular processes like vesicular fusion, signaling, chemotaxis, cytoskeletal organization, RNA processing, and transcription. The approximately 40-residue repeat is characterized by a glycine-histidine (GH) doublet and a tryptophan-aspartate (WD) doublet, separated by hydrophobic residues at conserved positions. The structural determination of the G protein transducin ␤ subunit revealed a seven-bladed array built by seven WD repeats [22, 23]. Between five and seven kelch repeats have been identified in numerous intra- and extracellular proteins [34, 32, 9]. Their consensus features a conserved tryptophan, a glycine doublet, and a tyrosine residue with intervening hydrophobic positions. Seven of these 44–56 amino acid repeats are present in galactose oxidase which has been shown to be a sevenbladed propeller [35]. Predictions of ␤ Propeller Folds The strong correlation of the WD repeats and modular structure in G␤ suggests that all WD proteins adopt a ␤ propeller fold [36, 37, 8]. The same proposition has been made for proteins with kelch repeats [32]. These predictions led to structural models for the WD protein Sec13 [38] and the kelch protein scruin [39]. Recently, the structures of two seven WD repeat proteins, the yeast transcriptional corepressor Tup1 [40] and the actin polymerization initiator protein ARPC1 [41] were shown to be seven-bladed propellers, reinforcing the expectations that all WD proteins fold into propeller domains. The detection of “tryptophan docking motifs” in alcohol dehydrogenase led to a predictive propeller model [42, 43], which was shown correct by the recent structure determination [15]. In addition, haemopexin-like repeats have been identified in the fish warm-acclimatation protein WAP65 [44] and the mammalian glycoprotein vitronectin [45], suggesting that propeller domains may exist. In all the cases above, fold predictions were based on the knowledge of a structure containing sequence repeats that were then found in other proteins. Fold recognition is clearly more challenging when novel sequence repeats are found and no structural information is available on related proteins. Because of this, and given the diverse nature of sequences in various propeller domains, it has been argued that identification of this fold from sequence data would be unlikely [46]. However, the propeller folds of several proteins have

been predicted correctly, primarily on the basis of sequence repeats. In the bacterial periplasmic protein TolB, the presence of Ala-x-Ser-Pro-Asp motifs and the high ␤ strand content indicated by sequence-based secondary structure analysis led to the prediction of a sixbladed ␤ propeller domain [47], then confirmed once the structure was determined [24, 25]. In a second instance, an insightful analysis was carried out into the nature of the Tyr-Trp-Thr-Asp (YWTD) motif [7], discovered in over 60 types of extracellular domains with diverse functions. On the basis of theoretical arguments, sequence and secondary structure analysis, threading methods, and experimental data, these repeats were predicted to assume a compact modular structure with a ␤ propeller fold built by six sheets. Remarkably, the recent structure determination of the low-density lipoprotein receptor, a YWTD protein, indeed showed a sixbladed propeller fold [48]. This outcome thus indicates that all YWTD proteins, which are some of the most abundant modular extracellular proteins, probably contain propeller domains. Similarly, based on a Phe-Gly(x)n-Gly-Ala-Pro motif associated with hydrophobic residues occurring at conserved positions, the ␣ subunit of integrin was predicted to contain a ␤ propeller domain [49]. As shown by the structure of the extracellular segment of integrin [50], this forecast was correct. Finally, another successful prediction was made about the tricorn protease complex [51], which contains two distinct propeller domains, one six bladed and one seven bladed [52]. These predictive studies indicate that it is possible to use the information from sequence repeats in the recognition of propeller folds, though this may only be possible in cases in which the repeats are defined enough to be detected prior to structure determination. Fold predictions have been made for the members of family 32 of glycosyl-hydrolase enzymes [6], some of which have sequence identities as low as 13%. Despite the lack of a clear and conserved motif, the combined results from secondary structure predictions and threading methods all led to the prediction of a sixbladed propeller architecture. In another study, weak repeats were detected in the eukaryotic family of UVdamaged DNA binding proteins [10]. The presence of 16–21 repeats has been postulated to correspond to structures with two to three ␤ propeller domains. These predictions await confirmation by structure determination.

Extreme Diversity of Sequences Compatible with the Same Folding Topology Fold recognition is hindered by the high degree of diversity in sequences of proteins with a propeller topology. In addition, irregularities in the fold and loop insertions can make this task even harder. Because of these reasons, meaningful sequence alignments can be obtained only from the superposition of structures. Modular ␤ sheets from distinct structures superimpose with rms deviations in the 0.9–2.9 A˚ range [53, 36, 28, 3, 25, 4]. Figure 2A reports a structure-based sequence alignment of the ␤ sheets of some representative sevenbladed propellers. These topologically identical structures have strikingly different sequences. It appears

Review 451

Figure 2. Figure Showing the Extreme Sequence Diversity between Distinct but Structurally Similar Propeller Domains and the Sequence Relationships between Modular ␤ Sheets within a Given Structure The structure-based sequence alignment includes all the modular sheets from two nonhomologous seven-bladed (A) and six-bladed (B) propellers and some additional randomly chosen modules from other representative structures. (More comprehensive alignments are provided at www-cryst.bioc.cam.ac.uk/ⵑmax/res_propeller.html.) Equivalent atoms and initial superpositions were found using the program O [63]; atom matches were then used as input to MNYFIT and COMPARER [64]. The alignments are annotated, using the program JOY [65], with codes about the structural environment of the residues. Amino acids are shown in their one-letter code and numbers introduced in the sequence represent residues from loops that cannot be structurally matched. Residues at conserved solvent-excluded positions, predominantly nonpolar, are highlighted by the § symbol, and residues responsible for repeated electrostatic interactions are highlighted by the • symbol. Upper case, solvent inaccessible; lower case, solvent exposed; bold, hydrogen bond to main chain amide; underlined, hydrogen bond to main chain carbonyl; italics, positive phi; blue, beta conformation; red, alpha conformation; ¶ symbol, start/end of chain. gp, G ␤ subunit (1GOT); rc, regulator chromosome condensation (1A12); md, methylamine dehydrogenase (2BBK); go, galactose oxidase (1GOG); po, prolyl oligopeptidase (1QFS); cl, clathrin head domain (1BPO); no, nitrous oxide reductase (1QNI); lr, low-density lipoprotein receptor (1IJQ); tb, TolB (1CRZ); ne, neuraminidase (1NN2); si, sialidase (2SIL); gd, glucose dehydrogenase (1C9U); ph, thermophylic phytase (1CVM); fp, diisoprolylfluorophosphatase (1E1A).

from Figure 2A that there is no single amino acid conserved throughout these molecules. However, many ␤ sheets have a polar side chain, often an aspartate, at the end of the third strand (highlighted by the symbol •

in Figure 2A), involved in electrostatic interactions with main chain atoms. Visual inspection of structural superpositions revealed that these reoccurring hydrogen bonds probably stabilize the exposed backbone atoms

Structure 452

of the turn between the third and fourth strands [27]. Mutagenesis of aspartate residues in two WD proteins resulted in some loss of folding stability [37]. The degree of sequence diversity amongst the sixbladed structures is also fascinating as seen from the structure-based sequence alignment reported in Figure 2B. Predictions of six-bladed propeller domains in YWTD proteins, glycosyl-hydrolases, as well as in proteins with six WD repeats and six kelch repeats (such as Sec13 and scruin, respectively), adds to the increasing number of proteins with a six ␤ sheet propeller architecture. The structure-based sequence alignments in Figures 2A and 2B show that nonpolar side chains, generally solvent inaccessible, are conserved at positions located in the central part of the strands in modular sheets from different domains (highlighted by the symbol §). This is apparent in the first three strands, while the fourth outer strand is varied and irregular. It is not surprising that hydrophobic groups predominantly occur at these positions since they are responsible for the contacts at the intersheet cores [1, 3–5], thus determining the assembly of the fold. Figures 2A and 2B also effectively show that many diverse sequences are compatible with the same folding architecture, since ␤ sheet modules from distinct propeller structures have no significant sequence identity between them. For instance, comparison of the sevenbladed nitrous oxide reductase sequence with proteins of similar structure such as G␤ or methylamine dehydrogenase gives sequence identities of 7%–9% [4] that falls in the noise level. This extreme sequence diversity means that new structure determinations may show known folds more often than expected. An interesting case is the one of the ␤/␣ propeller, an unusual propeller topology originally found in L-arg:gly amidino transferase [54] but not immediately recognized in the structure of the ribosome anti-association factor IF6 [55, 56, 27]. Evolution of ␤ Propeller Folds The evolutionary origin of the ␤ propeller fold is an intriguing issue given that the sequences of many proteins with this architecture are clearly unrelated, while the similarities of the sequences of modular ␤ sheets within a propeller protein indicates that an evolutionary relationships may exist. It was first suggested that the seven-bladed domain in methylamine dehydrogenase could have been preceded by a propeller with seven identical ␤ sheets [53]. This possibility is made more plausible by the structures of tachylectin-2 [19] and BLIP-II [30], which are the closest examples to arrays with identical ␤ sheets. These repeated sequences suggest that the structure arose from a primordial ␤ sheet gene that underwent multiple events of duplication and fusion. Further supporting evidence comes from the WD proteins, where different numbers of repeats may mean that several copies of an ancestor WD gene have combined. It has also been argued that the propeller topology does not impose specific constraints on the sequence, so that ␤ sheet modules within an array could evolve divergently, thus losing their sequence homology

[1]. This could be the case for some domains in which no obvious consensus repeats are present, such as viral neuraminidase. Most propeller domains, however, have internal repeats that are sometimes identified after structure determination like in the case of diisopropylfluorophosphatase [57]. The functional diversity of proteins with a propeller fold also hints that divergence from an overall common precursor architecture is unlikely and evolution may have taken place from distinct ancestral ␤ sheet genes. Perhaps, given the number of unrelated sequences seen to be compatible with its architecture (Figures 2A and 2B), this folding topology benefits from an intrinsic stability gained upon circularization that provides an evolutionary advantage. Further evolution by circular permutation, module insertion, or deletion could be possible because of the low-energy interactions between adjacent sheets [27]. Ultimately, the only major requirement for modular ␤ sheets to fold and pack into a propeller assembly is the presence of hydrophobic residues at conserved positions [1, 3–5], as shown in Figures 2A and 2B. Conclusions The ␤ propeller fold is intriguing for the strikingly different sequences that give rise to closely superimposable tertiary structures. Structure-based sequence alignments show that the predominantly common feature is the conservation of residues with hydrophobic character at central positions in the strands. Remarkably, this extreme sequence diversity is accompanied by greatly different functions and phylogenetic origins. In contrast to the ␣/␤ barrel fold, or TIM barrel, which has been repeatedly used in nature for purposes of catalysis [58], the ␤ propeller architecture is extremely adaptable to a variety of complex and specialized biological tasks. This is likely to be related to the presumed independent evolution of its members from distinct ancestor proteins. The marked sequence and functional diversity of these molecules, as well as the presence of irregularities in the fold and inserted loop regions, makes the prediction of ␤ propeller folds from sequence data exceptionally challenging. However, the presence of sequence repeats, together with the modular construction of the fold, made possible predictions that were successively proved correct by structure determinations. In the past four years, the number of nonhomologous proteins with a ␤ propeller fold in the PDB has doubled. Given that all proteins with WD, kelch, and YWTD repeats are thought to adopt a propeller fold, it is certain that this folding topology will prove much more common than first anticipated. Acknowledgments We thank the BBSRC for financial support and are grateful to F. Hollfelder, M. Sunde, and M. Hyvo¨nen for critical reading of the manuscript. References 1. Murzin, A.G. (1992). Structural principles for the propeller assembly of ␤-sheets: the preference for seven-fold symmetry. Proteins 14, 191–201. 2. Haar, E., Musacchio, A., Harrison, S.C., and Kirchhausen, T.

Review 453

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

(1998). Atomic structure of clathrin: a ␤ propeller terminal domain joins an ␣ zigzag linker. Cell 95, 563–573. Fu¨lo¨p, V., Bo¨cskei, Z., and Polga´r, L. (1998). Prolyl oligopeptidase: an unusual ␤ propeller domain regulates proteolysis. Cell 94, 161–170. Brown, K., Tegoni, M., Prudencio, M., Pereira, A.S., Besson, S., Moura, J.J., Moura, I., and Cambillau, C. (2000). A novel type of catalytic copper cluster in nitrous oxide reductase. Nat. Struct. Biol. 7, 191–195. Ha, N.C., Oh, B.C., Shin, S., Kim, H.J., Oh, T.K., Kim, Y.O., Choi, K.Y., and Oh, B.H. (2000). Crystal structures of a novel, thermostable phytase in partially and fully calcium-loaded states. Nat. Struct. Biol. 7, 147–153. Pons, T., Olmea, O., Chinea, G., Beldarrain, A., Marquez, G., Acosta, N., Rodriguez, L., and Valencia, A. (1998). Structural model for Family 32 of glycosyl-hydrolase enzymes. Proteins 33, 383–395. Springer, T.A. (1998). An extracellular ␤ propeller module predicted in lipoprotein and scavenger receptors, tyrosine kinases, epidermal growth factor precursor, and extracellular matrix components. J. Mol. Biol. 283, 837–862. Smith, T.F., Gaitatzes, C., Saxena, K., and Neer, E.J. (1999). The WD repeat: a common architecture for diverse functions. TIBS 24, 181–185. Adams, J., Kelso, R., and Cooley, L. (2000). The kelch repeat superfamily of proteins: propellers of cell functions. Trends Cell Biol. 10, 17–24. Neuwald, A.F., and Poleksic, A. (2000). PSI-BLAST searches using hidden Markov models of structural repeats: prediction of an unusual sliding DNA clamp and of ␤ propellers in UVdamaged DNA-binding protein. Nucleic Acids Res. 28, 3570– 3580. Fu¨lo¨p, V., Moir, J.W.B., Ferguson, S.J., and Hajdu, J. (1995). The anatomy of a bifunctional enzyme: structural basis for reduction of oxygen to water and synthesis of nitric oxide by cytochrome cd1. Cell 81, 369–377. Oubrie, A., Rozeboom, H.J., Kalk, K.H., Duine, J.A., and Dijkstra, B.W. (1999). The 1.7 A˚ crystal structure of the apo form of the soluble quinoprotein glucose dehydrogenase from Acinetobacter calcoaceticus reveals a novel internal conserved sequence repeat. J. Mol. Biol. 289, 319–333. Xia, Z., Dai, W., Xiong, J., Hao, Z., Davidson, V.L., White, S., and Mathews, S.F. (1992). The three-dimensional structures of methanol dehydrogenase from two methylotrophic bacteria at 2.6 A˚ resolution. J. Biol. Chem. 267, 22289–22297. Ghosh, M., Anthony, C., Harlos, K., Goodwin, M.G., and Bake, C. (1995). The refined structure of the quinoprotein methanol dehydrogenase from Methylobacterium extorquens at 1.94 A˚. Structure 3, 177–187. Oubrie, A., Rozeboom, H.J., Kalk, K.H., Huizinga, E.G., and Dijkstra, B.W. (2002). Crystal structure of quinohemoprotein alcohol dehydrogenase from Comamonas testosteroni: structural basis for substrate oxidation and electron transfer. J. Mol. Biol. 277, 3727–3732. Keitel, T., Diehl, A., Knaute, T., Stezowski, J.J., Hohne, W., and Gorisch, H. (2000). X-ray structure of the quinoprotein ethanol dehydrogenase from Pseudomonas Aeruginosa: basis of substrate specificity. J. Mol. Biol. 297, 961–974. Oubrie, A., and Dijkstra, B.W. (2000). Structural requirements of pyrrolo quinoline quinone dependent enzymes. Protein Sci. 9, 1265–1273. Paoli, M., Anderson, B.F., Baker, H.M., Morgan, W.T., Smith, A., and Baker, E.N. (1999). Crystal structure of haemopexin reveals a novel high-affinity haem site formed between two ␤ propeller domains. Nat. Struct. Biol. 6, 926–931. Beisel, H.G., Kawabata, S., Iwanaga, S., Huber, R., and Bode, W. (1999). Tachylectin-2: crystal structure of a specific GlcNAc/ GalNAc-binding lectin involved in the innate immunity host defense of the Japanese horseshoe crab Tachypleus tridentatus. EMBO J. 18, 2313–2322. Li, J., O’Hare, M.C., Skarzynski, T., Lloyd, L.F., Curry, V.A., Clark, I.M., Bigg, H.F., Hazleman, B.L., Cawston, T.E., and Blow, D.M. (1995). Structure of full-length porcine synovial collagenase re-

21.

22.

23.

24.

25.

26.

27. 28. 29. 30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

veals a C-terminal domain containing a calcium-linked, fourbladed ␤ propeller. Structure 3, 541–549. Libson, A.M., Gittis, A.G., Collier, I.E., Marmer, B.L., Goldberg, G.I., and Lattman, E.E. (1995). Crystal structure of the haemopexin-like C-terminal domain of gelatinase A. Nat. Struct. Biol. 2, 938–942. Wall, M.A., Coleman, D.E., Ethan, L., In˜iguez-Lluhi, J.A., Posner, B.A., Gilman, A.G., and Sprang, S.R. (1995). The structure of the G protein heterotrimer Gi␣1␤1␥2. Cell 83, 1047–1058. Sondek, J., Bohm, A., Lambright, D.G., Hamm, H.E., and Sigler, P.B. (1996). Crystal structure of a G protein ␤␥ dimer at 2.1 A˚ resolution. Nature 379, 369–374. Abergel, C., Bouveret, E., Claverie, J.M., Brown, K., Rigal, A., Lazdunski, C., and Be´ne´detti, H. (1999). Structure of the Escherichia coli TolB protein determined by MAD methods at 1.95 A˚ resolution. Structure 7, 1291–1300. Carr, S., Penfold, C.N., Bamford, V., James, R., and Hemmings, A.M. (1999). The structure of TolB, an essential component of the tol-dependent translocation system, and its protein-protein interaction with the translocation domain of colicin E9. Structure 8, 57–66. Renault, L., Nassar, N., Vetter, I., Bekker, J., Klebe, C., Roth, M., and Wittenghofer, A. (1998). The 1.7 A˚ crystal structure of the regulator of chromosome condensation (RCC1) reveal a seven-bladed propeller. Nature 392, 97–100. Paoli, M. (2001). Protein folds propelled by diversity. Prog. Mol. Biol. Biophys. 76, 103–130. Neer, E.J., and Smith, T.F. (1996). G protein heterodimers: new structures propel new questions. Cell 84, 175–178. Fu¨lo¨p, V., and Jones, D.T. (1999). ␤ propeller: structural rigidity and functional diversity. Curr. Opin. Struct. Biol. 9, 715–721. Lim, D., Park, H.U., De Castro, L., Kang, S.G., Lee, H.S., Jensen, S., Lee, K.J., and Strynadka, N.C. (2001). Crystal structure and kinetic analysis of ␤-lactamase inhibitor protein-II in complex with TEM-1 ␤-lactamase. Nat. Struct. Biol. 8, 848–852. Neer, E.J., Schmidt, C.J., Nambudripad, R., and Smith, T.F. (1994). The ancient regulatory-protein family of WD-repeat proteins. Nature 371, 297–300. Bork, P., and Doolittle, R.F. (1994). Drosophila kelch motif is derived from a common enzyme fold. J. Mol. Biol. 236, 1277– 1282. Nakayama, J., Saito, M., Nakamura, H., Matsuura, A., and Ishikawa, F. (1997). TLP1: A gene encoding a protein component of mammalian telomerase is a novel member of WD repeat family. Cell 88, 875–884. Xue, F., and Cooley, L. (1993). Kelch encodes a component of intercellular bridges in drosophila egg chambers. Cell 72, 681–693. Ito, N., Phillips, S.E., Stevens, C., Ogel, Z.B., McPherson, M.J., Keen, J.N., Yadav, K.D., and Knowles, P.F. (1991). Novel thioester bond revealed by a 1.7 A˚ crystal structure of galactose oxidase. Nature 350, 87–90. Garcia-Higuera, I., Fenoglio, J., Li, Y., Lewies, C., Panchenko, M.P., Reiner, O., Smith, T.F., and Neer, E.J. (1996). Folding of proteins with WD-repeats: comparison of six members of the WD-repeat superfamily to the G protein ␤ subunit. Biochemistry 35, 13985–13994. Garcia-Higuera, I., Gaitatzes, C., Smith, T.F., and Neer, E.J. (1998). Folding a WD repeat propeller. Role of highly conserved aspartic acid residues in the G protein beta subunit and Sec13. J. Biol. Chem. 273, 9041–9049. Saxena, K., Gaitatzes, C., Walsh, M.T., Eck, M., Neer, E.J., and Smith, T.F. (1996). Analysis of the physical properties and molecular modelling of Sec13: a WD repeat protein involved in vesicular traffic. Biochemistry 35, 15215–15221. Sun, S., Footer, M., and Matsudaira, P. (1997). Modification of cys-837 identifies an actin-binding site in the ␤ propeller protein scruin. Mol. Biol. Cell 8, 421–430. Sprague, E., Redd, M.J., Johnson, A.D., and Wolberger, C. (2000). Structure of the C-terminal domain of Tup1, a corepressor of transcription in yeast. EMBO J. 19, 3016–3027. Robinson, R.C., Turbedsky, K., Kaiser, D.A., Marchand, J.B., Higgs, H.N., Choe, S., and Pollard, T.D. (2001). Crystal structure of Arp2/3 complex. Science 294, 1679–1684.

Structure 454

42. Cozier, G.E., Giles, I.G., and Anthony, C. (1995). The structure of the quinoprotein alcohol dehydrogenase of Acetobacter aceti modelled on that of methanol dehydrogenase from Methylobacterium extorquens. Biochem. J. 308, 375–379. 43. Jongejan, A., Jongejan, J.A., and Duine, J.A. (1998). Homology model of the quinoprotein alcohol dehydrogenase from Comamonas testosteroni. Protein Eng. 11, 185–198. 44. Kikuchi, K., Yamashita, M., Watabe, S., and Aida, K. (1995). The warm temperature acclimation-related 65-kDa protein, Wap65, in goldfish and its gene expression. J. Biol. Chem. 270, 17087– 17092. 45. Hunt, L.T., Barker, W.C., and Chen, H.R. (1987). A domain structure common to hemopexin, vitronectin, interstitial collagenase, and a collagenase homolog. Prot. Seq. Data Anal. 1, 21–26. 46. Baker, S.C., Saunders, N.F.W., Willis, A.C., Ferguson, S.J., Hajdu, J., and Fu¨lo¨p, V. (1997). Cytochrome cd1 structure: unusual haem environments in a nitrite reductase and analysis of factors contributing to ␤ propeller folds. J. Mol. Biol. 269, 440–455. 47. Ponting, C.P., and Pallen, M.J. (1999). A ␤ propeller domain within TolB. Mol. Microb. 3, 739–740. 48. Jeon, H., Meng, W., Takagi, J., Eck, M.J., Springer, T.A., and Blacklow, S.C. (2001). Implications for familial hypercholesterolemia from the structure of the LDL receptor YWTD-EGF domain pair. Nat. Struct. Biol. 8, 499–504. 49. Springer, T.A. (1997). Folding of the N-terminal, ligand-binding region of the integrin ␣ subunits into a ␤ propeller domain. Proc. Natl. Acad. Sci. USA 94, 65–72. 50. Xiong, J.P., Stehle, T., Diefenbach, B., Zhang, R., Dunker, R., Scott, D.L., Joachimiak, A., Goodman, S.L., and Arnaout, M.A. (2001). Crystal structure of the extracellular segment of integrin ␣V␤3. Science 294, 339–345. 51. Ponting, C.P., and Pallen, M.J. (1999). ␤ propeller repeats and a PDZ domain in the tricorn protease: predicted self-compartmentalisation and C-terminal polypeptide-binding strategies of substrate selection. FEMS Microbiol. Lett. 179, 447–451. 52. Brandstetter, H., Kim, J.S., Groll, M., and Huber, R. (2001). Crystal structure of the tricorn protease reveals a protein disassembly line. Nature 414, 466–470. 53. Vellieux, F.M.D., Huitema, F., Goendijk, H., Kalk, K.H., Franz, J., Jongejan, J.A., Duine, J.A., Petratos, K., Drenth, J., and Hol, W.G.J. (1989). Structure of quinoprotein methylamine dehydrogenase at 2.25 A˚ resolution. EMBO J. 8, 2171–2178. 54. Humm, A., Fritsche, E., Steinbacher, S., and Huber, R. (1997). Crystal structure and mechanism of human L-arginine:glycine amidino-transferase: a mitochondrial enzyme involved in creatine biosynthesis. EMBO J. 16, 3373–3385. 55. Groft, C.M., Beckmann, R., Sali, A., and Burley, S.K. (2000). Crystal structure of ribosome anti-association factor IF6. Nat. Struct. Biol. 7, 1156–1164. 56. Paoli, M. (2001). An elusive propeller-like fold. Nat. Struct. Biol. 8, 744. 57. Scharff, E.I., Koepke, J., Fritzsch, G., Lucke, C., and Ruterjans, H. (2001). Crystal structure of diisopropylfluorophosphatase from Loligo vulgaris. Structure 9, 493–502. 58. Bra¨nde´n, C.-I. (1991). The TIM barrel—the most frequently occurring folding motif in proteins. Curr. Opin. Struct. Biol. 1, 978–983. 59. Carson, M. (1991). Ribbons 2.0. J. Appl. Crystallogr. 24, 958–961. 60. Kraulis, P.J. (1991). MOLSCRIPT: a program to produce both detailed and schematic plots of proteins structures. J. Appl. Crystallogr. 24, 946–950. 61. Merritt, E.A., and Bacon, D.J. (1997). Raster3D. Photorealistic moleacular graphics. Meth. Enzymol. 277, 505–524. 62. Berstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.E., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977). The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542. 63. Jones, T.A., Zou, J.-Y., Cowan, S.W., and Kjeldgaard, M. (1991). Improved methods for building protein models into electron density maps and the location of errors in these models. Acta Crystallogr. A47, 110–119. 64. Sali, A., and Blundell, T.L. (1990). The definition of topological equivalence in homologous and analogous structures: a proce-

65.

66.

67.

68.

69.

dure involving comparison of local properties and structural relationships through dynamic programming and simulated annealing. J. Mol. Biol. 212, 403–428. Mizuguchi, K., Deane, C.M., Johnson, M.S., Blundell, T.L., and Overington, J.P. (1998). JOY: protein sequence-structure representation and analysis. Bioinformatics 14, 617–623. Crennell, S.J., Garman, E.F., Laver, W.G., Vimr, E.R., and Taylor, G.L. (1993). Crystal structure of a bacterial sialidase (from Salmonella typhimurium LT2) shows the same fold as an influenza virus neuraminidase. Proc. Natl. Acad. Sci. USA 90, 9852–9856. Varghese, J.N., Laver, W.G., and Colman, P.M. (1983). Structure of the influenza virus glycoprotein antigen neuraminidase at 2.9 A˚ resolution. Nature 303, 35–40. Datta, S., Mori, Y., Takagi, K., Kawaguchi, K., Chen, Z.W., Okajima, T., Kuroda, S., Ikeda, T., Kano, K., Tanizawa, K., and Mathews, F.S. (2001). Structure of a quinohemoprotein amine dehydrogenase with an uncommon redox cofactor and highly unusual crosslinking. Proc. Natl. Acad. Sci. USA 98, 14268– 14273. Satoh, A., Kim, J.K., Miyahara, I., Devreese, B., Vandenberghe, I., Hacisalihoglu, A., Okajima, T., Kuroda, S., Adachi, O., Duine, J.A., Van Beeumen, J., Tanizawa, K., and Hirotsu, K. (2002). Crystal structure of quinohemoprotein amine dehydrogenase from Pseudomonas putida. Identification of a novel quinone cofactor encaged by multiple thioether cross-bridges. J. Biol. Chem. 277, 2830–3834.