MINIREVIEW
lAIN D CAMPBELL AND CLAUS SPITZFADEN
Building proteins with fibronectin type III modules Fibronectin type III modules are versatile components of many proteins. Recent structures of module pairs show how these modules are joined together. Structure 15 May 1994, 2:333-337
Many proteins, especially in higher organisms, have an obvious modular structure (some examples are shown in Fig. 1). The constituent modules are of various kinds but one of the most common is the type III module (Fn3), first found in fibronectin. It has now been found in many different proteins, in species ranging from bacteria to man [1]. In many respects it is a 'model' module, being relatively small (about 100 residues) having a recognizable 'consensus' sequence and having boundaries that are related to the exon structure of the gene [2,3]. Its main functions seem to be to mediate protein-protein interactions and to act as a 'spacer' to get the required biological function in the right place. In the last few years, several three-dimensional structures of Fn3 modules have been solved. These include isolated Fn3 modules [4,5] and module pairs [6,7]. These modules were all 'dissected' from larger proteins illustrated in Fig. 1. This brief review summarizes some of the structural findings about Fn3 modules and how they affect our understanding of their use in biological systems.
Distribution and functional roles of Fn3 modules
A recent database search for Fn3-related sequences discovered over 300 distinct occurrences in 67 proteins, 60 from animals and 7 from bacteria [1,3]. Comparison of similar proteins from different species shows that Fn3 sequences are relatively invariant in these proteins. This contrasts with the considerable sequence variation between the different Fn3 modules from any one species. It appears that the Fn3 module is a stable and convenient fold, used in different ways to provide both an adaptable functional surface and a spacer. Fn3 modules are found in various biological environments and have a variety of functions. In proteins of higher organisms they are found in extracellular matrix proteins like fibronectin and tenascin as well as in cell surface adhesion molecules like neuroglian and in a variety of different cytokine receptors, such as the growth hormone receptor (HGHr). Structures of Fn3 modules from all these classes of protein have recently been solved (Fig. 1) including, very recently, the first pair of Fn3 modules [7].
Fig. 1. Schematic representation showing the modular nature of some extracellular proteins. The red lines indicate other parts of the amino-acid sequences (not drawn to scale). The proteins tenascin and fibronectin are connected covalently to other similar chains. Membrane-bound proteins are shown attached to a membrane (brown).
( Current Biology Ltd ISSN 0969-2126
333
334
Structure 1994, Vol 2 No 5
Intracellular proteins also contain Fn3 modules; for example, in the muscle-associated molecule, titin [8], the largest protein known (3000 kDa). Titin contains numerous Fn3 and immunoglobulin (Ig) modules that are arranged in arrays of 11 single modules. This 11 module repeat seems to have a role both in the control of muscle filament assembly and as a structural component of the thick filament. The individual domains may have distinct binding properties and the repeat distance could give a precise spatial relationship between those binding sites [8]. This aspect is probably also essential for many large molecules of the kind shown in Fig. 1. For example, neuroglian (Ng), a Drosophila neural recognition molecule consisting of six Ig modules and five Fn3 modules [7], has 28 % identity with the human neural protein L1. It seems unlikely that all of the modules are involved in specific protein interactions and some are probably required only to arrange binding surfaces in space. Fn3 modules have also been found in bacterial extracellular enzymes involved in the decomposition of polymers such as cellulose [1], and in tapeworm antigens [9]. A study of evolutionary relationships unexpectedly revealed that the bacteria had acquired the Fn3 module from an animal source [1,3]. It appears that the bacterial cellulase enzymes employ the versatile binding functions of the Fn3 module to facilitate binding to substrate. A particularly well-characterized example of a functional role for Fn3 modules in eukaryotes involves specific interactions between the cell and the extracellular matrix. The RGD sequence in FnFn3(10) [this notation means the tenth Fn3 module from fibronectin] is an essential determinant for binding of fibronectin to receptors of the integrin family of transmembrane receptors. Short RGD peptides [10], the isolated FnFn3(10) fragment of fibronectin [5] and certain small proteins [11,12] all bind to integrins and efficiently inhibit cell adhesion to a fibronectin matrix. This binding does not account for the receptor specificity of native fibronectin and other adhesion molecules. In native fibronectin,
not only the RGD sequence but also two synergistic adhesion sites, mapped to the eighth and the ninth Fn3 modules, are necessary for full biological activity [13]. These results indicate that receptor binding of fibronectin will depend not only on the conformation and accessibility of the RGD segment, but also on the relative orientation of secondary sites on other Fn3 modules. The recent structure determination of an Fn3 module pair [7] is thus of considerable interest and could aid the interpretation of biochemical studies on fibronectin, provided that the relative Fn3 module orientation is similar in fibronectin.
The structure of the Fn3 module
NMR spectroscopy [5] and X-ray crystallography [4,6,7] have revealed a common molecular topology: seven antiparallel [3-strands are arranged in two sheets A-B-E and C'-C-F-G, enclosing a core of highly conserved hydrophobic residues (Figs 2 and 3). The extent and position of the G-strand varies somewhat in the different molecules and the regular 3-strand may be partially disrupted, as in HGHr, or replaced by stretches of polyproline helix, as in NgFn3(1,2). The Fn3 fold may be compared with the immunoglobulin superfamily topology; in both folds the B-E and C-F-G strands form an invariant core, but in the Ig fold, strands A and C' are interchanged between the two sheets. The Fn3 module does not usually contain disulphide bridges, although one of the domains in neuroglian has a bridge between strands A and G (the Ig module usually has a disulphide bond, see Fig. 2). The Fn3 modules have ellipsoidal shape with approximate overall dimensions 38 x 20 x 25 A. The amino and carboxyl termini are at opposite ends of the folded module but on the same side of the ellipsoid. The functional RGD sequence of the tenth Fn3 module purified from Fn [FnFn3(10)] is found on an exposed loop between the F and G strands of the seven -strand sandwich structure [5] (see Figs 2 and 3). The structure of an RGD sequence bound to an integrin is eagerly
Fig. 2. Alignment of the amino acid sequence and secondary structure of some of the Fn3 and Ig modules shown schematically in Fig. 1. For the double module systems, only the carboxy-terminal module is shown. The Fn3 sequences were aligned after a least square fit superposition of the C and C positions with those of FnFn3(10). Sequence alignments for the Ig modules were taken from [201 and 1231]. Highly conserved residues are emphasized in red. Residues at the inter-module interface with the amino-terminal module, as characterized by inter-module heavy-atom distances of less than 4.5 A (Co-Cc distances < 8 A for HGHr), are shown in magenta. The P-strand assignments in the upper panels are indicated for NgFn3(10) [7] and CD2(2) [20] but are closely similar for the homologous modules.
Fibronectin type III modules Campbell and Spitzfaden
Fig. 3. MOLSCRIPT diagrams of the structures of the tenth Fn3 module of fibronectin [FnFn3(10)1 and the following double modules: neuroglian [NgFn3(1,2)], human growth hormone receptor (HGHr), human T-cell CD4(1,2), rat T-cell CD4(3,4) and rat T-cell CD2. Fn3 and similar modules are shown against a green background. The -strands of the carboxy-terminal modules of the double module structures were superimposed with the corresponding strands of FnFn3(10). Light blue -strands correspond to strands A, B and E. Strands C, C', F and G are in dark blue, strands C" and C' of the Ig type V modules are in grey, the metal ion bound in the module interface of NgFn3(1,2) is yellow. TnFn3(3) is very similar to FnFn3(10). Coordinates for FnFn3(10) and NgFn3(1,2) were provided by Main et al. [51 and Huber et al. [71, respectively. All other coordinates and the secondary structure assignment are from the protein data bank.
awaited but, meanwhile, one must make do with the structures of active RGD-containing proteins. The NMR structures of FnFn3(10) [5] and two other integrinbinding proteins [11,12] indicate that the RGD loops are exposed and disordered in solution. In the X-ray structure of a tenascin Fn3 module (TnFn3), a shorter RGD loop in the same position appears to be ordered [4] but the biological significance of this relatively short RGD loop in intact tenascin is still unclear. The topologies of Fn3 modules are very similar to modules from T-cell surface glycoproteins [the second module of CD2 and the second and fourth module of CD4 (Fig. 3)]. From sequence comparisons, however, these CD modules are clearly members of the Ig superfamily rather than the Fn3 family (Fig. 2). There is a somewhat stronger sequence similarity between Fn3 modules and the cytokine receptor (CR)-type modules, but the cysteine-rich consensus sequence of this module is rather different again and has often been classified as a different module family (for example [14]). Despite the highly similar topologies of some Ig modules to Fn3, CR modules, and bacterial chaperone proteins like PapD [15], the low sequence homology suggests that some of these families might have converged to form similar structures rather than having arisen by divergence from a common ancestor. It appears that the kind of 3-sandwich structure exhibited by these proteins represents a particularly stable fold. This is consistent with observations (see, for example [16]) about topological constraints that restrict the alternatives for stable 3-sheet structures.
Module assembly
Several recent structures give clues to how modules are combined to form an intact modular protein. The way that Ig modules assemble in antibodies has been known for some time (reviewed in [17]). Some other types of module pairs have been determined recently by NMR and it is interesting that examples of both flexible and rigid connections have been found for fibronectin type 1 [18] and complement [19] module pairs respectively. In this review, however, we concentrate on the ways in which the structurally similar Ig and Fn3 modules are joined. Some of the known structures are presented in Fig. 3. These data are based on X-ray structures of the two extracellular T-cell surface molecules CD2 [20] and CD4 [21-23] and the extracellular domains of the growth hormone receptor [6] and neuroglian [7]. Except for NgFn3(1,2), which is an Fn3 module pair, all other systems contain the Fn3 module (as in HGHr) or the topologically-related Ig module [as in CD4(1,2), CD4(3,4) and CD2] in the carboxy-terminal position. In Fig. 3, the Fn3 and the corresponding Ig modules were superimposed with the single FnFn3(10) module as a frame of reference. It is apparent from Fig. 3 that modules may adopt strikingly different relative orientations in a pair, despite the similarity of the individual modules. In Ig and Fn3 modules there is a tendency for a 180 ° rotation of the faces on alternate modules. This is probably because the amino and carboxyl termini are on the same side of the module and a rotation leads to a significant protein surface area at the module interface. The relationship between one module and its neighbour can be
335
336
Structure 1994, Vol 2 No 5
described by a linear displacement and three rotation angles [19]. As can be seen in Fig 3, these parameters, and the absolute size of the interface area, are rather variable for Fn3 and Ig pairs. In Ng(1,2) (which is, to date, the only example of a Fn3 double module pair) the two modules are related by a rotation of 175 ° and meet at a tilt angle of 120 ° . In the other structures these values vary between about 140 ° (in HGHr) and 160 °
[in CD4(1,2)] for the rotation, and about 90 ° (HGHr) and 135 ° (CD2) for the tilt angle. While the long axis dimension of one module is about 37 A, the separations between the centres of gravity vary between 25-32 A depending on the tilt angle and the extension of the linker. The relative positions of the individual modules are stabilized by predominantly hydrophobic residues buried in the module-module interface, surrounded by a fringe of hydrophilic residues. The buried surface area varies widely, between 400A2 in CD2 [20] and 950A2 in CD4(3,4). A peculiar feature of the NgFn3(1,2) interface is a metal-binding site. Under the conditions of crystallization, the site is occupied by a sodium ion in an approximately square pyramidal co-ordination sphere. Considering the large buried surface (830A2) of the module-module interface of NgFn3(1,2), it seems unlikely that this metal binding is essential for structural stability. The residues involved in the binding also do not seem to be generally conserved among pairs of Fn3 modules, although similar residues are present in the neuroglian homologue L1. To compare module-module interfaces, it is of interest to consider residues involved at the interface, other than the direct linker region joining strand G of the amino-terminal module to the A-strand of the carboxy-terminal module. Analysis of interface residues on the carboxy-terminal module (as defined by module-module heavy-atom distances of up to 4.5 A) reveal them to be restricted to the B-C and F-G loop regions. In contrast, the interface on the aminoterminal module can involve both loops and 13-strands and the sequence position of the contact residues is different in different structures. In NgFn(1,2) the interface on the first module is composed exclusively of loops C-D and E-F, whereas the modules of the CR and Ig family employ residues on one or several 13-strands. In the most extended structure, CD2, only residues of the A-strand contribute to the interface (apart from the linker). In the more compact CD4 structures the pattern of interaction is more complicated. In CD4(1,2) residues along the entire length of the A-strand form close heavy-atom contacts, whereas in CD4(3,4) 3-strands A, C and F, as well as the E-F loop belong to the contact sphere. As the total buried surface areas of CD4(1,2) and CD4(3,4) are comparable (880 A2 and 950 A2, respectively) it appears that the interaction surface is distributed over different parts of the respective modules, resulting in a slightly different relative orientation of the modules in the two related structures.
One interesting aspect of the recent structure determinations of Fn3 modules is that they have provided insights into the assembly of entire mosaic proteins containing repeated arrays of modules. A molecular model of the 15 successive Fn3 modules of tenascin, derived by Leahy et al. [4] from a combination of electron microscopy data and the crystal structure of a single Fn3 module, provided a preliminary picture of what a mosaic protein might look like. In this study, successive modules were tilted to reconcile the 37A length of a single Fn3 module with the average 32 A module spacing derived from electron microscopy. Since the termini of Fn3 modules are on the same face of the molecule, a rotation of 180 ° was assumed to provide for a maximum interface area. Interestingly, the relative orientation of the two modules in the crystal structure of NgFn(1,2) is remarkably similar to this prediction. It appears from this close agreement of modelling studies and experimental results for two different proteins that some general rules may be derived, at least in the case of Fn3 modules, for the prediction of the orientation of successive module pairs. However, in light of the considerable structural diversity found for the double modules of the Ig family (Fig.3), it seems that considerable caution should be exercised when making such predictions. This uncertainty should be solved on determination of the structure of a fragment of human fibronectin, encompassing the seventh to the tenth Fn3 modules, that has recently been crystallized. Good quality diffraction data have been obtained from this material, which was produced in Escherichia coli with selenomethionine incorporated for phase determination [24]. Alignments of Fn3 sequences or Ig sequences ([5], Fig. 2) and of the modules in Fig. 3 do not show any phylogenetic conservation of the residues in the module-module interface. However, the two highly similar crystal structures from CD4 module pairs demonstrate that similar module-module interactions may be achieved without a recognizable consensus of interface residues. The NgFn3(1,2) metal-binding site is conserved in the homologous human protein L1 but not in the other Fn3 modules of Ng, and this suggests that a conservation of interface residues may be related more to a common biological function of particular module pairs than to a structural requirement. It remains to be seen how mutations of interface residues of Fn3 modules will affect the biological activity, for example integrin binding by FnFn3(9,10).
Conclusion
In conclusion, structural and functional studies of the kind of molecules shown in Fig. 1 have made remarkable progress recently. This is in spite of these molecules being glycosylated, membrane bound and relatively flexible. Many of the functional patches have now been identified by assay of fragments of the intact
Fibronectin type III modules Campbell and Spitzfaden
molecule and by site-directed mutagenesis. Most of the module structures are now known at atomic resolution. This allows other members of a module family to be predicted quite well. While it will still be necessary to be careful when modelling the structure of the intact molecules because of uncertainties about module connections, recent structural information on module pairs combined with electron microscopy are beginning to give some guidelines about module assembly.
12.
Acknowledgements. This is a contribution from the Oxford Centre for Molecular Sciences which is supported by the SERC and MRC. IDC acknowledges support from the Wellcome Trust and CS is funded by an EEC Fellowship.
16.
13
14. 15.
17. 18.
References 1. 2. 3. 4
5.
6. 7. 8. 9. 10. 11.
Bork, P. & Doolittle, R.F. (1992). Proposed acquisition of an animal protein domain by bacteria. Proc. Natl. Acad Sci. USA 89, 89908994. Patthy, L. (1991). Modular exchange principles in proteins. Curr. Opin. Struct. Biol. 1, 351-361. Doolittle, RF. & Bork, P. (1993). Evolutionarily mobile modules in proteins. Sci. Am. 269, 50-56. Leahy, D.M., Hendrickson, W.A, Aukhil, I. & Erickson, H.P. (1992). Structure of a fibronectin type III domain from tenascin phased by MAD analysis of the selenomethionyl protein. Science 258, 987-991. Main, L.M., Harvey, T.S., Baron, M., Boyd, J. & Campbell, I.D. (1992). The three-dimensional structure of the tenth type III module of fibronectin: an insight into RGD-mediated interactions. Cell 71, 671-678. deVos, AM. Ultsch, M. &Kossiakoff, AA (1992). Human growth hormone and extracellular domain of its receptor. Crystal structure of the complex. Science 255, 306-312. Huber, AH., Wang, Y.E., Bieber, AJ. & Bjorkman, PJ. (1994). Crystal structure of tandem type III fibroglian at 2.1 A. Neuron, in press. Labeit, S. Gautel, M., Lakey, A & Trinick, J. (1992). Towards a molecular understanding of titin. EMBO J 11, 1711-1716. Bork, P. & Doolittle, R.F. (1993). Fibronectin type II modules in the receptor phosphatase CD45 and tapeworm antigens. Protein Sci 2, 1185-1187. Yamada, K.M. (1991). Adhesive recognition sequences. J Biol. Chem. 266, 12809-12812. Saudek, V., Atkinson, RA & Pelton, J.T. (1991). The three dimensional structure of echistatin, the smallest active RGD protein. Biochemistry 30, 7369-7372.
19. 20. 21. 22. 23. 24.
Adler, M., Lazarus, R.A, Dennis, M.S. & Wagner, G. (1991). So lution structure of kistrin, a potent platelet aggregation inhibitor and GPIIbIIla antagonist. Science 253, 445-448. Nagai, T., Yamakawa, N., Aota, S., Yamada, S.S., Akiyama, S.K. & Yamada, K.M. (1991). Monoclonal antibody characterisation of two distant sites required for function of the central cellbinding domain of fibronectin in cell adhesion, cell migration and matrix assembly. J Cell. Biol. 114, 1295-1305. Barclay, AN., et al., & Williams, A.F. (1992). The Leucocyte Antigen Facts-book. Academic Press, New York. Holmgren A, Kuehn, MJ., Brainden, C.-I. and Hultgren, SJ. (1992). Conserved immunoglobulin-like features in a family of periplasmic pilus chaperones in bacteria. EMBOJ. 11, 1617-1622. Woolfson, D.N., Evans, P.A, Hutchinson, E.G. &Thornton, J.M. (1993). Topological and stereochemical restrictions in p-sandwich protein structures. Protein Eng. 6, 461-470. Jones, E.Y. (1993). The immunoglobulin superfamily. Curr. Opin. Struct. Biol. 3, 846-852. Williams, M.M., Phan, I., Harvey, T.S., Rostagno, A, Gold, LI. & Campbell, I.D. (1994). Solution structure of a pair of fibronectin type 1 modules with fibrin binding activity. J. Mol. Biol. 235, 1302-1311. Barlow, P.N., et alt, & Campbell, I.D. (1993). Solution structure of a pair of complement modules by nuclear magnetic resonance. J. Mol. Biol. 232, 268-284. Jones, E.Y., Davis, SJ., Williams, A.F., Harlos, K. & Stuart, D.I. (1992). Crystal structure at 2.8A resolution of a soluble form of the cell adhesion molecule CD2. Nature 360, 232-239. Wang, J., et al., & Harrison, S.C. (1990). Atomic structure of a fragment of human CD4 containing two immunoglobulin-like domains. Nature 348, 411-418. Ryu, S-E., et al., & Hendrickson, W.A (1990). Crystal structure of an HIV-binding recombinant fragment of human CD4. Nature 348, 419-426. Brady, R.L., et al., & Barclay, AN. (1993). Crystal structure of domains 3 and 4 of rat CD4: relation to the NH2-terminal domains. Science 269, 979-983. Leahy, DJ., Erickson, H.P., Ikramuddin, A, Joshi, P. & Hendrickson, W.A (1994). Crystallization of a fragment of human fibronectin: introduction of methionine by site directed muta genesis to allow phasing via selenomethionine. J. Mol. Biol., in press.
lain D Campbell and Claus Spitzfaden, Department of Biochemistry, University of Oxford, South Parks Road, Oxford OXI 3QU, UK.
337