Large conformational fluctuations of the multi-domain xylanase Z of Clostridium thermocellum

Large conformational fluctuations of the multi-domain xylanase Z of Clostridium thermocellum

Journal of Structural Biology 191 (2015) 68–75 Contents lists available at ScienceDirect Journal of Structural Biology journal homepage: www.elsevie...

1MB Sizes 1 Downloads 41 Views

Journal of Structural Biology 191 (2015) 68–75

Contents lists available at ScienceDirect

Journal of Structural Biology journal homepage: www.elsevier.com/locate/yjsbi

Large conformational fluctuations of the multi-domain xylanase Z of Clostridium thermocellum a,⇑ _ Bartosz Rózycki , Marek Cieplak a, Mirjam Czjzek b a

Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, 02-668 Warsaw, Poland Sorbonne Universités, UPMC, Université Paris 06, and Centre National de la Recherche Scientifique, UMR 8227, Integrative Biology of Marine Models, Station Biologique de Roscoff, CS 90074, F-29688 Roscoff cedex, Bretagne, France b

a r t i c l e

i n f o

Article history: Received 10 March 2015 Received in revised form 15 April 2015 Accepted 22 May 2015 Available online 23 May 2015 Keywords: Multi-domain proteins Intrinsically disordered proteins Cellulosome Small angle X-ray scattering Coarse-grained simulations Conformational ensemble

a b s t r a c t The cellulosome is a multi-enzyme machinery which efficiently degrades plant cell-wall polysaccharides. The multiple domains of the cellulosome complexes are often tethered to one another by intrinsically disordered regions. The properties and functions of these disordered linkers are unknown to a large extent. In this work, we study the conformational variability of one component of the cellulosome – the multi-domain xylanase Z (XynZ) of Clostridium thermocellum. We use a coarse-grained protein model to efficiently simulate conformations of the enzyme. Our simulation results are in excellent agreement with data from small angle X-ray scattering experiments, which validates the simulation outcome. Both in the presence and absence of the cohesin domain, the XynZ enzyme appears to be flexible in the sense that it takes various compact and extended conformations. The physical interactions between the individual domains are rather weak and transient, and the XynZ enzyme is held together mainly by the flexible linkers connecting the domains. The end-to-end distance distributions for the flexible linkers can be rationalized by the excluded volume effect. Taken together, our results provide a detailed picture of the conformational ensemble of the XynZ enzyme in solution. Ó 2015 Elsevier Inc. All rights reserved.

1. Introduction Cellulosome complexes are multi-enzyme machines present on the surface of many cellulolytic microorganisms (Shoham et al., 1999; Bayer et al., 2004). They efficiently degrade plant cell wall polysaccharides – including cellulose – one of the most abundant organic polymers on Earth (Bayer et al., 2004, 2007). The major challenges in the current research on the cellulosomes is to understand how the cellulosome complexes are constructed and how their molecular architectures and activities are related (Bayer et al., 2007; Vazana et al., 2013). Importantly, the detailed structural characterization of the cellulosome complexes can provide a platform for biotechnological and nanotechnological applications, including prospects for producing biofuels from plant-cell-wall biomass (Bayer et al., 2007). The multiple subunits of cellulosomes are composed of numerous functional domains which interact with each other and with the cellulose substrate. The scaffoldin subunit selectively integrates the various enzymatic subunits into the complex. This task

⇑ Corresponding author. Tel.: +48 22 116 3265. _ E-mail address: [email protected] (B. Rózycki). http://dx.doi.org/10.1016/j.jsb.2015.05.004 1047-8477/Ó 2015 Elsevier Inc. All rights reserved.

is accomplished by the tight binding of cohesin domains present in the scaffoldin subunit to dockerin domains present in each of the enzymatic subunits. The functional domains of the cellulosome complexes are often connected together into subunits by flexible linker peptides (Ossowski et al., 2005; Hammel et al., 2005). The functions of the linkers are not quite clear but it appears that the cellulosome complexes have to be flexible to some degree in order to explore the environment and bind the multiple enzymatic subunits to cellulose fibers efficiently. There is no single method that could yield atomic structures of the full-length cellulosome complexes: they are not directly accessible to X-ray crystallography due to the presence of the disordered and flexible segments (although their constituent domains can be crystallized separately); they are also not accessible to protein NMR because of their large sizes; and their inherent flexibility and the lack of symmetries make them practically inaccessible to cryoEM. Therefore, to delineate the representative conformations of such macromolecular assemblies as the cellulosomes, various _ complementary methods must be combined (Rózycki and Boura, 2014). In particular, small angle X-ray scattering (SAXS) in solution is increasingly used to complement protein crystallography in structural studies on multi-protein complexes, including the cellulosome complexes (Ossowski et al., 2005; Hammel et al., 2005;

B. Róz_ ycki et al. / Journal of Structural Biology 191 (2015) 68–75

Currie et al., 2013, 2012; Czjzek et al., 2012). In addition, these structural methods combined with molecular dynamics or other simulational methods can lead to insights into dynamic properties and conformational heterogeneity of the protein complexes under _ study (Rózycki and Boura, 2014). The pioneering work using SAXS to study the modular architecture of full-length enzymes was done already in 1988 (Abuja et al., 1988). Several years later, analogous experiments were conducted on multi-modular cellulases (Receveur et al., 2002) and cellulosomal complexes (Hammel et al., 2005), implementing the ‘dissect and build’ approach. These studies have highlighted the importance of the linker structure that is extended in the modular enzymes and tends to pleat upon protein–protein complex formation of different cellulosomal components. More recently, the crystal structure of the CohI-X-DocII segment form the C-terminus of the C. thermocellum scaffoldin subunit, complexed on either side by the cognate DocI or CohII domains, revealed four distinct conformations of the internal scaffoldin CohI-X linker that were trapped in the crystal structure (Currie et al., 2012). These findings have led the authors to suggest that this linker region is both flexible and unstructured in solution (Currie et al., 2012). The XynZ protein is a single polypeptide chain that comprises four functional domains (CE1, CBM6, dockerin-I, and GH10) (Grepinet et al., 1988; Fontes et al., 1995), which are connected in series with three linkers, see Fig. 1. The three linkers have been predicted to be unstructured and flexible in solution (Czjzek et al., 2012). The N-terminal CE1 domain facilitates the hydrolysis of the ferulate ester groups which are involved in the cross-linking between hemicelluloses and between hemicellulose and lignin. The CBM6 domain binds carbohydrates and, thus, facilitates the adhesion of the XynZ enzyme to plant cell walls polysaccharides. The type I dockerin domain binds the complementary cohesin

A

B

69

domain. The C-terminal GH10 domain is a xylanase catalytic module that degrades the linear polysaccharide beta-1,4-xylan into xylose, thus breaking down hemicellulose. The XynZ–cohesin complex is formed when the type I dockerin domain binds its cohesin. Atomic structures of the individual domains of the XynZ–cohesin complex are known but the behavior and roles of the linkers are somewhat unclear. Attempts to crystallize the full-length XynZ enzyme have failed, probably due to its intrinsic flexibility and weak interactions between the constituent domains. However, SAXS data for the full-length enzyme (both with and without the cohesin domain) are available. Based on the SAXS data and the high-resolution structures of the constituent domains, an atomic model of the XynZ–cohesin complex has been derived (Czjzek et al., 2012). This model, however, represents only a single ‘snapshot’ of the flexible protein complex. The question we pose in this study is about the degree of conformational fluctuations of the multi-domain XynZ enzyme in solution. To explore physical conformations of XynZ in solution, we use a simulation model that has been developed to study large multi-protein assemblies on the basis of residue-level coarse-graining, a transferable energy function, and enhanced sampling methods (Kim and Hummer, 2008). Recently, this model has been successfully applied to systems ranging from ESCRT pro_ tein complexes (Rózycki et al., 2011; Boura et al., 2011, 2012) to multi-domain protein kinases (Leonard et al., 2011) and kinases in dynamic complexes with phosphatases (Francis et al., 2011b,a). In these studies, simulations have been combined with SAXS, FRET, and pulsed EPR experiments to identify ensembles of protein conformations. The XynZ–cohesin complex is similar to the aforementioned protein systems in that it comprises both well-folded domains and disordered linkers peptides. Also the size of the XynZ–cohesin complex is comparable to that of the ESCRT complexes (Boura et al., 2011, 2012). Therefore, the methodology we have developed in our earlier studies on multi-domain proteins _ (Kim and Hummer, 2008; Rózycki et al., 2011; Boura et al., 2011, 2012; Leonard et al., 2011; Francis et al., 2011a,b) is directly applicable to study the XynZ enzyme. In this study we first validate our results of simulations of XynZ using the experimental SAXS data. We next characterize the conformational variability of the XynZ enzyme, both in the presence and absence of the cohesin domain. In particular, we find that the distributions of the end-to-end distance of the disordered linkers are almost unaffected by the presence of the cohesin domain. We also rationalize the range of extensions of the linkers by exclude volume effects. Our results thus shed light on the properties and functions of the disordered linkers present in the XynZ enzyme.

2. Methods 2.1. Simulations

Fig.1. Molecular architecture of the XynZ–cohesin complex. (A) Schematic of the domain structure of the XynZ–cohesin complex. The location of the domain boundaries in the sequence are indicated. The three linkers connecting the domains are denoted by L1, L2, and L3. (B) Conformations of the XynZ–cohesin complex that individually fit the experimental SAXS data. The color code is as in panel A: CE1 in blue, CBM6 in gray, dockerin in orange, cohesin in green, GH10 in tan, flexible linkers in red. These conformations correspond to the red points in Fig. 3(B).

To sample physical conformations of the XynZ–cohesin complex in solution, we use a coarse-grained model equipped with a transferable energy function that has been developed to simulate protein binding. In the framework of this model, amino acid residues are represented as spherical beads centered at the Ca atoms. The interactions between the residue beads are described by amino-acid dependent pair potentials and Debye-Hückel-type electrostatics. Folded protein domains are treated as rigid bodies whereas flexible linker peptides connecting the domains are represented as polymers of amino acid beads with bending, stretching and torsion potentials. The two free parameters of the interaction energy – which scale the pair potentials relative to the known electrostatic interactions – have been determined by fitting the

B. Róz_ ycki et al. / Journal of Structural Biology 191 (2015) 68–75

70

osmotic second virial coefficient of lysozyme and the binding affinity of the ubiquitin-CUE complex. The basic energy unit is given by kB T 0 , where kB is the Boltzmann constant and T 0 ¼ 300 K is the room temperature. For a series of protein complexes, the binding affinities estimated from the ensemble of simulated structures have been found to agree quantitatively with experiment (in most cases within a factor of four, corresponding to an error in the overall free energy of less than 1 kcal/mol). A detailed description of the model can be found in Kim and Hummer (2008). Here, the rigid domains are CE1 (PDB code IJJF, residues 1 to 255 in XynZ), CBM6 (PDB code 1GMM, residues 268 to 393 in XynZ), cohesin/dockerin complex type I (PDB code 1OHZ, residues 400 to 461 in XynZ, and residues 1 to 140 in the cohesin), and GH10 (PDB code 1XYZ, residues 486 to 806 in XynZ) (Czjzek et al., 2012). The flexible linkers comprise residues 256 to 267 (linker between CE1 and CBM6), 394 to 399 (connects CBM6 with the cohesin/dockerin complex type I), and 462 to 485 (linker between the cohesin/dockerin complex and the GH10 domain) in XynZ. We performed extensive Monte Carlo (MC) simulations of the coarse-grained model. The basic MC steps were rigid body translational and rotational moves on each domain. For flexible linker peptides, in addition to local MC moves on each residue, crank shaft moves were employed to enhance sampling. The protein conformations were saved every 5000 MC steps, which resulted in an ensemble of N ¼ 10; 000 conformations for further analysis. In order to check that the simulation results were not biased by the initial conformation, we performed two independent simulations that were started with different conformations.

2.2. Characterization of the shapes and sizes of the simulated conformations The characteristic sizes of the simulated conformations can be described by the radius of gyration

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M u1 X 2 ri  ~ r cm Þ Rg ¼ t ð~ M i¼1

ð1Þ

and the maximum extension

  Dmax ¼ max~ ri  ~ rj  ði;jÞ

ð2Þ

2.3. SAXS intensity calculation for coarse-grained protein models In general, computing scattering intensity profiles requires all-atom structures as input. The most widely distributed program for computing SAXS intensity profiles from atomic structures of proteins is CRYSOL (Svergun et al., 1995). Its direct application is to verify whether a given protein structure can faithfully represent the protein conformation in solution. We have recently developed a simple method to compute scattering intensity profiles of proteins using only the coordinates of their Ca atoms as input _ et al., 2011). In the scattering calculations we represent (Rózycki single amino acids by spherical beads centered at the Ca atoms, which is a reasonable approximation because the spatial resolution of biomolecular SAXS is much lower than the size of single amino acids. As in the all-atom scattering calculations (Svergun et al., 1995), we take the solvent electron density to be 0.334 e/Å3, where e is the electron charge, and treat the electron density of the hydration shell as a free parameter which can be varied between x ¼ 0 (meaning no hydration shell) and x ¼ 0:03 e/Å3. In a recent structural study on cellulosome complexes (Czjzek et al., 2012), an all-atom model of the XynZ–cohesin complex has been derived: The atomic structures of the constituent domains (CE1, CBM6, dockerin-I, GH10, and cohesin) have been connected with linker peptides and organized in space in such a way that the resulting model of the XynZ–cohesin complex, M, gives a scattering intensity profile IM ðqÞ that fits the experimental SAXS data Iexp ðqÞ. Model M represents a single ‘snapshot’ of the flexible XynZ–cohesin complex. Importantly, it also allows us to further test our algorithm for computing the SAXS intensity profiles of proteins using only the coordinates the Ca atoms, see Fig. 2. To select the optimal value of the hydration shell election density, x, in the scattering calculations, we computed the SAXS intensity profile of model M using CRYSOL (Svergun et al., 1995) (in this calculation, the input were all atoms of model M) and our program _ et al., 2011) (in this case, the input for the calculation (Rózycki were only the coordinates of the Ca atoms of model M). Fig. 2 shows that the two programs give practically identical outputs if we take x ¼ 0:005 e/Å3 in the Ca scattering calculation. We therefore used this particular value of x to compute the SAXS intensity profiles of all the simulated conformations. As a result, we could compute the average intensity profile

Isim ðqÞ ¼

N 1X Ik ðqÞ N k¼1

ð4Þ

where ~ ri and ~ rj are the coordinate vectors of Ca atoms i and j, respectively, M is the total number of amino acid residues in the XynZ–cohesin complex (M ¼ 946) or in the free XynZ enzyme (M ¼ 806), P ~ and ~ r cm ¼ M1 M j¼1 r j is the vector that describes the location of the center of mass. Another useful quantity is a distortion parameter, w, which pffiffiffiffiffiffiffiffiffiffiffi depends on all three main radii, Rk ¼ J k =M , that are associated with the eigenvalues J k of the tensor of inertia of a given conformation (Sikora et al., 2011). These eigenvalues, and thus the corresponding radii, describe the instantaneous shape of the protein. Using the convention that R1 is the smallest radius and R3 is the largest one, the distortion parameter is defined as

DR w¼  R

ð3Þ

 ¼ ðR1 þ R3 Þ=2 and DR ¼ R2  R.  Spherical shapes correwhere R spond to w  0. Elongated cigar-like shapes yield substantial positive values of w. Substantial negative values of w indicate pancake-like shapes.

Fig.2. Scattering intensity profile of the XynZ–cohesin model (Czjzek et al., 2012) computed using CRYSOL (Svergun et al., 1995) with the all-atom model M and experimental SAXS data as input (solid curve in magenta), and our program with x ¼ 0:005 e/Å3 and only the Ca atoms of model M as input (black dashed curve). Taking values of x that are either smaller or larger than 0.005 e/Å3 leads to worse agreement with the CRYSOL calculation outcome.

B. Róz_ ycki et al. / Journal of Structural Biology 191 (2015) 68–75

that characterizes a particular simulation run. Here, Ik ðqÞ denotes the intensity profile that corresponds to the kth conformation as

71

A

4

generated in the simulation, and N ¼ 10 is the number of conformations obtained in the simulation. 3. Results and discussion 3.1. XynZ–cohesin complex To sample physical conformations of the XynZ–cohesin complex in solution, we performed simulations of the protein complex as described in the Methods section. For each of the XynZ–cohesin conformations that were generated in the simulations, we computed the scattering intensity profile Ik ðqÞ, where the index k ¼ 1; . . . ; N labels the simulated conformations. We next compared the average intensity profile Isim ðqÞ as given by Eq. (4) with the experimental SAXS intensity profile Iexp ðqÞ. Fig. 3A shows that Iexp ðqÞ is in excellent agreement with Isim ðqÞ. Two remarks are in place here: (i) The experimental SAXS data were not used as input to our simulations. They were used here to validate the simulation outcome. (ii) The computed Isim ðqÞ curves represent the scattering intensity profile averaged over N ¼ 104 conformations of the XynZ–cohesin complex generated in the simulations. The intensity profiles of individual conformations, Ik ðqÞ, do not necessarily fit the experimental SAXS data. To characterize the conformational variability of the XynZ–cohesin complex in solution, we measured its characteristic dimensions as functions of the simulation progression. Specifically, for each of the simulated conformations we computed the radius of gyration, Rg , and the maximum extension, Dmax , as given by Eqs. (1) and (2), respectively. Fig. 3(B) shows Dmax versus Rg as obtained in the simulations. Here, each of the data points represents a simulational conformation. The radius of gyration varies between 31 and 64 Å with the average value of hRg i  45 Å, and the maximum extension changes between 95 and 195 Å with the average at hDmax i  140 Å. Visual inspection of the conformations reveals that the XynZ–cohesin complex attains various compact and extended conformations. In the compact conformations with small Rg and Dmax , all the constituent domains make extensive contacts with at least one neighboring domain. In the extended conformations with large Rg and Dmax , both the CE1 and GH10 domains are spatially separated from all other domains, and the protein complex is held together primarily by the flexible linkers. One extremely compact and one very extended conformation are marked in Fig. 3(B) with purple and green dots, respectively. Neither the compact or extended conformations fit the experimental SAXS data, as indicated by the purple and green curves in Fig. 3(C). We could identify several conformations that individually fit the experimental SAXS data quite well. They are depicted in Fig. 1(B) and represented by red squares in Fig. 3(B). These optimal conformations are similar in size (Rg  45 Å and Dmax  140 Å) but they differ from one another in that the relative orientations of the individual domains are somewhat different. Interestingly, in all of these conformations, the GH10 domain makes no direct contacts with any other domain of the XynZ–cohesin complex. Thus none of these conformations can represent a stable conformation in solution. This observation indicates that the GH10 domain is not held in one particular orientation relative to the protein complex. We obtained various compact and extended conformations that jointly fit the experimental SAXS data very well. The conformations that individually fit the experimental data are neither truly compact nor very extended, see Fig. 1(B). Their size is basically identical to the ensemble-averaged size of the XynZ–cohesin complex (hRg i  45 Å and hDmax i  140 Å). Therefore, these individual conformations are representative of the structural ensemble but they

B

C

Fig.3. Results of simulations of the XynZ–cohesin complex. (A) Comparison of experimental SAXS results Iexp ðqÞ (black dots) with the scattering intensity profile Isim ðqÞ obtained in the simulations (red curve). The experimental SAXS data are taken from Czjzek et al. (2012). The theoretical Isim ðqÞ curve was computed using Eq. (4) with N ¼ 104 conformations of the XynZ–cohesin complex. (B) The maximum extension versus the gyration radius of the simulated conformations. The purple and green dots mark one compact and one extended conformation, respectively. The red squares indicate conformations that individually fit the experimental SAXS data. These conformations are depicted in Fig. 1(B). The corresponding intensity profiles, Ik ðqÞ, are shown in panel (C) in red. The scattering intensity profiles that correspond to the compact and extended conformations are shown in panels (C) in purple and green, respectively. For comparison, the experimental SAXS data are shown as black dots in panel (C).

do not reflect the large conformational changes occurring in the system. These large conformational fluctuations, however, could be relevant to the biological function of the cellulosome complexes. To characterize the possible shapes of a the XynZ–cohesin complex, it is convenient to introduce two parameters. The first one is the end-to-end distance, Dee , between the N-terminus of the CE1 domain and the C-terminus of the GH10 domain. It is simply the end-to-end distance of XynZ. The second one is the distortion parameter, w, as given by Eq. (3). Spherical shapes correspond to w  0. Elongated cigar-like shapes yield substantial positive values of w. Substantial negative values of w indicate pancake-like shapes. Fig. 4 shows that Dee of the XynZ–cohesin complex changes between 10 and 140 Å, and the values of w vary between 0.1 and 0.6. The conformations with large Dee or Dmax are typically elongated and exhibit substantial positive values of w. The conformations with w  0:1 are somewhat flattened. There is also a

72

B. Róz_ ycki et al. / Journal of Structural Biology 191 (2015) 68–75 Table 1 Fractions of the numbers of conformations in which two specified domains are in contact. The errors indicate differences between two independent simulation runs.

Fig.4. The end-to-end distance versus the distortion parameter as defined by Eq. (3) for the XynZ–cohesin complex. Each of the data points corresponds to a simulated conformation. Characteristic conformations are depicted in the same color scheme as in Fig. 1. The elongated conformations are characterized by substantial positive values of w and large values of Dee . The conformations with globular shapes correspond to w  0. Their end-to-end distance varies between 10 and 120 Å. The conformations with w  0:1, which are observed rather rarely, appear to be flattened to some degree. As in Fig. 3, the red squares indicate the six conformations shown in Fig. 1 that individually fit the experimental SAXS data. Their shapes are somewhat elongated and correspond to w  0:3. The most frequent conformations correspond to the data points localized inside the orange contour line. The numbers of data points inside and outside the orange line are approximately equal.

variety of globular shapes with w  0 that exhibit Dee between 10 and 120 Å. To characterize the spatial arrangement of the multi-domain XynZ–cohesin complex, we analyzed the inter-domain contacts. We have assumed that two residues are in contact if their Ca atoms are separated by less than 8 Å. The resulting map of transient contacts is shown in Fig. 5(A). Interestingly, the most frequent contacts, which are marked as red dots in Fig. 5, occur between the dockerin and CBM6 domains. The corresponding binding mode

Domains in contact

XynZ with cohesin

XynZ without cohesin

CBM6-dockerin CBM6-cohesin CE1-CBM6 GH10-cohesin CE1-cohesin CE1-dockerin GH10-dockerin CBM6-GH10 CE1-GH10

0.65 ± 0.02 0.31 ± 0.01 0.20 ± 0.01 0.19 ± 0.03 0.11 ± 0.01 0.11 ± 0.03 0.09 ± 0.02 0.08 ± 0.01 0.07 ± 0.01

0.59 ± 0.08 – 0.23 ± 0.02 – – 0.37 ± 0.03 0.30 ± 0.03 0.15 ± 0.02 0.14 ± 0.02

involves residues G348, C366 and S367 in the CBM6 domain, and L400, Y454 and L456 in the dockerin domain. If there is at least one contact between residues in two separate domains, we take the two domains to be in contact. Table 1 shows the probabilities of contacts between the different domains. For example, CBM6 is in contact with dockerin and cohesin, respectively, in about 65% and 31% of all the simulated conformations. In contrast, the domains that contact each other most rarely are CE1 and GH10. The XynZ enzyme contains three flexible linkers that connect four domains, see Fig. 1(A). An important quantity that characterizes the extension of the linkers is their end-to-end distance dee . Using the XynZ–cohesin conformations generated in the simulations, we calculated the end-to-end distance distributions, Pðdee Þ, for the flexible linkers present in the complex, see Fig. 6. We note that these distributions appear relatively broad, with long ‘tails’, which reflects a dynamical behavior of the XynZ–cohesin complex. The Pðdee Þ distributions can be characterized by their average 2

hdee i and variance ree ¼ hðdee  hdee iÞ i1=2 . For the shortest linker L2, which comprises residues 394 to 399, i.e. the residues between the CBM6 and dockerin domains, we obtain hdee i ¼ 11:7 Å and ree ¼ 2:3 Å. For the 12-residue linker L1 connecting CE1 and CBM6 (residues from 256 to 267) we get hdee i ¼ 21:1 Å and ree ¼ 4:6 Å. For the longest, 24-residue linker L3 between dockerin and GH10 (residues 462 to 485) we obtain hdee i ¼ 34:3 Å and ree ¼ 8:6 Å. The results are summarized in Table 2. Interestingly, the average end-to-end distance per residue, hdee i=n, decreases with the number of residues in the linker, n, giving about 2 Å for L2, 1.8 Å for L1, and 1.4 Å for L3. This result implies that shorter linkers are effectively more extended.

Fig.5. The map of inter-domain contacts. Panels (A) and (B) correspond to the presence and absence of the cohesin domain, respectively. The boundaries of the domains are indicated by the horizontal and vertical lines. The colors correspond to the frequency of residue–residue contacts. The blue, orange and red dots correspond to the contact frequency between 0.1% and 1%, between 1% and 10%, and between 10% and 100%, respectively. The black dots in panel (A) indicate the contacts between the cohesin and dockerin domains, which have the contact frequency of 100% as the cohesin–dockerin complex is treated as one rigid unit in the simulations, see Methods for details.

B. Róz_ ycki et al. / Journal of Structural Biology 191 (2015) 68–75

Fig.6. Distributions of the end-to-end distance for the three flexible linkers L1, L2 and L3. The blue and red lines, respectively, correspond to the presence and absence of the cohesin domain in the simulation system. The black dashed lines correspond to free linker peptides.

Table 2 The average end-to-end distance, hdee i, and the root-mean-square end-to-end 2 distance, hdee i1=2 , for linkers L1, L2 and L3. The number of amino acid residues in linkers L1, L2 and L3 is n ¼ 12; n ¼ 6 and n ¼ 24, respectively. linker

L1 L2 L3

XynZ with cohesin

XynZ without cohesin 2

2

Free peptides 2

Random coil 2

hdee i

hdee i1=2

hdee i

hdee i1=2

hdee i

hdee i1=2

hdee i1=2

21.1 Å 11.7 Å 34.3 Å

21.6 Å 11.9 Å 35.3 Å

21.2 Å 12.1 Å 32.0 Å

21.7 Å 12.3 Å 32.9 Å

18.3 Å 11.4 Å 30.3 Å

19.0 Å 11.7 Å 31.8 Å

13.2 Å 9.3 Å 18.6 Å

3.2. XynZ without the cohesin domain We also performed simulations of the XynZ enzyme without its cohesin domain. In Fig. 7(A) we compare the results of the simulations (red curves) with the experimental SAXS spectra (black data points) taken from Czjzek et al. (2012). The computed, ensemble-averaged scattering profiles, Isim ðqÞ, are in good agreement with the experimental SAXS profiles, Iexp ðqÞ, although small deviations can be noticed at q  0:16/Å. The radius of gyration varies between about 28 and 65 Å with hRg i  42 Å, and the maximal extension fluctuates between about 85 and 190 Å with hDmax i  130 Å, see Fig. 7(B). The multi-domain protein XynZ is observed to be as flexible as the XynZ–cohesin complex. The distributions of dee for linkers L1 and L2 are almost unaffected by the presence of the cohesin domain, see Fig. 6 and Table 2. However, the distribution of dee for linker

73

L3 is shifted by about 2.5 Å to smaller distances when the cohesin domain is not bound to the dockerin domain of XynZ. The average end-to-end distance per residue, hdee i=n, is 2 Å for L2, 1.8 Å for L1, and 1.3 Å for L3. It decreases with the number n of residues forming the given linker. This observation means that the shorter the linker is the more extended conformations it adopts. The absence of the cohesin domain affects the transient contacts between the four domains of XynZ, see Fig. 5(B) and Table 1. In particular, due to the lack of contacts with the cohesin domain, the dockerin domain has a larger area available for interactions with other domains. As a result, dockerin interacts more frequently with CE1 and GH10. Interestingly, also GH10 shows a tendency to interact somewhat more frequently with other domains of XynZ when the cohesin domain is absent. These interactions result in the apparent shortening of the 24-residue linker L3 which connects GH10 with dockerin, see Fig. 6. 3.3. Free linker peptides We also performed simulations of free linker peptides, i.e., single peptides L1, L2 and L3 without any domains at their ends. The resulting distributions of the end-to-end distance, Pðdee Þ, are shown in Fig. 6. The average distance and the root-mean-square distance for these distributions are given in Table 2. The distributions Pðdee Þ for the free peptides L1 and L3 are clearly shifted towards smaller distances dee as compared to the corresponding distributions for the linkers L1 and L3 within XynZ. This transformation reflects excluded volume effects. Namely, the bulky domains do not allow the linker ends to come close to one another and, thus, small distances dee cannot be reached by linkers L1 and L3 within XynZ. In the absence of the domains, however, there is more space available for linkers L1 and L3 to sample and, thus, smaller end-to-end distances can be reached. Interestingly, Pðdee Þ for the shortest linker peptide L2 is practically the same in the presence and absence of the dockerin and CBM6 domains at its ends. In the majority of L2 conformations, dee takes values between about 10 and 15 Å. Visual inspection of the simulated configurations revealed that peptide L2 usually attains elongated configurations in which the bond angles formed by three subsequent Ca atoms are of the order of 120°. Only very rarely L2 bends into crescent-shaped conformations with dee between 8 and 5 Å. It is instructive to compare the free linker peptides to Gaussian chains, i.e., ideal polymers in which the constituent monomers, or residues, do not interact except through the bonds between the adjacent residues. The probability for a Gaussian chain to have distance dee between the ends obeys the following distribution 2 dee

PG ðdee Þ ¼ 4p

!3=2

3 2

2phdee i

2

exp 

3dee 2

2hdee i

! ð5Þ

Fig.7. Results of simulations of the free XynZ enzyme, i.e., without its cohesin domain. (A) Comparison of experimental SAXS data Iexp ðqÞ (black dots) with the scattering intensity profile Isim ðqÞ obtained in the simulations (red curve). The experimental SAXS data are taken from Czjzek et al. (2012). The theoretical Isim ðqÞ curve was computed using Eq. (4) with N ¼ 104 conformations of the XynZ protein. (B) The maximum extension versus the gyration radius of the XynZ protein. Each point here corresponds to a simulated conformation.

B. Róz_ ycki et al. / Journal of Structural Biology 191 (2015) 68–75

74

is very similar to Pðdee Þ for the peptide L2, see the middle panel of Fig. 8(A). Finally, for n ¼ 24 and r ¼ 4 Å we obtained 2

hdee i ¼ 30:2 Å and hdee i1=2 ¼ 32 Å, which are close to the values of the corresponding quantities that characterize the free linker peptide L3, see Table 2. The resulting distribution Pðdee Þ, as shown in the bottom panel of Fig. 8(A), is similar to the distribution Pðdee Þ for the free linker peptide L3. Our results show that chains of hard spheres can exhibit similar Pðdee Þ distributions as the free linker peptides. The sphere diameter is an effective parameter that somehow relates to the peptide stiffness, which in turn is determined by the peptide sequence, but we find that r is of the order of 4 Å in all three cases. This observation indicates that the volume excluded by the residues has an important effect on the equilibrium configurations of the linker peptides. The apparent stiffness of the hard-sphere chains is reflected in the shift of their Pðdee Þ distributions towards larger distances dee as compared to P G ðdee Þ for Gaussian chains with the same numbers of residues. We also performed simulations of the free peptides L1, L2 and L3 at temperatures T between 0:4 T 0 and 5:3 T 0 , where T 0 is the room 2

temperature. Fig. 8(B) shows hdee i1=2 as functions of T for peptides L1, L2 and L3. The longest peptide L3 collapses into more compact conformations at low temperatures T < T 0 , which is reflected in a 2

Fig.8. (A) The end-to-end distance distributions for the free linker peptides L1, L2 and L3 as obtained in the simulations (black dashed lines); for chains of hard spheres with n ¼ 12 and r ¼ 3:7 Å, n ¼ 6 and r ¼ 4:3 Å, and n ¼ 24 and r ¼ 4 Å (red solid lines); and for Gaussian chains with n ¼ 12; n ¼ 6, and n ¼ 24 (magenta dash-dot lines). (B) The root-mean-square end-to-end distance for the free linker peptides L1, L2 and L3 as a function of temperature T, where T 0 is the room temperature.

steady decrease of hdee i1=2 with decreasing T. At elevated tempera2 hdee i1=2

attains a practically constant value. In contures T > T 0 , its trast, peptides L1 and L2 do not collapse at low temperatures. Their 2

hdee i1=2 depends on T very weakly in the range of temperatures studied.

4. Conclusions 2

The root-mean-square end-to-end distance for the chain, hdee i1=2 , is the only parameter that characterizes the distribution given by Eq. (5). If the chain consists of n residues, and the separation between 2

the subsequent residues is ‘, then hdee i ¼ n‘2 . The linker peptides L1, L2 and L3 comprise 12, 6 and 24 residues, respectively, and the Ca-Ca pseudo-bond has a length ‘ ¼ 3:8 Å. Therefore, if the linker 2

peptides L1, L2 and L3 were Gaussian chains, their hdee i1=2 would be 13.2 Å, 9.3 Å and 18.6 Å, respectively. These values are signifi2

cantly smaller than the values of hdee i1=2 that we obtained in the linker simulations, see Table 2. Moreover, Fig. 8(A) shows that PG ðdee Þ as given by Eq. (5) with ‘ ¼ 3:8 Å and n equal to 12, 6, and 24 do not agree with the distributions Pðdee Þ for the linker peptides L1, L2 and L3 that we obtained in the simulations. Therefore, the linker peptides cannot be described as Gaussian chains. An important factor missing in the Gaussian chain model is the excluded volume effect. To investigate how the excluded volume affects Pðdee Þ, we consider a chain of n hard spheres. We assume that the distance between the centers of the subsequent spheres in the chain is ‘ ¼ 3:8 Å. All spheres are taken to have the same diameter r. This means that the centers of two spheres i and j with j i  j jP 2 cannot come closer than r. We performed MC simulations of chains that consist of 12, 6 and 24 hard spheres. For n ¼ 12 and r ¼ 3:7 Å we obtained hdee i ¼ 18:2 Å and 2

hdee i1=2 ¼ 19:1 Å, which are very close to the values of hdee i and 2

hdee i1=2 that we obtained in the simulations of the free peptide L1, see Table 2. Also the distribution Pðdee Þ for the chain of 12 hard spheres with diameter r ¼ 3:7 Å is very similar to the distribution Pðdee Þ that we obtained in the simulations of the free peptide L1, see the top panel of Fig. 8(A). For n ¼ 6 and r ¼ 4:3 Å we obtained 2

hdee i ¼ 11:5 Å and hdee i1=2 ¼ 11:7 Å, which are very close to the val2 hdee i1=2

for the free peptide L2, see Table 2. Also ues of hdee i and Pðdee Þ for the chain of 6 hard spheres, each of diameter r ¼ 4:3 Å,

The proteins that do not fold into a single stable structure at physiological conditions are termed intrinsically disordered proteins (IDPs). The thermodynamic state of an IDP is an ensemble of rapidly interconverting conformations. Important examples of IDPs are large multi-domain proteins in which several autonomously folded domains are held together by intrinsically disordered regions (Dunker et al., 2008). Their structural analysis is difficult because of their large sizes and dynamical nature _ (Rózycki and Boura, 2014). Notable examples with great potential applications in biofuel production are the cellulosomes. While the individual catalytic domains and modules of these multi-modular carbohydrate active enzymes have extensively been characterized structurally by crystallography and NMR methods (Davies and Henrissat, 2013), the specific roles of the length and flexibility of the disordered linkers are still not well understood. A number of SAXS studies exploring the solution structures of multi-domain enzymes and their complexes have evidenced that the linkers play a key role as they provide conformational flexibility which gives rise to the spatial liberty of the relative positions of the individual globular domains (Receveur et al., 2002, Hammel et al., 2005 and Currie et al., 2012). But the static X-ray diffraction methods only indirectly give access to information about conformational flexibility. The use of molecular simulations in combination with experimental validation – as applied here to the full-length enzyme XynZ – is a powerful tool to extract additional, dynamic properties of these proteins. Our results show that the excluded volume effect accounts for the statistical properties of the linkers, correlating the length of a given linker to its extendedness. In the particular example of the full-length enzyme XynZ, the intrinsic linker properties allow more spatial liberty for the two catalytic domains CE1 and GH10 at both ends of the chain. Assuming that the catalytic modules need to have conformational adaptability to efficiently accommodate their

B. Róz_ ycki et al. / Journal of Structural Biology 191 (2015) 68–75

large carbohydrate substrate molecules, the elasticity of the linkers will certainly result in a positive effect on the enzymatic activity of these modules. Another interesting prediction of our simulations is that the spatial range of the catalytic domain GH10 is even larger in the presence of a cognate cohesin domain. This result fits nicely into the picture that the efficiency of the enzyme is larger when integrated into the cellulosome complex. Interestingly, based on analyses of SAXS data, several distinct multi-domain proteins and protein complexes belonging the cellulosome machinery have been found to be highly dynamic and flexible in solution (Hammel et al., 2005; Currie et al., 2013, 2012; Czjzek et al., 2012). Flexibility seems thus to be a common feature among different cellulosome components. It probably is needed for the simultaneous binding of the multiple enzymatic domains to plant cell wall polysaccharides. The role of the disordered linkers is thus to provide the required degree of flexibility while maintaining the integrity of the cellulosome complexes. Acknowledgments This research has been supported by ADEME contract No. 1201C0104 in the context of the ERANET-IB Fiberfuel to M. Czjzek, by the ERA-NET grant ERA-IB (EIB.12.022) (FiberFuel) to M. Cieplak, and by the Polish National Science Center Grant No. _ 2012/05/B/NZ1/0063 to B. Rózycki. It was also co-financed by the Polish Ministry of Science and Higher Education from the resources granted for the years 2014–2017 in support of international scientific projects. All authors are financially supported through the EU programme NMP.2013.1.1.162 within the project CellulosomePlus, Grant No. 604530. References Abuja, P.M., Pilz, I., Claeyssens, M., Tomme, P., 1988. Domain structure of cellobiohydrolase II as studied by small angle X-ray scattering: Close resemblance to cellobiohydrolase I. Biochem. Biophys. Res. Commun. 156, 180–185. Bayer, E.A., Belaich, J.P., Shoham, Y., Lamed, R., 2004. The cellulosomes: Multienzyme machines for degradation of plant cell wall polysaccharides. Annu. Rev. Microbiol. 58, 521–554. Bayer, E.A., Lamed, R., Himmel, M.E., 2007. The potential of cellulases and cellulosomes for cellulosic waste management. Curr. Opin. Biotechnol. 18, 237–245. _ B., Herrick, D.Z., Chung, H.S., Vecer, J., Eaton, W.A., Cafiso, D.S., Boura, E., Rózycki, Hummer, G., Hurley, J.H., 2011. Solution structure of the ESCRT-I complex by small angle X-ray scattering, EPR, and FRET spectroscopy. Proc. Natl. Acad. Sci. USA 108, 9437–9442. _ Boura, E., Rózycki, B., Chung, H.S., Herrick, D.Z., Canagarajah, B., Cafiso, D.S., Eaton, W.A., Hummer, G., Hurley, J.H., 2012. Solution structure of the ESCRT-I and -II supercomplex: implications for membrane budding and scission. Structure 20, 874–886.

75

Currie, M.A., Adams, J.J., Faucher, F., Bayer, E.A., Jia, Z.C., Smith, S.P., 2012. Scaffoldin conformation and dynamics revealed by a ternary complex from the Clostridium thermocellum cellulosome. J. Biol. Chem. 287, 26953–26961. Currie, M.A., Cameron, K., Dias, F.M.V., Spencer, H.L., Bayer, E.A., Fontes, C.M.G.A., Smith, S.P., Jia, Z., 2013. Small angle X-ray scattering analysis of Clostridium thermocellum cellulosome N-terminal complexes reveals a highly dynamic structure. J. Biol. Chem. 288, 7978–7985. Czjzek, M., Fierobe, H.P., Receveur-Brechot, V., 2012. Small angle X-ray scattering and crystallography: A winning combination for exploring the multimodular organization of cellulolytic macromolecular complexes. Methods Enzymol. 510, 183–210. Davies, G.J., Henrissat, B., 2013. Cracking the code, slowly: The state of carbohydrate-active enzymes in 2013. Curr. Opin. Struct. Biol. 23, 649–651. Dunker, A.K., Silman, I., Uversky, V.N., Sussman, J.L., 2008. Function and structure of inherently disordered proteins. Curr. Opin. Struct. Biol. 18, 756–764. Fontes, C.M., Hazlewood, G.P., Morag, E., Hall, J., Hirst, B.H., Gilbert, H.J., 1995. Evidence for a general role for non-catalytic thermostabilizing domains in xylanases from thermophilic bacteria. Biochem. J. 307, 151–158. _ Francis, D.M., Rózycki, B., Koveal, D., Hummer, G., Peti, W., Page, R., 2011a. Structural basis of p38-a regulation and specificity by hematopoietic tyrosine phosphatase. Nat. Chem. Biol. 7, 916–924. _ Francis, D.M., Rózycki, B., Tortajada, A., Hummer, G., Peti, W., Page, R., 2011b. Resting and active states of the ERK2:HePTP complex. J. Am. Chem. Soc. 133, 17138–17141. Grepinet, O., Chebrou, M.C., Beguin, P., 1988. Purification of Clostridium thermocellum xylanase Z expressed in Escherichia coli and identification of the corresponding product in the culture medium of C. thermocellum. J. Bacteriol. 170, 4576–4581. Hammel, M., Fierobe, H.P., Czjzek, M., Kurkal, V., Smith, J.C., Bayer, E.A., Finet, S., Receveur-Brechot, V., 2005. Structural basis of cellulosome efficiency explored by small angle X-ray scattering. J. Biol. Chem. 280, 38562–38568. Kim, Y.C., Hummer, G., 2008. Coarse-grained models for simulations of multiprotein complexes: Application to ubiquitin binding. J. Mol. Biol. 375, 1416–1433. _ Leonard, T.A., Rózycki, B., Saidi, L.F., Hummer, G., Hurley, J.H., 2011. Crystal structure and allosteric activation of protein kinase C b-II. Cell 144, 55–66. Ossowski, I., Eaton, J.T., Czjzek, M., Perkins, S.J., Frandsen, T.P., Schuelein, M., Panine, P., Henrissat, B., Receveur-Brechot, V., 2005. Protein disorder: Conformational distribution of the flexible linker in a chimeric double cellulase. Biophys. J. 88, 2823–2832. Receveur, V., Czjzek, M., Schulein, M., Panine, P., Henrissat, B., 2002. Dimension, shape, and conformational flexibility of a two domain fungal cellulase in solution probed by small angle X-ray scattering. J. Biol. Chem. 277, 40887– 40892. _ Rózycki, B., Boura, E., 2014. Large, dynamic, multi-protein complexes: A challenge for structural biology. J. Phys.: Condens. Matter 26, 463103. _ Rózycki, B., Kim, Y.C., Hummer, G., 2011. SAXS ensemble refinement of ESCRT-III CHMP3 conformational transitions. Structure 19, 109–116. Shoham, Y., Lamed, R., Bayer, E.A., 1999. The cellulosome concept as an efficient microbial strategy for the degradation of insoluble polysaccharides. Trends Microbiol. 7, 275–281. Sikora, M., Szymczak, P., Thompson, D., Cieplak, M., 2011. Linker-mediated assembly of gold nanoparticles into multimeric motifs. Nanotechnology 22, 445601. Svergun, D., Barberato, C., Koch, M.H.J., 1995. CRYSOL – a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 28, 768–773. Vazana, Y., Barak, Y., Unger, T., Peleg, Y., Shamshoum, M., Ben-Yehezkel, T., Mazor, Y., Shapiro, E., Lamed, R., Bayer, E.A., 2013. A synthetic biology approach for evaluating the functional contribution of designer cellulosome components to deconstruction of cellulosic substrates. Biotechnol. Biofuels 6, 182.