Structural Analysis of the Group II Intron Splicing Factor CRS2 Yields Insights into its Protein and RNA Interaction Surfaces

Structural Analysis of the Group II Intron Splicing Factor CRS2 Yields Insights into its Protein and RNA Interaction Surfaces

doi:10.1016/j.jmb.2004.10.032 J. Mol. Biol. (2005) 345, 51–68 Structural Analysis of the Group II Intron Splicing Factor CRS2 Yields Insights into i...

2MB Sizes 0 Downloads 20 Views

doi:10.1016/j.jmb.2004.10.032

J. Mol. Biol. (2005) 345, 51–68

Structural Analysis of the Group II Intron Splicing Factor CRS2 Yields Insights into its Protein and RNA Interaction Surfaces Gerard J. Ostheimer1,2, Haralambos Hadjivasiliou1,2,5, Daniel P. Kloer1,6 Alice Barkan1,3* and Brian W. Matthews1,4,7 1

Institute of Molecular Biology University of Oregon, Eugene OR 97403, USA 2

Department of Chemistry University of Oregon, Eugene OR 97403, USA 3

Department of Biology University of Oregon, Eugene OR 97403, USA 4

Department of Physics University of Oregon, Eugene OR 97403, USA 5

Department of Cellular and Molecular Pharmacology University of California San Francisco, 600 16th Street Genentech Hall S574, San Francisco, CA 94143-2280 USA

Chloroplast RNA splicing 2 (CRS2) is a nuclear-encoded protein required for the splicing of nine group II introns in maize chloroplasts. CRS2 functions in the context of splicing complexes that include one of two CRS2-associated factors (CAF1 and CAF2). The CRS2–CAF1 and CRS2– CAF2 complexes are required for the splicing of different subsets of CRS2dependent introns, and they bind tightly and specifically to their genetically defined intron targets in vivo. The CRS2 amino acid sequence is closely related to those of bacterial peptidyl-tRNA hydrolases (PTHs). To identify the structural differences between CRS2 and bacterial PTHs responsible for CRS2 0 s gains of CAF binding and intron splicing functions, we determined the structure of CRS2 by X-ray crystallography. The fold of CRS2 is the same as that of Escherichia coli PTH, but CRS2 has two surfaces that differ from the corresponding surfaces in PTH. One of these is more hydrophobic in CRS2 than in PTH. Site-directed mutagenesis of this surface blocked CRS2–CAF complex formation, indicating that it is the CAF binding site. The CRS2 surface corresponding to the putative tRNA binding face of PTH is considerably more basic than in PTH, suggesting that CRS2 interacts with group II intron substrates via this surface. Both the sequence and the structural context of the amino acid residues essential for peptidyl-tRNA hydrolase activity are conserved in CRS2, yet expression of CRS2 is incapable of rescuing a pthts E. coli strain. q 2004 Elsevier Ltd. All rights reserved.

6

Institut fur Organische Chemie und Biochemie Albert-Ludwigs-Universitat Albertstrasse 21, D-79104 Freiburg im Breisgau, Germany 7

Howard Hughes Medical Institute, University of Oregon Eugene, OR 97403, USA *Corresponding author

Keywords: peptidyl-tRNA hydrolase; group II intron; chloroplast; protein facilitated RNA catalysis

Introduction Protein co-factors are required in vivo for the splicing of catalytic group I and group II introns.1 These splicing factors are either “maturases”, which Abbreviations used: CRS2, chloroplast RNA splicing 2; CAF, CRS2-associated factor; PTH, peptidyl-tRNA hydrolase. E-mail address of the corresponding author: [email protected]

are conserved proteins encoded by the introns themselves, or host-encoded proteins that have co-evolved with their intron targets to yield a ribonucleoprotein particle with splicing activity.2 Several group I intron splicing factors and a group II intron maturase have been shown to facilitate splicing by assisting the folding of their intron substrates into their catalytically competent conformation.2,3 However, the roles of host-encoded proteins in facilitating the splicing of group II introns are largely unexplored.

0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.

52 Chloroplast RNA splicing 2 (CRS2) is a hostencoded group II intron splicing factor that is required for the splicing of nine of the 17 group II introns in the chloroplast.4,5 CRS2 forms stable complexes with two CRS2-associated factors, CAF1 and CAF2.6 CAF1 and CAF2 are required for the splicing of distinct subsets of CRS2-dependent introns:6 the CRS2–CAF1 complex is required for the splicing of four introns, the CRS2–CAF2 complex is required for the splicing of three introns, and two introns require both CAF1 and CAF2. CRS2 is related to bacterial peptidyl-tRNA hydrolases (PTH),7 which cleave the ester bond linking the nascent peptide and tRNA when peptidyl-tRNA is released prematurely from the ribosome.8 Bacteria possess a single PTH gene, which is essential for cellular viability.9,10 The strong similarity between CRS2 and bacterial PTH indicates that CRS2 evolved from a PTH. Thus a comparison of the structures of CRS2 and PTH should reveal features specific to CRS2 that are relevant to its acquisition of a function in splicing. Escherichia coli PTH has been studied extensively and its structure has been determined by X-ray crystallography.11 An absolutely conserved histidine, H20 (Figure 1), functions as a catalytic base, deprotonating a water molecule that subsequently performs a nucleophilic attack on the ester linkage between the peptide and the tRNA.12 Nearly all of the residues implicated in the substrate recognition and catalytic activity of PTH are conserved in CRS2. Nonetheless, CRS2 is unable to complement an E. coli strain with a temperaturesensitive mutation in its PTH gene (pthts ), suggesting that CRS2 does not have PTH activity.7 One possible reason is that CRS2 is inactive in the absence of one of its protein partners, CAF1 or CAF2. Consequently, the failure of CRS2 alone to complement a pthts E. coli strain does not eliminate the possibility that CRS2 possesses PTH activity. To gain insights into how CRS2 is able to facilitate group II intron splicing and to form stable complexes with the CAFs, we solved the structure of CRS2 by X-ray crystallography. As was expected from the high degree of sequence similarity, the fold of CRS2 is very similar to that of E. coli PTH; however, CRS2 possesses two conserved surfaces that differ from the corresponding surfaces of PTH. One surface of CRS2 is markedly less hydrophilic than the corresponding PTH surface. Site-directed mutagenesis of this surface disrupted CRS2–CAF interactions, demonstrating it to be the CAF binding site. The surface of CRS2 that corresponds to the putative tRNA binding face of E. coli PTH is considerably more basic in CRS2, suggesting a role in recognizing an RNA substrate. As such, the CRS2 crystal structures reported here suggest the regions of the protein responsible for its acquisition of both CAF and group II intron binding activities. In addition, the CRS2 structures show the residues corresponding to the E. coli PTH active site to be structurally indistinguishable from those in PTH, suggesting that CRS2 could have PTH activity.

Structure of a Group II Intron Splicing Factor

However, neither the expression of CRS2 variants modified to more closely resemble monomeric bacterial PTHs, nor the presence of a CRS2/CAF2 co-expression plasmid complemented a pthts strain of E. coli. These results suggest that CRS2 does not function as a PTH in vivo, despite the structural conservation of the active site region.

Results and Discussion Structure determination of CRS2DC To solve the structure of CRS2 it was necessary to trim both its N and C termini. CRS2 possesses a chloroplast targeting peptide at its N terminus (Figure 1). This targeting peptide is predicted to be cleaved at residue “K13” with respect to the PTH sequence (all numbering is as for PTH to facilitate comparisons). For structure determination, a CRS2 construct was engineered such that an N-terminal hexahistidine tag (MRGSHHHHHHTD) was fused to residue V-2, thereby removing the chloroplast targeting peptide and producing a construct with an N terminus akin to bacterial PTHs (Figure 1). CRS2 differs additionally from PTH by possessing a seven amino acid residue basic and aromatic extension at its C terminus (Figure 1).7 Attempts to crystallize CRS2 constructs possessing this extension failed. However, trimming the C terminus back to Q191 resulted in a protein with improved solubility that yielded diffraction quality crystals. The structure of this N-terminal hexahistidine tagged, C-terminal trimmed construct, CRS2DC, was solved by molecular replacement using the structure of E. coli PTH11 as a search model. Crystallographic and refinement statistics are presented in Table 1. The N-terminal hexahistidine tag was not visible in the final model. All residues but Y66 are found within the allowed regions of the Ramachandran plot. Y66 has phi and psi angles of 83.58 and 149.38, respectively. This residue is found at the junction of a buried loop and helix 2. The corresponding residue in E. coli PTH, F66, has a similar conformation with phi and psi angles of 92.68 and 147.58, respectively. This indicates that the conformation of this residue is not specific to CRS2 but is intrinsic to the fold of this protein family. As would be expected for two proteins with w50% sequence identity (Figure 1), the folds of CRS2 and PTH are essentially the same (Figure 2). However, there was a notable exception: the Cterminal residues of PTH (residues 180–190) form a helix, whereas the corresponding residues of CRS2DC are in part disordered and in part form a strand that is sandwiched into a crystal contact (Figures 2 and 3(a)). Electron density for the CRS2DC chain is continuous only for residues Y0 through G179. However, additional density was observed adjacent to G179 that contributed to a crystal contact (Figure 3(a)). This electron density could be fit with the sequence ARFALVA. Allowing

Structure of a Group II Intron Splicing Factor

53

Figure 1. Sequences of CRS2-like and PTH-like proteins in maize and Arabidopsis, aligned with representative bacterial PTHs. Shown are the sequences of maize CRS2 (Zm.C_CRS2), the four predicted CRS2-like and PTH-like proteins in Arabidopsis thaliana (At.C_CRS2.A, At.C_CRS2.B, At.M_PTH/CRS2 and At.C_PTH), the available sequence for the predicted maize ortholog of a putative chloroplast PTH (Zm.C._PTH), which is incomplete at its N terminus, and bacterial PTHs from E. coli (Ec_PTH), B. subtilis (Bs_PTH), and Synechocystis 6803 (Syn_PTH). The predicted intracellular locations of the plant proteins28 are indicated by C (chloroplast) or M (mitochondrion). The plant proteins are designated as either CRS2 or PTH according to the presence or absence of CRS2 signature features (see the text). The protein At.M_PTH/CRS2 has a blend of CRS2 and PTH features. Residue classes are shaded as follows: hydrophobic in black, basic in blue, acidic in red, glycine in yellow, large polar in green and small residues unshaded. Numbering according to the E. coli PTH crystal structure11 is underneath the alignments, and is used for reference throughout the text. The predicted cleavage site of the CRS2 chloroplast targeting peptide is indicated with a downward arrow. Asterisks indicate the catalytic residues H20 and D93 in PTH. D Symbols indicate residues 105 and 133 that constitute a basic clamp in PTH for binding the tRNA 5 0 -phosphate. Filled black circles indicate residues in PTH that are proposed to bind the peptide moiety of peptidyl-tRNAs. Filled black squares indicate the residues of the CRS2 hydrophobic surface. Filled blue circles

54

Structure of a Group II Intron Splicing Factor

Table 1. X-ray data collection and structure refinement statistics

Data collection ˚) Wavelength (A Space group ˚) Unit cell dimensions (A a b c Unique reflections Completeness (%) Multiplicity RSYM (%) I/s(I) ˚ 2) Wilson B (A TNT refinement statistics ˚) Resolution range (A Scale factors K KSOLVENT B BSOLVENT Number of protein atoms Water molecules Rfactor (%) Rfree (%) RMS deviations from ideal geometry ˚) Bond lengths (A Bond angles (8) Ramachandran analysis Most favorable Allowed Generously allowed Disallowed

CRS2DC

CRS2(I42N,S44F,I50T)

CRS2(I42E)

0.773 P212121

1.00 P212121

1.00 I222

36.3 62.6 78.2 20,160 99.6 (99.6) 3.6 (3.6) 5.2 (19.4) 8.1 (3.9) 15.5

39.7 56.5 69.4 14,925 92.3 (95.5) 3.8 (3.7) 9.3 (24.6) 11.8 (3.3) 24.3

53.1 79.6 112.4 22,475 99.9 (99.9) 6.9 (6.6) 6.9 (25.7) 5.7 (2.5) 21.1

13.1–1.70

15.0–1.75

15.0–1.80

0.87 0.53 0.97 96 1446 165 17.6 24.6

0.69 0.85 K0.30 109 1515 143 19.9 27.8

1.15 0.72 0.15 118 1413 124 19.5 25.9

0.021 2.2

0.014 2.7

0.025 2.8

137 (91%) 12 (8%) 0 (0%) 1 (1%)

139 (87%) 17 (11%) 1 (1%) 1 (1%)

136 (91%) 13 (9%) 1 (1%) 0 (0%)

for the likely possibility that side-chain density was weak or absent, this sequence matches the ERFNLVQ sequence of CRS2 residues 185–191 (Figure 3(a)). While some backbone density was observed for residues F180 and T184, these residues and the intervening residues, SGST (Figure 1), were omitted from the CRS2 model. The CAF binding surface of CRS2 The surface of the crystal mate that contacts the E185-Q191 strand is centered on I42, which in solution would be exposed to solvent (Figure 3(a)). Electrostatic surface representations show I42 to be a component of an uncharged surface on CRS2 that corresponds to a more hydrophilic surface in PTH (Figure 3(b)). The residues of this surface, T41, I42, Q43, S44, L47 and I50 (Figures 1 and 3), are variable among bacterial PTHs but conserved among CRS2s, suggesting that they contribute to a CRS2-specific function. Bacterial PTHs are monomers,11 but CRS2 forms a stable complex with either of two CAFs.6 As

such, CRS2 would be expected to have a protein– protein interaction surface not found in bacterial PTHs. The participation of this conserved, hydrophobic CRS2 surface in the intimate crystal contact with the E185-Q191 strand suggested that it could be a protein–protein interaction surface. To test if the conserved surface surrounding I42 mediates CAF binding, the amino acid substitutions I42N, S44F and I50T were introduced so as to make the CRS2 surface similar to that in the PTH of Bacillus subtilis (Figure 1). The choice of B. subtilis PTH rather than E. coli PTH was made because both CRS2 and B. subtilis possess an amino acid deletion in this region as compared to E. coli (Figure 1). A yeast two-hybrid assay was employed to determine if the mutations I42N, S44F, and I50T interfered with CAF binding. CRS2 was fused to the Gal4 DNA binding domain and each CAF was fused to the Gal4 transcriptional activation domain.6 Figure 4 shows that the I42N mutation interferes slightly with CAF1 binding and moderately with CAF2 binding. Both of the double

indicate basic CRS2 residues suggested here to interact with RNA substrates. The CRS2 basic and aromatic tail is indicated with a blue bar. The N and C termini of the CRS2DC construct whose structure was solved in this work are indicated with left and right facing arrows, respectively. The accession number for Zm.C_CRS2 is AAF27939. The Plant Genome Database (www.plantgdb.org) accession code for Zm.C_PTH is Zmtuc-03-08-11.23238. The Arabidopsis gene identifier numbers are At5g38290 for At.C_CRS2.A, At5g16140 for At.C_CRS2.B, At5g19830 for At.M_PTH/CRS2 and At1g18440 for At.C_PTH. Accession numbers for the bacterial proteins are BAA05288 for Bs_PTH, NP 287450 for Ec_PTH and NP 442403 for Syn_PTH.

Structure of a Group II Intron Splicing Factor

55

Figure 2. Superimposed structures of CRS2DC and PTH. CRS2DC is in yellow and PTH is in blue. Residues of the PTH active site cleft and the corresponding residues of CRS2 are shown. The strand composed of CRS2 residues E185 through Q191 is included. Spheres indicate the connection between CRS2 residues K178 and E185 for which electron density is lacking.

mutations that include I42N, i.e. I42N, S44F and I42N, I50T, disrupt CAF binding, as does the triple mutation I42N, S44F, I50T (Figure 4). The single mutations S44F and I50T do not disrupt CRS2–CAF interactions to a degree discernible by the yeast two-hybrid assay. However, that these mutations augment the effect of the I42N mutation suggests that these residues are involved in CAF binding. The I50T mutation only slightly reduces the size of the side-chain and so it is not surprising that the single mutation does not have an appreciable effect. Attempts at inserting a more disruptive residue such as arginine in this position were found to drastically reduce the solubility of the protein, and as such were uninformative. Structure determination of a CRS2 variant possessing the three mutations, CRS2(I42N,S44F,I50T), which is described below, shows that the triple mutant folds into a stable structure that is very similar to that of CRS2DC. Therefore, the disruption of CAF binding by mutating these residues is not due to destabilization or disruption of the folded protein, and the CAF binding site of CRS2 includes residues I42, S44 and I50. Structure determination of CRS2(I42N,S44F,I50T) To determine if the mutations introduced into CRS2 were deleterious to the folding of the protein, the structure of the variant containing the three substitutions I42N, S44F and I50T, was solved by X-ray crystallography. In this variant, the chloroplast targeting peptide was trimmed through residue P-3, and V-2 was replaced with methionine to serve as the start codon. A hexahistidine tail, LEHHHHHH, was appended to K192. Therefore, as was the case for CRS2DC, this construct did not

possess CRS2 0 s C-terminal basic, aromatic “extenextension”. However, it did possess residues 180– 191, which correspond to the C-terminal helix of PTH (Figures 1 and 2). CRS2(I42N,S44F,I50T) was substantially more soluble than CRS2DC (not shown), consistent with the strategy of engineering this surface to resemble the corresponding surface of a monomeric bacterial PTH. The structure of CRS2(I42N,S44F,I50T) was solved by molecular replacement using the structure of CRS2DC residues T1 through K178 with nonburied side-chains trimmed to alanine as the search model. The search model did not include the E185Q191 strand. Crystallographic and refinement statistics are presented in Table 1. As with the CRS2DC structure, all residues but Y66 are found within the allowed regions of the Ramachandran plot and the overall geometry is excellent. The model starts with E-1 and ends with H197, which is the third histidine of the C-terminal hexahistidine tag. A loop consisting of residues K107 through H113 was disordered and G108, G109, H110, and G111 were omitted from the model. In addition, the electron density for the loop above the active site was weak, suggesting that these residues are not well ordered in the crystal. The structure of CRS2(I42N,S44F,I50T) is shown superimposed upon E. coli PTH and upon CRS2DC in Figure 5(a) and (b), respectively. With the exception of the loop above the active site, which is poorly defined, the structure of residues T1 through K178 is essentially identical with that in CRS2DC. As such, the three substitutions I42N, S44F and I50T do not cause the fold of CRS2(I42N,S44F,I50T) to differ from either CRS2 or PTH. Thus, the amino acid substitutions of CRS2 that blocked CRS2–CAF interactions in the yeast two-hybrid assay (Figure 4) did so because they disrupted the CAF binding site

56

Structure of a Group II Intron Splicing Factor

Figure 3. The conserved, hydrophobic surface of CRS2 contributes to an unusual crystal contact. (a) CRS2DC is shown in yellow. The crystal mate is shown in green. The residues of the strand and of the hydrophobic, uncharged patch are shown. The 2FOKFC electron density contoured at 1 sigma into which residues E185 through 191 were fit is shown. (b) Electrostatic surface representations of the hydrophobic patch of CRS2, as observed in CRS2DC, and the corresponding view of PTH. Basic surfaces are colored blue, and acidic surfaces are colored red. Uncharged and hydrophobic surfaces are colored white.

of CRS2 and not because they caused misfolding of CRS2. The crystal packing of CRS2(I42N,S44F,I50T) and CRS2DC are quite different. Unlike CRS2DC, residues S183 through K192 of CRS2(I42N,F44S,I50T) form a helix (Figure 5(a) and (b)). In addition, this helix continues for another three residues, LEH, which are the first three residues of the C-terminal hexahistidine tag (Figure 5(a) and (b)). However, the CRS2(I42N,S44F,I50T) C-terminal helix is not located in exactly the same position as the corresponding PTH helix and the linkages between the penultimate helix and the C-terminal helices are

different (Figure 5(a)). In the CRS2(I42N,S44F,I50T) crystal structure the electron density for residues K178 through G182 is weak, so the positions of these residues are ill-defined. A crystal mate abuts the bottom of the C-terminal helix, possibly dislodging it from its native location (not shown). The position of the abutting crystal mate precludes the C-terminal helix of CRS2(I42N,S44F,I50T) from occupying the same location as the C-terminal helix of PTH. In addition, the contacts made by the first three residues of the hexahistidine tag, which contribute to this helix and pack against the protein, could also have stabilized a perturbed

57

Structure of a Group II Intron Splicing Factor

conformation for the C-terminal helix. Nonetheless, the observation of such a helix in the CRS2(I42E, S44F, I50T) crystal structure indicates that CRS2 is likely to possess a helix analogous to the C-terminal helix in PTH. However, the location of this helix may be perturbed in the CRS2(I42N,S44F,I50T) crystal structure due to crystal packing. Structure determination of CRS2(I42E)

Figure 4. The CRS2 surface centered on I42 contributes to CAF binding. Yeast two-hybrid assays were used to detect interactions between CRS2 fused to the Gal4 DNA binding domain and CAF1 or CAF2 fused to the Gal4 activation domain. The amino acid substitutions in the CRS2 variants are indicated. The presence or absence of histidine in the growth medium is shown below: growth in the absence of histidine indicates an interaction between CRS2 and the CAF.

One of the signature sequence motifs distinguishing CRS2 from PTH is its basic and aromatic C-terminal extension KYKFHRV that starts with K192 (Figure 1).7 The conservation of this motif between dicots (e.g. Arabidopsis) and monocots (e.g. maize) implies that it contributes to the function of CRS2. Its basic and aromatic nature suggests that this sequence could participate in RNA recognition. Two lines of evidence suggest that this extension is unstructured in the absence of an RNA substrate: (1) the extension is susceptible to proteolytic degradation (not shown); and (2) removal of this extension facilitated crystallization of CRS2. The possible contribution of this extension to CRS2 0 s splicing function made the observed conformational flexibility of the residues immediately upstream (G179 through Q191) intriguing. Might CRS2 residues G179 through Q191 have a conformation distinct from the corresponding PTH helix that could contribute to CRS2 0 s role in splicing? In an attempt to determine the conformation of CRS2 residues G179 through Q191, the structure of CRS2(I42E) was solved by X-ray crystallography. CRS2 (I42E) is the same as CRS2DC except for the single substitution I42E. The intent of the I42E substitution was to modify the hydrophobic surface centered on I42 so as to disrupt the crystal contact that sandwiched residues E185 through Q191 into an extended conformation between two crystal mates. It was hoped that preventing this crystal contact would permit residues G179 through Q191 to adopt their native conformation. CRS2(I42E) crystallized in the space group I222 indicating that the I42E mutation lead to crystallization with different crystal packing than either CRS2DC or CRS2(I42N,S44F,I50T). The structure of CRS2(I42E) was solved by molecular replacement using the structure of CRS2DC residues T1 through K178 with non-buried side-chains trimmed to alanine as the search model. The search model did not include the E185-Q191 strand. Crystallographic and refinement statistics are presented in Table 1. Unlike the CRS2DC and CRS2(I42N,S44F,I50T) structures, all residues including Y66 are found within the allowed regions of the Ramachandran plot and the overall geometry is excellent. The model starts with Y0 and ends with Q191. CRS2(I42E) is shown superimposed upon CRS2DC and CRS2(I42N,S44F, I50T) in Figure 5(c) and (d), respectively. As was the case for both CRS2DC and CRS2(I42N,S44F,I50T), crystal packing perturbed the structure of CRS2(I42E). There were four significant structural rearrangements as a result of

58

Structure of a Group II Intron Splicing Factor

Figure 5. Superimposed structures of CRS2 variants with PTH and CRS2DC. (a) CRS2(I42N,S44F,I50T) (green) superimposed on E. coli PTH (blue). The unstructured loop between residues K107 and H113 of CRS2(I42N,S44F,I50T) is indicated with green spheres. (b) CRS2(I42N,S44F,I50T) (green) superimposed on CRS2DC (yellow). The unstructured loops between residues K107 and H113 of CRS2(I42N,S44F,I50T) and between residues K178 and E185 of CRS2DC are indicated with green and yellow spheres, respectively. (c) CRS2(I42E) (purple) superimposed on CRS2DC (yellow). The unstructured loops between residues G108 and G115 of CRS2(I42E) and between residues K178 and E185 of CRS2DC are indicated with purple and yellow spheres, respectively. (d) CRS2(I42E) (purple) superimposed on CRS2(I42N,S44F,I50T) (green). The unstructured loops between residues G108 and G115 of CRS2(I42E) and between residues K107 and H113 of CRS2(I42N,S44F,I50T) are indicated with purple and green spheres, respectively.

crystal packing. First, M67, which is buried under H20 in the crystal structures of E. coli PTH, CRS2DC and CRS2(I42N,S44F,I50T), is flipped out and is replaced by Y66, which has flipped in and stacked underneath H20 (Figure 6(a)). This rearrangement permits the backbone of Y66 to adopt an allowed conformation. However, neither E. coli PTH nor the other CRS2 structures presented here have this conformation. In addition, substitution of glutamate for M67 is deleterious to E. coli PTH function.12 As such, this “swapping” of Y66 for M67 is unlikely to be the biologically relevant

conformation. Flipping out of the M67 side-chain appears to have been facilitated by docking of the side-chain into a hydrophobic pocket formed by residues P3, P57, L59, Y81 and V83 (Figure 6(b)). This hydrophobic pocket is adjacent to the conserved hydrophobic surface that includes I42, S44 and I50 (Figures 3(b) and 6(b)). The proximity of this hydrophobic pocket to the surface known to contribute to CAF binding and the intimate nature of this protein–protein interaction in the context of the crystal contact raises the possibility that this pocket contributes to the CAF binding site.

59

Structure of a Group II Intron Splicing Factor

Figure 6 (legend on next page)

60 A second structural perturbation involved an invasive crystal contact that dislodges helix 7, which in CRS2 is composed of residues H113 through L123, resulting in residues G108 through G115 being disordered. Residues G109 through N114 were omitted from the model of CRS2(I42E) (Figures 5(c) and (d) and 6(a)). In the model of CRS2(I42N,S44F,I50T), K107 through H113 were also disordered (Figure 5(a) and (b)), suggesting that this glycine-rich loop is flexible. Thirdly, the electron density for the loop above the active site consisting of residues P140 through A147 was weak, suggesting that these residues are not well ordered in the crystal. This was observed in the crystal structure of PTH, CRS2DC and CRS2(I42N,S44F,I50T) and is consistent with this loop being mobile. Lastly, residues F180 through N189 form a helix (Figures 5(c) and (d) and 6(c)). However, the C-terminal helices of two CRS2(I42E) crystal mates interleave such that residues L189 and V190 contact the hydrophobic surface of a crystal mate that has been exposed by dislodging the corresponding helix of the crystal mate (Figure 6(c)). In E. coli PTH, L180 through F190 form the C-terminal helix. With the exception of residues L189, V190 and Q191, which have been unwound as a result of the helix-interleaving crystal contact, these are the same residues that are helical in CRS2(I42E). In contrast, in the CRS2(I42N,S44F, I50T) structure, residues S183 through K192 contribute to the helix, but F180, S181 and G182 are ill-defined. The presence of a helix in two of the three crystal structures of CRS2 suggests that CRS2 has a helix upstream of its CRS2-specific C-terminal extension. The C-terminal helix of PTH is amphipathic, and its hydrophobic residues contribute to the PTH hydrophobic core. The amphipathic pattern of the corresponding amino acid residues is maintained in CRS2. PTH C-terminal helix residues L180, A183, T184, L187, A189 and F190 pack into the hydrophobic core. The corresponding residues in CRS2 are F180, S183, T184, F187, L189 and V190 (Figure 1), which could also pack against the hydrophobic core. That the register of hydrophobic residues is the same in CRS2 and PTH suggests that residues F180 through Q191 of CRS2 form an amphipathic helix analogous to the helix observed in PTH. Inspection of the surfaces of PTH and CRS2 that would be exposed if the residues after 178 were removed reveals that CRS2 possesses a hydrophobic surface comparable to the surface of PTH that accommodates its C-terminal amphipathic helix (Figure 7). In particular, there is a hydrophobic pocket on the surface of CRS2. In the CRS2DC

Structure of a Group II Intron Splicing Factor

Figure 7. Electrostatic surface representations of the Cterminal helix docking surface of PTH and the corresponding surface of CRS2. The CRS2DC and PTH surfaces that result from the removal of residues after residue 178 are shown. Basic surfaces are blue, and acidic surfaces are red. Arrows indicate significant exposed hydrophobic pockets.

structure this pocket is occupied by L189 (Figure 3(a)), in the CRS2(I42N,S44F,I50T) structure this pocket is occupied by F187 (Figure 5) and in the CRS2(I42E) structure this pocket is occupied by L189 of a crystal mate (Figure 6(c)). Packing of the

Figure 6. Packing and structural perturbations observed in the crystal structure of CRS2(I42E). (a) Stereo view of the CRS2DC crystal structure (yellow) superimposed on the CRS2(I42E) crystal structure (purple) illustrating the perturbation of helix 7 in CRS2(I42E) and the concomitant flipping out of M67 and flipping in of Y66. (b) Stereo view of M67 of CRS2(I42E) (purple) shown docking into the hydrophobic pocket of a crystal mate (cyan). (c) Stereo view of the interleaving of the C-terminal helices of two CRS2(I42E) crystal mates shown in purple and cyan.

Structure of a Group II Intron Splicing Factor

CRS2 helix against this surface would involve the filling of this pocket with a bulky, hydrophobic sidechain. The most likely candidate is F187. Comparing the loops linking the penultimate and C-terminal helices of CRS2(I42N,S44F,I50T), CRS2(I42E) and PTH shows that the CRS2(I42E) loop is very similar to the corresponding loop in PTH (Figure 5). As such, it is likely that this helix in native CRS2 will be similar to that observed in the CRS2(I42E) structure except that the helix will be rotated such that F187 docks into the hydrophobic pocket of the proposed helix accepting surface. However, the ease with which crystal packing forces modify the conformation of this helix implies that this helix is not tightly associated with the core of the protein. This potential for conformational heterogeneity could contribute to the mechanism by which CRS2 promotes splicing; in particular, this flexibility could potentially facilitate the presentation of CRS2 0 s basic/aromatic C-terminal extension to intron RNA. Structural comparison of residues implicated in PTH activity The E. coli PTH active site consists of an essential histidine, H20, and an aspartic acid, D93.11,12 PTH recognizes both the peptide13 and the tRNA moieties of its substrate.14 The peptide moiety is proposed to be bound in a sequence-independent manner in a narrow cleft lined by three conserved

61 asparagine residues, N10, N68, and N114.11,14 It was proposed that the tRNA moiety is recognized by a basic surface, and, in particular, that the 5 0 -phosphate is bound in a “basic clamp” formed by K105 and R133.14 E. coli PTH distinguishes between peptidyl-tRNA and N-formyl-methionyltRNAfMet by detecting the base-pairing between positions 1 and 72, which is absent in N-formylmethionyl-tRNAfMet.15,16 Fromant et al. suggested that PTH detects the presence of the 1–72 base-pair via binding of the 5 0 -phosphate with the K105-R133 “basic clamp”.14 The PTH active site residues, H20 and D93, and the residues thought to be involved in peptide binding, N10, N68, and N114, are conserved in CRS2 and its predicted orthologs (Figure 1). However, the phosphate “clamp” of PTH has not been maintained in CRS2. Instead, in maize CRS2 and two Arabidopsis CRS2-like proteins, glutamine and serine have replaced the PTH “basic clamp” residues K105 and R133 (Figure 1). The PTH active site residues N10, Y15, H20, N68, D93, H113 and N114 and the corresponding residues in CRS2 have the same structure (Figure 8). In fact, nearly all of the conserved residues of the active site cleft, including T18, R19, N21 and M67 (Figure 1), are positioned identically in CRS2 and PTH (not shown for the sake of clarity). The structural conservation of these residues between CRS2 and PTH suggests that CRS2 could likewise act as a hydrolase, and the conservation of residues N10, Y15, N68 and N114 in CRS2 suggests that

Figure 8. Stereo view of the active site residues of PTH superimposed on the corresponding residues of CRS2. CRS2 is in yellow and PTH is in blue. The surface-exposed residues identical in CRS2 and PTH are indicated. The notation K105Q and R133S indicates that the PTH basic clamp residues K105 and R133 are replaced with glutamine and serine in CRS2.

62

Structure of a Group II Intron Splicing Factor

CRS2 is a peptidyl-hydrolase. The only obvious difference between CRS2 and PTH is the 5 0 phosphate clamp. The clamp region of PTH is shown superimposed on the corresponding region of CRS2 in Figure 8. The overall conformation of this region is conserved; however, the basic clamp is clearly absent in CRS2. The terminal base-pair of chloroplast tRNAfMet is absent as it is in the eubacteria, so a discrimination mechanism similar to that of bacterial PTHs might be anticipated. As such, if CRS2 functions as a PTH, then presumably it has an alternative method of discriminating between peptidyl-tRNA and formyl-methionyltRNAfMet. That Q105 and S133 are conserved among predicted CRS2 orthologs (Figure 1) suggests that these substitutions may be important for CRS2 0 s splicing function. Monomeric CRS2 is unable to rescue a pthts E. coli strain The structure of CRS2 suggests no obvious reason why CRS2 should not have peptidyl-tRNA hydrolase activity. Nonetheless, expression of wild-type CRS2 does not complement a pthts mutation in E. coli.7 To more thoroughly explore the possibility that CRS2 could function as a PTH, several CRS2 and PTH variants and a construct co-expressing CRS2 and CAF2 were assayed for their ability to rescue a pthts E. coli strain (Figure 9). pthts E. coli cells were transformed with expression plasmids at the permissive temperature, 30 8C, and then streaked on plates that were grown at either the permissive temperature or the non-permissive temperature, 42 8C. Expression of wild-type PTH rescued growth at the non-permissive temperature, whereas the empty vector did not (Figure 9(a), compare streaks 2 and 1). Mutating the basic clamp residues of PTH to mimic CRS2, i.e. PTH(K105Q, R133S), did not prevent rescue (Figure 9(a), streak 3), indicating that the absence of the basic clamp in CRS2 is not responsible for the failure of CRS2 to rescue. As such, the only obvious difference between CRS2 and PTH is not responsible for CRS2 0 s inability to rescue. Wild-type CRS2 failed to rescue (Figure 9(a), streak 4), as was observed previously.7 A CRS2 variant was engineered to be more akin to a monomeric, bacterial PTH by changing the CAF binding site to resemble the presumably monomeric Bacillus subtilis PTH. These substitutions resulted in a significant increase in the recovery of CRS2 in the soluble fraction (Figure 9(b), compare lanes 5 and 4), which is consistent with these mutations generating a soluble, monomeric CRS2. Additionally, the substitutions Q105K and S133R were made to create the basic clamp that is characteristic of the bacterial enzymes, and that is missing in wild-type CRS2. Despite these adjustments, CRS2(I42N,S44F, I50T,Q105K, S133R) was unable to rescue the pthts E. coli strain (Figure 9(a), streak 5). The CRS2 variants used in this experiment started from a methionine substituted for V-2 position, so that they

Figure 9. Complementation assay for PTH activity of CRS2 and PTH variants. (a) PTH and CRS2 variants were cloned into the pAC28 expression vector27 and introduced into E. coli strain MF100, which has a temperaturesensitive mutation in pth.8 Expression of proteins cloned into pAC28 is under the control of the T7 promoter. T7 RNA polymerase was provided by leaky expression from the T7 RNA polymerase expression plasmid pAR1219, in the absence of inducer. Shown are streaks of MF100 cells harboring pAR1219 and co-transformed with (1) pAC28, (2) pAC28-PTH, (3) pAC28-PTH(K105Q,R133S), (4) pAC28-CRS2, (5) pAC28-CRS2(I42N,S44F,I50T, Q105K,S133R), and (6) pAC28-CAF2/CRS2. The temperature at which each plate was incubated is indicated; 30 8C is permissive for growth of MF100 and 42 8C is non-permissive. (b) Immunoblot blot analysis of the soluble fraction of E. coli cell lines expressing CRS2 and PTH variants. Lanes 1–6 show MF100 pAR1219 co-transformed with (1) pAC28, (2) pAC28-PTH, (3) pAC28-PTH(K105Q,R133S), (4) pAC28-CRS2, (5) pAC28-CRS2(I42N,S44F,I50T,Q105K,S133R), and (6) pAC28-CAF2/CRS2 grown in the absence of IPTG. Lane 7 shows MF100 pAR1219 pAC28-CAF2/CRS2 induced with 1 mM IPTG. Lanes 8 and 9 show BL21 (DE3) Star pAC28-CAF2/CRS2 uninduced and induced with 1 mM IPTG. The antibodies used are indicated.

Structure of a Group II Intron Splicing Factor

lacked the chloroplast targeting peptide, and were full-length in that they possessed the CRS2-specific C-terminal extension. None of the CRS2 constructs were tagged in any way. Lastly, a CRS2-CAF2 co-expression construct did not restore growth of the mutant strain (Figure 9(a), streak 6). In the cell line BL21 (DE3) Star (Novagen), which has been optimized for protein overexpression, CRS2 and CAF2 are expressed and yield soluble CRS2–CAF2 complex that can be purified by column chromatography (unpublished data). Unfortunately, in the pthts E. coli strain used in this study, expression of CAF2 was not detected (Figure 9(b)). Therefore, it is impossible to know if appreciable amounts of soluble CRS2–CAF2 complex were present in the pthts E. coli strain, and so the PTH activity of CRS2 in complex with CAF2 remains ambiguous. However, it should be noted that extremely low expression of PTH is able to rescue the cell line used in this experiment. PTH under the control of a T7 RNA polymerase promoter is able to rescue this cell line in the absence of T7 RNA polymerase (not shown). In addition, archaeal and yeast PTHs are capable of rescuing a pthts E. coli strain, demonstrating that PTHs that are much more highly divergent than CRS2 have detectable PTH activity in this assay.17,18 As such, the failure of the CRS2 constructs to rescue this cell line despite detectable accumulation of soluble CRS2, suggests that CRS2 lacks PTH activity. Taken together, these results eliminate the trivial explanations for the failure of CRS2 to act as a PTH, and provide evidence that CRS2 does not function as a PTH in the chloroplast. The failure to observe PTH activity begs two questions: (1) what property of CRS2 prevents it from functioning as a PTH; and (2) might the conservation of the PTH-like active site be relevant to the function of CRS2 in facilitating group II intron splicing? Potential RNA interaction surface of CRS2 Protein co-factors for group I and group II splicing whose interactions with intron have been studied in vitro bind specific regions of their target introns with high affinity, and by doing so promote the productive folding of the intron.3,19–21 CRS2 does not fit readily into this paradigm because it does not have intrinsic high-affinity intron binding activity (unpublished data). Figure 10 shows electrostatic surface representations of the predicted tRNA-binding face of PTH and the corresponding face of CRS2. In CRS2 this face is considerably more basic than in PTH. CRS2 and intron RNA are coimmunoprecipitated from chloroplast extract by anti-CAF antisera,6 so it is likely that CRS2 contributes to splicing by interacting directly with the intron RNA. The increase in basic surface area relative to E. coli PTH suggests that CRS2 has acquired the ability to interact with an RNA more extensively than does its PTH ancestor. Conservation of basic residues among predicted orthologs of CRS2 supports the idea that these

63

Figure 10. Electrostatic surface representations of the tRNA binding site of PTH and the corresponding surface of CRS2. Basic surfaces are colored blue, and acidic surfaces are colored red. The residues of the PTH basic clamp are indicated. Basic residues conserved among predicted CRS2 orthologs are indicated. CRS2 possesses a pocket created by the conserved substitutions K105Q and R133S. H20 of the PTH active site is found in the middle of the prominent cleft in the protein.

substitutions are relevant to CRS2 function. Surfaceexposed basic residues that could potentially contribute to RNA binding are indicated in Figure 1. Of these R103, K107 and R131 have basic counterparts in many PTHs, R127 has counterparts in plant PTH-like genes, but R86, R146, K192, K194, H196 and R197 are conserved only in CRS2 and its predicted orthologs. Residues R103, K107 and R131 form a cluster surrounding residues 105 and 133, which form the basic clamp in PTH. The basic residues R86 and R127 contribute to a second ring of basic residues surrounding this basic cluster (Figure 10), and could contribute to the binding of an RNA substrate.

64 The conserved CRS2-specific substitutions, K105Q and R133S create a pocket on this putative RNA binding surface that is not found in PTH (Figures 8 and 10). In predicted CRS2 orthologs, the residues lining this pocket, Y92, D94, Q105 and S133, are conserved (Figure 1), raising the possibility that this pocket contributes to CRS2 function. Given the wealth of hydrogen bond donors and acceptors lining this pocket (Figure 8), its size, and the expectation that CRS2 binds RNA, an attractive hypothesis is that this pocket could bind a nucleotide base. Plant organellar CRS2-like and PTH-like genes PTH-like and CRS2-like proteins are encoded by a small gene family in plants, of which maize CRS2 is the only characterized member. Figure 1 shows a sequence alignment between four Arabidopsis proteins, maize CRS2, the product of a second predicted PTH-like gene in maize, and representative bacterial PTHs. All of the plant proteins have predicted organellar targeting sequences and are designated as either mitochondrial (M) or chloroplast (C) localized, based on predictions of the TargetP algorithm. 22 The plant proteins are assigned as “CRS2” rather than “PTH” based upon the following features: (1) maintenance of a potential CAF binding site consisting of the cluster of conserved uncharged/hydrophobic residues, T41, I42, Q43, S44, L47 and I50; (2) substitution of glutamine and serine for PTH basic clamp residues K105 and R133; and (3) the presence of the highly conserved QKYKFHRV putative RNA binding motif at their C terminus. Arabidopsis encodes two highly similar predicted chloroplast targeted CRS2-like proteins, At5g38290 and At5g16140, which in Figure 1 are labeled At.C_CRS2.A and At.C_CRS2.B. A T-DNA insertion in At.C_CRS2.A had no phenotype (data not shown), consistent with redundant CRS2-like proteins being targeted to the Arabidopsis chloroplast. Available genome sequence data provide evidence for just one such gene in both maize and rice, suggesting that there may have been a duplication of the ancestral CRS2 gene in the Arabidopsis ancestor, after the divergence of monocots and dicots. Organellar translation presumably necessitates PTH activity, based on the requirement for this activity in bacteria. In both maize and Arabidopsis, PTH activity in the chloroplast appears to be accounted for by predicted chloroplast-localized PTH-like proteins lacking the CRS2 signature motifs (see At.C_PTH and Zm.C_PTH in Figure 1), suggesting that CRS2 need not supply PTH activity to the chloroplast. This is consistent with the inability of CRS2 and its more soluble variants to rescue a pthts E. coli strain. Arabidopsis protein At5g19830 possesses a mix of CRS2-like and PTH-like traits and is predicted to be targeted to mitochondria (At.M_PTH/CRS2 in Figure 1). Residues T41, I42, Q43, S44, L47, and I50 on CRS2 0 s CAF-binding surface are not

Structure of a Group II Intron Splicing Factor

conserved in this protein, but the substitutions maintain the hydrophobic nature of the surface. This surface could potentially bind to mitochondrially targeted CAF paralogs that are predicted from maize and Arabidopsis genome sequence data (unpublished observations). This protein possesses only half of the basic clamp, R133, but like CRS2 the other half of the clamp is Q105. Unlike PTH, this protein possesses a CRS2-like basic and aromatic tail. As such, this protein is either a mitochondrial PTH with a CRS2-like tail, a mitochondrial CRS2like splicing factor, or both. The chimeric nature of this protein gives it the appearance of an evolutionary intermediate between PTH and CRS2. It will be interesting to determine if this chimeric protein is bi-functional, providing both PTH activity and group II intron splicing activity in plant mitochondria.

Materials and Methods Construction of protein expression plasmids All primers used in the engineering of CRS2 constructs and their mutagenesis are listed in Table 2. The structures of three CRS2 constructs are presented here: CRS2DC, CRS2(I42N,S44F,I50T) and CRS2(I42E). The CRS2DC construct has an N-terminal hexahistidine tag and the last seven C-terminal residues were deleted, but it is otherwise wild-type. The crs2 gene was amplified using the primers CRS2C, which introduced an in-frame BamHI site immediately upstream of V57 (CRS2 numbering), and CRS2K, which placed a KpnI site downstream of the native stop codon. The resulting PCR product, CRS2CK, was cloned into the BamHI and KpnI sites of pQE40 (Qiagen) such that the residues MRGSHHHHHHGS were fused to V57 (CRS2 numbering). This “full-length”, wild-type protein exhibited poor solubility and proved exceedingly difficult to crystallize. Solubility was improved by removing the CRS2 specific C-terminal tail, KYKFHRV. CRS2 K250 (CRS2 numbering) was substituted with a stop codon using the QuickChange Site-Directed Mutagenesis kit (Stratagene) with the primers CRS2M and CRS2N to generate the plasmid pQE40-CRS2CKDC from which was expressed the protein referred to in the paper as CRS2DC. The protein CRS2(I42E) is the same as CRS2DC except that I42 (Figure 1, PTH numbering) has been replaced with glutamate. The substitution, I42E, was achieved by site-directed mutagenesis using the primers CRS2I42ET and CRS2I42EB. The CRS2(I42N,S44F,I50T) protein has a C-terminal hexahistidine tag and starts at CRS2 residue 57 that has been changed from a valine to a methionine. The crs2 gene was PCR amplified using the primers CRS2Y and CRS2Z. CRS2Y simultaneously introduces an NcoI site and makes the V57M substitution. CRS2Z introduces an XhoI site immediately downstream of CRS2 K250. Cloning of the CRS2YZ PCR product into the NcoI and XhoI sites of pET28 appends the sequence LEHHHHHH downstream of CRS2 K250 (K191 in PTH numbering). Thus, in pET28CRS2YZ the basic, aromatic tail of CRS2 has been removed and replaced with the C-terminal hexahistidine tail. The expression plasmid for CRS2(I42N,S44F,I50T) was produced by sequential QuickChange (Stratagene) site-directed mutagenesis reactions that created the

65

Structure of a Group II Intron Splicing Factor

Table 2. PCR amplification and site-directed mutagenesis primer pairs CRS2C CRS2K CRS2G CRS2H CRS2M CRS2N CRS2S CRS2T CRS2Y CRS2Z CRS2I42ET CRS2I42EB CRS2I42NU CRS2I42NL CRS2S44FU CRS2S44FL CRS2S44FU2 CRS2S44FL2 CRS2I50TU CRS2I50TL CRS2Q105KU CRS2Q105KL CRS2S133RU CRS2S133RL PTHN PTHC PTHQU PTHQL PTHSU PTHSL RCCRS2NcoI RCCRS2XhoI pET28HindIII T7term

GCGGGATCCGTGGAATACACG CGGGGTACCAAGTATTTCACAGATCC GGGGCCCATGGAATACACGC CCTCAAGAATTCAAACCCTGTGG GATTCAACCTTGTGCAGTAGTACAAGTTCCACAGG CCTGTGGAACTTGTACTACTGCACAAGGTTGAATC TTCACCATGGTCTCCTCCGTC CACGGATCCTCAAACATTCAA GGAGATATACCATGGAATACACGC GTGGAACTCGAGCTTCTGCACAAG GGGATTACGATGAACACAGAGCAGTCCAAGTCGCTTCTG CAGAAGCGACTTGGACTGCTCTGTGTTCATCGTAATCCC ATTACGATGAACACAAACCAGTCCAAGTCGCTT AAGCGACTTGGACTGGTTTGTGTTCATCGTAAT ATGAACACAAACCAGTTTAAGTCGCTTCTGGGA TCCCAGAAGCGACTTAAACTGGTTTGTGTTCAT ATGAACACAATCCAGTTTAAGTCGCTTCTGGGA TCCCAGAAGCGACTTAAACTGGATTGTGTTCAT AAGTCGCTTCTGGGAACTGGTTCAATTGGCGAG CTCGCCAATTGAACCAGTTCCCAGAAGCGACTT AATGGTGTACTGCGGCTTAAAAAGAAAGGTGGTCATGGT ACCATGACCACCTTTCTTTTTAAGCCGCAGTACACCATT GAATTTCCTCGTTTACGTATAGGCATTGGTAGC GCTACCAATGCCTATACGTAAACGAGGAAATTC CAAAAAAACATGTCGATTAAATTG CCGGATCCGCAGACAACGACTTA CCTGGCGTCGCCAAATTTCAGTTGGGCGGTGGCCATGGT ACCATGGCCACCGCCCAACTGAAATTTGGCGACGCCAGG CCTAACTTTCACCGTTTAAGCATCGGAATCGGTCATCCG CGGATGACCGATTCCGATGCTTAAACGGTGAAAGTTAGG TGCCATGGTCTCCTCCGTCCCAGA CCGCTCGAGTCAAACCCTGTGGAACTTGTA GCAAGCTTCCCCTCTAGAAATAATTTTG GCTAGTTATTGCTCAGCGG

substitutions I42N, S44F, and I50T. The primer pairs for these reactions were CRS2I42NU-CRS2I42NL, CRS2S44FCRS2S44FL, and CRS2I50TU-CRS2I50TL, respectively. CRS2DC expression, purification and crystallization E. coli M15 cells containing the plasmids pREP4 and pQE40-CRS2CKDC were grown in Luria–Bertini broth supplemented with 200 mg lK1 ampicillin and 100 mg lK1 kanamycin sulfate in a stirring fermenter at 37 8C to an A600 nm of 1.4 at which time the temperature was reduced to 25 8C and production of protein induced by the addition of isopropyl-b-D-galactopyranoside (IPTG) to a final concentration of 1 mM. The cell culture was induced for five hours, harvested by centrifugation and the cell pellets stored at K20 8C. CRS2DC was purified using immobilized-metal affinity chromatography performed under denaturing conditions. Cell pellets were resuspended in 250 mM sodium chloride, 50 mM sodium phosphate (pH 7.0), and 20 mM imidazole. Cells were lysed by sonication and insoluble material pelleted by centrifugation (15,000g for 20 minutes). The insoluble material was resuspended in 8 M urea, 100 mM sodium phosphate (pH 8.0), 10 mM Tris, centrifuged again (15,000g 20 minutes) to remove material insoluble in 8 M urea, and applied to a column of Ni2C-NTA resin (QIAgen). The column was washed with 8 M urea, 100 mM sodium phosphate (pH 6.0), 10 mM Tris, and eluted with 8 M urea, 150 mM sodium acetate (pH 4.5). Soluble CRS2DC was generated by refolding the denatured protein and purified by gel filtration chromatography. Denatured CRS2DC in 8 M urea was refolded by diluting to a concentration of 1 mg mlK1 in deionized

water followed immediately by dialysis against 5% glycerol, 30 mM sodium acetate (pH 4.5). To separate misfolded and/or aggregated protein from soluble protein the crude refolded protein solution was passed through a Hi-Load Superdex 75 gel filtration column (Pharmacia). CRS2DC eluted in two peaks, one at the void volume which was attributed to misfolded and/or aggregated protein and a second peak whose elution volume was consistent with a monomeric 22 kDa protein. Fractions containing the monomeric CRS2DC were pooled, dialyzed against 5% glycerol, 20 mM sodium acetate (pH 6.0), and concentrated to w10 mg mlK1 using a Centriprep-10 device (Amicon) prior to crystallization trials. CRS2DC was crystallized at room temperature using the vapor-diffusion method with sitting drops. Crystals grew from drops that were initially equal volumes of 10 mg mlK1 CRS2DC and mother liquor: 24% PEG 3400, 100 mM sodium chloride, 100 mM Caps (pH 10.5). Immediately upon mixing of protein solution and precipitant a heavy precipitate formed. Crystals grew out of this precipitate within 24 hours. To increase the size of the crystals the drops were supplemented with additional protein and mother liquor solution mixed 1 : 1 that had been centrifuged to remove precipitate. CRS2DC crystals were 0.4 mm!0.2 mm!0.1 mm rectangular prisms. CRS2(I42N,S44F,I50T) expression, purification and crystallization E. coli BL21 (DE3) Gold cells containing pET28CRS2YZ(I42N, S44F, I50T) were grown in Luria–Bertini

66 broth supplemented with 100 mg lK1 kanamycin sulfate in sloshing flasks at 37 8C to an A600 nm of 1.0 at which time the temperature was reduced to 16 8C and production of protein induced by the addition of isopropyl-bD-galactopyranoside (IPTG) to a final concentration of 1 mM. The cell culture was induced overnight, harvested by centrifugation and the cell pellets stored at K20 8 C. Cell pellets were resuspended in lysis buffer that consisted of 10% glycerol, 200 mM sodium chloride and 25 mM Tris (pH 8.0). Cells were lysed by sonication and insoluble material removed by centrifugation (15,000g for 20 minutes). The soluble material was applied to a column of Ni2C-NTA resin (QIAgen) equilibrated in lysis buffer. The column was washed with ten column volumes of lysis buffer followed by ten column volumes of lysis buffer supplemented with 20 mM imidazole. Protein was eluted with a linear gradient of 20 mM to 200 mM imidazole in lysis buffer. Fractions containing CRS2(I42N,S44F,I50T) were pooled and supplemented with lithium sulfate to a final concentration of 200 mM. Fractions containing CRS2(I42N,S44F,I50T) were concentrated using a Centriprep-10 device (Amicon), and the crude Ni2C-NTA elution passed through a Hi-Load Superdex 75 gel filtration column (Pharmacia) with 10% glycerol, 200 mM sodium chloride, 200 mM lithium sulfate and 20 mM Tris (pH 8.0) as buffer. The elution volume of CRS2(I42N,S44F,I50T) was consistent with a monomeric 22 kDa protein. CRS2(I42N,S44F,I50T) gel filtration fractions were pooled and concentrated to 11.4 mg mlK1 using a Centriprep-10 device (Amicon) prior to crystallization trials. CRS2(I42N,S44F,I50T) was crystallized at room temperature using the vapor diffusion method with sitting drops. Crystals grew from drops initially containing equal volumes of CRS2(I42N,S44F,I50T) and mother liquor consisting of 30% PEG 4000, 200 mM lithium sulfate and 100 mM Tris (pH 8.4). CRS2(I42N,S44F,I50T) crystals were 0.2 mm!0.1 mm!0.1 mm rectangular prisms. Prior to collecting data crystals were flash-cooled by direct transfer into a 100 K liquid nitrogen cold stream. A single data set was collected at the Advanced Light Source using beamline 8.2.2. CRS2(I42E) expression, purification and crystallization The method of CRS2(I42E) expression and purification was the same as for CRS2DC. CRS2(I42E) was crystallized at room temperature using the vapor diffusion method with sitting drops. Crystals grew from drops initially containing equal volumes of w10 mg mlK1 CRS2(I42E) and mother liquor: 10% PEG 3400, 100 mM sodium citrate (pH 5.5). CRS2(I42N,S44F,I50T) crystals were 0.4 mm! 0.2 mm!0.2 mm rectangular prisms. Prior to collecting data crystals were transferred to paratone, in which excess precipitant solution was removed, and then flashcooled by direct transfer into a 100 K liquid nitrogen cold stream. A single data set was collected at the Advanced Light Source using beamline 5.0.3. Structure determination of CRS2DC Prior to data collection, crystals were transferred to paratone, in which excess precipitant solution was removed, and then flash-cooled by direct transfer into a 100 K liquid nitrogen cold stream. The structure was solved from a single data set that was collected at the Stanford Synchrotron Radiation Laboratory using beamline 9-1. Indexing and integration of reflections was

Structure of a Group II Intron Splicing Factor performed using MOSFLM.23 Scaling was performed using SCALA.23 Initial phases were obtained by molecular replacement with the program epmr.24 The search model used was the E. coli peptidyl-tRNA hydrolase structure11 of which non-conserved and surface-exposed residues had been trimmed to alanine. The final model was generated by multiple rounds of refinement using TNT25 and model building using Xfit.26 Structure determination of CRS2(I42N,S44F,I50T) Prior to data collection, crystals were flash-cooled by direct transfer into a 100 K liquid nitrogen cold stream. A single data set was collected at the Advanced Light Source using beamline 8.2.2. Indexing and integration of reflections was performed using MOSFLM.23 Scaling was performed using SCALA.23 Initial phases were obtained by molecular replacement with the program epmr24 using residues 1–178 of the CRS2DC structure of which surfaceexposed residues had been trimmed to alanine as the search model. The final model was generated by model building using Xfit26 and refinement using TNT.25 Structure determination of CRS2(I42E) Prior to data collection, crystals were transferred to paratone, in which excess precipitant solution was removed, and then flash-cooled by direct transfer into a 100 K liquid nitrogen cold stream. The structure was solved from a single data set that was collected at the Advanced Light Source using beamline 5.0.3. Indexing and integration of reflections was performed using MOSFLM.23 Scaling was performed using SCALA.23 Initial phases were obtained by molecular replacement with the program epmr24 using residues 1–178 of the CRS2DC structure of which surface-exposed residues had been trimmed to alanine as the search model. The model was generated by multiple rounds of model building using Xfit26 and refinement using TNT.25 Directed yeast two-hybrid assay of the CAF binding site A directed yeast two-hybrid assay was employed to test the hypothesis that CRS2 residues I42, S44 and I50 (PTH numbering) participate in CAF binding. The plasmid pBD-CRS2 expresses CRS2 fused to the DNA binding domain of Gal4 as described.6 The plasmids pAD-CAF1 and pAD-CAF2 express full-length CAF1 and CAF2 fused to the Gal4 transcriptional activation domain and are the plasmids originally identified in the yeast two-hybrid screen that discovered these two proteins. QuickChange site-directed mutagenesis (Stratagene) was employed to generate the substitutions I42N, S44F and I50T. The substituting amino acids were chosen to convert the residues of the hydrophobic patch to those residues found in the presumably monomeric B. subtilis PTH. All possible individual and pair-wise combinations as well as the triple mutant were generated using the primer pairs CRS2I42NU-CRS2I42NL, CRS2S44FU-CRS2S44L, and CRS2I50TU-CRS2I50TL, with the addition of the primer pair CRS2S44U2-CRS2S44L2 (Table 2) that permitted the S44F substitution, while maintaining I42 as wild-type. The yeast strains were generated by sequential transformation using the S.c. EasyComp Transformation Kit (Invitrogen) according to manufacturers instructions. First, the strain YRG-2 was transformed with either pAD-CAF1 or pAD-CAF2. The YRG-2 pAD-CAF1 and YRG-2 pAD-CAF2 strains were then made competent and

67

Structure of a Group II Intron Splicing Factor

transformed with the eight pBD-CRS2 constructs: pBDCRS2, pBD-CRS2(N), pBD-CRS2(F), pBD-CRS2(T), pBD-CRS2(NF), pBD-CRS2(NT), pBD-CRS2(FT), and pBD-CRS2(I42N,S44F,I50T), and plated on SD medium lacking tryptophan and leucine. Colonies from these plates were then streaked on the same medium either containing or lacking histidine, and grown for four days at 30 8C. Complementation assay for PTH activity Five protein constructs were tested for their ability to rescue a pthts strain: wild-type E. coli peptidyl-tRNA hydrolase (PTH), PTH with the substitutions K105Q and R133S, wild-type CRS2, CRS2 with the substitutions I42N, S44F, I50T, Q105K and S133R, and a construct expressing both CAF2 and CRS2. The full-length, wild-type pth gene was PCR amplified from E. coli genomic DNA using the primers PTHN and PTHC, which introduce AflIII and BamHI sites at the 5 0 and 3 0 ends of the gene, respectively. The PTHNC PCR product was cloned into the NcoI and BamHI sites of pET28, and then shuttled into pAC2827 using the restriction endonucleases XbaI and ClaI. The substitutions K105Q and R133S were introduced into pAC28-PTH using the primer pairs PTHQU-PTHQL and PTHSU-PTHSL. The wild-type crs2 gene was PCR amplified from a maize cDNA using the primers CRS2G and CRS2H, which introduce NcoI and EcoRI sites at the 5 0 and 3 0 ends of the desired coding sequence. The NcoI site was positioned such that the CRS2 protein started from a methionine substituted for V57 (CRS2 numbering) such that the chloroplast targeting sequence would not be included in this construct and the start of this protein coincides with the start of the homology to bacterial PTHs. The EcoRI site was positioned downstream of the native stop codon. The CRS2GH PCR product was cloned into the NcoI and EcoRI sites of pET28, and then shuttled into pAC28 using the restriction endonucleases XbaI and ClaI. The substitutions I42N, S44F, I50T, Q105K and S133R were introduced into pAC28-CRS2GH using the primer pairs CRS2I42NU-CRS2I42NL, CRS2S44FU-CRS2S44FL, CRS2I50TU-CRS2I50TL, CRS2Q105KU-CRS2Q105KL and CRS2S133RU-CRS2S133RL. The wild-type caf2 gene was PCR amplified from pAD-CAF26 using the primers CAF2P and CAF2Q, which introduce BamHI and HindIII sites at the 5 0 and 3 0 ends of the desired coding region. The BamHI site was positioned to remove the predicted chloroplast targeting peptide with the amplified CAF2 reading frame starting from Q59. The HindIII site was positioned downstream of the native stop codon. The CAF2PQ PCR product was cloned into the BamHI and HindIII sites of pET-28a fusing the protein to the Nterminal hexahistidine and T7 tags encoded by this vector. A CRS2 open reading frame was inserted downstream of the CAF2 coding sequence to generate a CAF2CRS2 co-expression plasmid as follows. The wild-type crs2 gene was PCR amplified from maize cDNA using the primers RCCRS2NcoI and RCCRS2XhoI, which introduce NcoI and XhoI sites at the 5 0 and 3 0 ends of the desired coding sequence. The NcoI site was positioned such that this CRS2 protein started from V46 (CRS2 numbering), which is the predicted chloroplast targeting peptide cleavage site. The NcoI site was introduced so as to substitute methionine for V45 (CRS2 numbering). The XhoI site was positioned downstream of the native stop codon. After cloning into the NcoI and XhoI sites of pET28b, the CRS2 sequence plus the plasmid encoded ribosome binding site was PCR amplified using the primers pET28HindIII and T7 term and cloned into the

HindIII and XhoI sites downstream of CAF2 in pET28CAF2. The CAF2-CRS2 cassette was subsequently transferred into pAC28 using the restriction endonucleases XbaI and ClaI. The E. coli pthts strain MF100 8 was transformed sequentially with the expression constructs for PTH, PTH(K105Q,R133S), CRS2, CRS2(I42N,S44F, I50T,Q105K,S133R) and CRS2-CAF2 followed by pAR1219, which is a T7 RNA polymerase over-expression construct that has the polymerase gene cloned into pBR322.28 This vector is “leaky” and expresses T7 RNA polymerase in the absence of IPTG. MF100 strains containing both a pAC28 construct and a pAR1219 construct were grown overnight in LB broth supplemented with 200 mg lK1 ampicillin and 100 mg lK1 kanamycin sulfate at 30 8C. Overnights were diluted into fresh media and grown four to eight hours at 30 8C to an A600 nm of w1. These cultures were streaked out on LB plates supplemented with 200 mg lK1 ampicillin and 100 mg lK1 kanamycin and grown at either 30 or 42 8C. In addition, aliquots of the liquid cultures were harvested and stored at K20 8C for subsequent Western blot analysis for the expression of CRS2 or CAF2. As a positive control for CAF2 and CRS2 expression from the pAC28-CAF2/CRS2 construct, E. coli BL21 (DE3) Star pAC28-CAF2/CRS2 cells were grown to on A600 nm of w1 and induced with 1 mM IPTG for two hours. The presence of CRS2 and CAF2 in the soluble fraction of these cell lines was assayed by Western blot. The antibodies raised against CAF26 and CRS27 have been described. Protein Data Bank accession codes Coordinates and structure factors have been deposited with the RCSB Protein Data Bank. The PDB ID codes are 1RYB for CRS2, 1RYN for CRS2(I42N,S44F,I50T) and 1RYM for CRS2(I42E).

Acknowledgements We thank Blaine Mooers and Rosalind WilliamsCarrier for invaluable assistance. In addition we thank the staffs of the Stanford Synchrotron Radiation Laboratory and the Advanced Light Source for their technical support. S.S.R.L. and A.L.S. are supported by the Department of Energy. This work was supported in part by the NIH (GM20066, B.W.M.), the National Science Foundation (MCB-0314597, A.B.) and NIH Training Grant GM07759 (G.J.O.).

References 1. Cech, T. R. (1993). Structure and mechanism of the large catalytic RNAs: group I and group II introns and ribonuclease P. In The RNA World (Gesteland, R. F. & Atkins, J. F., eds), pp. 239–269, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 2. Lambowitz, A. M., Caprara, M. G., Zimmerly, S. & Perlman, P. S. (1999). Group I and group II ribozymes as RNPs: clues to the past and guides to the future. In

68

3. 4.

5.

6.

7. 8. 9. 10.

11.

12.

13.

14.

15.

Structure of a Group II Intron Splicing Factor

The RNA World (Gesteland, R. F., Cech, T. R. & Atkins, J. F., eds), pp. 451–485, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Matsuura, M., Noah, J. W. & Lambowitz, A. M. (2001). Mechanism of maturase-promoted group II intron splicing. EMBO J. 20, 7259–7270. Jenkins, B. D., Kulhanek, D. J. & Barkan, A. (1997). Nuclear mutations that block group II intron splicing in maize chloroplasts reveal several intron classes with distinct requirements for splicing factors. Plant Cell, 9, 283–296. Vogel, J., Borner, T. & Hess, W. (1999). Comparative analysis of splicing of the complete set of chloroplast group II introns in three higher plants mutants. Nucl. Acids Res. 27, 3866–3874. Ostheimer, G. J., Williams-Carrier, R., Belcher, S., Osborne, E., Gierke, J. & Barkan, A. (2003). Group II intron splicing factors derived by diversification of an ancient RNA-binding domain. EMBO J. 22, 3919– 3929. Jenkins, B. D. & Barkan, A. (2001). Recruitment of a peptidyl-tRNA hydrolase as a facilitator of group II intron splicing in chloroplasts. EMBO J. 20, 872–879. Menninger, J. (1976). Peptidyl-transfer RNA dissociates during protein synthesis from ribosomes of Escherichia coli. J. Biol. Chem. 251, 3392–3398. Menninger, J. (1979). Accumulation of peptidyl-tRNA is lethal to Escherichia coli. J. Bacteriol. 137, 694–696. Heurgue-Hamard, V., Mora, L., Guarneros, G. & Buckingham, R. H. (1996). The growth defect in Escherichia coli deficient in peptidyl-tRNA hydrolase is due to starvation for Lys-tRNALys. EMBO J. 15, 2826–2833. Schmitt, E., Mechulam, Y., Fromant, M., Plateau, P. & ˚ resolBlanquet, S. (1997). Crystal structure at 1.2 A ution and active site mapping of Escherichia coli peptidyl-tRNA hydrolase. EMBO J. 16, 4760–4769. Goodall, J. J., Chen, G. J. & Page, M. G. P. (2004). Essential role of histidine 20 in the catalytic mechanism of Escherichia coli peptidyl-tRNA hydrolase. Biochemistry, 43, 4583–4591. Shiloach, J., Bauer, S., Groot, N. d. & Lapidot, Y. (1975). The influence of peptide chain length on the activity of peptidyl-tRNA hydrolase from E. coli. Nucl. Acids Res. 2, 1941–1950. Fromant, M., Plateau, P., Schmitt, E., Mechulam, Y. & Blanquet, S. (1999). Receptor site for the 5 0 -phosphate of elongator tRNAs governs substrate selection by peptidyl-tRNA hydrolase. Biochemistry, 38, 4982–4987. Schulman, L. H. & Pelka, H. (1975). The structural

16.

17.

18.

19. 20. 21. 22.

23. 24. 25. 26. 27. 28.

basis for the resistance of Escherichia coli formylmethionyl transfer ribonucleic acid to cleavage by Escherichia coli peptidyl transfer ribonucleic acid hydrolase. J. Biol. Chem. 250, 542–547. Dutka, S., Meinnel, T., Lazennec, C., Mechulam, Y. & Blanquet, S. (1993). Role of the 1–72 base pair in tRNAs for the activity of Escherichia coli peptidyltRNA hydrolase. Nucl. Acids Res. 21, 4025–4030. Menez, J., Buckingham, R. H., de Zamaroczy, M. & Campelli, C. K. (2002). Peptidyl-tRNA hydrolase in Bacillus subtilis, encoded by spoVC, is essential to vegetative growth, whereas the homologous enzyme in Saccharomyces cerevisiae is dispensable. Mol. Microbiol. 45, 123–129. Rosas-Sandoval, G., Ambrogelly, A., Rinehart, J., Wei, D., Cruz-Vera, L. R., Graham, D. E. et al. (2002). Orthologs of a novel archaeal and of the bacterial peptidyl-tRNA hydrolase are nonessential in yeast. Proc. Natl Acad. Sci. USA, 99, 16707–16712. Weeks, K. & Cech, T. R. (1995). Protein facilitation of group I intron splicing by assembly of the catalytic core and the 5 0 splice site domain. Cell, 82, 221–230. Weeks, K. & Cech, T. R. (1995). Efficient proteinfacilitated splicing of the yeast mitochondrial bI5 intron. Biochemistry, 34, 7728–7738. Solem, A., Chatterjee, P. & Caprara, M. G. (2002). A novel mechanism for protein-assisted group I intron splicing. RNA, 8, 412–425. Emanuelsson, O., Nielsen, H., Brunak, S. & von Heijne, G. (2000). Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016. CCP4 (Collaborative Computational Project 4). (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallog. sect. D, 50, 760–763. Kissinger, C. R., Gehlhaar, D. K. & Fogel, D. B. (1999). Rapid automated molecular replacement by evolutionary search. Acta Crystallog. sect. D, 55, 484–491. Tronrud, D. E. (1997). The TNT refinement package. Methods Enzymol. 277, 306–319. McRee, D. E. (1999). XtalView/Xfit—a versatile program for manipulating atomic coordinates and electron density. J. Struct. Biol. 125, 156–165. Kholod, N. & Mustelin, T. (2001). Novel vectors for co-expression of two proteins in E. coli. BioTechniques, 31, 1–4. Davanloo, P., Rosenberg, A. H., Dunn, J. J. & Studier, F. W. (1984). Cloning and expression of the gene for bacteriophage T7 RNA polymerase. Proc. Natl Acad. Sci. USA, 81, 2035–2039.

Edited by J. Doudna (Received 16 June 2004; received in revised form 2 October 2004; accepted 11 October 2004)