Structural Insights into the Protease-like Antigen Plasmodium falciparum SERA5 and Its Noncanonical Active-Site Serine

Structural Insights into the Protease-like Antigen Plasmodium falciparum SERA5 and Its Noncanonical Active-Site Serine

J. Mol. Biol. (2009) 392, 154–165 doi:10.1016/j.jmb.2009.07.007 Available online at www.sciencedirect.com Structural Insights into the Protease-lik...

2MB Sizes 0 Downloads 41 Views

J. Mol. Biol. (2009) 392, 154–165

doi:10.1016/j.jmb.2009.07.007

Available online at www.sciencedirect.com

Structural Insights into the Protease-like Antigen Plasmodium falciparum SERA5 and Its Noncanonical Active-Site Serine Anthony N. Hodder 1 †, Robyn L. Malby 1 †, Oliver B. Clarke 1,2 †, W. Douglas Fairlie 1 , Peter M. Colman 1 , Brendan S. Crabb 1 ⁎ and Brian J. Smith 1 ⁎ 1

The Walter and Eliza Hall Institute of Medical Research, Melbourne 3052, Australia 2

Department of Medical Biology, University of Melbourne, Melbourne 3010, Australia Received 23 April 2009; received in revised form 30 June 2009; accepted 2 July 2009 Available online 8 July 2009

The sera genes of the malaria-causing parasite Plasmodium encode a family of unique proteins that are maximally expressed at the time of egress of parasites from infected red blood cells. These multi-domain proteins are unique, containing a central papain-like cysteine-protease fragment enclosed between the disulfide-linked N- and C-terminal domains. However, the central fragment of several members of this family, including serine repeat antigen 5 (SERA5), contains a serine (S596) in place of the active-site cysteine. Here we report the crystal structure of the central protease-like domain of Plasmodium falciparum SERA5, revealing a number of anomalies in addition to the putative nucleophilic serine: (1) the structure of the putative active site is not conducive to binding substrate in the canonical cysteine-protease manner; (2) the side chain of D594 restricts access of substrate to the putative active site; and (3) the S2 specificity pocket is occupied by the side chain of Y735, reducing this site to a small depression on the protein surface. Attempts to determine the structure in complex with known inhibitors were not successful. Thus, despite having revealed its structure, the function of the catalytic domain of SERA5 remains an enigma. © 2009 Elsevier Ltd. All rights reserved.

Edited by R. Huber

Keywords: Plasmodium falciparum; SERA5; structure; X-ray crystallography

Introduction Members of the family of serine repeat antigens (SERAs) of Plasmodium falciparum have been tentatively implicated in the egress of daughter merozoites from infected red blood cells (RBCs) and/or in the subsequent merozoite invasion of RBCs.1–3 Moreover, the presence of protease domains and the prominence of one member, SERA5, as a strong antigen have led to interest in these proteins as targets for drug and vaccine development.4–10 Blood-stage SERA proteins are synthesised as 100- to 130-kDa

*Corresponding authors. E-mail addresses: [email protected]; [email protected]. † A.N.H., R.L.M., and O.B.C. contributed equally to this work. Abbreviations used: SERA, serine repeat antigens; SeMet, selenomethionine; RBCs, red blood cells; DMSO, dimethylsulfoxide; PEG, polyethylene glycol.

precursors, which are exported to the parasitophorous vacuole within an infected RBC.1,8,11,12 In the case of SERA5, the most abundant and best-studied member of the family, the precursor is processed into three fragments. The N- and C-terminal fragments remain covalently linked (via disulfide bonding) and appear to be attached to the merozoite surface.13–15 The central 56-kDa fragment is further processed at its C terminus to a stable 50-kDa fragment (T391–N828, SERA5 numbering here and elsewhere), which is shed at the time of schizont rupture,14 and found in abundance in the supernatant of parasite cultures. All SERA homologues exhibit high sequence identity (N50%) in the Cterminal regions (E578–N828) of the central fragment and ∼20% sequence identity to the papain-like family of cysteine proteases.7 However, some SERAs have substitutions in residues of the canonical papain-like catalytic triad (Cys, His, Asn), raising questions on their proteolytic competence. Phylogenetic studies of SERA revealed two distinct clusters across the genus, according to whether

0022-2836/$ - see front matter © 2009 Elsevier Ltd. All rights reserved.

Noncanonical Protease Active Site in SERA5

155

they possess cysteine or serine at the position of the putative catalytic nucleophile.7,16–18 In P. falciparum, SERA1–5 and SERA9 have serine and SERA6–8 have cysteine at the key catalytic position; however, only SERA5 and SERA6 appear to be essential for parasite viability in the blood stage.8,18 Hence, phylogenetic and functional data suggest that the Ser-type and Cys-type SERAs perform distinct functions within the erythrocytic cycle. Recently, the subtilase-like enzyme Plasmodium falciparum subtilisin-like protease-1 was shown to be responsible for the primary processing of SERA5 (and other SERAs) in vitro.19,20 Inhibition of Plasmodium falciparum subtilisin-like protease-1 and the cysteine protease dipeptidyl peptidase 3 prevents parasite egress as well as SERA maturation, consistent with a role for SERAs in schizont rupture.19,21 An autolytic and weak chymotrypsin-like proteolytic activity has been reported in vitro for the central fragment of SERA5 (T391–N828) expressed in Escherichia coli, although the biological relevance of these activities remains unclear.7 In a recent largescale analysis of seven metazoan genomes, a large number of sequences of enzyme homologues with mutations in their “catalytic” residues were identified.22 There were 19 cases identified in the papain family of cysteine proteases, with the cysteine-to-serine mutation found in most of these. Notably, proteolytic activity has not been demonstrated for any of these proteins. Two members of the papain family previously23 identified as rela-

tives of SERA5 harbouring a catalytic serine, testin24 and silicatein,25,26 also do not possess protease activity. In contrast, mutation of the active-site cysteine and flanking residues of cathepsin L with the corresponding residues in silicatein (i.e., Ser– Cys–Trp to Ala–Ser–Tyr, including the catalytic Cys to Ser substitution) produced an enzyme capable of condensing silicates.27 The tubulointerstitial nephritis antigens also harbour a catalytic serine, and while nothing is known regarding their protease activity, they are capable of binding laminin or type IV collagen, exhibiting a role in adhesion rather than proteolysis.28 In order to address the nature of the catalytic site of SERA5, we have determined the crystal structure of its enzyme domain. We show that, while the SERA5 active site shares many structural features with the catalytic centre of papain-like cysteine proteases, some key anomalies are present in SERA5 that could restrict this domain's function as a protease.

Results We have solved the crystal structure of an E. coliexpressed and in vitro refolded form of the SERA5 enzyme domain (SERA5E, residues D560–N828)7 under three different crystallisation conditions—we refer to these structures as structures 1 to 3. Refinement statistics for all three structures are presented in Table 1.

Table 1. X-ray data collection and refinement statistics Data collection Space group Cell dimensions a, c (Å) Wavelength (Å) Resolution (Å) Rsym or Rmerge I/σI Completeness (%) Redundancy Refinement Method Resolution (Å) Reflections (all/test) Rwork/Rfree (5%) Number of atoms (non-hydrogen) Protein Ligands Water bB-factorsN (Å2) r.m.s.d. Bond lengths (Å) Bond angles (°) Ramachandran Favoured(/total) Allowed Outliers

Native 1a

SeMet

Native 2a

Native 3

R3

R3

R3

R3

103.4, 72.5 0.968 24.74–1.79 (1.85–1.79) 0.049 (0.31) 35 (2.5) 98.5 (87) 5.3 (3.1)

103.1, 71.9 0.979 56.25–2.30 (2.38–2.30) 0.068 (0.24) 35 (6) 99.0 (94.5) 5.7 (4.6)

103.6, 72.1 1.541 33.46–1.80 (1.86–1.80) 0.093 (0.81) 14.1(1.9) 100(100) 4.2(4.0)

102.4, 71.4 0.957 29.55–1.60 (1.66–1.60) 0.068 (0.67) 24.5 (2.6) 100(100) 5.5(5.1)

REFMAC5 1.8 26,810/1344 0.166/0.194 2265 2065 + 11 (1 K , 2 HPO4 −) 189 26

REFMAC5 1.8 26,788/1341 0.166/0.197 2326 2095 1 (Ca2+) 230 29

PHENIX 1.6 38,541/1841 0.153/0.180 2412 2134 2+ 30 (2 Ca , 4 Cl−, 6 DMSO) 248 23

0.014 1.4

0.016 1.5

0.006 0.8

245/252 7 0

242/252 9 1

258/263 4 1

Values in parentheses correspond to the highest-resolution shell. a These crystals were soaked with a putative inhibitor peptide, derived from phage display;10 however, there was no evidence of the peptide in the diffraction data.

156

Noncanonical Protease Active Site in SERA5

Fig. 1. Sequence alignment of cathepsin L and SERA5E. (a) Structural alignment29 of SERA5E with the enzyme domain of cathepsin L (CatL; PDB 1mhw). Secondary-structure elements (helices and strands, cylinders and arrows, respectively) in SERA5E are indicated. Lowercase font refers to structurally unaligned regions. (b) Secondary-structure topology of SERA5E, showing disulfide bonding between cysteine residues; disulfide bonds unique to SERA5E are asterisked (1–2* between C567 and C572, 3–5* between C581 and C610). Black circles indicate the relative disposition of the triad of catalytic residues S596, H762, and N787.

Noncanonical Protease Active Site in SERA5

157

Fig. 2 (legend on next page)

158 Structure 1 (PDB 3ch3) Crystals for structure 1 were grown in phosphate buffer. The final model included residues N564–H690 and G700–N828 (the segment from N691–D699 being disordered), including a phosphate anion (H2PO4 −) located adjacent to S596, the putative catalytic nucleophile. Despite relatively low sequence identity, the structure of SERA5E is similar to that of the papain family of cysteine proteases as previously predicted.7 By DALI29 search, it is most similar to rhodesain (PDB 2p7u; Z-score of 23.5) and cruzain (PDB 1ewl, Z-score of 23.3), the major cysteine proteases of Trypanosoma brucei rhodesiense30 and Trypanosoma cruzi,31 respectively, and ervatamin B32 (PDB 1iwd, Z-score of 23.3). Other structural homologues identified by DALI include the endopeptidase cathepsins F, K, L, and S and the exopeptidase cathepsin H. A comparison of the structures of SERA5E with that of the archetypical cysteine protease, cathepsin L (PDB 1mhw),33 is presented in Fig. 1. SERA5E amino acids S596, H762, and N787 closely overlay the cathepsin L catalytic triad C25, H163, and N187, respectively (Fig. 2). S596 is located within a prominent cleft formed between the two lobes of the protease domain, reminiscent of the wellcharacterised substrate-binding cleft of papainfamily enzymes. Amino acids lining the walls of this cleft in SERA5E include D637, E638, S641, M643, K701, Y703, A705, E707, R710, Y735, S816, and V818. The amino acids adjacent to S596, G639 and S640, form an extended strand that would usually orient peptide substrates in cysteine proteases. However, the peptide plane orientation observed here in SERA5E is orthogonal to the canonical structures observed in cysteine proteases (Fig. 2c). Moreover, the side chain of Y735 (A135 in cathepsin L) occupies the S2 pocket, replacing a smaller amino acid (Val, Ala, Gly) in typical cysteine proteases. In cathepsin L, L69 and M7 (together with A135) form the S2 binding site. In SERA5E, these residues are S641 and P642, and the S2 site is reduced to a shallow hydrophobic depression on the surface of the protein. The first β-strand of the C-terminal lobe of SERA5E is G702–E708. The adjoining N- and Cterminal segments of all SERAs contain insertions of 10 or more residues with respect to cysteine

Noncanonical Protease Active Site in SERA5

proteases, including the disordered region from N691–D699 in structures 1 and 2 (see Fig. 2). The novel structure immediately adjacent to the G702– S708 strand potentially serves to extend the putative substrate-binding cleft. In SERA5E, Q590 is the structural equivalent of Q19 in cathepsin L that forms part of the oxy-anion hole, suggesting a possible role for SERA5E Q590 in engaging the scissile peptide carbonyl moiety. D594 is located in the active-site cleft close to S596— aspartic acid is uncommon at this position among SERAs and among cysteine proteases, where a glycine is common. SERA5E has two additional disulfide bonds compared with cathepsin L, C567–C572 and C581– C610, both in the N-terminal region of SERA5E and both conserved (and likely to be unique to this family) among all the SERAs (Fig. 1). The latter of these disulfide bonds serves to tether the N-terminal segment of SERA5E (C581) to the C-terminal end of the central helix (C610). The former disulfide holds the first two short β-strands (β1 and β2) in a type I β-turn. Structure 2 (PDB 3ch2) In the structure described above, a phosphate ion (H2PO4 −) was found hydrogen-bonded to the catalytic residues S596 and H762. We have also determined the structure of SERA5E in crystals grown under phosphate-free conditions—in this structure, the carboxylate side chain of D594 is observed in two conformations, one identical to that observed in structure 1 and the second close to the position of the phosphate ion in structure 1. No other significant differences in the protein structure were observed. Structure 3 (PDB 2wbf) In both the preceding structures (structures 1 and 2), electron density corresponding to residues N691–D699 could not be resolved in the density maps. Structure-3 crystals included 10% dimethyl sulfoxide (DMSO), which resulted in a structure that was mostly indistinguishable from structure 2, except for the presence of clearly resolvable density due to residues of the N691–D699 loop. The ordered N691–D699 loop packs against residues M742–G748,

Fig. 2. Structural comparison of SERA5E and cathepsin L. (a) Cartoon diagram of SERA5E. The N-terminal domain is to the left and the C-terminal domain to the right of the central catalytic cleft. The side-chain atoms of the catalytic triad, S596, H762, and N787, are shown along with those from residues that line the substrate-binding cleft, Q590, D594, D637, E638, S641, M643, K701, Y703, A705, E707, R710, Y735, S816, and V818. The residues connecting N691 to D699 in SERA5E are disordered in structures 1 and 2. (b) Cartoon diagram of cathepsin L. The side-chain atoms of the catalytic triad in cathepsin L, C25, H163, and N187 are highlighted, along with the side-chain atoms of A135, composing the S2 pocket. (c) Structural comparison of the substrate-binding sites. The peptide backbone atoms of SERA5E G639–S641 and cathepsin L G67–L69 are shown. The peptide plane of G639 in structure 1 and structure 2 of SERA5E is rotated ∼90° compared to the orientation of the equivalent residue in cathepsin L, whereas in structure 3, there is a rotation of ∼45° of this and the proceeding peptide plane (S640). The loop in the N-terminal domain leading to the strand responsible for orienting substrates [C627–G639 and C56–G67 in SERA5E (tan) and cathepsin L (cyan), respectively] is highlighted—this loop adopts a different path in SERA5E compared to that in cathepsin L. The disulfide bond at the C-terminal end of this loop is indicated. The side chains of the catalytic-triad residues in SERA5E (S596, H762, N787) are found in identical orientations and positions in cathepsin L (C25, H163, N187). The S2 pocket in cathepsin L, formed in part by A135, is occupied by Y735 in SERA5E.

Noncanonical Protease Active Site in SERA5

between α9 and β6, of an adjacent molecule in the crystal, resulting in a slightly altered conformation in this region of structure 3 compared with structures 1 and 2. That the residues M742–S747 adopt a conformation that differs between when the N691– D699 loop is resolved in the electron density and when it is not observed indicates that the N691–D699 loop does not adopt the structure-3 conformation in either structure 1 or structure 2. Residues that also contact the ordered N691–D699 loop in structure 3 include Y792 and W793. A single molecule of DMSO was observed to bind in a pocket formed by the alteration in the polypeptide conformation near G748 and G794, which presumably helps to stabilize the N691–D699 loop. The carboxylate side chain of D594 in this structure and in one side-chain rotamer of structure 2 forms a pair of water-mediated hydrogen bonds with S596 of the catalytic triad. Notably, the sulfur atom of cysteine does not usually engage in the formation of hydrogen bonds, and therefore a cysteine (the canonical catalytic-triad nucleophile), in place of S596, could not tether D594 in an analogous manner.

Discussion The substrate-binding cleft of the papain family of cysteine proteases lies at the interface of the two domains of these proteins, a mostly α-helical domain and a mostly β-strand domain (the left and right domains, respectively). At the middle of this cleft lie the catalytic residues, a cysteine residue from one domain and a histidine and an asparagine from the other. A glutamine residue, several residues Nterminal of the catalytic cysteine, helps form the oxy-anion hole that stabilizes the reactive intermediate during amide-bond scission. The binding cleft can be divided into individual binding sites into which the individual residues of the substrate bind. These sites are characterised by their interactions with inhibitors and with pro-region peptides.34,35 Bond cleavage of the substrate occurs at the amide bond separating residues P1 and P1′. Residues N-terminal of the cleavage site, Pn, bind to sites Sn (non-primed), whereas residues C-terminal of the cleavage site, Pn′, bind to sites Sn′ (primed). Beyond the S3 and S2′ sites, however, there appears little consensus between the various cysteine proteases on the location of specific binding sites.35 Sites of binding beyond S3 and S2′ for individual enzymes can, however, be defined based on their interaction with residues in the pro-region. Cathepsin L is the archetypical cysteine protease. In cathepsin L, the S1 site is defined by the catalytic Cys and oxy-anion-forming Asn residues. In SERA5E, the side chain of D594, in place of a glycine residue in other cysteine proteases, can adopt at least two orientations and occupies much of the space in which the side chain of the P1 residue must be situated, therefore limiting access of substrates to the site. The S 2 site dictates much of the substrate selectivity in cysteine proteases. The conserved -

159 GG- motif in many cysteine proteases adopts an extended conformation, capable of engaging the substrate backbone in an antiparallel β-strand-like geometry. In cathepsin L, the size of the S2 side-chain binding site is largely determined by L69 and A135 from the left and right domains, respectively, and forms a large pocket that dictates a preference for substrates with Phe as the P2 residue. 36,37 In SERA5E, S641 and Y735 replace L69 and A135 in cathepsin L, respectively, with the side chain of Y735 occupying the S2 site. Notably, the side chain of F78 of the pro-region of cathepsin L occupies the S2 site in a manner similar to that of Y735 in SERA5E (PDB 1cs8).38 In SERA5E, this site is reduced to a small depression on the surface of the protein. The S3 site in cathepsin L is formed by residues in the E63–L69 loop, tethered through a disulfide bond to the loop defining the S1 site. In SERA5E, this region corresponds to the D634–S641 loop. The polypeptide chain follows a different path in cathepsin L and SERA5E and is significantly more open in SERA5E. The S1′ site in cathepsin L is formed by the sidechain groups of the catalytic histidine and a tryptophan residue highly conserved among cysteine proteases (W189 in cathepsin L, W789 in SERA5E). Additionally, the main-chain atoms of A138–G139 in cathepsin L form one side to this site. In SERA5E, the corresponding residues, A738–E739, form part of a small α-helix (α9), capped at the Nterminal end by the side chain of D761. This site in SERA5E is therefore slightly smaller than that observed in cathepsin L. The S2′ site includes the loop residues between the catalytic Cys and oxy-anion-forming Asn residues. This site in cathepsin L and SERA5E are comparable, with the notable exception of D594 in SERA5E. Cathepsin B is a carboxypeptidase. The structure of cathepsin B revealed how an “occluding loop” prevents access of substrates to the sites beyond S2′.39 In contrast, cathepsins C and H are aminopeptidases. The structure of cathepsin C revealed a large “exclusion” domain occupying the active site beyond S2.40 The structure of cathepsin H revealed a mini-peptide occluding the S2 and S3 sites,41 with the carboxyl terminus forming part of the S1 site, capable of binding the N terminus of the substrate. The carboxyl terminus of the mini-peptide of cathepsin H lies in the vicinity of D594 in SERA5E, raising the spectre that SERA5E could also function as an aminopeptidase. Contrary to this prospect, however, is the fact that the active-site cleft is not occluded in SERA5E (as it is in cathepsin B, C, or H), albeit less well defined by the absence of a welldefined S2 pocket. There are two anomalies in the SERA5E active site that are particularly noteworthy. First, there is an absence of an appropriately oriented peptide strand for engaging substrates. The structure and environment of the segment D638–S641, whose homologue in cathepsin L orients polypeptide substrates, is different from that seen in cysteine proteases. In SERA5E, S640 is the structural equivalent of G68 of

160 cathepsin L (Fig. 1), conserved as a glycyl residue in the papain family. The peptide bond preceding S640 is rotated ∼90° in structure 1 and structure 2 and rotated ∼45° in structure 3 compared to its orientation in cathepsin L, meaning that it is no longer correctly positioned to form hydrogen bonds with the substrate backbone. Additionally, in structure 3, the plane of the peptide bond of S640 is also rotated by ∼45° (compared to its position observed in cathepsin L). The structural variation between the three SERA5E structures in this region suggests an intrinsic flexibility of this strand. As illustrated in Fig. 2c, the polypeptide backbones of cathepsin L and SERA5E N-terminal to this strand follow different paths. In cathepsin L, the orientation of the conserved glycine residue is maintained by the interaction between two extended antiparallel strands consisting of residues N62–E63 and N66– G67–G68. In SERA5E, only one of these strands is apparent, E638–G639–S640; the segment C627– G639 lacks any specific secondary structure. Sequence comparisons of SERA5 and SERA6 with cathepsin L suggest that the SERA6 structure is like

Noncanonical Protease Active Site in SERA5

SERA5 (Fig. 3) and, hence, is also likely to be different from known cysteine proteases in this way. In fact, only SERA8 and SERA9 contain the signature -GG- sequence characteristic of cysteine proteases. A consequence of this anomaly is either the absence of canonical hydrogen-bonding to substrate in SERA5, and likely also in SERA6, or a conformational change in the protein when substrate is engaged. The dust mite proteolytic allergen, Der p 1, also lacks the -GG- sequence and is consequently also missing the second strand (PDB 1xkg, 2as8).42,43 Critically, however, Der p 1 exhibits proteolytic activity,44,45 although its substrate specificity has not yet been determined. The catalytic domain of SERA5 is excised from the precursor protein, and we cannot exclude the possibility that in the intact SERA5 parent molecule, some other polypeptide segments help to align the D638–S641 strand assisting in the correct formation of the “active” site. The second anomaly in SERA5E is the presence of aspartic acid at position 594, located two residues from the active-site serine. This is unusual in the

Fig. 3. Sequence alignment of the putative catalytic domain of SERA5–9 from P. falciparum and cathepsin L. Highlighted in green is the signature -GG- sequence characteristic of cysteine proteases. The catalytic-triad residues (C, H, N) and the oxy-anion glutamine (Q) are highlighted in yellow. The blue highlight indicates the unusual residues in SERA5, including the aspartate D594 (aligned with glycyl residues in SERA6–8 and cathepsin L), the catalytic S596, and Y735 that occupies the S2 pocket. Identical residues between SERA5 and SERA6 are bolded. Cysteine residues are numbered (according to their position in SERA5E). The putative catalytic serine is asterisked—SERA1–5 and SERA9 have a serine at this position.

Noncanonical Protease Active Site in SERA5

sequences of cysteine proteases, as glycine is widely conserved there (e.g., G23 in cathepsin L). One exception is the papaya proteinase IV, an enzyme with unusual substrate specificity that cleaves Cterminal to glycine and that has a glutamyl residue at this position (PDB 1gec).46 This, coupled with an arginine residue at the position homologous to G639 in SERA5E, restricts substrate access to papaya proteinase IV unless glycine is in the P1 position. In the SERA family, the homologous residue to SERA5E D594 is glycine in SERA6–8 (i.e., the cysteine-containing enzymes) and alanine in all other serine-containing SERAs. The carboxylate side chain of D594 is close to a phosphate ion observed in the active site of structure 1. The pKa of D594 is estimated to be 8.7 in the presence of the phosphate ion and 6.2 in its absence. The electrostatic potentials on the substrate-binding face of SERA5E and on that of cathepsin L differ significantly (Fig. 4). The surface of SERA5E is very acidic in the region near D594 (including D637 and E638), whereas in cathepsin L this region is neutral. Additionally, in cathepsin L the surface at the S6 binding site is predominantly negative (where the charged K82 of the pro-region binds),47 whereas in SERA5E the surface is neutral. Our attempts to soak SERA5E crystals with the generic serine-protease inhibitors antipain and 4-(2aminoethyl) benzenesulfonyl fluoride or with a cyclic peptide inhibitor10 have failed to reveal anything bound to the protein. This may be attributable to some role of D594; in crystals grown in the absence of phosphate (structures 2 and 3), the carboxylate side chain of (one rotamer of) D594 occupies the phosphate-ion-binding site and limits access to the S1 site. We have been able to identify only one other sequence of a putative cysteine protease (Anopheles gambiae

161 Q7Q9Y5) that contains both a serine residue at the site of the nucleophile and an aspartate residue in place of glycine two residues N-terminal to it;22 however, the enzymic activity of this protein is currently unknown. Notwithstanding these two anomalous features of the active site, autolytic activity has been attributed to the recombinantly expressed proenzyme domain of SERA5.7 This recombinant protein, SERA5PE, is autolytically processed on the C-terminal side of Y559 (to yield SERA5E), placing S558 as the P2 residue.7 The side chain of Y735 projects into the SERA5E S2 specificity pocket (Fig. 3)—this residue is A135 in cathepsin L or V133 in papain—suggesting a smaller P2 residue in the substrate for SERA5 than for either cathepsin L or papain, which show preference for substrates with phenylalanine or valine in the P2 position, respectively.36,37 Consistent with this is the reported cleavage by SERA5PE of two synthetic substrates,7 one with proline and the other with valine at P2. We note that cleavage after Y559 places the negatively charged E554 residue in the P6 position, placing this residue in close proximity to K701 and R710. Thus, the cleavage profile reported in the protease-activity studies7 is consistent with the three-dimensional structure we observe with an elongated substrate-binding cleft with Y735 occupying the S2 pocket and P2 residues with compact side chains including proline, serine, and valine. The pro-regions of cysteine proteases bind in the active-site cleft in the reverse orientation as compared to substrate.46 In procathepsin L and procaricain (and, by homology, propapain), a glycyl residue, 41–44 residues N-terminal to the catalytic cysteine, is found adjacent to that cysteine. Similarly, in procathepsin B, G43 of the pro-sequence (49 residues N-terminal to the essential cysteine) is located immediately adjacent to the catalytic cysteine.48 One exception to this

Fig. 4. Comparison of the molecular surfaces of SERA5E (left) and cathepsin L (right); surfaces are coloured according to electrostatic potential. The electrostatic potential is coloured from red through white to blue, covering the range −30 to 0 to +30 kJ mol− 1. The positions of D594 and Y735 in SERA5E are circled. Note the strong negative potential around D594, D637, and E638—the equivalent surface is neutral in cathepsin L. The side chain of Y735 occupies the S2 pocket, whereas this pocket is clearly visible on the surface of cathepsin L. K82 of the pro-segment of cathepsin L lies in the basic S6 site—this region is neutral on SERA5E.

162 pattern of pro-sequences containing a glycyl residue located at the catalytic centre is proDer p 1,41 but there the structure of the relevant pro-sequence is helical, compared with strand in the above examples. In SERA5, G543 (53 amino acids N-terminal of the putative catalytic S596) has a counterpart in most P. falciparum SERA sequences (in a conserved sequence motif GXXD, where X is I or V). These observations are consistent with G543 and adjacent residues occupying the active-site cleft in SERA5PE. The protease domain of SERA6 shares 57% sequence identity with SERA5E, and it is clear that the two proteins are structural homologues. In the substrate-binding cleft, the level of sequence identity is significantly higher, suggesting a common substrate specificity for the two proteins—the most notable exception is D594 in SERA5E, which is a glycine in SERA6. The SERA8 orthologue in Plasmodium berghei is necessary for sporozoite egress from oocysts.49 The sequence of P. falciparum SERA8 has all the hallmarks of a “regular” cysteine protease (Fig. 3)—a catalytic cysteine, a glycine two residues N-terminal to this cysteine, the signature -GG- sequence, and a (relatively) small serine residue forming the S2 pocket. It is likely that SERA8 in P. falciparum also functions as an active cysteine protease and participates in parasite egress from host oocysts. SERA5 is produced in large amounts during the late-trophozoite and schizont stages of parasite development, and interference with SERA5 is known to impair development in the blood stage of the parasite life cycle.10 Hence, compounds targeting the SERA5 active site may have potential as chemotherapeutic agents. Structural details of the unusual SERA active site presented here may offer opportunities for rational design of novel, parasitespecific inhibitory compounds.

Materials and Methods Cloning, expression, and purification The SERA5E enzyme domain including residues V544 to N828, corresponding to the moderately stable fragment following proteolysis of the proenzyme domain (SERA5PE) with elastase (5PEa),7 was inserted into the expression vector pProExHTb (Invitrogen) with an N-terminal hexaHis tag. It was produced and purified in a similar manner to that described for the larger fragment SERA5PE.7 Briefly, the protein was synthesised in E. coli strain BL21(DE3) for 3 h at 37 °C and deposited into insoluble inclusion bodies. The cells were lysed by sonication, and the insoluble inclusion bodies were solubilised by the addition of 6 M guanidine–HCl, 20 mM β-mercaptoethanol (pH 8.0). The solubilised protein was isolated by metal-chelate affinity chromatography on a Ni–NTA (Qiagen) column and eluted in 8 M urea, 250 mM NaCl, 1 M imidazole, and 20 mM Tris (pH 8.0). The protein was diluted 50-fold and refolded at 4 °C for 7 days in 2 M urea, 100 mM NaCl, 20 mM Tris (pH 8.0), with 1 mM reduced glutathione and 0.25 mM oxidised glutathione, respectively, included to facilitate disulfide-bond formation. Refolded protein was isolated by

Noncanonical Protease Active Site in SERA5 anion-exchange chromatography and concentrated to 10 mg/ml for crystallisation. The N-terminal fusion tag was removed from the expressed protein using TEV protease, leaving residues V544 through N828 and five Nterminal residues (GAMGS) from the vector. Selenomethionine (SeMet)-substituted protein was produced following growth and expression in PASM-5052 medium.50 Crystallisation and data collection The protein, buffered in 20 mM Bis-Tris (pH 6.5) and 20 mM NaCl, was initially crystallised by vapour diffusion with 0.2 M KH2PO4 (pH 4.6), 12% polyethylene glycol (PEG)-8000, and 5% PEG-400 in the reservoir solution. Equal volumes of reservoir and protein (10 mg/ml) were added and equilibrated in a sitting-drop experiment at 20 °C. During crystallisation, parts of the pro-sequence were lost, presumably through autolysis, yielding D560 as the new N terminus (5PEc).7 We refer to this protein as SERA5E. SeMet-substituted protein was crystallised by a similar method, although equal volumes of protein and reservoir solution were added and subjected to centrifugation for 10 min at 13,000 rpm, 4 °C to remove precipitated protein before setting up sitting drops. Glycerol (16%) was added immediately prior to flash cooling in liquid nitrogen. High-resolution (1.8 Å) data were collected at the Advanced Photon Source on beamline 14-ID-B from a crystal grown in the presence of a putative peptide inhibitor (a 14-residue disulfide-bonded cyclic peptide with the sequence LVCHPAVPALLCAR).10 However, there was no evidence of the peptide in the diffraction data, and this data set has been treated as “native.” Data were collected from a SeMet-substituted SERA5E crystal at the Swiss Light Source, at 100 K. The crystals belong to the space group R3, with one monomer per asymmetric unit (Table 1). Phosphate-free conditions for crystallisation were subsequently identified using a reservoir solution of unbuffered 18% PEG-3350, 0.2 M CaCl2—referred to here as structure 2 and structure 3. Equal volumes of protein (2.5 mg/ml) and precipitant were combined and equilibrated against the reservoir in a sitting-drop vapour-diffusion setup. Crystals produced from this protein formed within 24 h and were isomorphous with those described above. For structure-2 and structure-3 crystals, high-resolution diffraction was only obtained following crystal dehydration; prior to data collection, crystals were equilibrated overnight against a reservoir solution containing an increased concentration (22%) of PEG-3350. Cryoprotection of the dehydrated crystals was achieved by streaking through Paratone-N and cryo-cooling in N2(l). For structure 3, 10% DMSO was added immediately prior to cryoprotection. Data for structure-2 and structure-3 crystals were collected at the Australian Synchrotron, beamline PX1. Structure determination, refinement, and analysis Data for all crystal forms were processed with HKL2000.51 The peak data from the SeMet cystal were used for single-wavelength anomalous diffraction analysis with HKL2MAP52 to locate the positions of eight (of an expected total of nine) Se atoms. Phasing with SOLVE/ RESOLVE53 produced an interpretable electron density map (mean figure of merit, 0.62). Model building in COOT54 and refinement in REFMAC555 led to the final model from structure 1. Structures 2 and 3 were solved via molecular replacement with PHASER56 using structure 1 as the search model.

Noncanonical Protease Active Site in SERA5 The ionization states for protein side chains were predicted using the multiple-conformation continuum electrostatics approach using the MCCE 2.2 program.57,58 PARSE partial charges and radii were applied to all protein atoms,59 and the electrostatic pairwise interactions and reaction field energies were calculated using the DelPhi program.60 The method is generally reliable to within 1 pH unit.58 The electrostatic potential displayed in Fig. 4 was calculated using the MEAD 2.2 program.61 Ionizable residues were assigned the protonation state predicted at pH 7—specifically, D594 was neutral. Accession numbers Coordinates have been deposited with the PDB with accession numbers 3ch3 (structure 1), 3ch2 (structure 2), and 2wbf (structure 3).

Acknowledgements We thank Mike Gorman, Trevor Huyton, and the beamline staff at SLS and APS (BioCARS) for data collection. Data were also collected at the Australian Synchrotron, Victoria, Australia. We thank Jo McCoubrie and Paul Gilson for helpful comments on the manuscript. O.B.C. is supported by an Australian Postgraduate award. This work was supported by the National Health and Medical Research Council (NHMRC) of Australia and the Wellcome Trust, UK. Infrastructure support from the NHMRC IRIIS (361646) and a Victorian State Government OIS grant is gratefully acknowledged. R.L.M. was supported by an NHMRC CJ Martin Fellowship. B.S.C. is an International Research Scholar of the Howard Hughes Medical Institute.

References 1. Delplace, P., Bhatia, A., Cagnard, M., Camus, D., Colombet, G., Debrabant, A. et al. (1988). Protein p126: a parasitophorous vacuole antigen associated with the release of Plasmodium falciparum merozoites. Biol. Cell. 64, 215–221. 2. Pang, X. L., Mitamura, T. & Horii, T. (1999). Antibodies reactive with the N-terminal domain of Plasmodium falciparum serine repeat antigen inhibit cell proliferation by agglutinating merozoites and schizonts. Infect. Immun. 67, 1821–1827. 3. Blackman, M. J. (2008). Malarial proteases and host cell egress: an ‘emerging’ cascade. Cell. Microbiol. 10, 1925–1934. 4. Aoki, S., Li, J., Itagaki, S., Okech, B. A., Egwang, T. G., Matsuoka, H. et al. (2002). Serine repeat antigen (SERA5) is predominantly expressed among the SERA multigene family of Plasmodium falciparum, and the acquired antibody titers correlate with serum inhibition of the parasite growth. J. Biol. Chem. 277, 47533–47540. 5. Bzik, D. J., Li, W. B., Horii, T. & Inselburg, J. (1988). Amino acid sequence of the serine-repeat antigen (SERA) of Plasmodium falciparum determined from cloned cDNA. Mol. Biochem. Parasitol. 30, 279–288.

163 6. Eakin, A. E., Higaki, J. N., McKerrow, J. H. & Craik, C. S. (1989). Cysteine or serine proteinase? Nature, 342, 132. 7. Hodder, A. N., Drew, D. R., Epa, V. C., Delorenzi, M., Bourgon, R., Miller, S. K. et al. (2003). Enzymic, phylogenetic, and structural characterization of the unusual papain-like protease domain of Plasmodium falciparum SERA5. J. Biol. Chem. 278, 48169–48177. 8. Miller, S. K., Good, R. T., Drew, D. R., Delorenzi, M., Sanders, P. R., Hodder, A. N. et al. (2002). A subset of Plasmodium falciparum SERA genes are expressed and appear to play an important role in the erythrocytic cycle. J. Biol. Chem. 277, 47524–47532. 9. Okech, B., Mujuzi, G., Ogwal, A., Shirai, H., Horii, T. & Egwang, T. G. (2006). High titers of IgG antibodies against Plasmodium falciparum serine repeat antigen 5 (SERA5) are associated with protection against severe malaria in Ugandan children. Am. J. Trop. Med. Hyg. 74, 191–197. 10. Fairlie, W. D., Spurck, T. P., McCoubrie, J. E., Gilson, P. R., Miller, S. K., McFadden, G. I. et al. (2008). Inhibition of malaria parasite development by a cyclic peptide that targets the vital parasite protein SERA5. Infect. Immun. 76, 4332–4344. 11. Knapp, B., Hundt, E., Nau, U. & Kupper, H. A. (1989). Molecular cloning, genomic structure and localization in a blood stage antigen of Plasmodium falciparum characterized by a serine stretch. Mol. Biochem. Parasitol. 32, 73–83. 12. Knapp, B., Nau, U., Hundt, E. & Kupper, H. A. (1991). A new blood stage antigen of Plasmodium falciparum highly homologous to the serine-stretch protein SERP. Mol. Biochem. Parasitol. 44, 1–13. 13. Debrabant, A., Maes, P., Delplace, P., Dubremetz, J. F., Tartar, A. & Camus, D. (1992). Intramolecular mapping of Plasmodium falciparum P126 proteolytic fragments by N-terminal amino acid sequencing. Mol. Biochem. Parasitol. 53, 89–95. 14. Li, J., Matsuoka, H., Mitamura, T. & Horii, T. (2002). Characterization of proteases involved in the processing of Plasmodium falciparum serine repeat antigen (SERA). Mol. Biochem. Parasitol. 120, 177–186. 15. Li, J., Mitamura, T., Fox, B., Bzik, D. & Horii, T. (2002). Differential localization of processed fragments of Plasmodium falciparum serine repeat antigen and further processing of its N-terminal 47 kDa fragment. Parasitol. Int. 51, 343–352. 16. Arisue, N., Hirai, M., Arai, M., Matsuoka, H. & Horii, T. (2007). Phylogeny and evolution of the SERA multigene family in the genus Plasmodium. J. Mol. Evol. 65, 82–91. 17. Bourgon, R., Delorenzi, M., Sargeant, T., Hodder, A. N., Crabb, B. S. & Speed, T. P. (2004). The serine repeat antigen (SERA) gene family phylogeny in Plasmodium: the impact of GC content, and reconciliation of gene and species trees. Mol. Biol. Evol. 21, 2161–2171. 18. McCoubrie, J. E., Miller, S. K., Sargeant, T., Good, R. T., Hodder, A. N., Speed, T. P. et al. (2007). Evidence for a common role for the serine-type Plasmodium falciparum SERA proteases: Implications for vaccine and drug design. Infect. Immun. 75, 5565–5574. 19. Yeoh, S., O'Donnell, R. A., Koussis, K., Dluzewski, A. R., Ansell, K. H., Osborne, S. A. et al. (2007). Subcellular discharge of a serine protease mediates release of invasive malaria parasites from host erythrocytes. Cell, 131, 1072–1083. 20. Koussis, K., Withers-Martinez, C., Yeoh, S., Child, M., Hackett, F., Knuepfer, E. et al. (2009). A multifunctional serine protease primes the malaria parasite for red blood cell invasion. EMBO J. 28, 725–735.

Noncanonical Protease Active Site in SERA5

164 21. Arastu-Kupur, S., Ponder, E. L., Fonovic, U. P., Yeoh, S., Yuan, F., Fonovic, M. et al. (2008). Identification of proteases that regulate erythrocyte rupture by the malaria parasite Plasmodium falciparum. Nat. Chem. Biol. 4, 203–213. 22. Pils, B. & Schultz, J. (2004). Inactive enzyme-homologues find new function in regulatory processes. J. Mol. Biol. 340, 399–404. 23. Sato, D., Li, J., Mitamura, T. & Horii, T. (2005). Plasmodium falciparum serine-repeat antigen (SERA) forms a homodimer through disulfide bond. Parasitol. Int. 54, 261–265. 24. Grima, J., Zhu, L. J., Zong, S. D., Catterall, J. F., Bardin, C. W. & Cheng, C. Y. (1995). Rat testin is a newly identified component of the junctional complexes in various tissues whose mRNA is predominantly expressed in the testis and ovary. Biol. Reprod. 52, 340–355. 25. Shimizu, K., Cha, J., Stucky, G. D. & Morse, D. E. (1998). Silicatein α: cathepsin L-like protein in sponge biosilica. Proc. Natl Acad Sci. USA, 95, 6234–6238. 26. Brutchey, R. L. & Morse, D. E. (2008). Silicatein and the translation of its molecular mechanism of biosilicification into low temperature nanomaterial synthesis. Chem. Rev. 108, 4915–4934. 27. Fairhead, M., Johnson, K. A., Kowatz, T., McMahon, S. A., Carter, L. G., Oke, M. et al. (2008). Crystal structure and silica condensing activities of silicatein α-cathepsin L chimeras. Chem. Commun., 1765–1767. 28. Kalfa, T. A., Thull, J. D., Butkowski, R. J. & Charonis, A. S. (1994). Tubulointerstitial nephritis antigen interacts with laminin and type IV collagen and promotes cell adhesion. J. Biol. Chem. 269, 1654–1659. 29. Holm, L. & Park, J. (2000). DaliLite workbench for protein structure comparison. Bioinformatics, 16, 566–567. 30. Caffrey, C. R., Hansell, E., Lucas, K. D., Brinen, L. S., Hernandez, A. A., Cheng, J. et al. (2001). Active site mapping, biochemical properties and subcellular localization of rhodesain, the major cysteine protease of Trypanosoma brucei rhodesiense. Mol. Biochem. Parasitol. 118, 61–73. 31. Gillmor, S. A., Craik, C. S. & Fletterick, R. J. (1997). Structural determinants of specificity in the cysteine protease cruzain. Protein Sci. 6, 1603–1611. 32. Biswas, S., Chakrabarti, C., Kundu, S., Jagannadham, M. V. & Dattagupta, J. K. (2003). Proposed amino acid sequence and the 1.63 Å X-ray crystal structure of a plant cysteine protease, ervatamin B: some insights into the structural basis of its stability and substrate specificity. Proteins, 51, 489–497. 33. Chowdhury, S., Sivaraman, J, Wang, J., Devanathan, G., Lachance, P., Qi, H. et al. (2002). Design of noncovalent inhibitors of human cathepsin L. From the 96-residue proregion to optimized tripeptides. J. Med. Chem. 45, 5321–5329. 34. Turk, D., Guncar, G., Podobnik, M. & Turk, B. (1998). Revised definition of substrate binding sites of papainlike cysteine proteases. Biol. Chem. 379, 137–147. 35. Turk, D. & Guncar, G. (2003). Lysosomal cysteine proteases (cathepsins): promising drug targets. Acta Crystallogr., Sect. D: Biol. Crystallogr. D59, 203–213. 36. Puzer, L., Cotrin, S. S., Alves, M. F. M., Egborge, T., Araújo, M. S., Juliano, M. A. et al. (2004). Comparative substrate specificity analysis of recombinant human cathepsin V and cathepsin L. Arch. Biochem. Biophys. 430, 274–283. 37. Choe, Y, Leonetti, F., Greenbaum, D. C., Lecaille, F., Bogyo, M., Brömme, D. et al. (2006). Substrate

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49. 50. 51. 52.

profiling of cysteine proteases using a combinatorial peptide library identifies functionally unique specificities. J. Biol. Chem. 281, 12824–12832. Groves, M. R., Coulombe, R., Jenkins, J. & Cygler, M. (1998). Structural basis for specificity of papain-like cysteine protease proregions toward their cognate enzymes. Proteins, 32, 504–514. Musil, D., Zucic, D., Turk, D., Engh, R. A., Mayr, I., Huber, R. et al. (1991). The refined 2.15 Å X-ray crystal structure of human liver cathepsin B: the structural basis for its specificity. EMBO J. 10, 2321–2330. Molgaard, A., Arnau, J., Lauritzen, C., Larsen, S., Petersen, G. & Pedersen, J. (2007). The crystal structure of human dipeptidyl peptidase I (cathepsin C) in complex with the inhibitor Gly–Phe–CHNH2. Biochem. J. 401, 645–650. Guncar, G., Podobnik, M., Pungercak, J., Strukelj, B., Turk, V. & Turk, D. (1998). Crystal structure of procine cathepsin H determined at 2.1 Å resolution: location of the mini-chain C-terminal carboxyl group defines cathepsin H aminopeptidase function. Structure, 6, 51–61. Meno, K., Thorsted, P. B., Ipsen, H., Kristensen, O., Larsen, J. N., Spangfort, M. D. et al. (2005). The crystal structure of recombinant proDer p 1, a major house dust mite proteolytic allergen. J. Immunol. 175, 3835–3845. de Halleux, S., Stura, E., VanderElst, L., Carlier, V., Jacquemin, M. & Saint-Remy, J. M. (2006). Threedimensional structure and IgE-binding properties of mature fully active Der p 1, a clinically relevant major allergen. J. Allergy Clin. Immunol. 117, 571–576. Hewitt, C. R. A., Brown, A. P., Hart, B. J. & Pritchard, D. I. (1995). A major house dust mite allergen disrupts the immunoglobulin E network by selectively cleaving CD23: innate protection by antiproteases. J. Exp. Med. 182, 1537–1544. Schulz, O., Sewell, H. F. & Shakib, F. (1998). Proteolytic cleavage of CD25, the α subunit of the human T cell interleukin 2 receptor, by Der p 1, a major mite allergen with cysteine protease activity. J. Exp. Med. 187, 271–275. O'Hara, B. P., Hemmings, A. M., Buttle, D. J. & Pearl, L. H. (1995). Crystal structure of glycyl endopeptidase from Carica papaya: a cysteine endopeptidase of unusual substrate specificity. Biochemistry, 34, 13190–13195. Coulombe, R., Grochulski, P., Sivaraman, J., Menard, R., Mort, J. S. & Cygler, M. (1996). Structure of human procathepsin L reveals the molecular basis of inhibition by the prosegment. EMBO J. 15, 5492–5503. Podobnik, M., Kuhelj, R., Turk, V. & Turk, D. (1997). Crystal structure of the wild-type human procathepsin B at 2.5 Å resolution reveals the native active site of a papain-like cysteine protease zymogen. J. Mol. Biol. 271, 774–788. Aly, A. S. I. & Matuschewshi, K. (2005). A malarial cysteine protease is necessary for Plasmodium sporozoite egress from oocysts. J. Exp. Med. 202, 225–230. Studier, F. W. (2005). Protein production by autoinduction in high density shaking cultures. Protein Expression Purif. 41, 207–234. Otwinowski, Z. & Minor, W. (1997). Processing of Xray diffraction data collected in oscillation mode. Macromol. Crystallogr., Part A, 276, 307–326. Pape, T. & Schneider, T. R. (2004). HKL2MAP: a graphical user interface for macromolecular phasing with SHELX programs. J. Appl. Crystallogr. 37, 843–844.

Noncanonical Protease Active Site in SERA5 53. Terwilliger, T. C. & Berendzen, J. (1999). Automated MAD and MIR structure solution. Acta Crystallogr., Sect. D: Biol. Crystallogr. 55, 849–861. 54. Emsley, P. & Cowtan, K. (2004). Coot: model-building tools for molecular graphics. Acta Crystallogr., Sect. D: Biol. Crystallogr. 60, 2126–2132. 55. Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr., Sect. D: Biol. Crystallogr. 53, 240–255. 56. McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674. 57. Alexov, E. G. & Gunner, M. R. (1997). Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophys. J. 72, 2075–2093.

165 58. Georgescu, R. E., Alexov, E. G. & Gunner, M. R. (2002). Combining conformational flexibility and continuum electrostatics for calculating pK(a)s in proteins. Biophys. J. 83, 1731–1748. 59. Sitkoff, D., Sharp, K. A. & Honig, B. (1994). Accurate calculation of hydration free-energies using macroscopic solvent models. J. Phys. Chem. 98, 1978–1988. 60. Nicholls, A. & Honig, B. (1991). A rapid finite difference algorithm utilizing successive over-relaxation to solve the Poisson–Boltzmann equation. J. Comp. Chem. 12, 435–445. 61. Bashford, D. (1997). An object-oriented programming suite for electrostatic effects in biological molecules. In Scientific Computing in Object-Oriented Parallel Environments. 1343 of Lecture Notes in Computer Science. pp. 233–240, Springer, Berlin.