Soluble expression, purification and functional characterisation of carboxypeptidase G2 and its individual domains

Soluble expression, purification and functional characterisation of carboxypeptidase G2 and its individual domains

Accepted Manuscript Soluble expression, purification and functional characterisation of carboxypeptidase G2 and its individual domains Dhadchayini Jey...

2MB Sizes 1 Downloads 50 Views

Accepted Manuscript Soluble expression, purification and functional characterisation of carboxypeptidase G2 and its individual domains Dhadchayini Jeyaharan, Philip Aston, Angela Garcia-Perez, James Schouten, Paul Davis, Dr Ann M. Dixon PII:

S1046-5928(16)30115-2

DOI:

10.1016/j.pep.2016.06.015

Reference:

YPREP 4958

To appear in:

Protein Expression and Purification

Received Date: 12 April 2016 Revised Date:

27 June 2016

Accepted Date: 29 June 2016

Please cite this article as: D. Jeyaharan, P. Aston, A. Garcia-Perez, J. Schouten, P. Davis, D.A.M. Dixon, Soluble expression, purification and functional characterisation of carboxypeptidase G2 and its individual domains, Protein Expression and Purification (2016), doi: 10.1016/j.pep.2016.06.015. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Soluble expression, purification and functional characterisation of carboxypeptidase G2 and its

RI PT

individual domains

Dhadchayini Jeyaharan†, Philip Aston†, Angela Garcia-Perez†, James Schouten§, Paul Davis§,

TE D

M AN U

SC

and Ann M. Dixon†*

EP

Running Title: Soluble expression and characterization of CPG2

§ Mologic Ltd, Bedford Technology Park, Thurleigh, Bedford, MK44 2YP, UK and †

AC C

Department of Chemistry, University of Warwick, Coventry, CV4 7AL, UK.

*To whom correspondence should be addressed: Dr Ann Dixon, Department of Chemistry, University of Warwick, Coventry, CV4 7AL, UK, Telephone: +44 2476 150037; FAX: +44 2476 524112; email: [email protected]

1

ACCEPTED MANUSCRIPT

ABSTRACT Due to its applications in the treatment of cancer and autoimmune diseases, the 42 kDa zinc-dependent metalloenzyme carboxypeptidase G2 (CPG2) is of great therapeutic interest. An

RI PT

X-ray crystal structure of unliganded CPG2 reported in 1997 revealed the domain architecture and informed early rational drug design efforts, however further efforts at co-crystallization of CPG2 with ligands, substrates or inhibitors have not been reported. Thus key features of CPG2

SC

such as the location of the active site, the presence of additional ligand-binding sites, stability, oligomeric state, and the molecular basis of activity remain largely unknown, with the current

M AN U

working understanding of CPG2 activity based primarily on computational modelling. To facilitate renewed efforts in CPG2 structural biology, we report the first high-yield (250 mg L-1) recombinant expression (and purification) of soluble and active CPG2 using the E. coli expression system. We used this protocol to produce full-length enzyme, as well as protein

TE D

fragments corresponding to the individual catalytic and dimerization domains, and the activity and stability of each construct was characterised. We adapted our protocol to allow for uniform incorporation of NMR labels (13C,

15

N and 2H) and present preliminary solution-state NMR

EP

spectra of high quality. Taken together, our results offer a route for production and solution-state characterization that supports renewed effort in CPG2 structural biology as well as design of

AC C

significantly truncated CPG2 proteins, which retain activity while yielding (potentially) improved immunogenicity.

KEYWORDS

Carboxypeptidase G2; bacterial expression; isotopic labelling; activity measurements; nuclear magnetic resonance; mass spectrometry

2

ACCEPTED MANUSCRIPT

INTRODUCTION Carboxypeptidase G2 (CPG2) is a 42 kDa, zinc-dependent metalloenzyme from Pseudomonas that cleaves the glutamic acid moiety from folic acid and its analogues. CPG2 is

RI PT

currently being exploited in a rescue therapy following high-doses of the highly cytotoxic drug methotrexate (MTX), commonly used in the treatment of cancer and autoimmune diseases, to metabolise the drug into two non-toxic metabolites: 2,4-diamino-N10-methylpteroic acid

SC

(DAMPA) and glutamate (Fig. 1A)1, 2 . CPG2 has also played a key role in the development of antibody directed enzyme pro-drug therapy (ADEPT, see Fig. 1B and caption for full

M AN U

description)3, an anti-cancer therapy aimed at limiting the action of cytotoxic drugs to tumour sites, thus amplifying their selectivity and effectiveness and diminishing the lethal effects on normal tissue4. It is envisaged that CPG2 will be used to generate active drugs (i.e benzoic acid mustard drugs) from a variety of glutamated pro-drugs (i.e glutamated benzoic acid mustard pro-

TE D

drugs). CPG2 is ideal for use in ADEPT because it has no mammalian homologue, thus no endogenous enzymes would act on a pro-drug specific for CPG2, and being a bacterial enzyme has the advantage of enhanced kinetics with substrate turnover5.

EP

Due to its therapeutic applications in the treatment of cancer and autoimmune diseases, a robust understanding of the molecular determinants of CPG2 activity is of great interest for

AC C

further development of therapeutics and applications. However, apart from an X-ray crystal structure of unliganded CPG2 reported by Rowsell and co-workers in 19976 (PDB ID: 1CG2), our current working understanding of the molecular basis of CPG2 activity is largely based on sequence/structural homology and molecular modelling7. As shown in Fig. 1C, the crystal structure revealed a two-domain architecture for CPG2 consisting of a non-contiguous catalytic domain (Fig. 1D) and a dimerization domain thought to stabilize CPG2 homodimer formation.

3

ACCEPTED MANUSCRIPT

While the crystal structure informed early rational drug design efforts6, 8, further attempts at cocrystallization with ligands, substrates or inhibitors have not been reported and very little progress has been made to establish key features of CPG2 such as the location of the active site,

RI PT

the presence of additional ligand-binding sites, stability, oligomeric state, and the molecular basis of activity. This lack of progress in CPG2 structural biology may be due (in part) to difficultly in formation of crystals for the ligand-bound enzyme.

SC

To take the next important steps toward characterizing CPG2, we wished to develop a robust method of making soluble, active enzyme in high yield with ready incorporation of a

M AN U

range of labels required for solution-state structural characterization. To date, CPG2 has been obtained either from the native strain Pseudomonas sp. RS 169, or via expression using the Escherichia coli (E. coli) expression system10 either in very low yield (100-fold lower than Pseudomonas)11 or in an insoluble form requiring extensive unfolding and refolding steps and

TE D

often resulting in low yields of active protein12,13. The work presented here describes the first high-yield (250 mg L-1) recombinant expression (and purification) of soluble and active CPG2 using the E. coli expression system, achieved in part by removal of the N-terminal 22-residue

EP

signal peptide. This protocol was used to produce the full-length enzyme, as well as protein fragments corresponding to the individual catalytic and dimerization domains. A significantly

AC C

truncated version of the catalytic domain was also designed and produced to narrow down the minimal requirements for CPG2 function. The activity and stability of each construct was characterised. The expression protocol was readily extended to the preparation of isotopically 2

H/15N/13C-labelled CPG2 proteins (at yields of up to 109 mg L-1) for preliminary NMR

analyses. NMR data were collected for all proteins, including the 42 kDa full-length enzyme, and an optimal candidate for future structure / binding studies was identified.

4

ACCEPTED MANUSCRIPT

EXPERIMENTAL METHODS Preparation of bacterial expression vector encoding CPG2 and individual domains.

RI PT

A codon-optimised version of the full-length CPG2 gene from Pseudomonas sp strain RS16 in the pET28a plasmid (Novagen) was obtained from Mologic Ltd. (Thurleigh, UK) and used as a template to amplify the mature wild-type CPG2 gene in the absence of the signal

SC

peptide (CPG223-415, residues 23-415), as well as the isolated non-contiguous catalytic domain (CPG2CAT, composed of residues 23-214 and residues 323-415 linked via a single, non-native

M AN U

alanine residue) and dimerization domain (CPG2DIM, residues 213-322), using the polymerase chain reaction (PCR). A construct containing the N-terminal contiguous region of the catalytic domain was also produced (CPG2CAT’, composed of residues 23-198). The 5’-sense primers encoded a CACC site, and the 3’-antisense primers were designed to add a stop codon for translation termination. The amplified gene segments were ligated into the pET-151-D-Topo

TE D

vector (Invitrogen, UK) with an N-terminal tag containing the V5 epitope and a hexahistidine affinity tag, which could be removed by TEV cleavage. All constructs were confirmed by PCR

Protein expression.

EP

and DNA sequencing (GATC, UK).

AC C

C41 (DE3), C43 (DE3) and BL21 (DE3) E. coli strains were tested for expression, and the BL21 (DE3) strain (New England Biolabs, UK) was selected as the final expression host for CPG2 and its individual domains. Typically, protein expression was carried out by inoculating 1 L of 2 × YT medium (supplemented with 100 µg L-1 ampicillin) with a 100 mL overnight starter culture followed by shaking at 180 rpm and 37°C until an OD600 nm = 2 was reached. Induction of CPG2 expression was initiated with the addition of isopropyl b-D-thiogalactopy-pyranoside (IPTG) to a final concentration of 0.5 mM (unless otherwise stated), and cells were grown at

5

ACCEPTED MANUSCRIPT

either 37°C (4 hours), 25°C (16 hours / overnight) or 15°C (48 hours), with 15°C yielding the highest expression levels. Crude post-induction samples (1 mL) were collected 4, 24 and 48

were harvested by centrifugation at 4229 × g for 20 min. at 4°C.

RI PT

hours after induction and analysed using SDS-PAGE to determine optimal expression time. Cells

To prepare uniformly 15N/13C-labeled samples of the proteins, bacteria were grown in 2 × YT rich medium at 37°C to a cell density OD600 = 1, at which point the cells were gently spun

SC

down at 1519 × g for 5 min and resuspended in M9 minimal salts medium (0.4% D-glucose, 0.1% NH4Cl, 50 mM Na2HPO4 .7H20, 50 mM KH2PO4, 5 mM NaCl and 10 mM MgSO4)

M AN U

supplemented with 100 µg L-1 ampicillin, 100 µM FeCl3, 4 g L-1 13C-D-glucose, 1 g L-115NH4Cl, and 10 µg mL-1 BME vitamins (all obtained from Sigma Alrich, UK). The pellet was repeatedly washed with M9 to remove rich medium, and cells were finally resuspended in a condensed volume of minimal medium (25% of the original volume)14. Cells were then cultured for another

TE D

1-1.5 h at 37°C (until the OD600 increased by 1-2 units) to allow discharge of unlabelled metabolites before induction of protein expression using IPTG (0.5 mM). Cells were incubated at 15°C for 20 hours to achieve optimal expression.

EP

A double colony selection was performed for bacterial expression of triply

15

N/13C/2H-

labeled protein samples in D2O15. In the first colony selection, a glycerol stock of BL21 (DE3)

AC C

cells transformed with the CPG2-expression plasmid was streaked onto LB- agar plates and left overnight in a 37°C incubator. Rich medium (50% D2O) was inoculated with freshly transformed cells, grown at 37°C overnight, and spread onto an M9 agar plate containing 50% D2O. Eight colonies from this plate were separately grown in 10 mL M9 containing 50% D2O, induced with 0.5 mM IPTG, and grown for a further 24 hours at 15°C before analysing CPG2 expression levels using SDS-PAGE and Western blots. The colony yielding the largest CPG2 band was

6

ACCEPTED MANUSCRIPT

selected from the master plate and the entire procedure was repeated in media containing 99% D2O. At the end of the procedure, the colony expressing the largest yield of CPG2 in 99% D2O

RI PT

was selected and used for protein production. Protein purification.

The recovered cell pellets were resuspended in 10 ml Tris buffer (20 mM Tris, 137 mM

SC

NaCl, 1 mg mL-1 lysosome; 1mM 4-(2-aminoethyl) benzenesulfonyl fluoride hydrochloride; pH 7.3) and cells were lysed using a cell disruptor (30 kPSI). Once lysed, the insoluble fraction was

M AN U

pelleted by centrifugation (47,850 × g, 20 mins, 4°C). The soluble fraction was collected and incubated with 10 mM imidazole and Ni-NTA resin (Novagen, 50 % slurry, pre-equilibrated with 10 mM imidazole) for 1 hour at 4°C, then loaded into a disposable column and the flowthrough collected as the resin set by gravity. The column was washed with ten bed volumes of Tris buffer (20 mM Tris, 100 mM NaCl, 10 mM imidazole, pH 8), and the protein was eluted

TE D

with five bed volumes of elution buffer (20 mM Tris, 100 mM NaCl, 250 mM imidazole, pH 8). To remove imidazole, the elution buffer was exchanged against Tris buffer using a PD-10 column. At this stage, if necessary, the N-terminal His-tag was removed by incubating CPG2

EP

with TEV protease at 4°C for 48 hours. Cleaved protein was further purified using Ni-NTA resin

AC C

(Novagen, 50% slurry, pre-equilibrated with 10 mM imidazole) followed by FPLC purification on a HiLoad 16/600 Superdex 75 GL column (GE Healthcare UK Ltd. Buckinghamshire, UK) pre-equilibrated with 20 mM Tris HCl buffer (pH 8) containing 100 mM NaCl (Sigma, UK). FPLC was performed at 4°C and at a flow rate of 0.5 mL min-1, and protein elution was monitored via absorbance at 280 nm. Calibration standards, containing 5 mg mL-1 each of βamylase (200 kDa), Albumen (66 kDa) and Cytochrome C (12.4 kDa), were analysed under identical conditions. The bicinchoninic acid (BCA) assay was used for colorimetric detection

7

ACCEPTED MANUSCRIPT

and quantitation of total protein (Thermo Scientific Pierce, UK). For buffer screening using the hanging drop method, the protein was concentrated down to 23 mg mL-1 using a centrifugal

RI PT

concentrator. Spectroscopic assay for CPG2 activity.

The activity of CPG2 was estimated by measuring its ability to cleave methotrexate

SC

(MTX, absorbs at 320 nm) into two non-absorbing species (see Fig. 1A). In the assay, 50 µL of CPG2 solution (20 mM Tris buffer, 100 mM NaCl, pH 7.3) was added to a 1 mL quartz cuvette

M AN U

containing 900 µL of the assay buffer (0.1 M Tris-HCl, 0.2 mM ZnCl2, pH 7.5) and 50 µL 0.6 mM methotrexate solution (0.1 M Tris-HCl, 0.2 mM ZnCl2, pH 7.5) pre-equilibrated to 37°C before addition of CPG2. The A320 was measured using a Perkin Elmer Lambda 35 UV/VIS Spectrometer with Peltier temperature control unit for 1 minute, in time steps of 2 sec, and absorbance versus time (min) was plotted for each sample. The slope was calculated for the

TE D

linear portion of the plot where the rate of the absorbance at 320 nm declined. By default, this is defined as occurring between 0 and 1 min. The activity was obtained in units per millilitre (U mL-1), where one unit is the amount of enzyme required to catalyse the hydrolysis of 1 µmol

EP

MTX per minute at 37°C. The measured absorbance of the assay buffer (blank) was subtracted

AC C

from all measurements. Mass spectrometry.

Protein size and purity were confirmed using electrospray ultra-high resolution

quadrupole time of-flight mass spectrometry (LC-ESI UHR QTOF). Samples containing 20 µM protein dissolved in 20 µM ammonium bicarbonate were diluted five-fold with 30% acetonitrile (ACN) with 0.1% formic acid (FA), and 40 uL aliquots of these were loaded onto a C4 column with 100% water and 0.1% FA as solvent A and 100% ACN with 0.1% FA as solvent B using a 8

ACCEPTED MANUSCRIPT

3-step gradient (75% of solvent A for 16 min, 100% of solvent B for 5 min and 75% of solvent A for 11.5 min) and eluted directly into the MaXis II QTOF mass spectrometer (Bruker Daltonics). The resulting total ion count (TIC) data (summed intensity across the entire range of masses

RI PT

being detected at every point in the analysis) was extracted and mass averaged in the region containing the proteins of interest, and deconvoluted using the Data Analysis Software (Bruker Daltonik GmbH; Bremen, Germany).

SC

Buffer Screen.

Initially, three buffer conditions were tested to extend the stability of CPG223-415, and

M AN U

these are summarized in Table 1. To improve long-term solubility, stability and sample conditions, we screened 363 conditions covering a range of buffer types (all used at a final concentration of 50 mM), pH values (5.2 - 8.53), added salts (NaCl, NaBr, LiCl, KCl, KSCN, Na2SO4, and Arg and Glu) and salt concentrations (50 - 200 mM). The influence of each buffer

TE D

composition on the stability of concentrated CPG2 samples (~ 23 mg / mL) was determined using the hanging drop method (used in X-ray crystallography) 16, and is summarized in Fig. S3. Briefly, 1 µL aliquots of protein solution and 1 µL of 50 mM reservoir buffer solution were

EP

pipetted onto a glass cover slip. The cover slip was inverted and sealed onto the wells of a 24 well-tissue culture plate using paraffin. The plates were left at room temperature for vapour

AC C

diffusion to take place. The amount of precipitate was checked after 5 and 11 days by visual examination under a microscope. Nuclear magnetic resonance (NMR) spectroscopy. NMR samples contained approximately 0.4 mM protein dissolved in 20 mM Tris buffer (pH 7.2) containing 100 mM NaCl. Spectra were collected on either a 700 MHz Bruker Avance spectrometer (Bruker, UK), equipped with a cryoprobe, housed at the University of Warwick or

9

ACCEPTED MANUSCRIPT

on a 900 MHz Bruker AVANCE III spectrometer (equipped with a cryoprobe) housed at the Henry Wellcome Building for NMR (HWB-NMR) at the University of Birmingham, UK. Twodimensional 1H/15N heteronuclear single quantum correlation (HSQC) spectra using were

RI PT

acquired at 298 K, 303 K and 310 K with 2048 × 128 data points and 256 scans at 700 MHz and/or 900 MHz. Water suppression was achieved using 3-9-19-pulse sequence with zgradients17, 18. The carrier positions were set to 114.994 ppm for

15

N and 4.719 for 1H, the 15

SC

spectral widths were 14 ppm in the 1H the dimension and 40 ppm in the

N dimension. Data

RESULTS AND DISCUSSION

M AN U

were processed using TOPSPIN 2.1 (Bruker, UK) and analysed using SPARKY19, 20.

Expression of codon-optimised full-length CPG2 in E. coli leads to intermittent activity and existence of two CPG2 subspecies.

TE D

An early aim of this work was the development of an E. coli-based protocol for overexpression of CPG2 in a soluble and active form. E. coli was a more attractive alternative to Pseudomonas sp. RS-16, the natural CPG2 source used previously to produce CPG2 for

EP

crystallographic studies, as it offers shorter culturing time, fast high density cultivation and

AC C

easier genetic manipulation. Because Pseudomonas and E. coli differ greatly in their relative codon-usage, the CPG2 gene from Pseudomonas sp. strain RS-16 was codon-optimised and cloned into the pET28a vector (containing a N-terminal 6 × His tag) for expression in BL21 (DE3) E. coli strain21. This optimization did result in high yield expression of full-length CPG2 in E. coli, but in its insoluble form21. Such proteins, forming dense aggregates (i.e. inclusion bodies), are frequently improperly folded and require lengthy and complex refolding protocols to restore activity.

10

ACCEPTED MANUSCRIPT

A number of tactics for the redirection of proteins from inclusion bodies to soluble cytoplasmic fractions are described in the literature, and generally require systematic optimization of a number of environmental factors22. To increase the levels of soluble full-length

RI PT

CPG2 in vivo, we varied the E. coli strain, IPTG concentration, and induction temperature. The E. coli strain BL21 (DE3), as well as two mutant strains known to promote soluble protein expression in previous studies23 (C41 (DE3) and C43 (DE3)), were tested at 20°C using 1 mM

SC

IPTG to induce protein expression. Comparable protein expression was observed in C41 (DE3) and BL21 (Fig. 2A), however cell growth was limited in C43 (DE3) and the optical density never

M AN U

exceeded OD600nm of 0.1, thus cells never entered the exponential growth phase. BL21 (DE3) was selected as the host for the remainder of the study. Reduction of the expression temperature and IPTG concentration are also commonly used approaches to limit in vivo protein aggregation in BL21 (DE3)24. As shown in Fig. 2B, while the IPTG concentration had little effect on

TE D

expression levels, the yield was significantly improved by the use of low induction temperature (15ºC). The ideal assay to evaluate the quality of the soluble CPG2 produced was the enzymatic activity based on cleavage of the substrate MTX (Fig. 1A). However, the enzymatic activity of

EP

replicate batches of CPG2 showed high inconsistency (Fig. 2C). SDS-PAGE gels indicated the presence of two bands close to the expected molecular

AC C

weight of CPG2 (indicated by arrows, Fig. 2D, left panel). In-gel tryptic digestion and protein identification using mass spectrometry were performed to identify the band of interest, as it was expected that the two populations corresponded to immature (containing the N-terminal, 22 residue signal sequence) and mature (signal sequence removed) CPG2. In fact, both bands corresponded to different isoforms of immature CPG2 (Fig. 2D, right panel), accounting for the poor batch-to-batch reproducibility of the activity assay (Fig. 2C). These results suggest that

11

ACCEPTED MANUSCRIPT

reducing the temperature of induction (and thus expression level) did not lead to expression of CPG2 in its soluble, periplasmic form.

RI PT

Removal of the 22-residue signal peptide leads to expression of a single, soluble CPG2 species. Two approaches were then considered in order to express biologically active soluble protein using BL21 (DE3) as a host. The first was removal of the N-terminal signal peptide

SC

sequence, which was unlikely to be recognised by E. coli. The second was replacement of the native signal peptide (residues 1-22) with a leader sequence (i.e. pelB) used for periplasmic

M AN U

localisation in E. coli bacterial strains25. The latter usually produces much lower yields since not all expressed protein is secreted into the periplasm. Therefore, the first option was prioritised, as it restricts the localisation of the protein to one compartment of the cell (cytoplasm). The pET28a plasmid containing the full-length CPG2 gene was used as a template to generate a cDNA sequence of CPG2 without the leader sequence (CPG223-415), which was then subcloned into the

TE D

pET151-D-topo vector (Invitrogen) and expressed and purified as detailed in Materials and Methods. Removal of the signal peptide led to a single-band in SDS-PAGE (Fig. 3A, top panel, lanes 1-5) and a 6-fold increase in protein expression yield (Table 2). Because Ni2+ IMAC

EP

purification was not effective at completely isolating cleaved CPG223-415 from His6-CPG223-415,

AC C

an extra gel filtration purification step was introduced. CPG223-415 eluted in two peaks which, when compared to a set of calibration standards as shown in Fig. 3B, corresponded to the monomeric and dimeric species. Mass spectrometric analysis confirmed the presence of CPG223415 protein

in the pooled, pure fractions (Table 2 and Fig. S1A). The purified, cleaved CPG223-415

(Fig. 3A) was tested for its ability to catalyse the hydrolysis of MTX and, as shown in Fig. 3C, yielded an enzymatic activity of 100.3 U mL-1 (Table 2) indicating that the recombinant enzyme produced with no signal peptide in the E. coli cytoplasm was biologically active.

12

ACCEPTED MANUSCRIPT

High-yield expression and solution NMR measurements of isotopically-enriched CPG223-415. The lack of progress in CPG2 structural characterization suggests that future studies may benefit from application of complementary methods in addition to X-ray crystallography. One

RI PT

method we suggest is well-suited to mapping substrate/ligand interactions and active site conformation is solution–state NMR. Solution–state NMR is amenable to dynamic systems and can access a large range of binding affinities, but requires 2H,

13

C and

15

N-enriched protein

SC

samples for high–resolution structural studies of large proteins, such as the CPG2 enzyme. This is primarily due to the slow tumbling of large proteins in solution, which leads to broad NMR

M AN U

signals that are ultimately unobservable26. While this limitation is constantly driven back by progress made in experimental NMR techniques and development of novel strategies for isotopic labelling 27, the low sensitivity of triple resonance experiments required for backbone assignment means progress is slow28. Indeed, few structural studies have been reported in the literature for

TE D

proteins larger than thirty kilodaltons29.

The suitability of CPG223-415 investigation via solution NMR was initiated by E. coli expression of the enzyme enriched with 2H, 13C and

15

N labels in M9 minimal media, in which

EP

the investigator has full control over the sources of carbon, nitrogen and hydrogen utilized by the cell, followed by purification. SDS-PAGE analyses and estimation of concentration using the

AC C

BCA assay indicated over 50% reduction in soluble protein yield of CPG223-415 upon switching the medium from 2 × YT to M9 containing isotopic labels (Table 2). To reduce the cost of each sample, an innovative expression method described by Sivashanmugan et al. in 2009 was used in which cell density is increased without interfering with the expression vector (referred to herein as the cell condensation method). Firstly, a double colony selection was performed (see Fig. S2A and caption for details)30, followed by expression using the cell condensation method (Fig.

13

ACCEPTED MANUSCRIPT

S2B)14. The degree of condensation yielding maximal expression differs depending on the protein expressed, and must be investigated. Small-scale time-course experiments were carried out for 1-fold, 2-fold, 4-fold and 8-fold condensations (respectively 100%, 50%, 25% and 12.5%

RI PT

of the initial volume of rich medium), and visual inspection of western blots indicated that 2-fold and 4-fold condensation produced similar, optimal yields (Fig. S2B). A four-fold condensation was used for the remainder of the work, reducing the cost of protein samples four-fold (as

SC

compared to conventional labelling, e.g. 1-fold).

While we had successfully expressed CPG223-415 in a soluble, active form in E. coli, the

M AN U

resulting protein displayed limited long-term stability (activity deteriorated 24 hours postpurification) and a propensity to aggregate over time in Tris buffer containing 50 mM NaCl (pH 7.2). The sensitivity of CPG223-415 to salt concentration was investigated by solubilisation of the enzyme in three buffers containing either 50, 137, or 500 mM NaCl (Table 1). As shown in Fig.

TE D

4, CPG223-415 was active at only one of the three salt concentrations, 137 mM, with an activity of 63.4 U mL-1 which was maintained over days. Protein solubility could not account for differences in activity, as substitution of NaCl with 50 mM L-arginine and 50 mM L-glutamate 31

did not enhance activity (Fig. 4). More

EP

(previously suggested to enhance solubility)

comprehensive screening of 363 additional buffer conditions was achieved using a method

AC C

typically employed for crystallography studies 16, and the resulting precipitation pattern observed for CPG223-415 in crystallisation plates is described in Fig. S3. Analysis of the precipitation pattern, as well as our smaller buffer screens, suggested that 20 mM Tris buffer (pH 7.3) containing 100 mM NaCl yielded the highest enzyme activities and no precipitation at 18°C. Therefore, CPG223-415 samples were prepared in this buffer for further NMR analyses.

14

ACCEPTED MANUSCRIPT

NMR has not been used to study CPG2 previously, but affords the possibility of studying this enzyme in solution without the need to generate crystals. The molecular weight of CPG223415

(42 kDa) and its dynamic nature (L-shaped protein) makes NMR assignment and subsequent

RI PT

structural investigation of the full-length enzyme an ambitious goal. We applied TROSY (transverse relaxation-optimised spectroscopy) based techniques to measurements of 2H, 13

15

N,

C-labelled CPG223-415 as this method minimises line-broadening due to limited molecular and can result in significant improvements in resolution and

sensitivity. The 1H/15N TROSY-HSQC spectrum of 2H,

15

SC

32, 33

tumbling in large proteins

N,

13

C-labelled CPG223-415 solubilized

M AN U

in 20 mM Tris buffer (pH 7.3, 100 mM NaCl) is shown in Fig. 5. Although this buffer contained a NaCl concentration that may lead to sensitivity loss when using the cryogenic NMR probe, it yielded protein samples with the highest stability while retaining activity. The spectrum displayed reasonable signal to noise, narrow linewidths, and a larger-than-expected number of

TE D

observable peaks (342 peaks, or 80% of the expected peaks) for a protein of this size. While this spectrum was promising for such a large protein, the issues with spectral overlap and broad/missing resonances hampered efforts to assign the protein using 3D NMR methods (which

EP

are substantially less sensitive than the 2D HSQC), and a new approach was employed to obtain

AC C

information about the regions of CPG2 crucial for activity. Preparation of isolated CPG2 dimerization and catalytic domains leads to improved NMR spectra and narrows requirements for activity of the enzyme. Investigation of the structure and ligand–binding modes for CPG2 in solution using

solution NMR requires the assignment of NMR resonances to specific nuclei of the protein. As demonstrated above, this assignment can be problematic for large proteins due to increased spectral overlap and broadening of peaks. Our efforts at more comprehensive labelling

15

ACCEPTED MANUSCRIPT

(15N/13C/2H) were unable to overcome these issues, hence a “divide-and-conquer”

34

approach

was utilized to study smaller domains of this large enzyme with solution properties more amenable to NMR measurements (e.g. faster tumbling, slower relaxation and improved

RI PT

lineshape).

According to the crystal structure (PDB ID: 1CG2), CPG2 consists of distinct protein domains associated with different functions, namely a dimerization domain inserted between two

SC

non-contiguous regions of the catalytic domain6. Considering this molecular domain organisation, and making use of the available crystal structure, we designed constructs to dissect

M AN U

the protein into three smaller fragments: the isolated dimerization domain fragment (CPG2DIM, residues 213-322); the full catalytic domain fragment (CPG2CAT, composed of residues 23-214 and residues 323-415 which we linked together using a single, non-native alanine residue, the location of which is shown in Fig. 1C); and the largest contiguous region of the catalytic domain

expressed with uniform

15

TE D

(CPG2CAT’, composed of residues 23-198) (Fig. 6A). Each of above three constructs was N/13C labelling in E. coli, and optimal expression was achieved at

25°C using 0.7 mM IPTG for induction (Fig. S4A). All three constructs were purified as

EP

described for CPG223-415 (Fig. S4B), and mass spectrometry was used to confirm the identity and purity of unlabelled samples (Fig. S1). The activity of each construct was assessed using the

AC C

methotrexate-based activity assay (Fig. 6B) and the activities are provided in Table 2. CPG2CAT yielded the highest activity of the three fragments, retaining 62% of the activity observed for CPG223-415 and confirming that linking these two polypeptide chains via an alanine residue did not disrupt the domain structure so severely that all function was lost. Removal of residues 323415 from the catalytic domain, to create the CPG2CAT’ protein, lead to a severe (92%) reduction in catalytic activity compared to CPG223-415 suggesting that the C-terminal region of the catalytic

16

ACCEPTED MANUSCRIPT

domain is implicated in substrate recognition / activity. Unsurprisingly, CPG2DIM yielded low activity (only 1.5% of CPG223-415 activity). 2D 15N-HSQC NMR spectra were acquired for all three protein constructs in 20 mM Tris

RI PT

buffer containing 100 mM NaCl to explore their suitability for further structural characterization, and the spectra are shown in Figs. 6C-E. The 110-residue protein derived from the CPG2 dimerization domain (CPG2DIM) did not yield a well-dispersed 1H/15N-HSQC spectrum, with

SC

most of the backbone amide 1H chemical shifts falling between 7.7 and 8.4 ppm. These results indicate that the dimerization domain is highly unstructured in isolation and likely requires the

M AN U

presence of the catalytic domain to fold correctly. Improved chemical shift dispersion and reduced signal overlap was observed for the CPG2CAT protein suggesting that the non-contiguous catalytic domain is able to fold in the absence of the dimerization domain. An overlay of the CPG2CAT HSQC spectrum with the corresponding spectrum of CPG223-415 (Fig. S5A) revealed

TE D

that, while many of the chemical shifts have changed upon removal of the dimerization domain (likely because of the absence of contacts between these domains) the envelope of chemical shifts in both spectra is remarkably similar and many of the chemical shift changes appear to be

EP

small. This behaviour suggests a similar fold for the catalytic domain in the absence and presence of the dimerization domain, and is supported by the retention of activity against

AC C

methotrexate (Table 2) for this isolated domain. The largest improvement in NMR peak width and variable intensity was observed for the 176-residue CPG2CAT’ protein, which produced a well-dispersed 1H/15N-HSQC spectrum suggesting a well-folded protein (Fig. 6E). Overlaying the CPG2CAT’ HSQC spectrum with that of CPG223-415 (Fig. S5B) further emphasized the improvement in peak shape, however the low activity observed for this construct (8.2 U mL-1) reinforces the key role of the C-terminal region of the catalytic domain in function.

17

ACCEPTED MANUSCRIPT

The only protein which combined a significant reduction in size (from 42 kDa to 30 kDa) with retention of activity was the CPG2CAT protein. The smaller size improved the NMR linewidths, but may also have implications for the immunogenicity of this important therapeutic

RI PT

protein. Currently, when CPG2 is administered to patients, antibody formation is a common response and one of the major drawbacks to CPG2 treatment. As summarized in a review by Chester and co-workers

35

,

specific sequences of amino acids within CPG2 known as T-cell

SC

epitopes can trigger an immune response by activating T-cells. Modification of these T-cell epitopes is one of the primary strategies used to reduce the immunogenicity of foreign enzymes.

M AN U

Comparison of our CPG2CAT protein sequence to the known T-cell epitopes within the fulllength enzyme as described by Chester and co-workers

35

suggests that our removal of residues

215-322 (i.e. the dimerization domain) removes approximately half of the known T-cell epitopes within the sequence of wild-type CPG2.. We have also used in silico tools to predict the

TE D

immunogenicity of full-length CPG2 and CPG2CAT from the T-cell epitope content in the protein primary sequence, as such approaches have been shown to correlate well with in vivo measurements of immunogenicity. The EpiToolKit server

36

was used here to analyse the

EP

primary sequences of the CPG223-415 and CPG2CAT proteins for the presence of MHC Class I and Class II binding epitopes from all available SYFPEITHI

37

alleles. The CPG223-415 protein was

AC C

found to contain 421 peptide sequences (of between 8-11 amino acids in length) predicted to bind to known HLA-A and HLA-B allelles. The number of binding epitopes was reduced by 30% in the CPG2CAT protein, again suggesting that truncating the protein could have a significant positive impact on immunogenicity. This requires experimental validation in vivo, but suggests an exciting direction for future research.

18

ACCEPTED MANUSCRIPT

CONCLUSIONS Here we describe the first reported method for high-yield expression of active and soluble mature CPG2 (in the absence of the leader peptide) and its individual catalytic and dimerization

RI PT

domains in E. coli. Such a method is of key importance for future structural and binding studies involving this enzyme, which we suggest may have been hampered by difficulties in obtaining large yields of the protein as well as labelling of the protein. Using the condensation methods

SC

outlined by Sivashanmugan et al.14 and Marley et al.30, we can routinely produce milligrams quantities of 15N and 2H/13C/15N isotopically-labelled protein suitable for NMR studies. Solution

M AN U

NMR is well-suited to the investigation of ligand-binding characteristics of CPG2, which is of high importance in applications using this enzyme (such as ADEPT). NMR can also be used to obtain information on conformational dynamics, which may have held back crystallographic approaches to ligand-binding studies thus far. Although the NMR data described here for the

TE D

full-length, mature CPG2 is preliminary, and included here only to illustrate the improvement in peak width as enzyme size was decreased, work is ongoing to use our optimised expression protocol to identify conditions that yield a full complement of 3D NMR data to facilitate

EP

assignment.

We utilized a “divide-and-conquer” approach to investigate, for the first time, the isolated

AC C

catalytic and dimerization domains in CPG2. This was motivated in part by the quest for highquality NMR peak shapes, and in part as an effort to reduce the immunogenicity of CPG2 by reducing the overall size while maintaining activity. NMR data clearly indicated that the isolated dimerization domain did not fold, but the isolated catalytic domain (created by fusing the two non-contiguous regions of this domain through a single Ala residue, Fig. 6A) was able to fold in the absence of the dimerization domain and cleave MTX with an activity 62% that of wild-type

19

ACCEPTED MANUSCRIPT

levels. To further narrow down the components necessary for activity, the largest contiguous region of the catalytic domain (CPG2CAT’, Fig. 6A) was expressed and purified. The CPG2CAT’ protein appeared folded from the appearance of the NMR data (i.e. peaks are narrow and well-

RI PT

dispersed), but did not retain the parent protein’s enzymatic activity. This lack of activity demonstrates that, while the dimerization domain is not required, the C-terminal region of the catalytic domain (residues 323-415) is critical in substrate recognition / activity. The residues 6

are mapped onto the crystal structure in Fig. 7 and

SC

proposed to form the active site in CPG2

illustrate that the two non-contiguous regions come together to form the complete active site.

M AN U

Removal of the C-terminal region leads to removal of two residues from the active site (shown in grey), specifically His 385 and Arg 324, the later of which is the only residue to have been shown experimentally via mutagenesis to play a role in catalysis. Therefore future work will focus on NMR structural elucidation and ligand binding studies for the CPG2CAT protein, which

TE D

shows great promise for downstream structural studies, as well as investigation of the immunogenicity of the CPG2CAT protein.

EP

ACKNOWLEDGEMENTS

This work was supported by a Biotechnology and Biological Sciences Research Council

AC C

Industrial Case Studentship (BB/I015965/1) awarded to AMD and PD. The authors wish to thank M. Chow for helpful discussions, I. Prokes (University of Warwick, Coventry, UK) for NMR assistance, and N. Chmel (University of Warwick, Coventry, UK) for circular dichroism assistance.

20

ACCEPTED MANUSCRIPT

BIBLIOGRAPHY

7.

8.

9. 10. 11.

12.

13.

14.

15.

RI PT

SC

6.

M AN U

4. 5.

TE D

3.

EP

2.

Levy, C. C., and Goldman, P. (1967) The enzymatic hydrolysis of methotrexate and folic acid, J. Biol. Chem. 242, 2933-2938. Albrecht, A. M., Boldizsar, E., and Hutchison, D. J. (1978) Carboxypeptidase displaying differential velocity in hydrolysis of methotrexate, 5-methyltetrahydrofolic acid, and leucovorin, J. Bacteriol. 134, 506-513. Bagshawe, K. D. (1987) Antibody directed enzymes revive anticancer prodrugs concept, Brit. J. Cancer 56, 531-532. Bagshawe, K. D. (1985) Cancer drug targeting, Clin. Radiol. 36, 545-551. Searle, F., Bagshawe, K. D., Boden, J., Bier, C., Green, A. J., Pedley, R. B., Melton, R. G., and Sherwood, R. F. (1986) Carboxypeptidase G2 conjugates with localizing antitumor antibodies - Potential therapeutic agents, Tumour Biol. 7, 320-320. Rowsell, S., Pauptit, R. A., Tucker, A. D., Melton, R. G., Blow, D. M., and Brick, P. (1997) Crystal structure of carboxypeptidase G(2), a bacterial enzyme with applications in cancer therapy, Structure 5, 337-347. Lindner, H. A., Lunin, V. V., Alary, A., Hecker, R., Cygler, M., and Menard, R. (2003) Essential roles of zinc ligation and enzyme dimerization for catalysis in the aminoacylase-1/M20 family, J. Biol. Chem. 278, 44496-44504. Khan, T. H., Eno-Amooquaye, E. A., Searle, F., Browne, P. J., Osborn, H. M. I., and Burke, P. J. (1999) Novel inhibitors of carboxypeptidase G(2) (CPG(2)): Potential use in antibody-directed enzyme prodrug therapy, Journal of Medicinal Chemistry 42, 951-956. Minton, N. P., and Clarke, L. E. (1985) Identification of the promoter of the Pseudomonas gene coding for carboxypeptidase G2, J. Mol. Appl. Genet. 3, 26-35. Sorensen, H. P., and Mortensen, K. K. (2005) Advanced genetic strategies for recombinant protein expression in Escherichia coli, J. Biotechnol. 115, 113-128. Minton, N. P., Atkinson, T., and Sherwood, R. F. (1983) Molecular cloning of the Pseudomonas carboxypeptidase G2 gene and its expression in Escherichia coli and Pseudomonas putida, J. Bacteriol. 156, 1222-1227. Goda, S. K., Rashidi, F. A. B., Fakharo, A. A., and Al-obaidli, A. (2009) Functional overexpression and purification of a codon optimized synthetic Glucarpidase (Carboxypeptidase G2) in Escherichia coli, Protein J. 28, 435-442. Danel, C., Duval, C., Azaroual, N., Vaccher, C., Bonte, J. P., Bailly, C., Landy, D., and Goossens, J. F. (2011) Complexation of triptolide and its succinate derivative with cyclodextrins: Affinity capillary electrophoresis, isothermal titration calorimetry and (1)H NMR studies, J. Chromatogr. A 1218, 8708-8714. Sivashanmugam, A., Murray, V., Cui, C. X., Zhang, Y. H., Wang, J. J., and Li, Q. Q. (2009) Practical protocols for production of very high yields of recombinant proteins using Escherichia coli, Prot. Sci. 18, 936-948. Murray, V., Huang, Y. F., Chen, J. L., Wang, J. J., and Li, Q. Q. (2012) A novel bacterial expression method with optimized parameters for very high yield production of triplelabeled proteins, In Protein NMR Techniques, Third Edition (Shekhtman, A., and Burz, D. S., Eds.), pp 1-18, Humana Press Inc, Totowa.

AC C

1.

21

ACCEPTED MANUSCRIPT

21. 22. 23.

24. 25. 26.

27.

28.

29. 30. 31. 32. 33.

RI PT

SC

20.

M AN U

19.

TE D

18.

EP

17.

Ducat, T., Declerck, N., Gostan, T., Kochoyan, M., and Demene, H. (2006) Rapid determination of protein solubility and stability conditions for NMR studies using incomplete factorial design, J. Biomol. NMR 34, 137-151. Piotto, M., Saudek, V., and Sklenar, V. (1992) Gradient-tailored excitation for singlequantum NMR spectroscopy of aqueous solutions, J. Biomol. NMR 2, 661-665. Sklenar, V., Piotto, M., Leppik, R., and Saudek, V. (1993) Gradient-tailored water suppression for H1-N15 HSQC experiments optimized to retain full sensitivity, J. Magn. Reson. Ser. A 102, 241-245. Kneller, D. G., and Kuntz, I. D. (1993) UCSF SPARKY - An NMR display, annotation and assignment tool, J. Cell. Biochem., 254-254. Lee, W., Westler, W. M., Bahrami, A., Eghbalnia, H. R., and Markley, J. L. (2009) PINE-SPARKY: Graphical interface for evaluating automated probabilistic peak assignments in protein NMR spectroscopy, Bioinformatics 25, 2085-2087. Goto, N. K., and Kay, L. E. (2000) New developments in isotope labeling strategies for protein solution NMR spectroscopy, Curr. Opin. Struct. Biol. 10, 585-592. Bentley, W. E., and Kompala, D. S. (1990) Optimal induction of protein-synthesis in recombinant bacterial cultures, Ann. N.Y. Acad. Sci. 589, 121-138. Miroux, B., and Walker, J. E. (1996) Over-production of proteins in Escherichia coli: Mutant hosts that allow synthesis of some membrane proteins and globular proteins at high levels, J. Mol. Biol. 260, 289-298. Weickert, M. J., Doherty, D. H., Best, E. A., and Olins, P. O. (1996) Optimization of heterologous protein production in Escherichia coli, Curr. Opin. Biotechnol. 7, 494-499. Choi, J. H., and Lee, S. Y. (2004) Secretory and extracellular production of recombinant proteins using Escherichia coli, Appl. Microbiol. Biotechnol. 64, 625-635. Nietlispach, D., Clowes, R. T., Broadhurst, R. W., Ito, Y., Keeler, J., Kelly, M., Ashurst, J., Oschkinat, H., Domaille, P. J., and Laue, E. D. (1996) An approach to the structure determination of larger proteins using triple resonance NMR experiments in conjunction with random fractional deuteration, J. Am. Chem. Soc. 118, 407-415. Wider, G. (2005) NMR techniques used with very large biological macromolecules in solution, In Nuclear Magnetic Resonance of Biological Macromolecules, Part C (James, T. L., Ed.), pp 382-398. Bayrhuber, M., and Riek, R. (2011) Very simple combination of TROSY, CRINEPT and multiple quantum coherence for signal enhancement in an HN(CO)CA experiment for large proteins, J. Magn. Reson. 209, 310-314. Fernandez, C., and Wider, G. (2003) TROSY in NMR studies of the structure and function of large biological macromolecules, Curr. Opin. Struct. Biol. 13, 570-580. Marley, J., Lu, M., and Bracken, C. (2001) A method for efficient isotopic labeling of recombinant proteins, J. Biomol. NMR 20, 71-75. Kelly, A. E., Ou, H. D., Withers, R., and Dötsch, V. (2002) Low-conductivity buffers for high-sensitivity NMR measurements, J. Am. Chem. Soc. 124, 12013-12019. Ollerenshaw, J. E., Tugarinov, V., and Kay, L. E. (2003) Methyl TROSY: explanation and experimental verification, Magn. Reson. Chem. 41, 843-852. Wüthrich, K. (1998) The second decade into the third millenium, Nat. Struct. Biol. 5, 492-495.

AC C

16.

22

ACCEPTED MANUSCRIPT

RI PT

SC

38.

M AN U

37.

TE D

36.

EP

35.

Pandey, A., Sarker, M., Scott, E. M., Liu, X. Q., and Rainey, J. K. (2014) Dividing to conquer - facilitating recombinant expression and NMR spectroscopy, Biochem. Cell Biol. 92, 576-576. Chester, K. A., Baker, M., and Mayer, A. (2005) Overcoming the immunologic response to foreign enzymes in cancer therapy, Expert Rev. Clin. Immunol. 1, 549–559. Feldhahn, M., Thiel, P., Schuler, M. M., Hillen, N., Stevanovic, S., Rammensee, H.-G., and Kohlbacher, O. (2008) EpiToolKit--a web server for computational immunomics., Nucl. Acids Res. 1, W519-522. Rammensee, H., Bachmann, J., Emmerich, N. P., Bachor, O. A., and Stevanovic, S. (1999) SYFPEITHI: database for MHC ligands and peptide motifs., Immunogenetics 50, 213-219. Bagshawe, K. D., Searle, F., Springer, C., Boden, J., Melton, R., Sherwood, R., and Jarman, M. (1989) Tumor Site Activation of Cyto-Toxic Agent, Brit. J. Cancer 59, 312312.

AC C

34.

23

ACCEPTED MANUSCRIPT

TABLES. Table 1. Buffer composition for Buffers 1-3 used to investigate the effect of NaCl concentration

RI PT

on activity and fold of CPG223-415.

[NaPi], mM

[KH2PO4], mM

[KCl], mM

[NaCl], mM

pH

10 20 10

2 1.8

2.7 2.7

500 50 137

7.2 7.3 7.3

M AN U

SC

Buffer 1 Buffer 2 (Gomori) Buffer 3 (1× ×PBS)

Table 2. Theoretical and measured masses, enzymatic activities against methotrexate (MTX), and protein yields (per liter of culture) for CPG2 and derivatives solubilized in 20 mM Tris buffer (pH 7.3) containing 0.2 mM ZnCl2 and 100 mM NaCl. Specific activities are also shown

Theoretical Measured Activity Mass + Mass (U mL-1) His6 (Da) (Da)

EP

Protein Construct

TE D

in units per mg of enzyme.

CPG2

Spec. Activity (U mg-1)

Protein yield (mg L-1) Rich 15 Medium N 2H, 13C, 15N (2 × YT) (M9) (M9)

32.9

5.6

38.6

nda

nda

45478.6

45477.8

100.3

75.5

251.6

nda

70-109.4

CPG2CAT

33889.4

33889.7

62.8

42.7

91.4

65.8

95.2

CPG2CAT’

22580.6

20845.7

8.2

3.6

nda

21.6

nda

CPG2DIM

15649.7

15649.1

1.5

2.9

49.8

5

nda

AC C

CPG223-415

a) not determined.

24

ACCEPTED MANUSCRIPT

FIGURE LEGENDS. Figure 1. A) CPG2 activity can be assayed by its enzymatic hydrolysis of the substrate methotrexate (MTX), yielding a decrease in MTX absorbance at 320 nm as it is cleaved into two

RI PT

non-absorbing species (DAMPA and glutamate). B) Schematic describing the various stages of antibody-directed enzyme prodrug therapy (ADEPT). In Stage 1, an antibody-enzyme conjugate (Ab-CPG2) is directed at a tumour-associated antigen and retained at the tumor site. Stage 2

SC

involves administration of a glycoslated enzyme inactivating/clearing agent, which inhibits the enzyme and rapidly removes excess immunoconjugate from the blood via hepatic receptors.

M AN U

Stage 3 describes administration of a non-toxic prodrug leading to localized generation of a potent cytotoxic agent (e.g. benzoic acid mustards) at tumor sites via CPG2-mediated cleavage of the protective moiety from the prodrug38. C) CPG2 crystal structure (PDB: 1CG2) highlighting the location of the dimerization domain (purple), thought to stabilize the dimeric

TE D

form of the enzyme, and the catalytic domain (green), which delivers the substrates for the two zinc ions in the active site. Also shown are the locations (●) and identities of the residues at the boundaries of the catalytic domain between which a non-native Ala linker was placed to create

EP

the CPG2CAT construct. D) Domain organization of CPG2 mapped onto linear sequence illustrating the non-contiguous nature of the catalytic domain and the residues thought to define

AC C

the boundaries of each domain.

Figure 2. A) Bacterial expression of full-length CPG2 in C41 (DE3) and BL21 (DE3) E. Coli strains, inducing with 1 mM IPTG at 20°C for 4 hours. The SDS-PAGE gels contain a protein marker (M) followed by the soluble (S) and insoluble (I) fractions after lysis. B) SDS-PAGE analyses of soluble and insoluble fractions after protein expression using BL21 (DE3) as the host for samples collected 24 hours post-induction with IPTG concentrations between 0.1-0.9 mM.

25

ACCEPTED MANUSCRIPT

C) Activity of replicate batches of expressed protein, as evaluated by monitoring the cleavage of methotrexate (MTX, A320 nm) into two non-absorbing species. D) SDS-PAGE gel (left panel) illustrating the presence of two CPG2 species post-expression and purification, both of which

RI PT

were confirmed as immature (i.e. containing N-terminal signal peptide) CPG2 by mass spectrometry (right panel).

SC

Figure 3. A) Coomassie-stained SDS-PAGE gel (top) and anti-His western blot (bottom) summarizing purification and cleavage of the His-tag from CPG223-415. Lanes 1-5 show elution

M AN U

of the protein from a Ni-NTA affinity column, followed by desalting of the protein using a PD10 coulumn, and cleavage of the His-tag from CPG2 using TEV protease. Cleavage was approximately 50% complete after 48 hours, at which point a second round of nickel affinity chromatography was used to collect the untagged CPG223-415 in the flow through (FT) and first elution fracton (E). B) Gel filtration chromatograms of a set of three gel filtration standards

TE D

(solid line) and CPG223-415 (dashed line) following isocratic elution from a HiLoad 16/600 Superdex 75 GL column equilibrated with 20 mM Tris (pH 8) containing 100 mM NaCl. Inset shows a plot of gel phase distribution coefficient (Kav = (Ve-Vo)/(Vc-Vo), where Ve = elution

EP

volume, Vo = column void volume, Vc = geometric column volume) versus log molecular mass

AC C

for the calibration standards (●) The resulting calibration curve (dashed line) was used to estimate the molecular weights of the two untagged CPG223-415 species (○) that eluted under identical conditions and corresponded closely to the expected masses for the monomeric (M) and dimeric (D) forms. C) Enzymatic activity measurements for CPG223-415 as measured using the methotrexate cleavage assay. Error was estimated using the standard error of the mean from duplicate measurements. Data were collected in the absence of any ZnCl2 in the assay buffer (inset) for comparison.

26

ACCEPTED MANUSCRIPT

Figure 4. Activity measurements for CPG223-415 against the substrate methotrexate measured in different buffer conditions, primarily differing in their NaCl content. Activity was observed in

Figure 5.

1

H-15N TROSY-HSQC spectrum acquired for

2

RI PT

137 mM NaCl and 50 mM L-arginine / 50 mM L-glutamate. H/15N/13C-labelled CPG223-415

solubilized in 20 mM Tris (pH 7.3) containing 100 mM NaCl. The spectrum was acquired at

SC

25°C and 700 MHz, and yielded 342 resolved peaks .

Figure 6. A) Schematic summarizing the three isolated CPG2 domains studied in this work: the

M AN U

dimerization domain (CPG2DIM, residues 213-322); the full, non-contiguous catalytic domain (CPG2CAT, composed of residues 23-214 and residues 323-415 linked via a non-native alanine); and the largest contiguous region of the catalytic domain (CPG2CAT’, composed of residues 23198). B) Activity measurements for each isolated domain against methotrexate (quantitative

TE D

activities are given in Table 2). C-E) 1H-15N HSQC spectra acquired for C) [U-15N] CPG2DIM, D) [U-2H,13C,15N] CPG2CAT, and E) [U-15N] CPG2CAT’ proteins solubilized in 20 mM Tris (pH 7.3) containing 100 mM NaCl. All three spectra were acquired at 25°C and 700 MHz. The CPG2 catalytic domain as shown in the crystal structure (PDB: 1CG2),

EP

Figure 7.

AC C

highlighting the region included in the CPG2CAT’ protein (shown in green) as well as the noncontiguous C-terminal region of the catalytic domain (gray). The residues previously proposed to form the active site (specifically H112, D141, E175, E176, E200, R324, and H385) are shown in space filling representation to illustrate that the inactive CPG2CAT’ protein is missing Arg 324 and His 385.

27

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT HIGHLIGHTS FOR: Soluble expression, purification and functional characterisation of carboxypeptidase G2 and its individual domains

EP

TE D

M AN U

SC

Carboxypeptidase G2 is of therapeutic interest in cancer and autoimmune treatment. Characterization of key features of CPG2 has been hampered by low protein yields. We report the first high-yield bacterial expression of soluble and active CPG2. We also report the first study of the isolated catalytic and dimerization domains. The results led to design of a highly truncated enzyme which retains 62% activity.

AC C

• • • • •

RI PT

Dhadchayini Jeyaharan†, Philip Aston†, Angela Garcia-Perez†, James Schouten§, Paul Davis§, and Ann M. Dixon†*