Biochimie 86 (2004) 601–605 www.elsevier.com/locate/biochi
High throughput cloning and expression strategies for protein production Jean-Michel Betton * Unité de Repliement et Modélisation des Protéines, Institut Pasteur, CNRS-URA2185, 28, rue du Docteur Roux, 75724 Paris cedex 15, France Received 6 April 2004; accepted 6 July 2004 Available online 06 August 2004
Abstract Traditionally, the production of a recombinant protein requires a preliminary cloning step of the target gene into an expression vector before evaluating its cellular expression. Among current methods, site-specific recombination cloning techniques, which eliminate the use of restriction endonucleases and ligase, offer several advantages in the context of high throughput (HT) procedures. Rapid and highly efficient, the recombinational cloning technology is largely used for structural genomics and functional proteomics. However, the correct expression of some genes requires further optimization steps that are time-consuming and carried out at relatively late stages in the cloning-expression process. An alternative strategy is described where expression is tested in vitro before cloning the target gene. This technology, amenable to automation for HT studies, makes the expression of several hundreds of genes possible from PCR products in cell-free transcription– translation systems. Once this preliminary step is achieved, the PCR product, which gives satisfying expression levels, is selected, and then cloned in a plasmid for its cellular expression and perpetuation. © 2004 Elsevier SAS. All rights reserved. Keywords: PCR; Cloning; Site-specific recombination; Cell-free expression; Protein production
1. Introduction Protein research is responding to the demand of postgenomics strategies with developments of systematic approaches to understand complete sets of proteins, through efficient and rapid expression of recombinant proteins [1,2]. However, the conversion of increasing genomic information, as well as the discovery of unknown biological functions, such as the identification of new therapeutic targets, in a purified native protein remains a current and continuing challenge [3,4]. Simultaneous and parallel studies of several hundreds of proteins require HT methods for gene delivery and rapid protein expression systems [5]. To fulfill their function, proteins must adopt a precise three-dimensional conformation, the native state, acquired during cellular protein folding [6]. This process is universally conserved in all organisms and enables bacteria to produce large amounts of human proteins, for example [7]. Although Escherichia coli is the simplest and by far the most widely used organism, the folding pathway of many recombinant * Corresponding author Tel.: +33-1-45-68-69, fax : +33-1-40-61-30-43 E-mail address:
[email protected] (J.-M. Betton). 0300-9084/$ - see front matter © 2004 Elsevier SAS. All rights reserved. doi:10.1016/j.biochi.2004.07.004
proteins is often impaired in that host. Consequently, the misfolded polypeptides aggregate and form inclusion bodies instead of the native protein. The conventional molecular mechanism of inclusion body formation suggests that nascent polypeptide chains have two alternative and competitive pathways between folding and misfolding [8]. Another major limitation of bacterial expression systems is the lack of eukaryotic post-translational modifications. However, the choice of an expression system for recombinant proteins in HT applications depends mainly on the success rate, which considers the fraction of proteins that can be produced in practical yields [9]. Thus, in the absence of a universal cellular system to produce recombinant proteins, this short communication will be limited to bacterial expression systems. Within the structural genomics program at the Pasteur Institute, I have been charged with the development of gene cloning and expression strategies to establish a centralized cloning facility to produce proteins from Mycobacterium tuberculosis and Mycobacterium leprae. Here, two such strategies are described which share, as the first step, a PCR amplification of the coding sequence, but differ substantially in the second step where rationale between cloning and expression is inversed.
602
J.-M. Betton / Biochimie 86 (2004) 601–605
2. Gene cloning then expression The handling of several hundreds of genes simultaneously requires a generic directional cloning technique with high efficiency in order to eliminate restriction site analysis of the target genes prior to cloning, and to reduce the background that limits the number of recombinant clones to be tested. For these reasons, the in vitro recombination method (Gateway™, Invitrogen) was implemented in the laboratory [10]. This method is based on the site-specific recombination reactions involved in the bacteriophage k integration and excision (Fig. 1). Incorporating strong positive selections and modifying the nucleotide sequences of specific att sites allow to recover correct expression clones at high efficiency (>90%). Our initial approach was based on the use of the Invitrogen pDEST17 vector, which creates fusion proteins bearing an N-terminal His6-tag, to provide also a generic strategy for protein purification. A simple PCR protocol and cloning conditions were optimized for the high G + C content of M. tuberculosis genes. At the end, cloned target genes in the pDEST17 vector are positioned downstream of the bacteriophage T7 promoter [11] which allows their expression in E. coli cells (BL21(DE3) strains) and in bacterial cell-free systems [12]. However, recombinant proteins have an additional 21 amino acids at their N-terminus, including the affinity tag and residues encoded by the attB1 site of recombination. It is often generally and reasonably assumed that
additional unstructured regions at termini cause potential problems in protein folding or crystallization. Thus, the experimental protocol was slightly modified to introduce at the PCR step the TEV protease cleavage site [13]. Before introducing robotics, the throughput was about 100 clones per person every month. This strategy, which becomes conventional in structural genomics and proteomics, follows a relatively traditional procedure where vectors containing target genes are constructed before testing protein expression. Consequently, the efficient expression of a given protein is realized only after obligatory experimental cloning steps. A procedure that would make it possible to test gene expression before gene cloning in a vector would have obvious advantages in terms of time and costs. In addition, if several constructs of the same gene could be tested to optimize critical parameters, the only successful cloning strategy would be the one yielding a satisfactory expression level. Now, this alternative strategy, with an inverted rationale, is provided by optimized cell-free protein expression systems.
3. Expression then cloning Cell-free protein expression systems, involving coupled in vitro transcription–translation reactions, produce recombinant proteins from added DNA [14]. Over the past several years, considerable progress (optimization of energy regen-
Fig. 1. Cloning by in vitro recombination: the Gateway™ technology The target gene is amplified by PCR with the sense primer containing the specific sequence of recombination attB1 and the antisense primer containing the sequence attB2 (step 1). The attB sequences, 25-base length each, are, respectively, fused at the 5′ and 3′ ends of the target gene. The PCR product is then subcloned into the shuttle vector pDONR201, in the presence of Int and IHF proteins, which catalyze the site-specific recombination between the attB and attP sites (step 2, BP cloning). The DNA products of this first cloning step, defining the ‘entry clone’ in the Gateway terminology, can be either isolated after transformation and selection on plates containing kanamycin, or directly mixed (‘one-tube protocol’), without any purification step, with the expression vector pDEST17, in the presence of Int, IHF and Xis proteins, which catalyze the site-specific recombination between the attL and attR sites (step 3, LR cloning), as indicated here. The DNA products of this second cloning step are directly used to transform bacterial cells, and the screening of only two colonies grown on plates containing ampicillin, is sufficient to obtain the desired final clone. Indeed, by incorporating a strong negative selection (the ccdB gene kills bacteria transformed either by the by-products or by unrecombined plasmids from both reactions of recombination) and two different positive selections (pDONR201 and pDEST17 vectors do not carry the same antibiotic resistance genes), the expression vector is recovered at high efficiency.
J.-M. Betton / Biochimie 86 (2004) 601–605
eration reactions and biochemical enhancement of bacterial lysates) has been made to improve the protein yield up to several hundred micrograms in these systems [15]. Efficient protein production is feasible using a linear DNA template containing the target gene with all the regulatory elements necessary to drive the synthesis of a protein [16]. By combining PCR techniques with cell-free expression systems (expression-PCR), this strategy eliminates the need for cloning and facilitates the analysis of protein expression in short periods of time [17]. Several improved cell-free systems are now commercially available, in which the T7 bacteriophage RNA polymerase performs transcription while an E. coli S30 lysate, enriched with limiting factors such as rare tRNAs, provides the translation machinery. The main properties of these systems are: (i) complete automatization and standardization of gene to protein in a few hours, (ii) no use of living cells risking the production of toxic proteins, and (iii) the open design which permits direct manipulation of the reaction conditions [18]. Programming a cell-free system with PCR products was made possible by adding a potent inhibitor of exonucleases, naturally present in the bacterial lysate, thus increasing considerably the lifetime of linear DNA templates. With this development, all techniques of mutagenic PCR, used in protein engineering (site-directed or random mutagenesis, ex-
603
change of codons, insertion or deletion, gene fusion, etc.), can be directly coupled with protein production [19]. To evaluate this alternative strategy, a limited number of genes, which were not successfully expressed by the above described recombination cloning strategy, was selected for sequence optimization. Indeed, RNA messengers of mycobacterial genes, naturally rich in GC bases, can adopt stable secondary structures in the translation initiation region (TIR) which limit ribosome binding and thus the level of expression [20]. The TIR optimization was performed by a knowledge-based calculation software (ProteoExpert) that define silent mutations in the first seven codons of genes, minimizing base pair probability and G+C content [21]. A set of mutants, harboring base changes, which do not modify the gene coding sequence was designed within the oligonucleotide sequences defining the forward primers. In a first step, these specific primers were used to amplify the target gene (PCR1). In the second step (PCR2), the T7-regulatory elements (promoter, ribosomal binding site, and terminator) and C-terminal His6-tag were introduced by overlap extension PCR from the PCR1 product as template. The final PCR2 product is directly used for in vitro expression, without any purification. In the example given in Fig. 2, the production of the unknown ML0180 protein from M. leprae, starting from 11 different PCR2 products, expressing the wild-type gene
Fig. 2. Expression of PCR products: the RTS technology. To introduce the T7-regulatory elements (promoter, ribosome binding site, and terminator), an overlap extension PCR using the Linear Template Generation Set kit (Roche) was performed by amplifying the ML0180 gene of M. leprae by two-step PCR. In the fist step (step 1), both primers contain gene-specific hybridizing sequences and 5′ extensions, which will serve as annealing sites in the second PCR. The sense primers, noted from 1 to 10, introduce several silent mutations into the seven first codons of the target gene (see Table 1). The PCR product of the first step (PCR1) is then used as the DNA template for the overlap extension PCR (step 2). The 3′ ends of the linear DNA fragments coding the regulatory elements act as primers and are extended to a full-length product, which is finally amplified via two short external primers. At this step, a His6-tag sequence was fused at the C-terminus of the ML0180 proteins. The final PCR2 product is directly expressed, without any purification step, into the cell-free Rapid Translation System RTS 100 E. coli HY kit (Roche). After 2 h at 30 °C, proteins produced in vitro are separated by electrophoresis (SDS-PAGE), transferred on a nitrocellulose membrane, and immunodetected using anti-His6 antibody coupled to peroxydase (step 3). The PCR2 product, which gave the highest protein yield, is then cloned in a vector to provide a permanent source of the amplified target gene for the scale-up expression. This preparative production can be performed either in vitro in a scale-up reaction with exchange (RTS 500 and RTS 9000 E. coli HY kits), or in vivo in bacterial cells. Here, the purified PCR2 products containing the wild-type (WT) and mutant 3 (noted M3) of the ML0180 gene were cloned in a vector and transformed in the BL21(DE3) strain. After induction, the bacterial lysates, resulting from two independent transformants (noted with indices 1 and 2), were separated by electrophoresis (SDS-PAGE), and proteins detected by Coomassie blue staining (step 4). An arrow indicates the ML0180 protein.
604
J.-M. Betton / Biochimie 86 (2004) 601–605
Table 1 Silent base substitutions within codons 2–7 of the ML0180 gene WT M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
ATG 1 ATG ATG ATG ATG ATG ATG ATG ATG ATG ATG
CCG 2 CCA CCA CCA CCA CCA CCA CCA CCA CCA CCA
ACC 3 ACT ACT ACA ACC ACT ACT ACC ACT ACT ACA
TAC 4 TAT TAT TAT TAT TAC TAC TAT TAT TAT TAT
and 10 silent mutants (Table 1), was tested in the Roche cell-free system (RTS 100 E. coli HY). After 2 h of synthesis, proteins were separated by electrophoresis (SDS-PAGE), transferred to a nitrocellulose membrane, and revealed by immunodetection with anti-His6-tag antibodies coupled to peroxydase. Visual inspection of a Western blot (Fig. 2) shows clearly that the silent mutations increase considerably the expression levels. These in vitro experiments allow analysis of protein production in one day. Furthermore, cellfree expression systems enable addition of detergents, molecular chaperones, and appropriate ligands during the process of protein synthesis, which may facilitate productive protein folding. Once expression is analyzed, the best constructs are converted to a circular plasmid by conventional PCR cloning methods. The selected expression plasmids are ready to produce target proteins either, in vitro, in a largescale exchange cell-free system (as RTS 500 E. coli HY) with a milligram range of protein yields, or, in vivo, in the bacterial strain BL21(DE3). Since it appears that most of the proteins behave similarly in both systems [12], the cell-free expression system may serve as a reliable screen for protein expression in HT studies. In Fig. 2, two selected PCR2 products, amplified from the wild-type and M3 mutant genes, were cloned in a plasmid for bacterial expression. To perform this final step, a rapid blunt-end PCR cloning method with a strong positive selection was developed [22]. In this specific case, the synonymous codon changes in TIR of the gene improved expression level not only in vitro, but also in bacterial cells. This alternative strategy is particularly attractive for manipulating the expression/cloning conditions, which are frequently critical to protein yields.
4. Conclusions The future of structural genomics and functional proteomics depends on the ability to obtain large numbers of different native proteins. The two strategies presented here can be used in a complementary fashion dictated by the target proteins and the need of optimization. Practical experience shows that the return is worth both the effort and investment on sequence optimization. However, effecting optimization from the be-
AGC 5 TCA TCA TCA TCA TCA TCT TCA TCA TCA TCA
TAC 6 TAT TAT TAT TAT TAT TAT TAT TAC TAC TAC
GAG 7 GAA GAG GAG GAA GAA GAA GAG GAG GAA GAG
TGT 8 TGT TGT TGT TGT TGT TGT TGT TGT TGT TGT
ginning avoids the necessity of going back to the beginning at a later stage of the expression strategy because, the sequence was not optimal. Both methods can also be used to efficiently screen many different constructs (orthologues, tags, protein domains, etc.) to identify rapidly those that produce a high yield of soluble protein in bacterial-based expression systems. Although post-translational modifications, like glycosylation, will probably remain for the present at least beyond the reach of bacterial expression systems, E. coli retains its dominant position as the prime choice of expression system for reasons of speed and simplicity [23]. It is tempting to conclude that continued developments of molecular biology methods, even more powerful for targeting complex proteins, will emerge to supplement these strategies.
References [1]
S.K. Burley, An overview of structural genomics, Nat. Struct. Biol. 7 (Suppl) (2000) 932–934. [2] M.R. Martzen, S.M. McCraith, S.L. Spinelli, F.M. Torres, S. Fields, E.J. Grayhack, E.M. Phizicky, A biochemical genomics approach for identifying genes by the activity of their products, Science 286 (1999) 1153–1155. [3] S.A. Marshall, G.A. Lazar, A.J. Chirino, J.R. Desjarlais, Rational design and engineering of therapeutic proteins, Drug Discov. Today 8 (2003) 212–221. [4] J.R. Swartz, Advances in Escherichia coli production of therapeutic proteins, Curr. Opin. Biotechnol. 12 (2001) 195–201. [5] S.P. Chambers, High-throughput protein expression for the postgenomic era, Drug Discov. Today 7 (2002) 759–765. [6] F.-U. Hartl, M. Hayer-Hartl, Molecular chaperones in the cytosol: from nascent chain to folded protein, Science 295 (2002) 1852–1858. [7] P. Braun, Y. Hu, B. Shen, A. Halleck, M. Koundinya, E. Harlow, J. LaBaer, Proteome-scale purification of human proteins from bacteria, Proc. Natl. Acad. Sci. USA 99 (2002) 2654–2659. [8] S. Betts, J. King, There’s a right way and a wrong way: in vivo and in vitro folding, misfolding and subunit assembly of the P22 tailspike, Structure 7 (1999) R131–R139. [9] P. Braun, J. LaBaer, High throughput protein production for functional proteomics, Trends Biotechnol. 21 (2003) 383–388. [10] J.L. Hartley, G.F. Temple, M.A. Brash, DNA cloning using in vitro site-specific recombination, Genome Res. 10 (2000) 1788–1795. [11] F.W. Studier, A.H. Rosenberg, J.J. Dunn, J.W. Dubendorff, Use of T7 polymerase to direct expression of cloned genes, Methods Enzymol. 185 (1990) 60–89.
J.-M. Betton / Biochimie 86 (2004) 601–605 [12] D. Busso, R. Kim, S.-H. Kim, Expression of soluble recombinant proteins in a cell-free system using a 96-well format, J. Biochem. Biophys. Methods 55 (2003) 233–240. [13] T.D. Parks, K.K. Leuther, E.D. Howard, S.A. Johnston, W.G. Dougherty, Release of proteins and peptides from fusion proteins using a recombinant plant virus proteinase, Anal. Biochem. 216 (1994) 413– 417. [14] A.S. Spirin, V.I. Baranov, L.A. Ryabova, S. Odovov, Y.B. Alakhov, A continuous cell-free translation system capable of producing polypeptides in high yield, Science 242 (1988) 1162–1164. [15] T. Kigawa, T. Yabuki, Y. Yoshida, M. Tsutsui, Y. Ito, T. Shibata, S. Yokoyama, Cell-free production and stable-isotope labeling of milligram quantities of proteins, FEBS Lett. 442 (1999) 15–19. [16] K.A. Martemyanov, A.S. Spirin, A.T. Gudkov, Direct expression of PCR products in a cell-free transcription/translation system: synthesis of antibacterial peptide cecropin, FEBS Lett. 414 (1997) 268–270. [17] D.E. Lanar, K.C. Kain, Expression-PCR (E-PCR): overview and applications, PCR Meth. Appl. 4 (1994) 92–96.
605
[18] J.-M. Betton, Rapid Translation System (RTS): a promising alternative for recombinant protein production, Curr. Protein Pept. Sci. 4 (2003) 73–80. [19] E.A. Burks, G. Chen, G. Georgiou, B.L. Iverson, In vitro scanning saturation mutagenesis of an antibody binding pocket, Proc. Natl. Acad. Sci. USA 94 (1997) 412–417. [20] M.H. de Smit, J. van Duin, Control of prokariotic translation initiation by mRNA secondary structure, Prog. Nucleic Acid Res. Mol. Biol. 38 (1990) 1–35. [21] D. Voges, M. Watzele, C. Nemetz, S. Wizemann, B. Buchberger, Analyzing and enhancing mRNA translational efficiency in an Escherichia coli in vitro expression system, Biochem. Biophys. Res. Commun. 318 (2004) 601–614. [22] J.-M. Betton, Cloning vectors for expression-PCR products, Bio Techniques 37 (2004) 346–347. [23] F. Baneyx, Recombinant protein expression in Escherichia coli, Curr. Opin. Biotechnol. 10 (1999) 411–421.