Protein Expression and PuriWcation 47 (2006) 562–570 www.elsevier.com/locate/yprep
High eYciency single step production of expression plasmids from cDNA clones using the Flexi Vector cloning system 夽 Paul G. Blommel, Peter A. Martin, Russell L. Wrobel, Eric SteVen, Brian G. Fox ¤ Department of Biochemistry, Center for Eukaryotic Structural Genomics, University of Wisconsin, Madison, WI 53706-1549, USA Received 21 October 2005, and in revised form 10 November 2005 Available online 5 December 2005
Abstract The success of structural genomics and proteomics initiatives is dependent on the availability of target genes in vectors suitable for protein production. Here, we compare two high-throughput methods for producing expression vectors from plasmid-derived cDNA fragments. Expression vectors were constructed for compatibility with the Gateway recombination cloning system and the Flexi Vector restriction-based cloning system. Cloning protocols for each system were conducted in parallel for 96 diVerent target genes from PCR through the production of sequence-veriWed expression clones. The short nucleotide sequences required to prepare the target open reading frames for Flexi Vector cloning allowed a single-step PCR protocol, resulting in fewer mutations relative to the Gateway protocol. Furthermore, through initial cloning of the target open reading frames directly into an expression vector, the Flexi Vector system gave time and cost savings compared to the protocol required for the Gateway system. Within the Flexi Vector system, genes were transferred between four diVerent expression vectors. The eYciency of gene transfer between Flexi Vectors depended on including a region of sequence identity adjacent to one of the restriction sites. With the proper construction in the Xanking sequence of the vector, gene transfer eYciencies of 95–98% were demonstrated. © 2005 Elsevier Inc. All rights reserved. Keywords: High-throughput cloning; Proteomics; Expression vectors
The large number of gene sequences now available has led to proteome-scale projects to evaluate protein structure and function [1–4]. Cloning of target genes into expression vectors is an essential part of these eVorts [5]. Many structural genomics and proteomic groups use the recombinationbased Gateway system [6–9] or ligation-independent cloning to produce expression clones [10,11]. As part of the Protein Structure Initiative, the Center for Eukaryotic Structural Genomics (CESG)1 previously developed and applied a twostep Gateway protocol to evaluate »3500 target genes [6]. As part of ongoing eVorts, we are interested in evaluating other cloning systems. Flexi Vector, a restriction 夽
This work was supported by the National Institutes of Health, Protein Structure Initiative Grant P50 GM-64598. * Corresponding author. Fax: +1 608 262 3453. E-mail address:
[email protected] (B.G. Fox). 1 Abbreviation used: CESG, Center for Eukaryotic Structural Genomics. 1046-5928/$ - see front matter © 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.pep.2005.11.007
enzyme cloning system, apparently oVered the advantages of high-throughput cloning of PCR products directly into an expression vector and serial transfer from the Wrst, sequence-veriWed expression vector to others. In this work, we compare the sub-cloning of 96 human target genes from a plasmid cDNA source using these two cloning systems and the eYciency of transfer of the coding sequences between diVerent Flexi Vectors. Materials and methods Materials Flexi Vector reagents, high concentration T4 DNA ligase, PCR clean-up kits (SV96 and Magnesil), Select 96 competent cells, PCR Master Mix, Magnebot II magnetic bead separation block, expression vectors pF1K, pF6A, and pF6K, and DNA molecular weight markers were from
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
Promega (Madison, WI). Gateway reagents, Escherichia coli Top 10 competent cells, and 2% Agarose Egel 96 s were from Invitrogen (Carlsbad, CA). Oligonucleotide primers were from Integrated DNA Technologies (Coralville, IA). The ExSite PCR mutagenesis kit and Yield Ace polymerase were from Stratagene (La Jolla, CA). CircleGrow medium was from Q-Biogene (Irvine, CA). Aeraseal gas permeable sealing tape was from Excel ScientiWc (Wrightwood, CA). Big Dye Version 3.1 sequencing reagents were from Applied Biosystems (Foster City, CA). Miscellaneous reagents were as previously reported [6]. Target genes A set of 96 target genes selected by CESG for Gateway cloning was used for the comparative study. The genes were obtained as full-length cDNA clones from Open Biosystems (Huntsville, AL). This set consisted of human ORFs 600–1200 nucleotides in length. The target genes are listed in the Supplementary data. Vectors Features of CESG expression vectors are reported elsewhere [6]. Standard molecular cloning techniques [12] were used unless speciWed. pVP33K and pVP33A were produced from pVP16 by several steps. These included: addition of NsiI and PacI sites Xanking the solubility domain by ExSite PCR mutagenesis; modiWcation of the linker to include cleavage sites for the 3C and TEV proteases, the tetraCys motif, and the SgfI and PmeI cloning sites also by using the ExSite system; creation of kanamycin (pVP33K) and ampi-
cillin (pVP33A) resistance cassettes Xanked by AvrII and BsiWI, and insertion of these cassettes by digestion and ligation into a PCR ampliWed vector backbone modiWed to contain the AvrII and BsiWI sites Xanking the positions of the resistance genes; reintroduction of the MBP solubility domain by digestion and ligation; production of the Bar-CAT cassette by overlap extension PCR [13]; insertion of the Bar-CAT cassette into the SgfI and PmeI sites of pF1K by digestion and ligation; and transfer of the Bar-CAT cassette to pVP33A/K using PCR, restriction digest, and ligation. The coding region from promoter through terminator was sequenced after each step to verify mutagenesis and the lack of second site mutations. pVP33A shares 128 bp of sequence identity with pF6A/K 3⬘ to the PmeI site while pVP33K does not. This region of identity was used to reduce the frequency of double backbone ligation products through the formation of a non-replicating extensive DNA palindrome [14]. To include the region of identity, the Bar-CAT cassette was ampliWed out of pF1K using the T7 terminator primer and ligated into pVP33A without digesting the cassette with PmeI (Fig. 1). pEU-HisFlexi was produced by modifying pEU-His [15] to include the SgfI and PmeI sites followed by insertion of the Bar-CAT cassette including the 128 bp of 3⬘ identity. The parental pVP33K, pVP33A, and pEU-His-Flexi plasmids must be propagated in a barnase resistant strain (E. coli BR610, Promega). PCR primers For Flexi Vector cloning, target genes were ampliWed using a single-step PCR. The forward and reverse primers
Col E1
lacIq
pVP16 7631 bp
lacIq
Kanr/ Ampr
λ t0 Terminator
T5 Promoter
pVP33K / pVP33A 7122 / 7229 bp
BsiWI
λ t0 Terminator
T5 Promoter
Coding Region MBP
N-terminal fusions
Linker attB1
ccdB
3’ MCS Homology (pVP33A Only)
Coding Region NsiI
His8
Col E1
AvrII
Ampr
563
CAT
Gateway cassette
attB2
His8
PacI MBP
SgfI
3C C4 TEV
N-terminal fusions
PmeI
Barnase
CAT
Bar-CAT cassette
Fig. 1. Vectors used in this study. Vectors were derived from pQE80 (Qiagen, Valencia, CA) and contain a ColE1 origin. After cloning, the target gene sequence replaces the Gateway or Bar-CAT cassette for pVP16 and pVP33K or pVP33A, respectively. Both vectors contain N-terminal fusions to allow standardized puriWcation (His8 tag) and increased target protein solubility (MBP). Target proteins may be liberated from the N-terminal fusions using TEV protease. For pVP16, the TEV protease recognition site is incorporated during ampliWcation of the target gene using the two-step PCR protocol. pVP33K and pVP33A contain a vector encoded TEV recognition site. pVP33K and pVP33A also contain a tetraCys motif (labeled C4 in the Wgure) for site-speciWc Xuorescent labeling of fusion protein and a 3C protease recognition site. Restriction sites in pVP33K and pVP33A allow switching of the solubility enhancing tag (NsiI and PacI) or antibiotic resistance (AvrII and BsiWI). The diVerences in total sizes of pVP33K and pVP33A arise from the sum of diVerences in the region designated 3⬘ MCS homology (present in pVP33A only) and in the sizes of the genes encoding the respective antibiotic resistances.
564
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
were 28–36 nucleotides in length. The gene-speciWc portion included 14–23 nucleotides that exactly matched the target gene beginning at the second codon. Whenever possible, the gene-speciWc primers ended with a C or G nucleotide to enhance DNA polymerase initiation. The invariant sequences 5⬘-GGTTgcgatcgcC-3⬘ (including an SgfI site, lowercase) and 5⬘-GTGTgtttaaacCTA-3⬘ (including a PmeI site) were added to the forward and reverse primers, respectively. Four additional nucleotides were added to promote restriction of the PCR products. A two-step PCR procedure was used for Gateway cloning [6]. The Wrst PCR forward and reverse primers were 32–41 nucleotides in length and contained the same genespeciWc nucleotides as the Flexi Vector primers. An invariant sequence was added to the Wrst PCR forward primers to encode most of the TEV protease site while the Wrst PCR reverse primer matched the gene including the stop codon and added a portion of the attB2 site. For the second PCR, universal primers [6] were used to add the nucleotides required to complete the TEV, attB1, and attB2 sites (85 additional nucleotides in total).
volume was reduced to 75 L to increase the concentration of the eluted DNA. For the Magnesil clean-up, 10 L of the restriction digest was added to a 96-well PCR plate containing 10 L of Magnesil yellow. A Magnebot II plate was used to separate the Magnesil particles from the liquid. After removal of the supernatant, the Magnesil particles were washed once with 20 L of wash solution supplied by the manufacturer and two times with 30 L of 80% ethanol. After the Wnal wash step, the plate was incubated for a few minutes on a 42 °C block to evaporate residual ethanol. The DNA was then eluted in 10 L water. Ligation reactions were performed in a 96-well PCR plate using the restriction-digested and puriWed DNA products. Individual reactions contained 5 L of PCR product, 2 L of pVP33K digest, and 3.5 L of a ligation master mix consisting of 1.0 L of 10£ ligase buVer, 0.5 L of T4 DNA ligase (10 U), and 2.0 L water. The ligation reactions were incubated at 25 °C for 3.5 h and then 1 L was used to transform 15 L of Select96 competent cells. The transformed cells were plated on LB agar plates supplemented with 4 g/L glucose containing 50 g/mL of kanamycin.
PCR ampliWcation
Transfer of genes between Flexi Vectors
For the Flexi Vector system, PCRs were assembled from 1 L of cDNA plasmid containing the target gene, 10 pmol of forward and reverse primers, 2.5 U of Yield Ace polymerase, 5 L of 10£ Yield Ace buVer, and 10 nmol of each dNTP with the balance water in 50 L reactions. The PCR consisted of the following cycling parameters: 95 °C for 5 min then 5 cycles of 94 °C for 30 s, 50 °C for 30 s and 72 °C for 1 min 15 s, 25 cycles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 1 min 15 s plus a 10 s increase per cycle, then 72 °C for 30 min extension and 4 °C soak. Gateway PCR ampliWcation protocols were as previously described [6].
Gene coding regions were moved between Flexi Vector expression vectors by restriction digest and ligation using diVering antibiotic resistance for positive selection and the lethal barnase gene for selection against parental vector. Individual reactions contained 1 L of donor vector (»80– 100 ng) and 4 L of digestion master mix containing 1 L of 5£ FlexiDigest buVer, 0.05 L Flexi enzyme blend, and »30 ng of acceptor vector with the balance water. The digests then contained an insert to vector ratio of »3. Reactions were incubated at 37 °C for 100 min and heat inactivated at 65 °C for 20 min. For the pF6K to pEU-His-GW transfer, the 37 °C incubation was decreased to 40 min. To perform the ligation, 5.5 L of a ligation master mix consisting of 1.0 L of 10£ ligase buVer, 0.5 L of T4 DNA ligase (10 U), and 4.0 L water was added directly to the 5 L restriction digests. The ligation reactions were then incubated at 25 °C for 1–2 h before transforming JM109 or Select 96 competent cells with the ligation product.
Flexi Vector protocol for ORF capture Each restriction digestion reaction contained 6 L of the PCR product (no puriWcation was needed) and 29 L of a restriction master mix composed of 0.7 L of 10£ Flexi Vector enzyme blend, 7 L of 5£ Flexi Vector buVer, and 21.3 L water. The acceptor vector pVP33K was digested in bulk in a separate reaction containing 13.2 L (4 g) of vector, 4.4 L of 10£ Flexi Vector enzyme blend, 88 L of 5£ FlexiDigest buVer, and 335 L water. This provided enough digested vector to test the eVect of puriWcation by either Magnesil or SV96 on 96 PCR digest ligations. The PCR product was incubated at 37 °C for 100 min and held at 4 °C until the clean-up protocol was started. The pVP33K vector digest was incubated at 37 °C for 100 min then heat inactivated at 65 °C for 20 min. The restriction digests were puriWed using either the Wizard SV96 PCR Clean-up system or the Wizard Magnesil PCR Clean-up system. A 20-L aliquot of the restriction digest was treated using the SV96 system according to the manufacturer’s protocol with the exception that the elution
Gateway cloning The attB-Xanked PCR products were transferred to pDONR221 as in [6] except that the BP reactions were scaled down to 5 L total volume. Each BP reaction contained 1 L of PCR product and 4 L of a master mix consisting of 0.5 L of pDONR221 (»75 ng), 1 L of 5£ BP Clonase buVer, 1 L of BP Clonase, and 1.5 L of TE buVer, pH 8.0, for each reaction. The transformation, isolation of plasmid DNA, screening for inserts, and transfer to destination vector were as described previously with the exception that the LR reactions were decreased to 5 L total volume with all other components scaled accordingly.
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
Screening of transformants
A
565
B
Two diVerent single colonies were picked using micropipette tips and used to inoculate 1 mL of CircleGrow medium supplemented with 50 g/mL of kanamycin (Flexi Vector pVP33K and pF6K expression clones and Gateway pDONR221 Entry clones) or 100 g/mL ampicillin (pVP33A, pF6A, pVP16, and pEU-His-GW expression clones) in 96-well growth blocks. The growth blocks were covered with Aeraseal gas permeable tape, incubated overnight at 37 °C with moderate shaking and collected by centrifugation. Plasmid DNA was puriWed using a BioRobot 800 and the QIAprep 96 Turbo BioRobot kit. A PCR ampliWcation step was used to screen for target genes in Flexi Vector expression clones by reusing the same genespeciWc primers used for Wrst ampliWcation from cDNA clones. A similar PCR ampliWcation step was used to check for the presence of the target gene for the Gateway system [6] with the exception that universal primers that annealed to the vector were used. In both cases, the cycling parameters for screening were 95 °C for 5 min then 20 cycles of 94 °C for 30 s, 50 °C for 30 s, and 72 °C for 75 s then 72 °C for 10 min and then 4 °C soak. Results Vectors Fig. 1 shows expression vectors created for protein production at CESG. The Gateway plasmid pVP16 provides a fusion of His8-MBP with the target protein that can be separated by tobacco etch virus (TEV) protease [16,17]. The lethal ccdB gene in the pVP16 Gateway Cassette provides negative selection against vector background during cloning while the chloramphenicol acetyltransferase gene (CAT) allows for positive selection during propagation in ccdB resistant strains. The Flexi Vector plasmids pVP33K and pVP33A are identical to each other with two exceptions; the antibiotic resistance gene (either ampicillin for pVP33A or kanamycin for pVP33K) and a 128 bp sequence in pVP33A corresponding to the multiple cloning site of several Promega vectors. Proteins expressed from pVP33K and pVP33A also yield a His8-MBP fusion but diVer from the fusion protein expressed from pVP16 by containing a modiWed linker between the C-terminus of MBP and the N-terminus of the target. This linker contains a tetraCys motif for Xuorescent labeling [18,19] Xanked by human rhinovirus 14 3C and TEV protease cleavage sites to facilitate studies of proteolysis by Xuorescence polarization [18,19]. Unique restriction sites are present in pVP33A and pVP33K to allow switching of the antibiotic resistance and solubility domains. The Bar-CAT cassette in pVP33 provides a CAT gene to select for the presence of the cassette during construction and propagation of the vector and the lethal barnase gene to select against the parental plasmid during cloning.
Fig. 2. Comparison of workXow for Gateway and Flexi Vector Cloning procedures. The timeline listed on the right is an indication of the schedule achieved during the comparison of the two protocols. The three days for sequence veriWcation include the turnaround time for sequencing (typically two days).
PCR ampliWcation Fig. 2A shows the Flexi Vector PCR ampliWcation results as analyzed using 2% Egel 96s. Table 1 shows that single pass PCR ampliWcation gave comparable frequency of success (92 PCR+ for Flexi Vector and 91 PCR+ for Gateway). The single clone that did not amplify initially in the Gateway system was successfully ampliWed by rework eVort using additional PCR cycles. The four remaining clones that did not amplify in either system were subsequently shown to be diVerent cDNA clones than those originally requested from the supplier. Gene capture protocols We Wrst evaluated each step of the original Flexi Vector manufacturer protocol using small groups of target genes. From this, a revised protocol was assembled to require
566
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
Table 1 Comparison of single pass cloning results for Flexi Vector and Gateway
PCR (+) Entry vector colonies screened Entry vector colonies screening (+)a Entry vector targets recovered Destination/expression vector colonies screened Destination/expression vector colonies screening (+) Destination/expression vector targets recovered a b
Gateway
Flexi Vector SV clean-up
Flexi Vector Magnesil clean-up
91 182 175 (96%) 90 180 174 (97%) 90
92 NNb NN NN 184 179 (97%) 92
92 NN NN NN 184 178 (97%) 90
Plasmids were isolated from individual colonies and screened for the presence of insert by agarose gel electrophoresis of PCR screening reactions. NN, not necessary because these steps are not part of the protocol.
A Row A 1-12
Row B 1-12
Row C 1-12
Row D 1-12
Row E 1-12
Row F 1-12
Row G 1-12
Row H 1-12
Row A 1-12
Row B 1-12
Row C 1-12
Row D 1-12
Row E 1-12
Row F 1-12
Row G 1-12
Row H 1-12
B
The transformed cells were incubated overnight on agar plates. Two colonies were screened for inserts. Fig. 2B shows a typical screening PCR, in this case from the PCR product digests treated by the Magnesil cleaned up method. These results can be compared to the initial PCR ampliWcation (Fig. 2A). All 92 PCR+ targets were recovered in the expression vector pVP33K (Fig. 1) from the Flexi Vector system after using SV96 clean-up, while 90 were recovered after using Magnesil. The overall success rate for obtaining clones was virtually identical (179 for SV96 vs. 178 for Magnesil out of 184 colonies picked; the two targets missed with the Magnesil clean-up method were missed in both picks, suggesting a liquid handling failure). In light of the similar behavior of these two clean-up methods, we chose to incorporate the Magnesil method into our standard Flexi Vector cloning protocol as it was cheaper and the DNA elution volume could be scaled down to increase the concentration of digested PCR product for ligation. In the Gateway protocol (Fig. 3), the initial PCR product was used in a BP reaction and entry clones were obtained for 90 of 91 PCR+ targets after screening two colonies. Sequencing results
Fig. 3. Agarose gel analysis of initial PCR ampliWcation and clone screening. Panel A shows PCR ampliWcation products prior to Flexi Vector cloning. Panel B is an example screening of Flexi Vector expression clones for target genes ligated into pVP33K after Magnesil cleanup of restriction digests. Wells B1, D1, E2, and G8 failed to produce an initial PCR product while wells G1, H7, and H8 successfully ampliWed but did not contain target genes in this screen.
fewer steps, use smaller volumes, allow pre-mixture of key reagents, and substitute cheaper reagents. This protocol is shown in Fig. 3 and is compared to our previously established Gateway protocol [6]. These two protocols were then applied in side-by-side cloning of 96 targets from human whose properties are more fully elaborated in the Supple mentary data. Preliminary work on a small set of clones established the potential to eliminate the initial PCR clean-up step in the Flexi Vector system without decreasing the insert frequency. However, clean-up of the restriction enzyme digest prior to ligation was found to be essential, and the SV96 and Magnesil clean-up methods were tested. After ligation of the puriWed PCR product digestions and parental vector, the ligation mixture was used to transform Select 96 E. coli.
Table 2 shows an analysis of the cloned gene sequences. The clones acceptable for protein production were scored sequence (+) (complete agreement with a published nucleotide sequence), silent (nucleotide change but identical Table 2 Comparison of single pass sequencing results from Flexi Vector and Gateway Sequencing resulta
Gateway entry clone
Flexi Vector expression clone in pVP33K
Sequence (+) Missense Silent Sequence (¡) No clone [PCR (¡)]
71 11 5 4 5
78 6 3 5 4
Total
96
96
a
Sequence (+) clones matched published sequences for the target gene exactly. Missense and silent mutations diVered in nucleotide sequence from the published sequences only by substitution of nucleotides. Sequence (¡) clones contained fatal errors, including primer deletions causing a frameshift, nucleotide substitutions resulting in a premature stop codon, or loss of the PmeI site due to cloning artifacts.
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
amino acid sequence) or missense (nucleotide changes leading to one or two amino acid substitutions). In the two-step Gateway protocol, originally developed for capture of low abundance mRNAs [6], 50 PCR cycles were used to add the recombination and TEV cleavage sites (52 and 33 nucleotides from the 5⬘ and 3⬘ primers, respectively) to the genespeciWc sequences of the primers. In contrast, the restriction sites required for the Flexi Vector protocol could be added in a single PCR (13 and 15 additional nucleotides from the 5⬘ and 3⬘ primers) and 30 PCR cycles were suYcient. Table 2 shows that the additional PCR cycles for the Gateway system resulted in more missense and silent mutations compared to the Flexi Vector. Both systems yielded similar frequencies of sequence (¡) clones, which included any DNA change that would render a protein unsuitable for expression. The majority of sequence (¡) clones (seven out of nine) had deletions arising from errors in primer synthesis that led to frame shifts. One sequence (¡) clone had a missense mutation that inserted a premature stop codon in the gene, and the remaining sequence (¡) clone had an incomplete PmeI site. Transfer of sequence-veriWed genes to other plasmids To produce an expression clone in the Gateway system, sequence-veriWed entry clones are subjected to a second recombination reaction (LR reaction) that transfers the gene into a destination vector. By screening two colonies of LR reaction products, all of the target genes were recovered as expression clones with 97% of the total number of colonies screened containing the target gene (Table 1). Although transfer of the target gene is not required to produce an expression clone in the Flexi Vector system, it is often advantageous to have the Xexibility to transfer the target gene into other expression contexts. In Flexi Vector, positive selection for transfer occurs by alternating the antibiotic resistance of the donor and acceptor plasmids. We studied the eYciency of transfers from our initial His8-
567
MBP expression vector, pVP33K (kanamycin selection), to pVP33A (ampicillin selection) and from these two vectors to the commercially available vectors pF6A and pF6K, respectively. A previous study on palindromic DNA found that plasmids containing long, continuous palindromes could not be replicated in E. coli [14]. Thus, intentional inclusion of an identical region adjacent to either the SgfI or PmeI sites in the Flexi Vector backbone should result in a long palindrome and prevent replication of faulty ligation products that do not contain the target genes. As a part of this work, we studied transfers between vectors containing diVerent lengths of sequence identity to more fully assess the eVect that the region of identity has on transfer success. Fig. 4 summarizes the results obtained for transfers between several diVerent Flexi Vector plasmids. Among these, pVP33K and pVP33A have »1400 bp of identity 5⬘ to the SgfI site. For the transfer from pVP33K to pVP33A, 184 colonies were screened for inserts and 181 contained the correct gene, corresponding to 98.4% success in transfer. One target gene accounted for two of the failures and thus it is possible that a pipetting error was responsible. When transfer of this clone was repeated, both colonies picked for analysis screened positive for the correct insert. All clones obtained by transfer to pVP33A were sequenced to determine the Wdelity of the transfer and, importantly, the integrity of the PmeI site. Of the 184 sequences analyzed, all but one contained functional PmeI sites. Since no PCR steps were involved in the transfers, mutations in the coding regions were not expected and indeed, no changes in amino acid coding were found. The transfers from pVP33A to pF6K and from pVP33K to pF6A provide an opportunity to investigate the consequences that a relatively short 3⬘ palindrome has on cloning eYciency. Thus, pVP33A and pF6K have 128 bp of identity 3⬘ to the PmeI site, while pVP33K and pF6A have no sequence identity 3⬘ to the PmeI site. For the transfer from pVP33A to pF6K, a transfer eYciency of 95% was observed. In contrast, transfer from pVP33K to pF6A gave
Fig. 4. Analysis of success rates for gene capture and transfer steps in Flexi Vector and Gateway cloning. For gene capture the eYciencies represent the successful insert rate obtained through the picking of a single clone for each target gene excluding unacceptable sequence problems. Transfer eYciencies represent the single clone success rate of target gene transfer. Sequence identity within the Flexi Vectors was either 5⬘ to the SgfI site (pVP33K to pVP33A) or 3⬘ to the PmeI site (pVP33A to pF6K and pF6K to pEU-His-Flexi).
568
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
a lower 82% eYciency of transfer. Sequencing conWrmed that most of the pVP33K to pF6A cloning failures could be attributed to growth of colonies that contained the pVP33K and pF6A backbones ligated at the PmeI and SgfI sites. The transfer of target genes from pF6K to pEU-HisFlexi provides an additional support that 128 bp of identity is of suYcient length to prevent growth of double backbone ligation products. In this case, 97% of the clones screened contained the correct target gene in a vector suitable for production of protein in a cell-free wheat germ extract [15]. Discussion High-throughput cloning High-throughput cloning requires high Wdelity PCR ampliWcation from diVerent DNA preparations (i.e., tissuederived cDNA libraries, plasmids containing cDNA fragments, or genomic DNA). After ampliWcation, methods to eYciently capture the PCR product are required. Sequence veriWcation is also essential to eliminate uncertainties arising from eukaryotic gene models and PCR artifacts. After the expense of sequence veriWcation has been incurred, the ability to perform high Wdelity serial transfer of the gene into other expression contexts is desirable from both Wnancial and experimental viewpoints. A recent review article lists these characteristics [5]. In a high-throughput environment, evaluation of the single pass execution of a protocol is a highly relevant performance metric. Consequently, results from this type of evaluation are presented here, along with indications of whether rework eVorts can eYciently recover failures. In this side-by-side comparison, we have found that the restriction-based Flexi Vector system performed as well or better than the Gateway system. Thus, Table 1 supports the conclusion that Flexi Vector can be executed as a highthroughput cloning platform. When targets fail along a production pipeline, signiWcant time and eVort can be needed to separately rework them. In this regard, we note that by repeating the single main protocol step where a target failed, clones suitable for protein expression testing were obtained from both cloning systems for all 92 targets in the initial expression vector set (excluding the four targets veriWed to be an incorrectly shipped cDNA). For the Flexi Vector system, the only rework was sequencing of additional clones obtained from the initial capture into an expression vector, while for the Gateway system, rework was required at both the entry and destination cloning steps. Gateway system The Gateway system has been well established for highthroughput cloning. It provides the ability to use a single entry clone as a source of DNA to subclone into a variety of diVerent destination vectors. Since this system has been
in development for some time, there are many destination vector possibilities. At CESG, Gateway was used to create »3500 His6 or His8-MBP fusion proteins that leave only one Ser residue on the N-terminal of a target protein after TEV protease treatment. On average, this Gateway protocol took 12 days from the initial PCR to the availability of sequence-veriWed expression clones and provided a greater than 95% probability for obtaining satisfactory expression clones. In addition to labor, the other major costs of the Gateway protocol are primer synthesis, sequencing, and the recombination reagents. Several properties of the CESG Gateway cloning protocol prompted this investigation. First, the Gateway system requires an entry vector and subsequent cloning into a destination vector before expression testing. This increases the time required and the associated reagent costs. Second, the attB1 recombination site adds nine residues to the N-terminus of the target protein (Thr-Ser-Leu-Tyr-Lys-LysAla-Gly-Ser), which demanded that a protease cleavage site must also be included in the primers used to clone a target gene [6,8]. Consequently, CESG needed a two-step PCR with two sets of primers to create these extra sites. The two-step process introduces the possibility of sequence errors from primer synthesis and from PCR. Finally, others have begun to introduce concerns that the recombination site sequences may interfere with transcription or translation [8]. Flexi Vector system The Flexi Vector system allowed high frequency cloning directly into an expression vector (Table 1). The protocol developed at CESG took eight days from the Wrst PCR step to the availability of sequence-veriWed clones suitable for expression testing. The pVP33 plasmids we developed for Flexi Vector cloning have the TEV protease site incorporated into the vector backbone. This design allowed the use of a single set of primers containing only the SgfI and PmeI sites along with the gene-speciWc nucleotides. Since a single step PCR can be used with these primers, fewer ampliWcation cycles are needed. Consequently, the frequency of missense clones was lower for Flexi Vector than for Gateway (Table 2). When using plasmid-derived cDNA templates as in this study, it is likely that the number of PCR cycles could be reduced for both the Gateway (50 cycles) and Flexi Vector (30 cycles) protocols. This would further reduce the frequency of mutations caused by PCR. Reduction of the number of PCR cycles would not likely be possible for cloning directly from tissue-derived cDNA pools [6]. Our Flexi Vector protocol for initial gene capture is somewhat more complicated than the Gateway BP reaction, but the initial vector obtained from the Flexi Vector method can be directly used for gene expression. By considering that an additional transfer step, transformant growth, and plasmid screening are required to obtain a veriWed expression clone, the overall number of steps, time, and
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
expense to obtain an expression clone is higher for the Gateway system. Suitability of restriction cloning The presence of a required restriction site within a target gene is an inherent problem of restriction-based cloning. No prescreening was done to eliminate targets with either SgfI or PmeI sites for this work. If these sites were present, and if interest in the target was suYciently high, an alternative cloning method would be needed. Nevertheless, genome-scale restriction mapping shows that the SgfI and PmeI combination would allow cloning of 98.9% of all human genes, 98.9% of mouse, 98.8% of rat, 98.5% of Caenorhabditis elegans, 97.8% of zebra Wsh, 97.6% of Arabidopsis, and 97.0% of yeast genes, suggesting broad overall utility (M. Slater, E. Strauss, unpublished data). Design of 3⬘ sequence in Flexi Vector plasmids Initial analysis of the Flexi Vector transfers showed that recovered clones that lacked the target gene arose almost exclusively from self-ligation of the vector backbone through theSgfI and PmeI sites. This result could be determined by growth of transformants on agar plates containing both ampicillin and kanamycin. Greater than 90% of the clones that failed to contain target genes exhibited growth on the double antibiotic plates. This problem can be eliminated by including a region of sequence identity adjacent to either the 3⬘ or 5⬘ end of the cloning cassette so that the compatible ends will form long palindromes from selfligation of the plasmid backbones [14]. Hisaki and coworkers showed that a palindrome of 95 bp allowed replication while palindromes greater than 184 bp in length prevented it. Our tests using »1400 and 128 bp palindromes eVectively eliminated the double backbone ligation products. In contrast, the absence of a region of 3⬘ identity increased the frequency of the double backbone ligation product to unsatisfactory levels. These results indicate inclusion of a region of either 3⬘ or 5⬘ sequence identity should be an essential design element of vectors being customized for the Flexi Vector system. Screening for inserts No problems were identiWed with the Wdelity of the 5⬘ SgfI site during our sequencing eVorts. However, heterogeneity near the PmeI site was identiWed in 3 out of 106 sequences analyzed after the initial gene capture step. Since PmeI restriction leaves a blunt end in the vector, it is possible to clone a 3⬘ blunt-ended PCR product processed by SgfI but not by PmeI. This result can be recognized by changes in the nucleotide sequence in the 3⬘ region. One example is the sequence ƒTAGgtttaaacACACAAACƒ, where the bold nucleotides are from the PCR product, the lower case nucleotides are from the primer encoded PmeI site, the underlined nucleotides are extra due to lack of PmeI
569
restriction of the PCR product, and the italicized nucleotides are from the vector. Importantly, this addition does not alter the protein expressed from pVP33 or the ability to transfer the gene to a diVerent expression vector. In one case out of the 106 clones analyzed after gene capture, a fragment that lacked a functional PmeI site was cloned. Thus, on occasion (<1% of clones) the target may not be readily transferable to another expression vector. In addition, when PCR products are used without removing the polymerases, as in the protocols reported here, it is important to avoid a heat inactivation step because residual endonuclease activity of the thermostable polymerases can lead to trimming of the PCR products after restriction and a higher frequency of blunt end cloning with a non-functional PmeI site. In principle, restriction-based screening could identify these rare non-functional PmeI sites. However, our experience with Flexi Vector indicates that using PCR with genespeciWc primers is more eVective as a single step to screen for inserts than using restriction digestion mapping to verify the restriction sites. This is due to the occasional presence of E. coli genomic DNA or barnase mutants cloned into the SgfI and PmeI sites. Although the PCR screening does not change the frequency of cloning these artifacts (overall rare, <3% of clones), it does prevent a spurious insert with a size similar to the target gene from being misidentiWed and carried forward. Expressed protein One consequence of the Flexi Vector cloning approach presented here is that three non-native residues (Ala-IleAla) remain on the target protein after liberation from the fusion with TEV protease. A comparison of the potential consequences of these residues on total expression, solubility, and proteolysis is in progress using control sets assembled from already characterized CESG clones. It is also possible to assemble a Flexi Vector cloning approach that gives the same N-terminal Ser as currently given with pVP16 by encoding the TEV protease sequence in the primers used to clone the gene. This would likely result in the necessity for a two-step PCR process, but would yield the minimal modiWcation possible to the N-terminal of a target after proteolytic liberation from a fusion protein. Transfer of genes between vectors The Flexi Vector system also oVers the potential for serial transfers of sequence-veriWed target genes from one vector to another using alternating antibiotic resistance to select for appropriate clones. The number of vector alternatives is currently less than that for Gateway, but the results presented here for transfers between pVP33, pF6, and pEUHis-Flexi vectors provide some of the Wrst examples. Additional development eVorts will expand these oVerings. With the inclusion of 3⬘ sequence identity in the design strategy
570
P.G. Blommel et al. / Protein Expression and PuriWcation 47 (2006) 562–570
for producing vector pairs, a transfer eYciency of greater than 97% can be obtained with no alterations in the coding sequence of the target gene. Considering the advantages we found in using the restriction endonuclease based Flexi cloning system compared to the recombination based Gateway system, it is reasonable to ask why restriction-based cloning methods have not gained wide acceptance as high-throughput methods for producing expression clones. A robust implementation of high-throughput, restriction-based cloning requires that several criteria be met. The restriction recognition sites cannot be present in the genes to be cloned so the recognition sites must be infrequent. The recombinase enzymes must be active and contain negligible star activity under identical reaction conditions. Cloning must be directional and place the target gene in the correct reading frame. Negative selection against parental vector using a lethal gene and positive antibiotic selection for transfer between expression vectors is necessary to eliminate undesirable cloning outcomes. A high frequency of double backbone ligation products during transfer cannot be tolerated. Protocols that have been streamlined to remove unnecessary liquid handling steps are essential for both robotic and manual implementation of the protocols. To our knowledge, no prior restriction-based cloning systems have met these requirements.
[3]
[4]
[5]
[6]
[7] [8] [9]
[10]
[11]
Acknowledgments [12]
This work was supported by a sponsored research agreement from Promega Corporation to B.G.F. and by the National Institutes of Health, Protein Structure Initiative Grants P50 GM-64598 and U54 GM074901 (J.L. Markley, PI; G.N. Phillips, Jr., Co-Investigator; B.G.F.; Co-Investigator). B.G.F. is a consultant to Promega. Appendix A. Supplementary data
[13]
[14]
[15]
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.pep. 2005.11.007. References [1] R. Service, Structural biology, Structural genomics, round 2, Science 307 (2005) 1554–1558. [2] J. Badger, J.M. Sauder, J.M. Adams, S. Antonysamy, K. Bain, M.G. Bergseid, S.G. Buchanan, M.D. Buchanan, Y. Batiyenko, J.A. Christopher, S. Emtage, A. Eroshkina, I. Feil, E.B. Furlong, K.S. Gajiwala, X. Gao, D. He, J. Hendle, A. Huber, K. Hoda, P. Kearins, C. Kissinger, B. Laubert, H.A. Lewis, J. Lin, K. Loomis, D. Lorimer, G. Louie, M. Maletic, C.D. Marsh, I. Miller, J. Molinari, H.J. Muller-Dieckmann, J.M. Newman, B.W. Noland, B. Pagarigan, F. Park, T.S. Peat, K.W. Post, S. Radojicic, A. Ramos, R. Romero, M.E. Rutter, W.E. Sanderson, K.D. Schwinn, J. Tresser, J. Winhoven, T.A. Wright, L. Wu, J. Xu,
[16]
[17]
[18] [19]
T.J. Harris, Structural analysis of a set of proteins resulting from a bacterial genomics project, Proteins 60 (2005) 787–796. T.B. Acton, K.C. Gunsalus, R. Xiao, L.C. Ma, J. Aramini, M.C. Baran, Y.W. Chiang, T. Climent, B. Cooper, N.G. Denissova, S.M. Douglas, J.K. Everett, C.K. Ho, D. Macapagal, P.K. Rajan, R. Shastry, L.Y. Shih, G.V. Swapna, M. Wilson, M. Wu, M. Gerstein, M. Inouye, J.F. Hunt, G.T. Montelione, Robotic cloning and protein production platform of the northeast structural genomics consortium, Methods Enzymol 394 (2005) 210–243. Y. Kim, I. Dementieva, M. Zhou, R. Wu, L. Lezondra, P. Quartey, G. Joachimiak, O. Korolev, H. Li, A. Joachimiak, Automation of protein puriWcation for structural genomics, J. Struct. Funct. Genomics. 5 (2004) 111–118. G. Marsischky, J. LaBaer, Many paths to many clones: a comparative look at high-throughput cloning methods, Genome Res. 14 (2004) 2020–2028. S. Thao, Q. Zhao, T. Kimball, E. SteVen, P.G. Blommel, M. Riters, C.S. Newman, B.G. Fox, R.L. Wrobel, Results from high-throughput DNA cloning of Arabidopsis thaliana target genes using site-speciWc recombination, J. Struct. Funct. Genomics. 5 (2004) 267–276. S. Yokoyama, Protein expression systems for structural genomics and proteomics, Curr Opin Chem Biol 7 (2003) 39–43. J.M. Betton, High-throughput cloning and expression strategies for protein production, Biochimie 86 (2004) 601–605. M.A. Brasch, J.L. Hartley, M. Vidal, Orfeome cloning and systems biology: Standardized mass production of the parts from the partslist, Genome Res 14 (2004) 2001–2009. L. Dieckman, M. Gu, L. Stols, M.I. Donnelly, F.R. Collart, Highthroughput methods for gene cloning and expression, Protein Expr. Purif. 25 (2002) 1–7. L. Stols, M. Gu, L. Dieckman, R. RaVen, F.R. Collart, M.I. Donnelly, A new vector for high-throughput, ligation-independent cloning encoding a tobacco etch virus protease cleavage site, Protein Expr. Purif. 25 (2002) 8–15. J. Sambrook, D.W. Russell, in: Molecular Cloning: A Laboratory Manual, vol. 3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2001, pp. 15.44–15.48. S.N. Ho, H.D. Hunt, R.M. Horton, J.K. Pullen, L.R. Pease, Sitedirected mutagenesis by overlap extension using the polymerase chain reaction, Gene 77 (1989) 51–59. H. Yoshimura, T. Yoshino, T. Hirose, Y. Nakamure, M. Higashi, T. Hase, K. Yamaguchi, H. Hirokawa, Y. Masamune, Biological characteristics of palindromic DNA (ii), J. Gen. Appl. Microbiol. 32 (1986) 393–404. R.C. Tyler, D.J. Aceti, C.A. Bingman, C.C. Cornilescu, B.G. Fox, R.O. Frederick, W.B. Jeon, M.S. Lee, C.S. Newman, F.C. Peterson, G.N. Phillips Jr., M.N. Shahan, S. Singh, J. Song, H.K. Sreenath, E.M. Tyler, E.L. Ulrich, D.A. Vinarov, F.C. Vojtik, B.F. Volkman, R.L. Wrobel, Q. Zhao, J.L. Markley, Comparison of cell-based and cellfree protocols for producing target proteins from the Arabidopsis thaliana genome for structural studies, Proteins 59 (2005) 633–643. R.B. Kapust, J. Tozser, J.D. Fox, D.E. Anderson, S. Cherry, T.D. Copeland, D.S. Waugh, Tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proWciency, Protein Eng. 14 (2001) 993–1000. R.B. Kapust, J. Tozser, T.D. Copeland, D.S. Waugh, The p1⬘ speciWcity of tobacco etch virus protease, Biochem. Biophys. Res. Commun. 294 (2002) 949–955. P.G. Blommel, B.G. Fox, Fluorescence anisotropy assay for proteolysis of speciWcally labeled fusion proteins, Anal. Biochem. 336 (2005) 75–86. S.R. Adams, R.E. Campbell, L.A. Gross, B.R. Martin, G.K. Walkup, Y. Yao, J. Llopis, R.Y. Tsien, New biarsenical ligands and tetracysteine motifs for protein labeling in vitro and in vivo: synthesis and biological applications, J. Am. Chem. Soc. 124 (2002) 6063–6076.