Production of protein complexes via co-expression

Production of protein complexes via co-expression

Protein Expression and Purification 75 (2011) 1–14 Contents lists available at ScienceDirect Protein Expression and Purification journal homepage: www...

929KB Sizes 16 Downloads 77 Views

Protein Expression and Purification 75 (2011) 1–14

Contents lists available at ScienceDirect

Protein Expression and Purification journal homepage: www.elsevier.com/locate/yprep

Review

Production of protein complexes via co-expression John J. Kerrigan a, Qing Xie b, Robert S. Ames a, Quinn Lu a,⇑ a b

Biological Reagents & Assay Development, Platform Technology & Science, GlaxoSmithKline R&D, 1250 South Collegeville Road, Collegeville, PA 19426, USA Computational Biology, GlaxoSmithKline R&D, 709 Swedeland Road, King of Prussia, PA 19406, USA

a r t i c l e

i n f o

Article history: Available online 6 August 2010 Keywords: Multi-protein complex Protein–protein interaction

a b s t r a c t Multi-protein complexes are involved in essentially all cellular processes. A protein’s function is defined by a combination of its own properties, its interacting partners, and the stoichiometry of each. Depending on binding partners, a transcription factor can function as an activator in one instance and a repressor in another. The study of protein function or malfunction is best performed in the relevant context. While many protein complexes can be reconstituted from individual component proteins after being produced individually, many others require co-expression of their native partners in the host cells for proper folding, stability, and activity. Protein co-expression has led to the production of a variety of biological active complexes in sufficient quantities for biochemical, biophysical, structural studies, and high throughput screens. This article summarizes examples of such cases and discusses critical considerations in selecting co-expression partners, and strategies to achieve successful production of protein complexes. Ó 2010 Elsevier Inc. All rights reserved.

Contents Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Co-expression case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Co-expression of nuclear receptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Expression of glycoprotein hormones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Expression of soluble T-cell receptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Eukaryotic initiation factor 2B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Co-expression strategies and considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Single vector strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Multiple vector strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Full length versus truncations/deletions/mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Tagging options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Stable versus transient expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Homologous expression versus heterologous expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 In vitro expression versus in cell expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Expression in the presence of a compound ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Introduction With the completion of genome sequencing projects, functional proteomics has come of age. After a decade of systematic genome ⇑ Corresponding author. Fax: +1 610 917 7385. E-mail address: [email protected] (Q. Lu). 1046-5928/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.pep.2010.07.015

wide studies of DNA, RNA, proteins and their interplays, a reasonable and realistic picture has emerged: they work together in cells! While the number of genes and their encoded proteins within a particular genome may be enumerable, their functions are far from clear. This is due to the fact that a protein’s function or activity is defined by multiple factors including its primary sequence, secondary structures, post-translational modifications, and associated

2

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

partners. Depending on binding partners, a transcription factor can function as an activator in one instance and a repressor in another [1–4]. A protein’s function is defined by a combination of its inherent properties, cellular localization, interacting partners, and stoichiometry of each in a specific cell type. Multi-protein complexes are involved in essentially all cellular processes. In some cases, targeting protein–protein interactions appears to be a viable approach [5] in the intervention of a protein’s function or a malfunction involved in the disease process. For early stages of drug discovery which requires a throughput, increasing attention has been drawn to the use of physiological relevant cells in cell based assays/ screens and purified enzyme complexes in biochemical assays/ screens. Proteins that physically contact each other with an affinity above random interactions form a complex. Complex formation is believed to provide a benefit to the proteins involved with regard to stability and functions. A few terms have been introduced in the literature to define protein complexes. Based on the nature of interactions, they have been defined as ‘‘transient or permanent complexes” and ‘‘obligate or non-obligate complexes” [6]. Based on affinity, they can be defined into ‘‘core or peripheral components” [7]. While some definitions are subjective, others are more systematic. Gavin and colleagues [8] used the term ‘‘socio-affinity” to predict the strength of association between proteins in a complex. Several experimental methods have been developed to identify and characterize physical protein–protein interactions (PPIs)1 [9]. High-throughput screening (HTS) such as the yeast two-hybrid (Y2H) system, and tandem affinity purification (TAP) followed by mass spectrometry (MS) have been applied widely. Other highthroughput approaches such as protein microarray and phage display have also been used to identify PPIs. Protein interactions have also been independently verified using low throughput methods such as confocal microscopy, co-immunoprecipitation (co-IP), X-ray crystallography, NMR spectroscopy, fluorescence resonance energy transfer (FRET) and surface plasmon resonance (SPR) [9,10]. Based on these studies, a number of PPI databases have been built and improved significantly over the years. These databases store the PPIs derived from a variety of resources including automated or manual literature mining, direct submission, orthologue inferring, and predictions. A summary of some of the human PPI databases are listed in Table 1. Despite considerable overlap, evaluation and comparison of the publically available databases has revealed that PPI databases differ in scope and content. Some of the differences can be explained by identifier mapping, detection methods and coverage of the curated literature [11,12]. While IntAct and BioGRID represent the most comprehensive databases in terms of the highest number of unique PPIs, HPRD appears to be the most comprehensive for human proteins in the public domain. Most of PPIs are mediated by the domain–domain interactions (DDIs) via interface hydrophobicity and surface complementarity. Several DDI databases store DDIs identified from protein structure data or those predicted by computational methods [13–15]. Note that the results of the computational predictions are suggestive, providing guidelines for further dissection of protein complexes and validation of experimental observations. Purification of small quantities of protein complexes for biochemical studies has often been achieved by co-IP or TAP technologies. With TAP, a tag with multiple affinity/epitope sequences is

added to the target protein, and then over-expressed in cells to facilitate detection and purification of protein complexes [16,17] from a variety of organisms [18]. With the gentle native conditions used, several protein complexes purified by TAP have been shown to retain biochemical activities [16,17,19]. However, both co-IP and TAP are limited by yield. In addition, if the target protein is over-expressed in a homologous host, paralogues/orthologues of associated proteins are likely to be co-purified. For the milligram quantities of protein complexes required for structural studies and HTS, recombinant co-overexpression of major partners provides a more practical approach. Over-expression of proteins for use in biochemical, biophysical, structural studies and drug discovery, has been achieved with bacterial, yeast, insect, and mammalian host systems. Historically, the preferred method to obtain multi-protein complexes was to express and purify each subunit separately, followed by reconstitution in vitro. This approach has been successfully used to produce many protein complexes including the eukaryotic initiation factor eIF2B [20], a set of CXCR4/peptide ligand pairs [21], and a set of soluble T-cell receptors (TCRs) [22]. This method usually requires re-folding of at least one-member of the complex. Protein aggregation during reconstitution will also affect yield and functionality. Co-expression and formation of the multi-protein complex in cultured cells represents a preferred approach that, in many cases, has become a requirement in forming functional multi-protein complexes (see case studies for examples). A study of protein–protein interactions among CARD family proteins produced via coexpression or reconstitution indicated that proteins produced via co-expression correlated more closely with protein–protein interactions detected by other methods [23]. Co-expression can increase the efficiency of production as the entire complex is purified at one time rather than separately followed by reconstitution [24]. Production of recombinant tetrameric antibodies and engineered derivatives via co-expression is well established [25,26]. Some of the basic technologies have been used to produce protein complexes in other protein classes; however, with the added challenges of multiple interacting partners with various interaction affinities. This article summarizes examples of such cases, discusses considerations involved in selecting co-expression partners, and describes strategies to achieve production of active multiprotein complexes. Co-expression case studies Within the past decade, there have been an increasing number of reports describing the production of protein complexes via coexpression. They represent a variety of target/protein classes, including intracellular proteins, transcription and translation factors, secreted proteins, and intracellular/extracellular domains of membrane proteins. While co-expression was used primarily as an alternative approach to produce active soluble proteins, many cases indicated protein complex formation is required for optimal activity. With the knowledge of the appropriate components in a complex, potential co-expression partners can be produced and tested. Table 2 lists examples reported in the literature on coexpression of protein complexes, with as many as six component proteins. Representative cases are highlighted below. Co-expression of nuclear receptors

1 Abbreviations used: Co-IP, co-immunoprecipitation; DDI, domain–domain interaction; ECD, extracellular domain; HTS, high throughput screen; IRES, internal ribosomal entry site; IVTT, in vitro transcription and translation; LBD, ligand binding domain; MHC, major histocompatibility complex; ORF, open reading frame; Ori, origin of replication; PPI, protein–protein interaction; TAP, tandem affinity purification; VLP, viral-like particle.

Nuclear receptors belong to a class of ligand-activated transcription factors that regulate development and metabolism, and are important targets for drug discovery. Over-expression of their ligand binding domains (LBD) in Escherichia coli often leads to inclusion bodies, and production of soluble proteins usually

Table 1 List and description of the protein–protein interaction databases. Manually curated (open source)

Organisms in database

URL

Proteins

Interactions

Database source

References

DIP MINT

274 organisms >30 model organisms

21891 30,521

69171 83,419

>275 species

63,588

213,763

BioGRID MIPS/Mpact

17 organisms 10 mammals species, Mus musculus as the reference species Human and 22 non-human species based on ortholog prediction Human

http://www.thebiogrid.org http://mips.helmholtzmuenchen.de/proj/ppi/

29,984 >900

235,128 1728

Experimentally determined PPIs Experimentally verified PPIs mined from the scientific literature by expert curators Data is extracted from the literature or from direct data deposition from high-throughput studies Curated from literature Published experimental evidence (>370 articles) curated by experts

[125] [126]

IntAct

http://dip.doe-mbi.ucla.edu http://mint.bio.uniroma2.it/ mint http://www.ebi.ac.uk/intact

http://www.reactome.org

2975

2907

Manually curated Reactome pathway datasets

[130,131]

HPRD–http://www.hprd.org

8710

38 167

PPIS are derived from yeast two-hybrid analysis, in vitro and in vivo methods

[132]

430,782 >37 000 high probability interactions of which > 34,000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. 180,010

Predicted base on ortholog Bayesian prediction method predicted PPI database

[133] [134]

Predicted PPIs by integration of 27 heterogeneous genomic, proteomic and functional annotation datasets. Known and predicted PPIs from MINT, IntAct, HPRD, BIND, DIP, BioGRID, KEGG and Reactome, EcoCyc and Gene Ontology (GO) protein complexes. Integration of HPRD, BIND, MINT, STRING, MIPS and OPHID

[135]

HPRD

Predicted (open source) OPHID 7 species PIPs Human

http://ophid.utoronto.ca http:// www.compbio.dundee.ac.uk/ www-pips http://hanlab.genetics.ac.cn/ sys http://string-db.org

IntNetDB

Human

String

630 organisms

HAPPI

Human

http:// discern.uits.iu.edu:8340/ HAPPI/

Human

Commercial Prolexys human interactome database MetaCore

Ingenuity Knowledge Base BINDplus

69 965

9901 2.5 million proteins 10,592

50 million

http://www.prolexys.com/

11500

120,000

Experimentally determined and verified human protein– protein interactions, Prolexys HyNet.

Human and orthologs from 8 species mapped to human

http://www.genego.com

>840,000 interactions

Manually curated human PPIs from literature. GeneGo

Human, mouse and rat and orthologs from 8 species mapped to human >1500 unique organisms

http://www.ingenuity.com

>20,000 human proteins NA

NA

Manually curated human PPIs from top journals. Ingenuity

http://thomsonreuters.com

60,000

200,000

From Scientific literature extracted by expert curators, part of Thomson Reuters BONDplus

142,956

[128] [129]

[136]

[137]

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

Reactome

[127]

3

4

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

Table 2 Examples of purified multi-protein complexes via co-expression. Protein complex

Number of components

Multiple vectors

Holotranslocon YidCSecYEGDF TCR-CD3

6 6

+

RFC Rat eIF2B Yeast eIF2B TFIID TRAPP Blue tongue complex VLPs SNAPc PRC2 Toluene-4-monooxylase (T4MO) Skp1/Fbox/RBx1/Cul1 BTV Blue tongue virus ESCRT-II NuA4 HAT complex LKB1/MO25a/STRADa VHL-Elongin B-Elongin C (VBC) Y. pestis virulence factors YopN/SycN/YscB VHL–Elongin B–Elongin C (VBC) Cdk7/Cyclin H/MAT1 Chorionic Gonadotropin/LHR ECD (ba-ECD) TAF5/6/9 VDR/RXR/Drip Yeast ISW1b Yeast TAF17/60 Yeast Trm8/Trm82 Soluble TCRab Soluble TCRab Soluble MHC II-IA BCL-XL/BAD BCL-XlL/BIM-S IL-18 PA28ab RAR/RXR LXRa/RXRb CAR LBD/SRC-1 EcR/USP HPV L1/L2 VLPs Hemoglobin (Hb A) Protein farnesyltransferase MSH2/MSH6

5 5 5 5 4, 3, 2 4, 3 4 4, 3 4

+ + +

4 4 3 3 3 3 3

+

3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Single vector with multiple cistrons

Single chain fusion

Host system

References

E. coli

[76] [114]

+

Eukaryotic in vitro Yeast Baculovirus Yeast Baculovirus E. coli Baculovirus E. coli Baculovirus E. coli

+ + + + +

Baculovirus Baculovirus E. coli E. coli E. coli E. coli E. coli

[32] [96,139] [143] [95] [144] [76] [80]

E. coli

[81]

Baculovirus Baculovirus

[32] [49]

Baculovirus E. coli Baculovirus E. coli E. coli Baculovirus E. coli Drosophila S2 E. coli E. coli E. coli E. coli E. coli E. coli E. coli Baculovirus Baculovirus E. coli Baculovirus E. coli, Baculovirus Baculovirus Baculovirus

[85] [32] [83] [81] [101] [53,55] [57] [59] [74] [74] [145] [146] [31] [147] [35] [34] [148,149] [102] [150] [24,32]

HEK293/ Adenovirus Baculovirus E. coli E. coli E. coli E. coli COS/Plasmid E. coli Aspergillus E. coli E. coli E. coli E. coli E. coli HEK293/ BacMam E. coli E. coli Baculovirus E. coli CHO/Plasmid

[153]

+

+ + + + + +

+ +

3 3 3

Single vector with multiple promoters

+

+ +

+ +

+ + +

+ + + + + + + +

+ + + + +

+ + + + + + +

+

MTMR2/sbf2 Rotavirus VLPs of simian SA11 genes 2 and 6 IL-12

2 2

+ +

2

+

IL-12 DFF40/DFF45 KChIp1/Kv4.3 pRB/Ad5 E1a pRB/HPV16E7 Rhodopsin Fragments RNA Polymerases Neoculin ScoAB SRF/Elk-1 SRF/SAP-1 TFIID/TFIIH Transilin/Trax Fz4 ECD/Wnt5

2 2 2 2 2 2 2 2 2 2 2 2 2 2

Soluble TCRab NS4A-NS3 NS3-NS4A HIV RT p66/p51 Lutropin (LH)

2 2 2 2 2

+ + + + + + + + + + + + + + + + + + +

[138] [20] [62] [83] [77] [96,139] [140] [141] [142]

[151] [152]

[154] [88,155] [156] [157] [157] [158] [159,160] [161] [162] [163] [163] [164] [165] [106] [57,166] [167] [168] [160] [169]

5

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14 Table 2 (continued) Protein complex

Number of components

PXR/SRC-1 Single chain Monellin CuZN SO dismutase Homodimer Chorionic gonadotropin (CG) Luteinizing hormone (LH) Luteinizing hormone (LH)

2 2 2

Multiple vectors

Single vector with multiple promoters

2 2 2

α

α

requires the introduction of specific mutation(s) [27]. This hurdle has been overcome by co-expression with interacting partners such as retinoid-X-receptor (RXR) and/or steroid receptor co-activator 1 (SRC-1). RXR is known as a master regulator in the nuclear receptor super family and has been shown to be the partner of many receptors including RAR, PPAR, VDR, and LXR [28,29], whereas SRC-1 is a co-activator with histone acetyltransferase activity [30]. In the case of RAR, expression of the RAR LBD alone was largely insoluble whereas RAR-LBD/RXR-LBD co-expression in E. coli resulted in a 1:1 ratio of RAR/RXR with more than 95% of the expressed proteins in tight heterodimers [31]. Co-expression with RXR dramatically increased RAR solubility and promoted dimerization. Further analysis indicated that the RAR/RXR heterodimers possess increased ligand-binding capacity of both components by 5- to 10-fold. Similar observations have been made with other nuclear receptors when co-expressed in E. coli with RXR, including VDR/RXR/Drip [32], PPARc/RXRa [33], and LXRa/RXRa (Fig. 1). The insect orthologue of RXR, USP (ultraspiracle), has also been shown to facilitate production of the insect ecdysone receptors in E. coli [31] and in baculovirus-infected insect cells [34]. Similar to RXR, co-expression with SRC-1 has been used to facilitate production of a set of nuclear receptors in E. coli, including ER/ SRC-1 [32], CAR/SRC-1 [35], PXR/SRC-1 [36,37], and PPARc/SRC-1 [38]. In one case, PXR was produced in E. coli as a fusion with a SRC-1 peptide, PXR-SRC1p, and the resulting complex crystallized [39].

α α

Fig. 1. Co-expression of LXRaLBD/RXRaLBD heterodimers in E. coli. Expression analysis of LXRaLBD/RXRaLBD heterodimers co-expressed in E. coli BL21(DE3). Coomassie staining of SDS–PAGE analysis of samples derived from soluble and insoluble fractions, together with those from a small scale analytical purification trial using Ni–NTA mini-column (www.qiagen.com). The ORFs for human LXRa and RXRa LBDs were co-expressed as 6xHis-LXRa(aa205–447) and RXRa(aa225–462), carried on pRSETa (www.invitrogen.com) and pACYC184 (www.neb.com) vectors, respectively. Co-expression enhanced the solubility of LXRaLBD dramatically.

Single vector with multiple cistrons

Single chain fusion

Host system

References

+

+ + +

E. coli Yeast E. coli

[39] [170] [171]

+ + +

Baculovirus CHO/Plasmid Yeast

[48] [46] [47]

Expression of glycoprotein hormones Glycoprotein hormones are a family of proteins consisting of placental-derived chorionic gonadotropin (CG), pituitary-derived lutropin (LH), follitropin (FSH), and thryrotropin (TSH) in humans, primates and equines. They are heterodimeric proteins with a common a subunit and a hormone-specific b subunit. Given the requirement of protein glycosylation, production of the hormones typically has been achieved in eukaryotic systems. Early expression studies of this class of hormones indicated that expression of both subunits is required for efficient assembly and secretion of LH and TSH heterodimers [40,41]. Stable heterodimeric equine LH has been produced in Sf9 cells and in COS-7 cells via subunit coexpression [42,43], and the protein purified from insect cell medium [42,44]. Alternatively, active heterodimeric glycol-hormones can be produced by fusing the a and b subunits together as a single chain protein [45]. A variety of such single chain glycol-hormones with ba or ab configuration have been produced in CHO cells, yeast, and baculovirus infected Sf9 cells [44,46–48]. A fused human CG has also been shown to enable production of the extracellular domain (ECD) of its receptor LHR when fused with the ECD (as ba-ECD), whereas expression of the LHR ECD alone in E. coli or insect cells was unsuccessful [49]. Expression of soluble T-cell receptors T-cell receptors (TCRs) are antigen recognition molecules on the surface of T cells, and belong to the immunoglobulin super family. They are heterodimeric proteins of either ab or cd chains. Each contains an extracellular domain (ECD) with a variable region and constant region, a single transmembrane domain, and a short cytoplasmic domain. The variable regions confer specificity for interaction with the peptide-major histocompatibility complex (peptide-MHC) presented on antigen-presenting cells (APCs). Given the nature of membrane association of this class of proteins, protein production has been focused on obtaining the soluble forms of the heterodimeric ECDs or the variable regions. Although expression in E. coli has been successfully used to produce sufficient amounts of protein for crystallography, the proteins typically form inclusion bodies and require re-folding after the variable regions are expressed separately or jointly as a single chain [22,50]. Direct production of certain soluble TCR heterodimeric forms has been achieved via co-expression in eukaryotic hosts, which include co-expression and secretion of heterodimeric ECDs from CHO cells [51,52] and baculovirus-infected insect cells [53]. However, it has long been observed that the transmembrane domain plays a role in the formation of TCR heterodimers [54], and co-expression of N15 TCR a and b chain ECDs in insect cells fail to form the desired heterodimers [53,55]. This issue was addressed by a chain pairing approach, in which the a and b chains were tagged with a pair of specifically designed acid/base coiled-coil leucine zipper peptides [56], respectively. Co-expression of the helix-tagged N15 TCR a and b chains (aTCR-LZbasic, bTCR-LZacid) in insect cells resulted in efficient heterodimer assembly and secretion [55]. This chain

6

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

Table 3 Commonly used expression hosts and forms of vector delivery and maintenance. Bacteria Chromosomal Episomal

E. coli

Yeast

Insect

Mammalian

Pichia

Drosophila S2

CHO, HEK293

Saccharomyces

Plasmid transient Viral transient

In vitro expression

Baculovirus/ Sf9

E. coli

HEK293EBNA, COS HEK293EBNA, CHO BacMam/ HEK293 Adenovirus/ HEK293

Sf9

achieved via a single expression vector or multiple expression vectors, via multiple ORFs or a single ORF. The component proteins can be expressed via stable or transient expression, in a homologous or heterologous host. The pros and cons for these expression strategies are examined below. Table 3 summarizes commonly used expression hosts and forms of vector delivery and maintenance. In principle, the vectors, hosts, and expression strategies developed to produce a single protein can be applied to producing protein complexes via co-expression. To facilitate discussion on expression strategies, we will use the following terms in referring to protein production needs: small scale for 1 milligram or less, medium scale for 100 milligrams or less, and large scale for 1 gram or more. Single vector strategies

pairing approach has also been used to produce other soluble TCRs and MHCs, including murine D10 sTCR ab heterodimers in E. coli [57], soluble human HLA-DR2 ab heterodimers in yeast [58], and soluble murine HMC II molecule IA a1a2b1b2 heterodimer in Drosophila cells [59]. Eukaryotic initiation factor 2B Eukaryotic initiation factor 2B (eIF2B) is an essential translation initiation factor with five subunit components. Given that eIF2B is a multi-subunit phosphoprotein, eukaryotic cell hosts have been used for recombinant production. For the rat eIF2B complex, the five subunits were co-expressed in insect cells using multiple baculoviruses expressing 1–2 subunits, each tagged with an N-terminal FLAG epitope. The heteropentameric holoenzyme was subsequently purified via an anti-FLAG affinity column followed by gel filtration chromatography [20]. Analysis of the recombinant holoenzyme indicated that the five subunits were present in an equal ratio, similar to the native eIF2B purified from rat liver, and that the specific activity was comparable to that of the native eIF2B complex. The human eIF2B pentameric complex was also expressed in insect cells, and was found to be active in stimulating translation in an in vitro system [60]. The yeast Saccharomyces cerevisiae eIF2B complex was produced in S. cerevisiae via co-expression of a combination of vectors each expressing 1–3 subunits, with 1 or 2 subunits tagged with FLAG [61,62]. Co-expression strategies and considerations Protein partner co-expression can be achieved via a variety of strategies, and in different expression hosts. Co-expression can be

A single vector system for co-expression can be achieved either via multiple expression cassettes or a single expression cassette (polycistronic or monocistronic), where an expression cassette consists of a single transcriptional unit with ‘‘promoter-ORF(s)-terminator” sequence elements. With the multi-cassette approach, individual genes/ORFs are expressed separately, although the cassettes are carried on a single vector. A variety of such vectors have been described in the literature, those commercially available are listed in Table 4. With the single-cassette approach, multiple ORFs are linked together to form a single transcriptional unit with either multiple translational units (polycistronic) or a single translational unit for a single chain protein (Fig. 2). Polycistronic co-expression vectors can be constructed by linking individual ORFs with a sequence containing the ribosome entry/binding site (RES). This RES sequence can simply be a purine-rich sequence or Shine-Dalgarno sequence such as 50 GGGAG AG30 located approximately 5 nucleotides upstream of the initiation codon for expression in E. coli, or an IRES (internal RES) sequence for expression in eukaryotic cells. A variety of IRES sequences have been successfully used in eukaryotic expression vectors. For example, the IRES derived from Encephalomyocarditis virus (EMCV) has been used for mammalian expression [63], and an IRES derived from Rhopalosiphum padi virus in insect cells [64]. Given the short Shine-Dalgarno sequences used in E. coli polycistronic vectors, the ORFs can be joined together via overlap PCR followed by subcloning into a conventional expression vector. For constructing a eukaryotic polycistronic expression vector or a single vector with multiple expression cassettes, given the size of popularly used eukaryotic IRES sequences or promoters (CMV, RSV, SV40, 0.5 kb), it is often more convenient to insert individual ORFs into an expression vector with existing IRES sequences (the

Table 4 Commercially available vectors with multiple promoters for co-expression. E. coli Novagen (www.emdchemicals.com)

Invitrogen (www.invitrogen.com) Stratagene (www.genomics.agilent.com) ClonTech (www.clontech.com) InvivoGen (www.invivogen.com)

Baculovirus

Mammalian pTandem-1

pFastBac Dual

pBudCE4–1

pESC vectors pIRES vectors pDUO vectors pVITRO vectors pVIVO vectors pCMV-BICEP-4 pBICEP-CMV-3

Sigma–Aldrich (www.sigmaaldrich.com) Addgene(www.addgene.com)

Yeast

pACYCDuet-1 pETDuet-1 pCDFDuet-1 pRSFDuet-1 pCOLADuet-1

pQLinkH pQLinkG pQLinkN pQLinkHD pQLinkGD

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

Multiple ORFs

Single ORF

SD or IRES sequences

Linker sequences

Fig. 2. Definition of expression vectors and co-expression strategies. The diagram depicts expression vectors and strategies for co-expression in E. coli. Ovals represent E. coli cells, and rectangles represent expression vectors. Bent arrows represent individual expression cassettes, and shaded rectangles are for individual ORFs. The round dots represent either ribosome binding sites or linker sequences, as indicated. Each vector can be used alone or in combination with others. SD, ShineDalgarno sequence; IRES, internal ribosome entry site; ORF, open reading frame.

IRES vectors) or promoters. In both cases, sequential subcloning steps are typically required. Examples of co-expression via a polycistronic expression cassette can be found in Table 2. For proteins that naturally function as heterodimers or homodimers, their genes can be linked together to form a hybrid fusion gene which encodes a single polypeptide with two distinct functional domains. Such a gene dimerization approach has been applied to many proteins, including the generation of single chain antibodies (scFv) [65] and a retroviral protease. Retroviral protease is a member of the aspartyl protease family that also includes the human cellular enzymes pepsin, chymosin, and cathepsin D [66]. The viral protease contains only a single protease domain and functions as a homodimer, whereas each of the cellular proteases contains two structurally similar protease domains and functions as a monomer. Single chain viral protease homodimers produced in E. coli were found to be 2- to 3-fold more active than the unlinked homodimeric enzyme [67]. A common observation with single chain fusions is that the linker length and composition can play critical roles regarding protein stability, solubility, and activity [65,68]. An optimal configuration may need to be determined or selected experimentally. A good example is the protein dimerization of monellin from African berries. Monellin is a heterodimer of an A chain (45 amino acid residues) and a B chain (50 amino acid residues) and is an intensely potent sweet protein. Hybrid proteins with the B and A chains covalently linked by various linker peptides of 6–8 amino acid residues in length were produced in E. coli and tested in various assays. The sweetness of these proteins was found to differ significantly among variants. One variant was found to be as potently sweet as the natural monellin and appears to be more stable than the native two-chain monelin [69]. Examples of this type of protein tethering/fusion for additional heterodi-

7

meric and homodimeric proteins can be found in case studies for glycoprotein hormones and in Table 2. In all cases proteins with the desired properties were obtained or selected, typically for an increase in stability, solubility, and/or activity. A variation of the protein tethering/fusion approach is to link the two proteins with a self-processing sequence, such as the 2A sequence (20 amino acid in length, QLLNFDLLKLAGDVESNPGP) derived from foot-andmouth disease virus (FMDV), which mediates co-translational cleavage and separation of the component proteins [70]. This approach has been used extensively in viral delivery systems for co-expression of multiple genes in functional studies in animal models or cultured cells. Its feasibility for protein production has been demonstrated [71]. The single vector approach expresses all component proteins from a single expression construct, which facilitates easy establishment of expression strains, and ensures expression of component proteins in the same host cell. When the component proteins are expressed as a single chain protein, it guarantees appropriate heterodimer formation with a desired stoichiometry. It also facilitates further co-expression with additional partner(s) and/or modifying enzymes carried on additional compatible vectors [72]. However, this approach has limitations. Aside from the sequential steps needed for construct generation, a fixed gene ratio may cause an imbalance of proteins. With the single-cassette-polycistronic approach, the upstream ORFs are typically translated more efficiently than the downstream ones [64,73,74]. With the multi-cassette approach, even though the gene copies for the individual expression cassette are fixed (1:1), imbalance in protein products may still be caused by differences in the rate of transcription, translation, translocation, and the stability of RNA and protein products [75]. The optimal gene dose and order may need to be determined experimentally. With the single vector approach for co-expression, construction of the expression constructs can be a challenge. Within a limited length (5 kb), the component ORFs or expression cassettes can be easily linked together via overlapping PCR before being subcloned into a vector, or the fragments can be assembled into a vector via InFusion/LIC (ligation-independent cloning) [76]. Beyond this size or the number of component fragments, it might be more practical to use a stepwise subcloning approach. The pQLink, MultiBac, and the GatewayR MultiSite (http://www.invitrogen.com) vector systems were created to enable such a strategy. pQLink vectors are E. coli expression vectors with GatewayR compatibility. Each pQLink vector can be used to express a single ORF under the control of the Ptac promoter, or be used to accept additional expression cassettes from sister pQLink vectors through the specially designed flanking LINK sequences via LIC [77]. The MultiSite GatewayR system allows assembly of multiple DNA fragments precloned in specific entry/donor vectors in an orientation and orderspecific manner [78]. With a RES sequence added to the 50 end of each ORF, multicistronic constructs have been generated for coexpression [79] and for production of protein complexes [80]. Tan and colleagues [81,82] described a different approach for sequential construction of multicistronic vectors. With this approach, each component ORF, together with a translational enhancer and a SD element, is subcloned into a transfer vector with distinct flanking restriction sites. The translational cassette in each transfer vector can then be transferred into an expression vector via unique flanking restriction sites. The MultiBac system utilizes a combination of a specially designed multiplication module and cre/lox recombination for construction of baculovirus transfer vectors for bacmid generation in E. coli [83–85]. Specific donor and acceptor transfer vectors were created with each bearing a dual expression cassette with the polh/p10 promoters together with a multiplication module. Multiple expression cassettes can be generated in each vector via the module, and the expression cassettes in

8

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

the donor vector(s) can be combined with those in the acceptor vector via cre/lox recombination [85,86]. The MultiBac system allows combinatorial assembly of expression vectors with component ORFs or mutations/truncations for each carried on the acceptor/donor vectors, enabling selection and expression of appropriate protein complexes for structural studies. The same principle has also been used to generate composite expression vectors for E. coli expression [76]. Multiple vector strategies A multi-vector co-expression system utilizes two or more expression vectors, each expressing one or two component proteins (Fig. 2). For stable expression in eukaryotic cells, multiple vectors can be selected via different selectable markers, either via episomal maintenance or genomic insertion. Co-expression in E. coli via this strategy requires that each vector contain a different origin of replication (ori) and a different selectable marker. Plasmids with the same ori are mutually exclusive, and unfavored for co-existence [87]. The Duet vectors marketed by Novagen (www.emdchemicals. com) contain compatible ori sequences derived from ColE1, p15A, CloDF13, and RSF1030, respectively. With 1–2 proteins carried on each vector, up to 8 proteins can be co-expressed. However, coexpression of the human DNA fragmentation factor DEF40-DEF45 heterodimer in E. coli was achieved by using two vectors with the same ori and different selectable marker [88]. Plasmid co-existence in this case was apparently forced by antibiotic selection. The general application of this approach remains to be seen. For heterodimeric proteins with a weak affinity, dimer formation can be enhanced by linking the component proteins, respectively, to a pair of short peptides known to form heterodimers. Such hetero-dimerization pairs are specifically designed peptide pairs, including the coiled-coil domains of leucine zipper proteins [56], the helix-loop-helix domains of 4-helix bundle proteins [89] the constant domains (CH1 and CL) of an antibody [90] or Fc fragments with engineered CH3 domains [91]. Synthetic polyionic peptide pairs, such as ACE8 and ACR8, have also been used [92,93]. For example, coiled-coil leucine zipper domains were used to direct hetero-dimerization of variable regions of an antibody to form helix stabilized Fv [94] or hetero-dimerization of extracellular domains of membrane proteins for biochemical studies (see the case study for TCRs). Co-expression via multiple vectors provides flexibility in construct generation and expression optimization. The expression constructs for each component can be generated in parallel, and used in expression trials individually or in combination. This feature can enable study of individual component proteins in the formation of a complex. Eukaryotic cells can take up many different plasmid DNAs and/or viruses, upon plasmid DNA transfection and/or viral transduction, thus the ratio of gene expression can be modulated via ratio of input plasmid DNA or virus multiplicity of infection (MOI). However, the appropriate ratio of input DNA may need to be determined empirically, given the difference in RNA/protein stability and turnover [75]. The use of multiple vectors to establish a stable mammalian cell line may require double or triple selection with antibiotics, which may affect cell viability. Multiple vector co-expression in E. coli also requires that the component genes be expressed from vectors with compatible ori. In addition, vectors with different origin of replication confer different copy numbers, which may result in an undesirable imbalance in abundance of the component proteins. Full length versus truncations/deletions/mutations Either full length proteins or modified versions have been used for co-expression. Extracellular domains (ECDs) of membrane pro-

teins are typically used for co-expression (for example, see case study for TCRs). In some cases, only functional domains of the component proteins are co-expressed. Using deletion constructs for each of the heterotrimeric components of the yeast Piccolo NuA4 histone acetyltransferase (HAT) complex, Song and colleagues [95] dissected regions of the component proteins responsible for nucleosome-specific HAT activities via a combinatorial co-expression approach. Deletion and/or mutation constructs have also been used to identify proper constructs for crystallization [85], or to define critical regions in component proteins responsible for assembly of complex bluetongue VLPs [96]. For proteins predicted to have intrinsically disordered regions (intrinsically unstructured or regions with reduced complexity), inclusion of a specific region might be needed to facilitate complex formation, since these regions are rich with post-translational modification sites and are often involved in interactions with other proteins/domains [97]. A number of databases are available online to enable such analysis and prediction [98–100]. Tagging options To enhance solubility and facilitate purification of the assembled protein complexes, one or more components of the complex are typically tagged. The most convenient tagging sites are either at the N- or C-terminus of the target protein. In some cases, all components are tagged, either with the same or a different tag. Using the same tag may cause heterogeneity of the purified complexes, whereas different tags enable further purification of the core complex. Solubility tags are often used to enhance solubility of a component protein [77,81,101,102]. All of the popular epitope and affinity tags have been used; however, the shorter ones, such as FLAG, 6  His, etc., are used more often. It is reasonable to assume that the shorter tags will have minimal interference with protein complex assembly, but most of the commonly used larger tags can enhance solubility of their fusion partners, thus enabling assembly of the protein complex [72,102]. When using tags, an additional consideration is whether or not to include a protease cleavage site such as TEV or thrombin to enable tag removal after purification, if needed. Stable versus transient expression A critical choice for producing recombinant proteins is whether to establish a stable cell line or to use transient expression. For stable expression in mammalian cells, the genes are usually inserted into the host genome via random or targeted insertion and selection. The mammalian stable cell line system ensures sustained production of recombinant protein and is the preferred system for protein production at a scale required for therapeutic proteins and antibodies. Alternatively, expression vectors carrying a viral ori derived from Epstein-Barr virus (EBV) or Simian virus 40 (SV40) can be maintained as stable episomes in mammalian cells expressing EBV nuclear antigen 1 (EBNA-1) or SV40 large-T antigen, respectively [103,104]. For transient expression, the genes carried on individual vectors can be delivered into the host via plasmid DNA co-transfection, or via viral co-infection or co-transduction. Based on the cell host used, viral delivery and expression systems can be lytic or non-lytic. The lytic expression systems utilize a replication-permissive host where the virus can be replicated upon cell entry (infection) and eventually cause cell lysis, such as the baculovirus/Sf9 and adenovirus/HEK293 virus-host combinations. The non-lytic expression systems refer to the use of a non-permissive host where the virus cannot be replicated upon cell entry (transduction) and will eventually be lost upon cell growth, such as the BacMam/HEK293 combination [105,106]. With the non-lytic systems, the virus is used merely as a delivery agent, a role functionally

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

similar to that of transfection reagents in delivering plasmid DNA into mammalian cells. Given the replication nature and the use of later stage promoters, the baculovirus expression system often provides high level expression of recombinant genes, and has been used extensively for producing protein complexes at median scales (100 mg) [86]. The BacMam/HEK293 system has also been used to co-express ECD of a G protein-coupled receptor (GPCR) with its ligand [106]. With the advances in plasmid DNA preparation and transient transfection technologies, median scale protein production via transient transfection can be cost-effective [107]. It should be noted that it takes longer and requires more effort to establish a stable line especially for high level sustained expression of all the component proteins. However, once established, the stable line can be used in combination with other approaches. The transient system allows expression of proteins at a median scale. With the incorporation of the EBV origin of replication OriP and EBNA-1 into the recombinant viral genomes, sustained gene expression has been achieved for non-lytic delivery via BacMam and adenovirus [108,109]. Homologous expression versus heterologous expression The definition of homologous expression and heterologous expression is mainly determined by the evolutionary distance between the origin of the genes to be produced and the expression host used. Expression of mammalian genes in a mammalian cell line is considered to be homologous expression, whereas expression of the mammalian genes in yeast, insect cells, and bacteria hosts are considered to be heterologous expression. In certain cases, there may be a need to produce human proteins in a human cell line to make it strictly homologous. As with protein production of individual proteins, the choice of the expression host for producing protein complexes is influenced by multiple factors, including post-translational modifications, solubility, and stability of each component. Therapeutic vaccine proteins or immunogens such as viral-like particles (VLPs) are routinely expressed in non-mammalian hosts to avoid contamination with mammalian-derived components [110]. The HPV vaccines Cervarix and Gardasil are VLPs of papillomavirus major capsid proteins expressed and assembled in insect T. ni cells via baculovirus and in the yeast S. cerevisiae, respectively [111]. On the other hand, homologous expression allows expression and assembly of close to native complexes, including patterns of post-translational modifications. It also allows incorporation of potentially unknown factors that may facilitate complex formation and function. However, given the homologous host background, paralogues and orthologues (for example, a human gene is expressed in CHO cells) of the component proteins are likely to be incorporated into the protein complex, thus introducing heterogeneity of the purified protein complex. A good illustration comes from expression and analysis of recombinant histone deacetylases (HDACs) [112]. Production of active HDACs in E. coli, yeast, and baculovirus-infected insect cells has so far been unsuccessful. This may be due to lack of essential co-factors in the heterologous hosts. Active recombinant HDACs can only be purified from mammalian hosts, and purified 6xHis tagged HDAC1 or HDAC3 preparations were found to be multi-protein complexes containing HDAC1, HDAC2, and HDAC3 [112]. Heterologous expression would enable assembly of protein complexes with increased homogeneity and purity. However, post-translational modifications of the expressed proteins are likely to be different or absent. This is particularly true when human proteins are expressed in E. coli, where post-translational modifications such as phosphorylation, glycosylation, and acetylation are missing. For example, the transcription factor Sp3 is an activator when acetylated, but a repressor when unmodified [113]. However, this feature has made E. coli the default host for

9

protein production for structure studies. It has also been used for specific applications to avoid protein modifications, such as production of kinase substrates. In vitro expression versus in cell expression In vitro expression, using cellular extracts to drive transcription and translation (IVTT), provides an alternative method for producing proteins and protein complexes. The system has been explored to produce a TCR-CD3 complex [114], and to produce scFvs for screening purposes [115]. A major advantage of IVTT is that toxic proteins can be produced, and roles of individual components can be rapidly assessed. In addition, the abundance of certain cellular factors or proteins can be manipulated easily [60,116]. However, given the current limitation on scale and cost of reagents, it is typically used for producing protein complexes for small quantities of protein (1 mg or less). Expression in the presence of a compound ligand Co-expression with a high-affinity ligand or metal co-factor can also enhance the stability and solubility of recombinant proteins. This approach has been utilized extensively for producing proteins in the nuclear receptor family, where expression of receptors were stabilized in the presence of a high-affinity ligand for crystal structure studies [117]. The overall improvement with the addition of a high-affinity ligand depends on potency and solubility of the ligand. Significant enhancement was observed when specific mutations in the LBD were expressed in the presence of ligand [118–120]. In some instances, a lower affinity ligand may be needed due to toxicity or solubility issues seen when expressed with higher affinity ligands. Upon purification a higher affinity compound ligand may then be added for further stabilization if needed [120]. Concluding remarks Protein production is a complex process involving vector and host engineering, gene delivery, cell growth, cell lysis, protein purification and storage. This article focuses on the initial stages of the process including construct design and expression strategy. Major challenges in protein production are solubility, stability, activity, and yield, which is mainly a result of over-expression and lack of major native partners. Increasing evidence has indicated that many of these issues can be alleviated when the protein is overproduced in a host environment along with its major partners or a stabilizer that physically binds to the protein. As discussed in the above sections, designing a suitable expression strategy for a desired protein complex depends on the following factors: partner selection, amount of protein complex required, and specific application of the protein complexes. Critical considerations involved in designing a co-expression strategy are outlined in Fig. 3. Which protein(s) should be used for co-expression with a specific target protein? This is a fundamental question related to the target biology and the purpose of the study. In other words, which complex is the biologically relevant target of the study? A decision can only be made after considering the specifics of target biology, pathway interaction, intracellular localization, and therapeutic purposes (Fig. 3). This can be illustrated by the case with 14-3-3 proteins. 14-3-3 proteins are homo- or heterodimers of certain members among seven isoforms, each with the ability to bind a variety of target proteins and regulate diverse cellular processes including signal transduction, subcellular targeting, and cell cycle control [121]. They appear to interact with motifs containing phosphoserine or phosphothreonine residues that are located predominantly in intrinsically disordered/unstructured regions of their

10

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

Fig. 3. Flowchart of critical considerations in designing a co-expression strategy. A purpose driven flowchart is shown with critical steps involved in designing a coexpression strategy outlined. The decision of choosing a specific partner(s) for co-expression is a result of database searches and literature study of the target and its biology, together with a judgment on using a relevant splice variant (if any). The expression vectors could be any one or more of the vectors with a single or multiple expression cassettes, polycistronic, and/or single chain fusions. The host could be cells of mammalian origin, or derived from insect, yeast, or bacteria.

binding partners, leading to a disorder-to-order transition [122,123]. Such conformational changes influence further complex formation and direct cellular localization of the complexes to their functional sites [121]. Thus specific target/14-3-3 protein complexes will have distinct functions, which should be selected carefully to enable relevant studies. Similar consideration also applies to the question of which isoform of the multiple splice variants of a target or its partners to use in co-expression. In producing human DFF45/DFF40 heterodimers, it was found that only a long form splice variant of DFF45 formed a functional complex with DFF40 when co-expressed, but not a short form [124]. With the help of a variety of databases derived from experimental data and associated bioinformatics tools (Table 1), together with information on tissue co-distribution of the partners, it is anticipated that correct biological complexes with relevant partners and relevant splice variants can be identified and studied. Advances in protein expression and purification technologies have enabled production of many proteins and protein complexes for activity studies, structural studies, and HTS. While all the strategies and accumulated learning obtained on the protein production of individual proteins should be applied to producing protein complexes, specific considerations are required to maintain the stoichiometry of the component proteins. Plasmid copy number, expression efficiency of the component proteins, stability and solubility of the component proteins, together with host and growth conditions can have an impact. In addition, the experiments planned with the protein complex can impact the choice of expression strategy. In some cases, the use of truncations of the component proteins in a complex is more suitable for higher

activity or for crystallization, and in other cases, solubility tags are used to enhance solubility of a component protein. In designing truncation constructs, consideration should be given to the potentially disordered regions in a component protein on whether to include it or not, since these areas are potential binding sites and frequently interface with other proteins [97]. In order to determine the optimal strategy for a particular task, multiple parallel approaches are required, which could involve any step along the protein production process including construct generation, expression trials in multiple hosts, medium supplements, and purification procedures. High-throughput approaches [76,86,95,101] may help reach a fit-for-purpose strategy and process sooner. Co-expression technology has also been used to supply biological components required to enhance protein production including proteins that modulate transcription, translation, translocation, protein modifications, and protein folding and processing. In addition, co-expression has been extensively used to deliver multiple genes into cultured cells in cell based assays to facilitate gene function studies, pathway analysis, and target validation studies. With a better understanding of biological systems and advances in coexpression technologies, it is believed that more challenging protein complexes can be produced to enrich our understanding of protein function and facilitate drug discovery efforts. Acknowledgments We thank Tom Kost, Christopher S. Jones, Gordon D. McIntyre, Kyung O. Johanson, Christina Pao, and James Fornwald for critical reading of the manuscript, and our managers Gordon McIntyre

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

and Thomas Meek at GlaxoSmithKline for continuous support and encouragements. References [1] J. Wang, K. Scully, X. Zhu, L. Cai, J. Zhang, G.G. Prefontaine, A. Krones, K.A. Ohgi, P. Zhu, I. Garcia-Bassets, F. Liu, H. Taylor, J. Lozach, F.L. Jayes, K.S. Korach, C.K. Glass, X.D. Fu, M.G. Rosenfeld, Opposing LSD1 complexes function in developmental gene activation and repression programmes, Nature 446 (2007) 882–887. [2] J. Torchia, C. Glass, M.G. Rosenfeld, Co-activators and co-repressors in the integration of transcriptional responses, Curr. Opin. Cell Biol. 10 (1998) 373– 383. [3] O. Laptenko, C. Prives, Transcriptional regulation by p53: one protein, many possibilities, Cell Death. Differ. 13 (2006) 951–961. [4] V. Blank, Small Maf proteins in mammalian gene control: mere dimerization partners or dynamic transcriptional regulators?, J Mol. Biol. 376 (2008) 913– 925. [5] P.M. Fischer, D.P. Lane, Small-molecule inhibitors of the p53 suppressor HDM2: have protein–protein interactions come of age as drug targets?, Trends Pharmacol Sci. 25 (2004) 343–346. [6] I.M. Nooren, J.M. Thornton, Diversity of protein–protein interactions, EMBO J. 22 (2003) 3486–3492. [7] D. Devos, R.B. Russell, A more complete, complexed and structured interactome, Curr. Opin. Struct. Biol. 17 (2007) 370–377. [8] A.C. Gavin, P. Aloy, P. Grandi, R. Krause, M. Boesche, M. Marzioch, C. Rau, L.J. Jensen, S. Bastuck, B. Dumpelfeld, A. Edelmann, M.A. Heurtier, V. Hoffman, C. Hoefert, K. Klein, M. Hudak, A.M. Michon, M. Schelder, M. Schirle, M. Remor, T. Rudi, S. Hooper, A. Bauer, T. Bouwmeester, G. Casari, G. Drewes, G. Neubauer, J.M. Rick, B. Kuster, P. Bork, R.B. Russell, G. Superti-Furga, Proteome survey reveals modularity of the yeast cell machinery, Nature 440 (2006) 631–636. [9] B.A. Shoemaker, A.R. Panchenko, Deciphering protein–protein interactions. Part I. Experimental techniques and databases, PLoS. Comput. Biol. 3 (2007) e42. [10] T. Berggard, S. Linse, P. James, Methods for the detection and analysis of protein–protein interactions, Proteomics. 7 (2007) 2833–2842. [11] S. Mathivanan, B. Periaswamy, T.K. Gandhi, K. Kandasamy, S. Suresh, R. Mohmood, Y.L. Ramachandra, A. Pandey, An evaluation of human protein– protein interaction data in the public domain, BMC Bioinform. 7 (Suppl. 5) (2006) S19. [12] B. Lehne, T. Schlitt, Protein–protein interaction databases: keeping up with growing interactomes, Hum. Genomics 3 (2009) 291–297. [13] M. Liu, X.W. Chen, R. Jothi, Knowledge-guided inference of domain-domain interactions from incomplete protein–protein interaction networks, Bioinformatics. 25 (2009) 2492–2499. [14] C. Guda, B.R. King, L.R. Pal, P. Guda, A top–down approach to infer and compare domain–domain interactions across eight model organisms, PLoS. One. 4 (2009) e5096. [15] H.X. Ta, L. Holm, Evaluation of different domain-based methods in protein interaction prediction, Biochem. Biophys. Res. Commun. 390 (2009) 357–362. [16] G. Rigaut, A. Shevchenko, B. Rutz, M. Wilm, M. Mann, B. Seraphin, A generic protein purification method for protein complex characterization and proteome exploration, Nat. Biotechnol. 17 (1999) 1030–1032. [17] O. Puig, F. Caspary, G. Rigaut, B. Rutz, E. Bouveret, E. Bragado-Nilsson, M. Wilm, B. Seraphin, The tandem affinity purification (TAP) method: a general procedure of protein complex purification, Methods 24 (2001) 218–229. [18] X. Xu, Y. Song, Y. Li, J. Chang, H. Zhang, L. An, The tandem affinity purification method: an efficient system for protein complex purification and protein interaction identification, Protein Expr. Purif. 72 (2010) 149–156. [19] S. Honey, B.L. Schneider, D.M. Schieltz, J.R. Yates, B. Futcher, A novel multiple affinity purification tag, its use in identification of proteins associated with a cyclin–CDK complex, Nucleic Acids Res 29 (2001) E24. [20] J.R. Fabian, S.R. Kimball, L.S. Jefferson, Reconstitution and purification of eukaryotic initiation factor 2B (eIF2B) expressed in Sf21 insect cells, Protein Expr. Purif. 13 (1998) 16–22. [21] A. Dukkipati, J. Vaclavikova, D. Waghray, K.C. Garcia, In vitro reconstitution and preparative purification of complexes between the chemokine receptor CXCR4 and its ligands SDF-1alpha, gp120-CD4 and AMD3100, Protein Expr. Purif. 50 (2006) 203–214. [22] G.I. van Boxel, G. Stewart-Jones, S. Holmes, S. Sainsbury, D. Shepherd, G.M. Gillespie, K. Harlos, D.I. Stuart, R. Owens, E.Y. Jones, Some lessons from the systematic production and structural analysis of soluble (alpha)(beta) T-cell receptors, J. Immunol. Methods 350 (2009) 14–21. [23] W. Shen, S. Yun, B. Tam, K. Dalal, F.F. Pio, Target selection of soluble protein complexes for structural proteomics studies, Proteome. Sci. 3 (2005) 3. [24] J. Finkelstein, E. Antony, M.M. Hingorani, M. O’Donnell, Overproduction and analysis of eukaryotic multiprotein complexes in Escherichia coli using a dualvector strategy, Anal. Biochem. 319 (2003) 78–87. [25] H.E. Chadd, S.M. Chamow, Therapeutic antibody expression technology, Curr. Opin. Biotechnol. 12 (2001) 188–194. [26] J.R. Birch, A.J. Racher, Antibody production, Adv. Drug Deliv. Rev. 58 (2006) 671–685. [27] D.E. Mossakowska, Expression of nuclear hormone receptors in Escherichia coli, Curr. Opin. Biotechnol. 9 (1998) 502–505.

11

[28] T.H. Bugge, J. Pohl, O. Lonnoy, H.G. Stunnenberg, RXR alpha, a promiscuous partner of retinoic acid and thyroid hormone receptors, EMBO J. 11 (1992) 1409–1418. [29] D.J. Mangelsdorf, R.M. Evans, The RXR heterodimers and orphan receptors, Cell 83 (1995) 841–850. [30] C. Leo, J.D. Chen, The SRC family of nuclear receptor coactivators, Gene 245 (2000) 1–11. [31] C. Li, J.W. Schwabe, E. Banayo, R.M. Evans, Coexpression of nuclear receptor partners increases their solubility and biological activities, Proc. Natl. Acad. Sci. USA 94 (1997) 2278–2283. [32] C. Romier, J.M. Ben, S. Albeck, G. Buchwald, D. Busso, P.H. Celie, E. Christodoulou, M. De, V.G.S. van, P. Knipscheer, J.H. Lebbink, V. Notenboom, A. Poterszman, N. Rochel, S.X. Cohen, T. Unger, J.L. Sussman, D. Moras, T.K. Sixma, A. Perrakis, Co-expression of protein complexes in prokaryotic and eukaryotic hosts: experimental procedures, database tracking and case studies, Acta Crystallogr. D. Biol. Crystallogr. 62 (2006) 1232–1242. [33] R.T. Gampe Jr., V.G. Montana, M.H. Lambert, A.B. Miller, R.K. Bledsoe, M.V. Milburn, S.A. Kliewer, T.M. Willson, H.E. Xu, Asymmetry in the PPARgamma/ RXRalpha crystal structure reveals the molecular basis of heterodimerization among nuclear receptors, Mol. Cell 5 (2000) 545–555. [34] L.D. Graham, P.A. Pilling, R.E. Eaton, J.J. Gorman, C. Braybrook, G.N. Hannan, A. Pawlak-Skrzecz, L. Noyce, G.O. Lovrecz, L. Lu, R.J. Hill, Purification and characterization of recombinant ligand-binding domains from the ecdysone receptors of four pest insects, Protein Expr. Purif. 53 (2007) 309–324. [35] L. Shan, J. Vincent, J.S. Brunzelle, I. Dussault, M. Lin, I. Ianculescu, M.A. Sherman, B.M. Forman, E.J. Fernandez, Structure of the murine constitutive androstane receptor complexed to androstenol: a molecular basis for inverse agonism, Mol. Cell 16 (2004) 907–917. [36] R.E. Watkins, G.B. Wisely, L.B. Moore, J.L. Collins, M.H. Lambert, S.P. Williams, T.M. Willson, S.A. Kliewer, M.R. Redinbo, The human nuclear xenobiotic receptor PXR: structural determinants of directed promiscuity, Science 292 (2001) 2329–2333. [37] S.A. Jones, L.B. Moore, G.B. Wisely, S.A. Kliewer, Use of in vitro pregnane X receptor assays to assess CYP3A4 induction potential of drug candidates, Methods Enzymol. 357 (2002) 161–170. [38] R.T. Nolte, G.B. Wisely, S. Westin, J.E. Cobb, M.H. Lambert, R. Kurokawa, M.G. Rosenfeld, T.M. Willson, C.K. Glass, M.V. Milburn, Ligand binding and coactivator assembly of the peroxisome proliferator-activated receptor-gamma, Nature 395 (1998) 137–143. [39] W. Wang, W.W. Prosise, J. Chen, S.S. Taremi, H.V. Le, V. Madison, X. Cui, A. Thomas, K.C. Cheng, C.A. Lesburg, Construction and characterization of a fully active PXR/SRC-1 tethered protein with increased stability, Protein Eng Des Sel 21 (2008) 425–433. [40] M.M. Matzuk, C.M. Kornmeier, G.K. Whitfield, I.A. Kourides, I. Boime, The glycoprotein alpha-subunit is critical for secretion and stability of the human thyrotropin beta-subunit, Mol. Endocrinol. 2 (1988) 95–100. [41] C.L. Corless, M.M. Matzuk, T.V. Ramabhadran, A. Krichevsky, I. Boime, Gonadotropin beta subunits determine the rate of assembly and the oligosaccharide processing of hormone dimer in transfected cells, J. Cell Biol. 104 (1987) 1173–1181. [42] S. Legardinier, M. Duonor-Cerutti, G. Devauchelle, Y. Combarnous, C. Cahoreau, Biological activities of recombinant equine luteinizing hormone/ chorionic gonadotropin (eLH/CG) expressed in Sf9 and Mimic insect cell lines, J. Mol. Endocrinol. 34 (2005) 47–60. [43] M. Chopineau, N. Martinat, C. Troispoux, H. Marichatou, Y. Combarnous, F. Stewart, F. Guillou, Expression of horse and donkey LH in COS-7 cells: evidence for low FSH activity in donkey LH compared with horse LH, J. Endocrinol. 152 (1997) 371–377. [44] S. Legardinier, J.C. Poirier, D. Klett, Y. Combarnous, C. Cahoreau, Stability and biological activities of heterodimeric and single-chain equine LH/chorionic gonadotropin variants, J. Mol. Endocrinol. 40 (2008) 185–198. [45] D. Ben-Menahem, I. Boime, Converting heterodimeric gonadotropins to genetically linked single chains: new approaches to structure activity relationships and analogue design, Trends Endocrinol. Metab 7 (1996) 100– 105. [46] A. Jablonka-Shariff, J.F. Roser, G.R. Bousfield, M.W. Wolfe, L.E. Sibley, M. Colgin, I. Boime, Expression and bioactivity of a single chain recombinant equine luteinizing hormone (reLH), Theriogenology 67 (2007) 311–320. [47] H. Kasuto, B. Levavi-Sivan, Production of biologically active tethered tilapia LHbetaalpha by the methylotrophic yeast Pichia pastoris, Gen. Comp Endocrinol. 140 (2005) 222–232. [48] G.B. Fralish, P. Narayan, D. Puett, Consequences of single-chain translation on the structures of two chorionic gonadotropin yoked analogs in alpha-beta and beta-alpha configurations, Mol. Endocrinol. 17 (2003) 757–767. [49] G.B. Fralish, P. Narayan, D. Puett, High-level expression of a functional singlechain human chorionic gonadotropin-luteinizing hormone receptor ectodomain complex in insect cells, Endocrinology 142 (2001) 1517–1524. [50] D.H. Fremont, W.A. Rees, H. Kozono, Biophysical studies of T-cell receptors and their ligands, Curr. Opin. Immunol. 8 (1996) 93–100. [51] R.K. Strong, D.M. Penny, R.M. Feldman, L.P. Weiner, J.J. Boniface, M.M. Davis, P.J. Bjorkman, Engineering and expression of a secreted murine TCR with reduced N-linked glycosylation, J. Immunol. 153 (1994) 4111–4121. [52] F. Davodeau, I. Houde, G. Boulot, F. Romagne, A. Necker, N. Canavo, M.A. Peyrat, M.M. Hallet, H. Vie, Y. Jacques, Secretion of disulfide-linked human Tcell receptor gamma delta heterodimers, J. Biol. Chem. 268 (1993) 15455– 15460.

12

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

[53] J. Kappler, J. White, H. Kozono, J. Clements, P. Marrack, Binding of a soluble alpha beta T-cell receptor to superantigen/major histocompatibility complex ligands, Proc. Natl. Acad. Sci. USA 91 (1994) 8462–8466. [54] N. Manolios, J.S. Bonifacino, R.D. Klausner, Transmembrane helical interactions and the assembly of the T cell receptor complex, Science 249 (1990) 274–277. [55] H.C. Chang, Z. Bao, Y. Yao, A.G. Tse, E.C. Goyarts, M. Madsen, E. Kawasaki, P.P. Brauer, J.C. Sacchettini, S.G. Nathenson, A general method for facilitating heterodimeric pairing between two proteins: application to expression of alpha and beta T-cell receptor extracellular segments, Proc. Natl. Acad. Sci. USA 91 (1994) 11408–11412. [56] E.K. O’Shea, R. Rutkowski, W.F. Stafford III, P.S. Kim, Preferential heterodimer formation by isolated leucine zippers from fos and jun, Science 245 (1989) 646–648. [57] A. Golden, S.S. Khandekar, M.S. Osburne, E. Kawasaki, E.L. Reinherz, T.H. Grossman, High-level production of a secreted, Heterodimeric alpha beta murine T-cell receptor in Escherichia coli, J. Immunol. Methods 206 (1997) 163–169. [58] A. Kalandadze, M. Galleno, L. Foncerrada, J.L. Strominger, K.W. Wucherpfennig, Expression of recombinant HLA-DR2 molecules. Replacement of the hydrophobic transmembrane region by a leucine zipper dimerization motif allows the assembly and secretion of soluble DR alpha beta heterodimers, J. Biol. Chem. 271 (1996) 20156–20162. [59] C.A. Scott, K.C. Garcia, F.R. Carbone, I.A. Wilson, L. Teyton, Role of chain pairing for the production of functional soluble IA major histocompatibility complex class II molecules, J. Exp. Med. 183 (1996) 2087–2095. [60] S. Mikami, M. Masutani, N. Sonenberg, S. Yokoyama, H. Imataka, An efficient mammalian cell-free translation system supplemented with translation factors, Protein Expr. Purif. 46 (2006) 348–357. [61] J. Nika, W. Yang, G.D. Pavitt, A.G. Hinnebusch, E.M. Hannig, Purification and kinetic analysis of eIF2B from Saccharomyces cerevisiae, J. Biol. Chem. 275 (2000) 26011–26017. [62] S.S. Mohammad-Qureshi, R. Haddad, K.S. Palmer, J.P. Richardson, E. Gomez, G.D. Pavitt, Purification of FLAG-tagged eukaryotic initiation factor 2B complexes, subcomplexes, and fragments from Saccharomyces cerevisiae, Methods Enzymol. 431 (2007) 1–13. [63] I.R. Ghattas, J.R. Sanes, J.E. Majors, The encephalomyocarditis virus internal ribosome entry site allows efficient coexpression of two genes from a recombinant provirus in cultured cells and in embryos, Mol. Cell Biol. 11 (1991) 5848–5859. [64] Y.J. Chen, W.S. Chen, T.Y. Wu, Development of a bi-cistronic baculovirus expression vector by the Rhopalosiphum padi virus 5’ internal ribosome entry site, Biochem. Biophys. Res. Commun. 335 (2005) 616–623. [65] J.S. Huston, M. Mudgett-Hunter, M.S. Tai, J. McCartney, F. Warren, E. Haber, H. Oppermann, Protein engineering of single-chain Fv analogs and fusion proteins, Methods Enzymol. 203 (1991) 46–88. [66] M. Miller, M. Jaskolski, J.K. Rao, J. Leis, A. Wlodawer, Crystal structure of a retroviral protease proves relationship to aspartic protease family, Nature 337 (1989) 576–579. [67] D. Bizub, I.T. Weber, C.E. Cameron, J.P. Leis, A.M. Skalka, A range of catalytic efficiencies with avian retroviral protease subunits genetically linked to form single polypeptide chains, J. Biol. Chem. 266 (1991) 4951–4958. [68] C.R. Robinson, R.T. Sauer, Optimizing the stability of single-chain proteins by linker length and composition mutagenesis, Proc. Natl. Acad. Sci. USA 95 (1998) 5929–5934. [69] S.H. Kim, C.H. Kang, R. Kim, J.M. Cho, Y.B. Lee, T.K. Lee, Redesigning a sweet protein: increased stability and renaturability, Protein Eng 2 (1989) 571– 575. [70] P. de Felipe, G.A. Luke, L.E. Hughes, D. Gani, C. Halpin, M.D. Ryan, E unum pluribus: multiple proteins from a self-processing polyprotein, Trends Biotechnol. 24 (2006) 68–75. [71] P.J. Chaplin, E.B. Camon, B. Villarreal-Ramos, M. Flint, M.D. Ryan, R.A. Collins, Production of interleukin-12 as a self-processing 2A polypeptide, J. Interferon Cytokine Res. 19 (1999) 235–241. [72] N.H. Tolia, L. Joshua-Tor, Strategies for protein coexpression in Escherichia coli, Nat. Methods 3 (2006) 55–64. [73] H. Mizuguchi, Z. Xu, A. Ishii-Watabe, E. Uchida, T. Hayakawa, IRES-dependent second gene expression is significantly lower than cap-dependent first gene expression in a bicistronic vector, Mol. Ther. 1 (2000) 376–382. [74] K.J. Kim, H.E. Kim, K.H. Lee, W. Han, M.J. Yi, J. Jeong, B.H. Oh, Two-promoter vector is highly efficient for overproduction of protein complexes, Protein Sci. 13 (2004) 1698–1703. [75] S. Schlatter, S.H. Stansfield, D.M. Dinnis, A.J. Racher, J.R. Birch, D.C. James, On the optimal ratio of heavy to light chain genes for efficient recombinant antibody production by CHO cells, Biotechnol. Prog. 21 (2005) 122–133. [76] C. Bieniossek, Y. Nie, D. Frey, N. Olieric, C. Schaffitzel, I. Collinson, C. Romier, P. Berger, T.J. Richmond, M.O. Steinmetz, I. Berger, Automated unrestricted multigene recombineering for multiprotein complex production, Nat. Methods 6 (2009) 447–450. [77] C. Scheich, D. Kummel, D. Soumailakakis, U. Heinemann, K. Bussow, Vectors for co-expression of an unrestricted number of proteins, Nucleic Acids Res 35 (2007) e43. [78] D.L. Cheo, S.A. Titus, D.R. Byrd, J.L. Hartley, G.F. Temple, M.A. Brasch, Concerted assembly and cloning of multiple DNA segments using in vitro site-specific recombination: functional analysis of multi-segment expression clones, Genome Res. 14 (2004) 2111–2120.

[79] Y. Sasaki, T. Sone, K. Yahata, H. Kishine, J. Hotta, J.D. Chesnut, T. Honda, F. Imamoto, Multi-gene gateway clone design for expression of multiple heterologous genes in living cells: eukaryotic clones containing two and three ORF multi-gene cassettes expressed from a single promoter, J. Biotechnol. 136 (2008) 103–112. [80] F.D. Schubot, D.S. Waugh, A pivotal role for reductive methylation in the de novo crystallization of a ternary complex composed of Yersinia pestis virulence factors YopN, SycN and YscB, Acta Crystallogr. D. Biol. Crystallogr. 60 (2004) 1981–1986. [81] S. Tan, A modular polycistronic expression system for overexpressing protein complexes in Escherichia coli, Protein Expr. Purif. 21 (2001) 224–234. [82] S. Tan, R.C. Kern, W. Selleck, The pST44 polycistronic expression system for producing protein complexes in Escherichia coli, Protein Expr. Purif. 40 (2005) 385–395. [83] I. Berger, D.J. Fitzgerald, T.J. Richmond, Baculovirus expression system for heterologous multiprotein complexes, Nat. Biotechnol. 22 (2004) 1583– 1587. [84] C. Bieniossek, T.J. Richmond, I. Berger, MultiBac: multigene baculovirus-based eukaryotic protein complex production, Curr. Protoc. Protein Sci., Chapter 5 (2008). Unit. [85] D.J. Fitzgerald, C. Schaffitzel, P. Berger, R. Wellinger, C. Bieniossek, T.J. Richmond, I. Berger, Multiprotein expression strategy for structural biology of eukaryotic complexes, Structure 15 (2007) 275–279. [86] S. Trowitzsch, C. Bieniossek, Y. Nie, F. Garzoni, I. Berger, New baculovirus expression tools for recombinant protein complex production, J. Struct. Biol., 2010, doi:10.1016/j.jsb.2010.02.010. [87] G. Selzer, T. Som, T. Itoh, J. Tomizawa, The origin of replication of plasmid p15A and comparative studies on the nucleotide sequences around the origin of related plasmids, Cell 32 (1983) 119–129. [88] W. Yang, L. Zhang, Z. Lu, W. Tao, Z. Zhai, A new method for protein coexpression in Escherichia coli using two incompatible plasmids, Protein Expr. Purif. 22 (2001) 472–478. [89] W.F. DeGrado, L. Regan, S.P. Ho, The design of a four-helix bundle protein, Cold Spring Harb. Symp. Quant. Biol. 52 (1987) 521–526. [90] K.M. Muller, K.M. Arndt, T. Alber, Protein fusions to coiled-coil domains, Methods Enzymol. 328 (2000) 261–282. [91] J.B. Ridgway, L.G. Presta, P. Carter, ‘Knobs-into-holes’ engineering of antibody CH3 domains for heavy chain heterodimerization, Protein Eng 9 (1996) 617– 621. [92] S.A. Richter, K. Stubenrauch, H. Lilie, R. Rudolph, Polyionic fusion peptides function as specific dimerization motifs, Protein Eng 14 (2001) 775–783. [93] M. Kleinschmidt, R. Rudolph, H. Lilie, Design of a modular immunotoxin connected by polyionic adapter peptides, J. Mol. Biol. 327 (2003) 445–452. [94] K.M. Arndt, K.M. Muller, A. Pluckthun, Helix-stabilized Fv (hsFv) antibody fragments: substituting the constant domains of a Fab fragment for a heterodimeric coiled-coil domain, J. Mol. Biol. 312 (2001) 221–228. [95] W. Selleck, I. Fortin, D. Sermwittayawong, J. Cote, S. Tan, The Saccharomyces cerevisiae Piccolo NuA4 histone acetyltransferase complex requires the Enhancer of Polycomb A domain and chromodomain to acetylate nucleosomes, Mol. Cell Biol. 25 (2005) 5535–5542. [96] P. Roy, M. Mikhailov, D.H. Bishop, Baculovirus multigene expression vectors and their use for understanding the assembly process of architecturally complex virus particles, Gene 190 (1997) 119–129. [97] H.J. Dyson, P.E. Wright, Intrinsically unstructured proteins and their functions, Nat. Rev. Mol. Cell Biol. 6 (2005) 197–208. [98] J. Prilusky, C.E. Felder, T. Zeev-Ben-Mordehai, E.H. Rydberg, O. Man, J.S. Beckmann, I. Silman, J.L. Sussman, FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded, Bioinformatics. 21 (2005) 3435–3438. [99] R. Linding, R.B. Russell, V. Neduva, T.J. Gibson, GlobPlot: exploring protein sequences for globularity and disorder, Nucleic Acids Res. 31 (2003) 3701– 3708. [100] M. Sickmeier, J.A. Hamilton, T. LeGall, V. Vacic, M.S. Cortese, A. Tantos, B. Szabo, P. Tompa, J. Chen, V.N. Uversky, Z. Obradovic, A.K. Dunker, DisProt: the database of disordered proteins, Nucleic Acids Res. 35 (2007) D786–D793. [101] A. Alexandrov, M. Vignali, D.J. LaCount, E. Quartley, V.C. de, R.D. De, J. Babulski, S.F. Mitchell, L.W. Schoenfeld, S. Fields, W.G. Hol, M.E. Dumont, E.M. Phizicky, E.J. Grayhack, A facile method for high-throughput co-expression of protein pairs, Mol. Cell Proteomics. 3 (2004) 934–938. [102] E. Domingues, T. Brillet, C. Vasseur, V. Agier, M.C. Marden, V. Baudin-Creuza, Construction of a new polycistronic vector for over-expression and rapid purification of human hemoglobin, Plasmid 61 (2009) 71–77. [103] J.L. Yates, N. Warren, B. Sugden, Stable replication of plasmids derived from Epstein-Barr virus in various mammalian cells, Nature 313 (1985) 812–815. [104] C.K. Van, P. Vanhoenacker, G. Haegeman, Episomal vectors for gene expression in mammalian cells, Eur. J. Biochem. 267 (2000) 5665–5678. [105] M.J. Scott, S.S. Modha, A.D. Rhodes, N.M. Broadway, P.I. Hardwicke, H.J. Zhao, K.M. Kennedy-Wilson, S.M. Sweitzer, S.L. Martin, Efficient expression of secreted proteases via recombinant BacMam virus, Protein Expr. Purif. 52 (2007) 104–116. [106] A. Dukkipati, H.H. Park, D. Waghray, S. Fischer, K.C. Garcia, BacMam system for high-level expression of recombinant soluble and membrane glycoproteins for structural studies, Protein Expr. Purif. 62 (2008) 160–170. [107] L. Baldi, D.L. Hacker, M. Adam, F.M. Wurm, Recombinant protein production by large-scale transient gene expression in mammalian cells: state of the art and future perspectives, Biotechnol. Lett. 29 (2007) 677–684.

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14 [108] L. Shan, L. Wang, J. Yin, P. Zhong, J. Zhong, An OriP/EBNA-1-based baculovirus vector with prolonged and enhanced transgene expression, J. Gene Med. 8 (2006) 1400–1406. [109] F. Kreppel, S. Kochanek, Long-term transgene expression in proliferating cells mediated by episomally maintained high-capacity adenovirus vectors, J. Virol. 78 (2004) 9–22. [110] R. Noad, P. Roy, Virus-like particles as immunogens, Trends Microbiol. 11 (2003) 438–444. [111] A. Szarewski, HPV vaccine: Cervarix, Expert. Opin. Biol. Ther. 10 (2010) 477– 487. [112] J. Li, M.J. Staver, M.L. Curtin, J.H. Holms, R.R. Frey, R. Edalji, R. Smith, M.R. Michaelides, S.K. Davidsen, K.B. Glaser, Expression and functional characterization of recombinant human HDAC1 and HDAC3, Life Sci. 74 (2004) 2693–2705. [113] S. Ammanamanchi, J.W. Freeman, M.G. Brattain, Acetylated sp3 is a transcriptional activator, J. Biol. Chem. 278 (2003) 35775–35780. [114] J.B. Huppa, H.L. Ploegh, In vitro translation and assembly of a complete T cell receptor-CD3 complex, J. Exp. Med. 186 (1997) 393–403. [115] S. Rungpragayphan, T. Yamane, H. Nakano, SIMPLEX: single-molecule PCRlinked in vitro expression: a novel method for high-throughput construction and screening of protein libraries, Methods Mol. Biol. 375 (2007) 79–94. [116] M.C. Jewett, A. Voloshin, J.R. Swartz, Prokaryotic systems for in vitro expression, in: M.P. Weiner, Q. Lu (Eds.), Gene Cloning and Expression Technologies, Westborough, MA, 2002, pp. 391–411. [117] A.M. Hassell, G. An, R.K. Bledsoe, J.M. Bynum, H.L. Carter III, S.J. Deng, R.T. Gampe, T.E. Grisard, K.P. Madauss, R.T. Nolte, W.J. Rocque, L. Wang, K.L. Weaver, S.P. Williams, G.B. Wisely, R. Xu, L.M. Shewchuk, Crystallization of protein–ligand complexes, Acta Crystallogr. D. Biol. Crystallogr. 63 (2007) 72–79. [118] R.K. Bledsoe, V.G. Montana, T.B. Stanley, C.J. Delves, C.J. Apolito, D.D. McKee, T.G. Consler, D.J. Parks, E.L. Stewart, T.M. Willson, M.H. Lambert, J.T. Moore, K.H. Pearce, H.E. Xu, Crystal structure of the glucocorticoid receptor ligand binding domain reveals a novel mode of receptor dimerization and coactivator recognition, Cell 110 (2002) 93–105. [119] R.K. Bledsoe, K.P. Madauss, J.A. Holt, C.J. Apolito, M.H. Lambert, K.H. Pearce, T.B. Stanley, E.L. Stewart, R.P. Trump, T.M. Willson, S.P. Williams, A ligandmediated hydrogen bond network required for the activation of the mineralocorticoid receptor, J. Biol. Chem. 280 (2005) 31283–31293. [120] S.P. Williams, P.B. Sigler, Atomic structure of progesterone complexed with its receptor, Nature 393 (1998) 392–396. [121] C. Mackintosh, Dynamic interactions between 14-3-3 proteins and phosphoproteins regulate diverse cellular processes, Biochem. J. 381 (2004) 329–342. [122] T. Obsil, R. Ghirlando, D.C. Klein, S. Ganguly, F. Dyda, Crystal structure of the 14-3-3zeta:serotonin N-acetyltransferase complex. a role for scaffolding in enzyme regulation, Cell 105 (2001) 257–267. [123] D.M. Bustos, A.A. Iglesias, Intrinsic disorder is a key characteristic in partners that bind 14-3-3 proteins, Proteins 63 (2006) 35–42. [124] H. Sakahira, M. Enari, S. Nagata, Functional differences of two forms of the inhibitor of caspase-activated DNase, ICAD-L, and ICAD-S, J. Biol. Chem. 274 (1999) 15740–15744. [125] L. Salwinski, C.S. Miller, A.J. Smith, F.K. Pettit, J.U. Bowie, D. Eisenberg, The Database of Interacting Proteins: 2004 update, Nucleic Acids Res. 32 (2004) D449–D451. [126] A. Ceol, A.A. Chatr, L. Licata, D. Peluso, L. Briganti, L. Perfetto, L. Castagnoli, G. Cesareni, MINT, the molecular interaction database: 2009 update, Nucleic Acids Res. 38 (2010) D532–D539. [127] B. Aranda, P. Achuthan, Y. am-Faruque, I. Armean, A. Bridge, C. Derow, M. Feuermann, A.T. Ghanbarian, S. Kerrien, J. Khadake, J. Kerssemakers, C. Leroy, M. Menden, M. Michaut, L. Montecchi-Palazzi, S.N. Neuhauser, S. Orchard, V. Perreau, B. Roechert, E.K. van, H. Hermjakob, The IntAct molecular interaction database in 2010, Nucleic Acids Res. 38 (2010) D525–D531. [128] B.J. Breitkreutz, C. Stark, T. Reguly, L. Boucher, A. Breitkreutz, M. Livstone, R. Oughtred, D.H. Lackner, J. Bahler, V. Wood, K. Dolinski, M. Tyers, The BioGRID Interaction Database: 2008 update, Nucleic Acids Res. 36 (2008) D637– D640. [129] P. Pagel, S. Kovac, M. Oesterheld, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, P. Mark, V. Stumpflen, H.W. Mewes, A. Ruepp, D. Frishman, The MIPS mammalian protein–protein interaction database, Bioinformatics. 21 (2005) 832–834. [130] I. Vastrik, P. D’Eustachio, E. Schmidt, G. Gopinath, D. Croft, B.B. de, M. Gillespie, B. Jassal, S. Lewis, L. Matthews, G. Wu, E. Birney, L. Stein, Reactome: a knowledge base of biologic pathways and processes, Genome Biol. 8 (2007) R39. [131] L. Matthews, G. Gopinath, M. Gillespie, M. Caudy, D. Croft, B.B. de, P. Garapati, J. Hemish, H. Hermjakob, B. Jassal, A. Kanapin, S. Lewis, S. Mahajan, B. May, E. Schmidt, I. Vastrik, G. Wu, E. Birney, L. Stein, P. D’Eustachio, Reactome knowledgebase of human biological pathways and processes, Nucleic Acids Res. 37 (2009) D619–D622. [132] T.S. Keshava Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen, A. Venugopal, L. Balakrishnan, A. Marimuthu, S. Banerjee, D.S. Somanathan, A. Sebastian, S. Rani, S. Ray, C.J. Harrys Kishore, S. Kanth, M. Ahmed, M.K. Kashyap, R. Mohmood, Y.L. Ramachandra, V. Krishna, B.A. Rahiman, S. Mohan, P. Ranganathan, S. Ramabadran, R. Chaerkady, A. Pandey, Human protein reference database–2009 update, Nucleic Acids Res. 37 (2009) D767–D772.

13

[133] K.R. Brown, I. Jurisica, Online predicted human interaction database, Bioinformatics. 21 (2005) 2076–2082. [134] M.D. McDowall, M.S. Scott, G.J. Barton, PIPs: human protein–protein interaction prediction database, Nucleic Acids Res. 37 (2009) D651–D656. [135] K. Xia, D. Dong, J.D. Han, IntNetDB v1.0: an integrated protein–protein interaction network database generated by a probabilistic model, BMC, Bioinformatics. 7 (2006) 508. [136] L.J. Jensen, M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T. Doerks, P. Julien, A. Roth, M. Simonovic, P. Bork, M.C. von, STRING 8 – a global view on proteins and their functional interactions in 630 organisms, Nucleic Acids Res. 37 (2009) D412–D416. [137] J.Y. Chen, S. Mamidipalli, T. Huan, HAPPI: an online database of comprehensive human annotated and predicted protein interactions, BMC, Genomics 10 (Suppl 1) (2009) S16. [138] Peter M.J. Burgers, Overexpression of multisubunit replication factors in yeast, Method: A companion to Methods in Enzymology 18 (1999) 349–355. [139] A.S. Belyaev, P. Roy, Development of baculovirus triple and quadruple expression vectors: co-expression of three or four bluetongue virus proteins and the synthesis of bluetongue virus-like particles in insect cells, Nucleic Acids Res. 21 (1993) 1219–1223. [140] A. Hanzlowsky, B. Jelencic, G. Jawdekar, C.S. Hinkley, J.H. Geiger, R.W. Henry, Co-expression of multiple subunits enables recombinant SNAPC assembly and function for transcription by human RNA polymerases II and III, Protein Expr. Purif. 48 (2006) 215–223. [141] M. Nekrasov, B. Wild, J. Muller, Nucleosome binding and histone methyltransferase activity of Drosophila PRC2, EMBO Rep. 6 (2005) 348– 353. [142] J.M. Studts, K.H. Mitchell, J.D. Pikus, K. McClay, R.J. Steffan, B.G. Fox, Optimized expression and purification of toluene 4-monooxygenase hydroxylase, Protein Expr. Purif. 20 (2000) 58–65. [143] A. Hierro, J. Kim, J.H. Hurley, Polycistronic expression and purification of the ESCRT-II endosomal trafficking complex, Methods Enzymol. 403 (2005) 322– 332. [144] D. Neumann, M. Suter, R. Tuerk, U. Riek, T. Wallimann, Co-expression of LKB1, MO25alpha and STRADalpha in bacteria yield the functional and active heterotrimeric complex, Mol. Biotechnol. 36 (2007) 220–231. [145] R.B. Kirkpatrick, P.J. McDevitt, R.E. Matico, S. Nwagwu, S.H. Trulli, J. Mao, D.D. Moore, A.F. Yorke, M.M. McLaughlin, K.A. Knecht, L.C. Elefante, A.S. Calamari, J.A. Fornwald, J.J. Trill, Z.L. Jonak, J. Kane, P.S. Patel, G.M. Sathe, A.R. Shatzman, P.M. Tapley, K.O. Johanson, A bicistronic expression system for bacterial production of authentic human interleukin-18, Protein Expr. Purif. 27 (2003) 279–292. [146] A.Y. Le Feuvre, C. ntas-Barbosa, V. Baldin, O. Coux, High yield bacterial expression and purification of active recombinant PA28alphabeta complex, Protein Expr. Purif. 64 (2009) 219–224. [147] S. Svensson, T. Ostberg, M. Jacobsson, C. Norstrom, K. Stefansson, D. Hallen, I.C. Johansson, K. Zachrisson, D. Ogg, L. Jendeberg, Crystal structure of the heterodimeric complex of LXRalpha and RXRbeta ligand-binding domains in a fully agonistic conformation, EMBO J. 22 (2003) 4625–4633. [148] R. Kirnbauer, J. Taub, H. Greenstone, R. Roden, M. Durst, L. Gissmann, D.R. Lowy, J.T. Schiller, Efficient self-assembly of human papillomavirus type 16 L1 and L1–L2 into virus-like particles, J. Virol. 67 (1993) 6929–6936. [149] M.E. Hagensee, N. Yaegashi, D.A. Galloway, Self-assembly of human papillomavirus type 1 capsids by expression of the L1 protein alone or by coexpression of the L1 and L2 capsid proteins, J. Virol. 67 (1993) 315–322. [150] W.J. Chen, J.F. Moomaw, L. Overton, T.A. Kost, P.J. Casey, High level expression of mammalian protein farnesyltransferase in a baculovirus system. The purified protein contains zinc, J. Biol. Chem. 268 (1993) 9675–9680. [151] C. Bieniossek, I. Berger, Towards eukaryotic structural complexomics, J. Struct. Funct. Genomics 10 (2009) 37–46. [152] A. Bertolotti-Ciarlet, M. Ciarlet, S.E. Crawford, M.E. Conner, M.K. Estes, Immunogenicity and protective efficacy of rotavirus 2/6-virus-like particles produced by a dual baculovirus expression vector and administered intramuscularly, Intranasally, or orally to mice, Vaccine 21 (2003) 3885– 3900. [153] J. Bramson, M. Hitt, W.S. Gallichan, K.L. Rosenthal, J. Gauldie, F.L. Graham, Construction of a double recombinant adenovirus vector expressing a heterodimeric cytokine: in vitro and in vivo production of biologically active interleukin-12, Hum. Gene Ther. 7 (1996) 333–342. [154] T. Kokuho, S. Watanabe, Y. Yokomizo, S. Inumaru, Production of biologically active, heterodimeric porcine interleukin-12 using a monocistronic baculoviral expression system, Vet. Immunol. Immunopathol. 72 (1999) 289–302. [155] O.K. Dzivenu, H.H. Park, H. Wu, General co-expression vectors for the overexpression of heterodimeric protein complexes in Escherichia coli, Protein Expr. Purif. 38 (2004) 1–8. [156] P.K. Chanda, W.A. Edris, J.D. Kennedy, A set of ligation-independent expression vectors for co-expression of proteins in Escherichia coli, Protein Expr. Purif. 47 (2006) 217–224. [157] A. Clements, K. Johnston, J.M. Mazzarelli, R.P. Ricciardi, R. Marmorstein, Oligomerization properties of the viral oncoproteins adenovirus E1A and human papillomavirus E7 and their complexes with the retinoblastoma protein, Biochemistry 39 (2000) 16033–16045. [158] K.D. Ridge, S.S. Lee, N.G. Abdulaev, Examining rhodopsin folding and assembly through expression of polypeptide fragments, J. Biol. Chem. 271 (1996) 7860–7867.

14

J.J. Kerrigan et al. / Protein Expression and Purification 75 (2011) 1–14

[159] S. Nottebaum, L. Tan, D. Trzaska, H.C. Carney, R.O. Weinzierl, The RNA polymerase factory: a robotic in vitro assembly platform for high-throughput production of recombinant protein complexes, Nucleic Acids Res. 36 (2008) 245–252. [160] R. Zuniga, S. Sengupta, C. Snyder, O. Leon, M.J. Roth, Expression of the Cterminus of HIV-1 reverse transcriptase p66 and p51 subunits as a single polypeptide with RNase H activity, Protein Eng. Des. Sel. 17 (2004) 581– 587. [161] K. Nakajima, T. Asakura, J. Maruyama, Y. Morita, H. Oike, A. Shimizu-Ibuka, T. Misaka, H. Sorimachi, S. Arai, K. Kitamoto, K. Abe, Extracellular production of neoculin, a sweet-tasting heterodimeric protein with taste-modifying activity, by Aspergillus oryzae, Appl. Environ. Microbiol. 72 (2006) 3716– 3723. [162] L. Stols, M. Zhou, W.H. Eschenfeldt, C.S. Millard, J. Abdullah, F.R. Collart, Y. Kim, M.I. Donnelly, New vectors for co-expression of proteins: structure of Bacillus subtilis ScoAB obtained by high-throughput protocols, Protein Expr. Purif. 53 (2007) 396–403. [163] K. Johnston, A. Clements, R.N. Venkataramani, R.C. Trievel, R. Marmorstein, Coexpression of proteins in bacteria using T7-based expression plasmids: expression of heteromeric cell-cycle and transcriptional regulatory complexes, Protein Expr. Purif. 20 (2000) 435–443. [164] S. Fribourg, C. Romier, S. Werten, Y.G. Gangloff, A. Poterszman, D. Moras, Dissecting the interaction network of multiprotein complexes by pairwise coexpression of subunits in E. coli, J. Mol. Biol. 306 (2001) 363–373.

[165] G.D. Gupta, R.D. Makde, R.P. Kamdar, J.S. D’Souza, M.G. Kulkarni, V. Kumar, B.J. Rao, Co-expressed recombinant human Translin–Trax complex binds DNA, FEBS Lett. 579 (2005) 3141–3146. [166] J. Maynard, E.J. Adams, M. Krogsgaard, K. Petersson, C.W. Liu, K.C. Garcia, High-level bacterial secretion of single-chain alphabeta T-cell receptors, J. Immunol. Methods 306 (2005) 51–67. [167] S.S. Taremi, B. Beyer, M. Maher, N. Yao, W. Prosise, P.C. Weber, B.A. Malcolm, Construction, expression, and characterization of a novel fully activated recombinant single-chain hepatitis C virus protease, Protein Sci. 7 (1998) 2143–2149. [168] D.L. Sali, R. Ingram, M. Wendel, D. Gupta, C. McNemar, A. Tsarbopoulos, J.W. Chen, Z. Hong, R. Chase, C. Risano, R. Zhang, N. Yao, A.D. Kwong, L. Ramanathan, H.V. Le, P.C. Weber, Serine protease of hepatitis C virus expressed in insect cells as the NS3/4A complex, Biochemistry 37 (1998) 3392–3401. [169] V. Garcia-Campayo, A. Sato, B. Hirsch, T. Sugahara, M. Muyan, A.J. Hsueh, I. Boime, Design of stable biologically active recombinant lutropin analogs, Nat. Biotechnol. 15 (1997) 663–667. [170] Y.H. Sung, J. Shin, H.J. Chang, J.M. Cho, W. Lee, Solution structure, backbone dynamics, and stability of a double mutant single-chain monellin. Structural origin of sweetness, J. Biol. Chem. 276 (2001) 19624–19630. [171] R.A. Hallewell, I. Laria, A. Tabrizi, G. Carlin, E.D. Getzoff, J.A. Tainer, L.S. Cousens, G.T. Mullenbach, Genetically engineered polymers of human CuZn superoxide dismutase. Biochemistry and serum half-lives, J. Biol. Chem. 264 (1989) 5260–5268.