CURRENT
Cloning by function: expression cloning in mammalian cells Henrik Simonsen and Harvey F. Lodish
In cloning the cDNA encoding a mammalian protein, the most important decision is the selection of a suitable strategy. Until recently, cloning required prior purification of the protein, and the corresponding cDNA was isolated by one of two approaches: (1) Micro-sequencing of all or parts of the purified protein was used to generate nucleic acid probes for screening of cDNA libraries by hybridization or polymerase chain reaction @‘CR). (2) The purified protein was used to generate antisera used in immune screening of phage lambda cDNA expression libraries. Both methods depended on identification of DNA sequences, rather than on functional proteins. Generally, the isolated clones were not full length, and hence not immediately available for functional studies. In addition, the clones isolated may have encoded an irrelevant protein that, fortuitously, shared some amino acid sequences with the desired protein. The first generally applicable approach for functional cloning of mammalian receptor, channel, and transport proteins involved microinjection of mRNA into oocytes from the frog Xenopus lnevis. The biological propc’rties of the clones could then be analysed by electrophysiological techniques. This method relied on the ability of the oocytes to correctly process the protein to the cell surface and to transduce the intracellular signals generated by it. Other limitations of this approach included the facts that hundreds of eggs must be individually microinjected and analysed in order to isolate pools of cDNAs that encode the desired protein, and that the eggs display seasonal variations in their properties. The procedure, nevertheless, has been quite successful for the isolation of functional EDNA clones’.
01994,
ElsevierScienceLtd
In recent years, overexpression of cDNA libraries in mammalian host cells has become an extremely powerful method for isolating cDNAs encoding novel proteins using functional assays. Mammalian cells are essential for cloning in situations where appropriate post-translational processing of the target molecule, such as glycosylation or expression on the cell surface, is required for function. These cloning systems involve transient expression of excep tionally high levels of protein, leading to manageable signal-to-noise ratios. Host cells and cloning vectors All shuttle vectors used for transient overexpression in mammalian cells contain four basic elements: an origin for DNA replication in bacterial cells, a genetic marker for selection of plasmids in bacteria, a constitutively active mammalian promoter in front of the cloning site into which the cDNA library is inserted, and an origin of replication of a mammalian virus (Fig. la). The matching mammalian host cells have been engineered to constitutively express the corresponding viral nuclear antigen that catalyses replication of DNAs containing the origin of viral replication (Table 1). COS cells, the most frequently used hosts, are modified monkey kidney epithelial cells that constitutively express the simian virus 40 (SV40) large T nuclear antigen. Large T antigen causes high level replication of transfected plasmids containing the SV40 origin of replication. Two systems, the SV40 large T antigen boost system2 and the EBO-pcD-XN vector3 deliver the nuclear antigen along with the cDNA to be screened to the host cell, thus obviating the need for a particular cell line constitutively expressing viral nuclear antigen. This high replication leve! has two important consequences:
0165-6147/94/$07.00
(1) The large number of plasmids in the transfected cells allows the re covery of the plasmids from selected positive COS cells. (2)A correspondingly largeamount of the protein encoded by the inserted cDNA (typically about 106 molecules per cell) is synthesized. The large overexpression of pre teins encoded by the insert cDNAs imparts a transient nature on the system, since the majority of the biosynthetic apparatus in the transfected cell is diverted to the production of the heterologous protein. Optimally, the host cells are used in a screening assay 48-72 h post-transfection. Proteins cloned in mammalian cells by overexpression Secreted proteins A large variety of molecules have been isolated by mammalian expression cloning (Table 2). The system was first used by Wong and coworkers to isolate cDNAs encoding the secreted cytokine granulocytemacrophage colony-stimulating factor (GM-CSF)(Ref.4): COS cells were transfected with pools of cDNAs prepared from a cell line expressing GM-CSF activity, and aliquots of the medium from the transfected COS cells were used in assays for colony formation by suitable responding cells. Positive pools of cDNA were maintained in bacteria, subdivided, and re-screened until individual clones encoding GM-CSF were isolated. Subsequently, many secreted growth factors have been isolated by this approach.
Cell-surfacemolecules A refinement to the method involved combining overexpression of cell-surface proteins in COS cells with an immunoselection procedure called panningjs. Transfected COS cells expressing the desired cellsurface antigen were specifically retained on dishes coated with the corresponding monoclonal antibody, and their content of plasmid DNA was extracted by the Hirt procedure:. These cDNAs were transformed into bacteria, and used in subsequent
TiPS-Decrmhw
1994 (Vol.151
TECHNi$UES
CURRENT
----
a
b
cDNA
cDNA
selection bacterial
library pools
ori
Vansfecthost cells . &I
UAAA UhAA
nuclear
mRNA
antigen
I
0
u
0
target
probe
subdivide positive pools or select positive cells re-screen
. .
.
1
hl
.
Fig. 1.Theprinciple of mammalian expression clonmg a: The shuttle vectors used in transrent overexpressmn contam an ongm of viral DNA rebkation. Xhen ::::s!ected !!!!c mammahan host cells exoressmg the matching vrral nuclear antigen. the shuttle vector IS replicated to high copy numbers. Canstituhve transcnptron from the promoter leads to overexpressron ot the protem encoded by the cDNA msen. whrch ennances Ifhe siynai detected by the screen mg assay. b: To Isolate cDNAs for the desired molecule. host cells are transfected with pools of cDNA. and the host cells are then subjected to a screenmg assay Plasmid cDNA pools giving nse to a posmve srgnal are maintamed in bactena. rteratlvely subdlvrded. and then re-screened by transfectron m the host cells. until eventually a smgle clone IS obtained Alternatrvely. plasmrds are extracted from transfected host cells selected for the expression Iof the desrred protem. amphfred m bacteria. and subjected to secondary rounds of screening until eventually a smgle clone IS obialned
rounds of panning until pure clones were isolated. Other methods for selecting positive cells, such as panning transfected cells on immobilized l&and’,” and fluorescence-activated cell sorting, have since been used to isolate cDNAs encoding cell-surface antigens, receptors”, and a fatty-acidtransport protein’“. Receytors Further enhancements have since been introduced”. Pools of cDNAs encoding the murine erythropoietin receptor were detected by adding radioiodinated erythropoietin to monolayers of transfected COS cells, incubating to allow hormone binding and endocytosis, and then counting the entire culture dish. This approach, while successful, suffered from a poor signal-to-noise ratio: the signal from a positive pool of 200 to
4
3
8
TIPS
- Dcrrmhw
IW4
(Vol.
15)
1000 cDNAs was about 15% above the nonspecific background. Furthermore, since the screening procedure analysed the signal from a pool rather than from an individual cell, care had to be exercised in the calculation of pool size because the detected signal would depend on the relative abundance of the erythropoietin receptor cDNA in a pool. The approach was modified by detecting the radioactive signal microscopically if1 si!lr using emulsion autoradiography’?. Using this procedure, individual positive cells, which express approximately 10” cell-surface receptors, were detected in the midst of thousands of negative cells. This signal enhancement allows the screening of very large pools of cDNAs, typically ICTclones per pool, depending on the transfection efficiency. Specifically, if a pool of l@
clones contains one positive cDNA, and if the efficiency of COS cell transfection is 20%.,then a dish of 3 X 10’ COS cells will contain about six positive cells that are visualized by ligand binding followed by autoradiography. Even if COS ceils endogenously express the desired receptor, it is still possible to clone it by expression in these cells. For instance, CO!3cells express on their surface approximately 5 X lOI receptors for transforming growth factor p (TCF-P), yet one can easily detect cells expressing lG+ TGF-P receptors above this background. This property was essential to the expression cloning of the types II and III TGF-P receptorW4. Intracellular proteins Other modifications of this technique have made use of immunostaining procedures to isolatecDNAs
CURRENT
Table 1. Host cells used in mammalian
TECHNIQUES
expression
cloning
Host cell system
Origin
Nuclear antigen
Vector copy aumber
CHOP
Chinese hamster ovary
Polyoma virus
n.a.
CHOH
Chinese hamster ovary
Hamster papova virus
n.a.
cos
Monkey kidney eptthelial
W-1
Simian virus 40
10000-100900
CV-l/EBNA
Monkey kidney epithelial
CV-1
Epstein-Barr vrrus
n.a.
KF3027
Murine El6 melanoma
Polyoma virus
n.a.
MOP
Murine NrH/3T3 fibroblasts
Polyoma virus
100e10000
WOP
Murine NIH/3T3 fibroblasts
Polyoma virus
1000-10000
293
Human fibroblasts
Adenovirus
n a.
SV40 T boost
Most mammalian cellsa
Simran viru: 40
na.
EBO-pcO-XN
Most mammalian ceils
Epstein-Barr virus
na.
“Stmlan virus 40 (SW01large T anugen-driven plasmid replrcation works poorly III murine cells n a not available
encoding intracellular antigens”, of uptake assays to clone transporter cDNAs (Refs 10,16,17) and of sensitive DNA gel-shift assays to clone the erythroid transcription factor DATA-1 (Ref. 181. A particularly clever modification to the method was the construction of a soluble CD27 receptor that was used as a probe in ‘reverse-binding’ assays for the isolation of the cDNA encoding its cell-associated ligandr’. Identifying a source of mRNA A typical mammalian cell contains about 101” protein molecules, encoded by a population of Ioh mI?NAs of 104-3 x 104different species. Rare messages are only present in a few copies per cell or, if transcription is cell-cycle dependent, on average in less than one copy per cell. Thus, in expression cloning, one frequently must efficiently screen millions of cDNA clones in an attempt to isolate the single desired one. Practical limitations in the screening procedures, therefore, make it very important to select a cell source that expresses high levels of the desired protein. It is often possible to induce specific messages by stimulation with phorbol estersP or with hormones, or by triggering cell differentiation. However, screening many different cell lines with an appropriate assay, and selecting the highest expressor, is generally the easiest way of enriching a desired mRNA. Tissues contain many differ-
ent cell types, only a few of which are expected to express the desired target. Thus, a cDNA library made from a tissue is generally not practical. It is often possible to derive an estimate of the relative abundance of the desired mRNA (and hence the number of clones to be screened) if the level of the corresponding protein can be determined. Abundantlv expressed receptors, such as the epidermal growth factor receptor, are typically present in 10; copies per cell surface, or approximately one part in 10s of total cell protein. Receptors expressed at an intermediate level, such as the insulin receptor, are present at l(r copies per cell surface. Only 103copies of the erythropoietin receptor arc expressed on the cell surface. Importantly, only l%, of cellular erythropoietin receptors are found on the plasma membrane; the remainder are in the endoplasmic reticulum, Golgi complex, or endosomes?‘. Thus, the level of the receptor, and presumably of the homologous mRNA, is approximately lOO-fold higher than might have been anticipated from ligand-binding studies.
The screening procedure The crux of any screening procedure is to determine whether a given pool of cDNAs is positive, that is, whether it contains at least one cDNA encoding the protein of interest. For cell-surface receptors, binding of radiolabelled ligand to monolayers of
transfected cells followed by emulsion autoradiography offers the highest sensitivity, and allows the screening of large cDNA pools, typically IV clones per pool (Fig. lb). This is a very laborious procedure, but is generally the most applicable. Selection of positive transfected host cells and subsequent extraction of their content of plasmid DNA for rescreening is particularly useful where monoclonal antibodies are available that bind the desired-cell surface pro tein but not to the host cells. Panning of the cells can be used’,“, or, if the probe can be made fluorescent, flow cytometry can be used for sortir@J’. Although the Hirt procedure for harvesting plasmid DNAs from eukaryotic cells is efficient, it has inherent problems: for poorly understood reasons, plasmids that have replicated in COS cells often become damaged and emerge from the cells as smaller DNAs. These consist chiefly of the vector sequences necessary for plasmid replication, and have an advantage in DNA replication. Hence, the desired cDNA clone may not be enriched during subsequent rounds of screening. This problem can be overcome by using alternative DNA transfection procedurti such as protoplast fusions,” that allows transfection of only one or very few plasmids per host cell. Alternativelv, one can conduct one round of transfection, selection, and Hirt extraction. generate pools of plasmids, maintain them
439
TECHNIQUES
CURRENT -
Table 2. Different classes of proteins cloned by functional Type of target Antigen
Ligand
Receptor
Transporter
Enzyme
Transcription factor
localization
expression
Name
hay
in mammalkn
Cdk
-
Isolation
Refs
Surface
CO2
Antibody panning
Hirt
5
Lyzosome
CD63
ImmunoreactiviIy
Hirt
15
Secreted
GM-W
Colony frrmation
Sib
4
Surface
CD27 ligand
Reverse binding a.ld autoradiography
Sib
19
Fas !igand
iieverse binding arid panning
Hirt
20
Binding and counting
Sib
11
GM-CSF receptor
Binding and autoradiography
Sib
12
Types II and Ill TGF-p receptor
Binding and autoradiography
Sib
13.14
Erythropoietin
Surface
receptor
Binding and autoradiography
Hirt and Sib
2
Binding and FACS
Hirt
9
FGF-binding heparan sulphate proteoglycan
Ligand panning
Hirt
3
Mevalonate
Uptake and counting
Sib
17
5-HT transporter
Uptake and autoradiography
Siband hybridization
16
Fatty-acid transporter
Uptake and FACS
Hirt
10
Lyzosome
Fucosyl transferase
Panninga
Hirt
27
Endoplasmic reticulum
17 p-hydroxy steroid dehydrogenase 2
Enzyme activity
Sib
28
Nucleus
GATA- 1
Gel shift
Sib
18
y-Interferon
receptor
lnterleukin
6 receptor
Surfare
transporter
Hrrt. plasmrd extractron from posltrve cells’ Srb. subdusron of posmve CONA pools [for example, sib selection). Reverse bmdmg. blndmg of soluble receptor to cells ‘IndIrect assay fucosyl transterase allows the surface expression of immunoreactlve ELAM-1 ligand FACS. fluorescence-activated cell sorting; FGF. fibroblast growth factor. GM-W. granulocyte-macrophage colony-strmulatmg factor; TGF-P. transforming growth factor p
exclusively in bacteria, and then and subdivide these pools until a single clone is obtained”. For the cloning of secreted proteins. biological screening of the medium from dishes of cells that are transfected with pools of cDNAs is the most straightforward approach. The difficulty is to determine the size of a pool of cDNAs that will generate a sufficient signal. Control experiments are particularly useful in which a previously cioncd cDNA, encoding a related secreted protein, is diluted with a cDr2.j encoding an irrelevant protein. screen
Many control experiments the probability
enhance
of success. The efficiency of transfection in each round of screening should be determined by performing a parallel transfection of
440
a control cDNA inserted in the same cioning vector. In Fig. 2a, a p-galactosidase cDNA has been used. Using an if1sitrr staining procedure, the fraction of transfected cells that express this enzyme can be determined; ideally, the transfection efficiency should be 10 to 25%. The integrity of the screening assay (whether it is ligand binding or transport) should be verified each time that it is used. This is another reason to use cell lines, and not tissues, as the source of mRNA for the cDNA library. The screening is typlrally done on live cells, and a parallel incubation of the cell line from which the mRNA was isolated assures the functionality of the assay. Where control cell lines are unavailable, cDNAs with analogous function can be transfected into the host cells and assayed under similar conditions. In the case of recep-
tors detected by single-cell autoradiography, a cloned cDNA encoding an isoform of the receptor, or a receptor for a related hormone, is a useful control (Fig. 2b). To distinguish artefacts from true positive pools and to demonstrate the specificity of the signal, one should rescreen apparently positive cDNA pools (Fig. lb) using radiolabelled ligand together with an excess of the same or of unrelated ligands as inhibitors.
Limitations It should be noted that while these expression cloning techniques are powerful, they are limited to the isolation of cDNAs encoding proteins that function as a single polypeptide chain. This may be a problem with certain enzyme systems, but most receptors can be isolated by the expression cloning method, since
CURRENT
TECHNIQUES
Fig. 2.Controls used in expression cloning. 8: Photomicrograph of COS cells transfected with a vector cantaining the E. co/i P-galactosidase gene, and stained in &for p-galactosidase. The fraction of P-galactosidase-expressing cells is equal to the frequency of transfection. b: Photomicrograph of a monolayer of COS cells transfected with a vector containing the endothelin ET,, receptor, incubated with radiolabelled endothelin 1, and then subjected to autoradiography. The silver grains over the positive cells kdicate abundant expression of endothelin receptors. Analogous receptors can be used to optimize the time of autoradiographic exposure.
virtually all oligomeric receptor complexes have at least one subunit that can bind !&and by itself. A potential pitfall with G protein-coupled receptors is that they have two affinities for ligand that depend on the interaction of the receptor with a heterotrimeric G protein: a relative shortage of the appropriate G protein in thchost cell that is expressing large amounts of the transfected receptor would lead to a prevailing lowaffinity state of the transfected receptor and might prevent its detection by ligand binding. This problem has been addressed by Ishihara and colleagues?‘, who used COS cells that were stably transfected with G,, as host cells for expression cloning of the secretin receptor cDNA. Another phenomenon is that the isolated clone can be a regulator of the functional expression of the desired molecule. This could be a transcriptional activator or an enzyme that catalyses processing of a precursor form of the target or another regulatory componenfs. This has been a frequent occurrence in the attempted isolation of cell-surface antigens identified by monoclonal antibodies. The isolated clones, rather than encoding the antigen itself, were the glyco-
transferases that attach sugar moieties to the precursor antigen, thus forming the epitope recognized by the monoclonal antibody used as a probe26. Concluding remarks Recent developments in DNA transfer, heterologous protein expression, and screening strategies have led to the emergence of the expression cloning technique as a stateof-the-art molecular cloning method. By relying on functional assays while bypassing biochemical purification schemes, it conceptually represents the direct route from a biological activity to the cloned protein. It is possibly the most generally applicable approach for isolating the cDNA encoding a protein when little is known about it except the nature of its biological function. Selected references 1 Masu, Y. rf nl. (1987)
Nntrrrc 329.836-838 2 de Chasseval, R. and de Villartay. J-P. (1992) Ndcir Acids Ra. 20,254-250 3 Kiefer, M. C,Stephans, J.C.,Crawford. K.. Okino, K. and Barr, P. J. (1990) Proc. Nnfl Ad. Sci. USA 87,6985-6989 4 Wong,G. C. rfef. (198.5)Scirnce228,810-515 5 Seed, B. and Aruffo, A. (1987) Proc. Nat1 Acad. Sri. USA 843365-3369 6 Aruffo, A. and Seed, B. (1987) Prw. Nell Ad. Sri. USA 84,8573-8577 7 Hirt, B. (1967) /. Mol. Bid. 26,365-369
B Xie,G. X.. Miyajima, A. and Goldstein,A. (1992) Proc. Nlrfl Acad. Sri. USA 89, 4124-4128 9 Yamasaki, K. rf nl. (1988) Scirnrc 241, 825-828 10 Chaffer, I. E. and Lodish, H. F. Of/ tin press)
11 D’Andrea, A. D., Lodish, H. F. and Wang, G. G. (1989) Cc#/57,277-285 12 Cearinp, D. I’., King, I. A., Gowh, N. M. and t&ola, N. A.’ il989) EiBO /. 8, 3667-3676 13 Wang, X. F. rf rtf.(1991) Ccl167,797-305 14 Lin. H. Y., Wang, X. F., Ng-Eaton, E., Weinberg, R. A. and Lodish, H. F. (1992) Cc/l 68,775-785 15 Metzelaar, M. J. ef a/. (1991) 1. Binf.Clrcw. 266,3239-3245 16 Hoffman, B. J., Mezey, E. and Brownstein, M. J. (1991) Sriwcc 254,579-580 17 Kim, C. M., Goldstein, J. L. and Brown, M. S. (1992) /. Bin/.Clara. 267, 2311.3-23121 18 Tsai, S-F. cf al. (1989) Nalrrrr 339,446-451 19 Goodwin, R. G. rf a/. (1993) Cuff73,447-156 20 Suda, T. Takahashi, T. Golstein, I’. and Nagata, S. (1993) Crl/ 75,1169--1178 21 Yoshimura. A., D’Andrea, A. D. and Lndish, H. F. (19901 Pmt. Nltfl Anrf. Sci. USA 87.4139-4143 22 Hollenbaugh. D. and Aruffo, A. (1993) in Cwwtrt Pmtrwb irl Mohrrbv Biolqv (Ausubel, F. M. vt 111.. eds), pp. b.fl.11-6.11.16, John Wiley and Sons 23 Munro, S. and Maniatis. T. (1989) Pna. Nsff Amd. Sci. USA 86,9248-9252 24 Ishihara, T. of nl. (1991) EMBCJ 1. 10, 163%1641 25 Pullman, W. E. and Bodmer, W. F. (1992) Nahm 356529-532 26 Bat, B.J.rf al. (1992) I. Cr./fBid. ll6,42.&t35 27 Goelz, S. E. cl a:. (1990) Cc/l 63,1349-1356 28 Wu, L. cr al. (1993) /, BLd. Ckrrr. 268, 129s12969
TiPS - December 1994 (Vol. 15)
441