Trends in Biotechnology, Vol. 1, No. 4, 1983
109
High expression of cloned genes in E. coli and its consequences M. J. Carrier, M. E. Nugent, W. C. A. Tacon and S. B. Primrose Using r e c o m b i n a n t D N A technology it is relatively easy to construct strains o f bacteria which o v e r - p r o d u c e foreign proteins such as those f r o m h u m a n , a n i m a l or viral sources. In the near future the s a m e m e t h o d o l o g y will be u s e d to increase the production o f natural products o f m i c r o o r g a n i s m by m a n i p u l a t i n g m e t a b o l i c pathways, e.g. those involved in antibiotic or a m i n o acid syntheses. This article reviews the factors which lead to over-expression o f proteins in E. coli and the possible c o n s e q u e n c e s w h e n large scale culture is undertaken.
There are two absolute requirements. for the expression of a cloned gene. First, the gene must be placed downstream from a promoter sequence to permit transcription. Second, ribosomes must be able to bind to the RNA transcript and initiate translation at the start of the cloned gene sequence. Many factors can affect the efficiency of this expression, e.g. choice of promoter, the sequence of bases in the vicinity of the ribosome-binding site, the plasmid copy number, the presence of transcription terminators, the choice of bacterial host and the way in which the recombinants are grown. If we are looking for high levels of expression of a cloned gene then transcription should be maximized by use of a strong promoter. Since constitutive expression will almost certainly be detrimental to the host organism (see below) then the promoter should also be controllable, i.e. it should incorporate an operator region. A variety of promoters have been used to regulate expression, the most commonly used being the E. coli lac UV5 and trp promoters and the PL promoter from coliphage lambda. In terms of transcriptional strength yet more efficient expression plasmids have recently been constructed which utilize the E. coli lipoprotein (lpp) promoter or a hybrid trp-lac UV5 promoter (the tac promoter)2. High rates of transcription can also be attained by increasing the number of promoters which transcribe into the cloned gene. By the use of double and triple trp promoters the expression of urogastrone has been S. B. Primrose is Director of Microbiology & Process Research, Searle Research & Development, High Wycombe, U.K. M. J. Carrier, M. E. Nugent and W. C. A. Tacon are members of the Gene Expression Group.
increased 3.5-fold and 4.5-fold respectively. Efficient translation depends on ribosomes recognizing the initiation codon at the start of a gene sequence. With most E. coli genes translation is initiated after a ribosome binds to a ribosome recognition site on the mRNA - the so-called Shine-Dalgarno (SD) site. This is a region which on average is 5 to 9 nucleotides upstream of the start codon. Usually it contains
part of the sequence 5'-AGGAGGTGY which is complementary to the 3' terminus of the 16S ribosomal RNA (-CACCUCCUAoH). When expressing foreign genes in E. coliit is essential that such a region precedes the coding sequence. The precise location of the SD site relative to the initiation codon and the composition of the intervening mRNA can affect dramatically the level of expression. For example, Shepard et al. 4 reduced from 11 nucleotides to 6 the distance between the SDsite and the initiating AUG codon of a /3-interferon gene and obtained a 100-fold increase in the level of expression. Varying this distance in this way can result in increases and decreases in the level of expression and these have been attributed to changes in mRNA secondary structure. Most of the cloning vectors in current use are based on multicopy ColE1related plasmids such as pBR322 and pAT153. These two plasmids have copy numbers of 50 and 150 per chromosome respectively and mutants with even higher copy numbers are
Glossary direct repeats - exact repeat of a sequence on the same DNA strand, e.g. 5' AGTCT . . . . . 3' TCAGA . . . . .
AGTCT . . . . . 3' TCAGA . . . . . 5'
IS elements - movable pieces of DNA widely distributed throughout prokaryotic and eukaryotic DNA. The E. coli chromosome contains at least five different IS elements: IS I, 2, 3, 4 and 5. These IS elements can insert themselves into different regions of the chromosome or into co-resident plasmids. The mechanism of their insertion ensures that they retain a copy of themselves at the donor site. transposons - a class of insertion sequence encoding a recognizable gene product. The most commonly occurring transposons encode antibiotic-resistance genes. Insertion sequences and transposons are important in bringing about various changes in genetic information, They cause mutations by inserting into genes and can cause deletions, duplications and insertion of DNA. By their very nature they can disseminate the genes that are encoded by them. transposon-induced co-integrates during transposition a co-integrate structure is made in which donor and recipient replicons are fused with a directly repeated copy of the transposon
at each juncture point. Transposition is completed by recombination between the direct repeats of the transposon. read-through - a term given to the situation in transcription where the RNA polymerase molecule continues to transcribe beyond the expected termin.ation point. iso-acceptor tRNAs - tRNA molecules which recognize and bind to the same codon, but not necessarily with the same efficiency. plasmid runaway replication mutant - a plasmid which has lost the ability to control its copy number at increased growth temperatures. At 30°C mini derivatives of the runaway replication mutant of the plasmid RI have normal copy number (10-25 copies per chromosome). A t temperatures above 35°C control of replication is lost resulting in an increase in copy number to more than 2000 copies per chromosome equivalent, plasmid multimers - plasmid molecules normally exist in the cell as monomers but in recombination-proficient cells recom. bination between identical plasmid molecules can generate dimers, trimers and higher oligomers.
© 1983,ElsevierS~iencePublishers B,V., Amsterdam 0166-9430/831501.00
Trends in Biotechnology, VoL 1, No, 4, 1983
110
available s. The reason for the use of high copy number plasmids is simple: the more plasmid copies there are, the more RNA transcripts of the gene there are. This is important since the ratelimiting step in protein synthesis is the binding of a ribosome to the mRNA and one of the ways of speeding up the process is to increase the number of mRNA molecules. One disadvantage of the use of high copy number plasmids is that they impose a heavy energetic drain on the host cell. An alternative expression system makes use of a temperature-sensitive mutant of the low copy number plasmid R1. When the growth temperature of the host cell is increased runaway plasmid replication occurs6,7 and the copy number can increase several hundred fold.
A factor which has not been exploited fully is the use of transcription terminators to minimize the length of the recombinant transcript. Ideally transcription should be made to terminate shortly after the coding sequence. In some instances the use of transcription terminators is essential when using strong constitutive promoters to drive expression a. The E. coli strain chosen as the host for a recombinant plasmid can greatly influence the levels of expression of a cloned gene. In many instances there is, as yet, no rational explanation for this effect, but in some cases it is a reflection of the level of protease activity within the cell. Eight proteases have been detected in wild-type E. coli cells and there may be more9. One of these
107
proteases is absent in lon mutants and such mutants now are widely used. An alternative approach to the problem of proteases is to include in the cloning vector the antiprotease gene of phage T4 ~°. This phage gene product reduces proteolysis and its use can result in increased levels of expression (see front page). Finally, it should not be forgotten that the physiology of the host cell can greatly influence the levels of expression. The choice of nutrients, the way in which they are supplied to the culture and environmental parameters such as temperature, dissolved oxygen tension etc. are particularly important and a rational fermentation programme is essential if high yields of product are to be obtained. By careful attention to all the above factors it should be possible to obtain 10-20% of total cell protein as recombinant gene product when the cell yield is 50-100 g dry wt/litre.
The effects of high expression on plasmid stability
Having maximized the expression of a particular gene it is important to consider what effects this will have on the 106 bacterium harbouring the recombinant plasmid. Increases in the levels of expression of recombinant genes lead to reductions in cell growth rates and may result in morphological changes such as Z < fllamentation and increased cell fragt-ility. Ifa mutant arises which has either v lost the recombinant plasmid, or has undergone structural rearrangement 104 o UJ such that the recombinant gene is no m :E longer expressed, or has a reduced Z plasmid copy number, then this will have a faster growth rate and may 103 -109 u3 laJ quickly take over and predominate in c_) the culture (Fig. 1). The loss ofplasmids due to defective Z partitioning is called segregative instab102 108 ility. Naturally occurring plasmids are LU stably maintained because they contain a partitioning function, par, which ensures that they are accurately segre101 10 7 gated at each cell division. Such par regions are essential for the stability of low copy number plasmids. The higher copy number plasmid ColE1 also contains a par region, but this region is I I I t t I I I I I I I 1 I I I I I I I I I I I deleted in pBR322 which is Segregated 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 lg 2021 22 23 2 4 2 5 randomly at cell division. Although the copy number ofpBR322 is high and the Fig. 1. Competition between a typical slow-growingproduction strain and a faster-growing probability ofplasmid-free cells arising mutant generated from it. At time zero there are 2 x 106 cells/ml of the parent strain and a single mutant cell per ml. Notice that when the parent cell density is 10I° cells/ml the is very low, under certain conditions mutant cell density is 107 cells/m1. On the next sub-culture the mutant cells will come to such as nutrient limitations or during rapid host cell growth, plasmid free predominance.
lO5
/
b~°
TIME(hours)
/
101°
Trends in Biotechnology, VoL 1, No. 4, 1983
cells may arise 3,H. This problem can be obviated by maintaining antibiotic selection. However this may not be a desirable solution for large scale culture because of cost and waste disposal considerations. T h e p a r region from a plasmid such as pSC101 may be cloned into pBR322-type vectors thus stabilizing the plasmid. Plasmid instability may also arise due to the formation ofmultimeric forms of a plasmid. The mechanism which controis the copy number of a plasmid ensures a fLxed number of plasmid origins per bacterium. Cells containing multimeric plasmids have the same number of plasmid origins but fewer plasmid molecules which leads to greater instability if these plasmids lack a partitioning function. These multimeric forms are not seen with ColE1 which has a natural method of resolving the multimers back to monomers. It contains a highly recombinogenic resolution site that resolves multimers in a way analogous to the resolution of transposon-induced cointegrates. The resolution site of ColEl has been cloned into pBR322-type plasmids and has eliminated problems due to multimerization (Summers and Sherratt, personal communication). Structural instability of plasmids may arise by deletion, insertion or rearrangements of DNA. Some of the earliest reports of deletions were in chimeric plasmids which can replicate both in Escherichia coli and Bad#us subtilis (for reviews see references 12 and 13). Spontaneous deletions have now been observed in a wide range of plasmid, virus and chromosomal DNAs. A common feature of these deletions is the involvement of homologous recombination between short direct repeats 14.Artificial plasmids with multiple tandem promoters are particularly prone to deletion formation. As well as homologous recombination between sites on a plasmid, structural instability may be mediated by insertion (IS) elements or transposons resident in the host chromosome or on plasmids. Both of these elements can cause spontaneous mutations by their insertion, adjacent deletion or inversion of DNA. There are many reports of plasmid instability due to insertion oflS elements from the chromosome. For example, transformation oftyrR strains orE. coliwith multicopy plasmids carrying the tyrosine operon gave rise to modified plas-
111 ORI
- 5'55
RNA I 1
- 55/*
0
+ 4'29
RNA
+6'20 ROP
- 445
I
PROCESSING BY RNase
H
Fig. 2. The replication control of plasmid ColE1. ROP controls the transcription of RNAII and RNAI inhibits the processing of RNAII by RN'ase H. The figures represent the base pair co-ordinates measured from the origin of replication.
mids with either insertions or deletions~L These effects were due to IS1 insertion. Mutant strains may also arise which result in a drop in copy number ofplasmids which they harbour. These 'low cop' mutations are chromosomal in origin and may be selected against using increased antibiotic concentrations. The use of high antibiotic concentrations whilst ensuring a population of 'high cop' plasmids will not ensure that a high cop plasmid which no longer produces the recombinant protein does not arise. Plasmid low cop mutants may arise which have a slower replication rate than normal. If such a mutation arose in vivo in one plasmid molecule out of a total cellular population of at least 50 (in the case of pBR322) then it is very unlikely that this mutant would ever take over in the population since the normal plasmids would replicate to compensate for the mutant and maintain the normal cellular copy number of pBR322. Under some circumstances, however, low cop plasmid mutants may be constructed in vitro. Two negatively acting components are involved in the replication of ColEl-type plasmids. One is a 108 base pair (bp) untranslated RNA molecule called RNA I (Ref. 16) (Fig. 2), the other is a protein repressor (ROP)'L RNA II is an RNA molecule which is processed by RNaseH to give a 555 bt~ primer for the initiation of ColE1 replication is. RNA I inhibits DNA replication by base pairing with a complementary sequence on RNA II and preventing RNA II processing by RNase H. ROP is a 63 amino acid protein which controls the initiation of transcription of RNA II. If transcription of either RNA I or ROP is increased then this may lead to a reduction in the rate of plasmid replication and hence a lowered copy number. For example, a strong promoter from coliphage T5 was cloned into a pBR322-1ike plasmid such that there
was strong read-through into the tetracycline-resistance gene. Without a strong terminator before the tetracycline-resistance gene there was readthrough into the ROP gene and a reduction in plasmid copy number 19. Possible read-through problems must be considered when using strong promoters. To prevent such problems of instability on scale-up it is desirable to minimize expression of the cloned gene until the organism is introduced into the final fermentation vessel. One strategy is to use controllable promoters, another is to use R1 derived runaway replication mutants. Better still, both techniques can be combined 6. The accuracy of translation One of the most poorly understood aspects of high expression systems is the accuracy with which recombinant genes are translated and the subsequent stability of the product inside the cell. The normal error rate for translation in vivo is about 1 codon in 3000 read 2°. Depending on the position of the error, the resulting protein may retain some activity whilst being more or less susceptible to proteolysis. Stressing of E. coli, e.g. by increased growth temperatures or amino acid starvation, tends to increase translational errors. Cellular strategies have been developed to minimize these errors and maximize the polypeptide elongation rate e.g. optimized codon usage, proof:reading, protease activity and translational shutdown during amino acid starvation. In E. coli a clear correlation exists between codon usage, isoacceptor tRNA concentration and protein abundance 2j. Thus highly expressed proteins such as EF-Tu, recA and ompA have optimized codons whereas low abundance proteins do not. Under conditions of amino acid limitation the error frequency tends to increase significantly; indeed under even moderate tRNA imbalance
112 the error frequency increases. This is because for susceptible codons the time tRNA occupies the ribosomal acceptor site (A site) is increased thereby reducing the rate of elongation and increasing the probability of premature termination, missense errors or frameshifting. An artificial tRNA imbalance may be set up ifa highly expressing recombinant gene contains a high proportion of infrequently used codons. Minor tRNA species which are now required in much larger quantities may be sequestered on ribosomes to such an extent that they cause a potentially serious starvation of the cognate tRNA, hence increasing errors and reducing elongation rate. Although no concrete evidence is available to support this idea there is an increasing body of supportive data which suggests optimization of codon usage is important. For example, specialized tissues which make predominantly one product, e.g. silk fibroin in Bombix mori, show striking tRNA/codon relationships 22. Also the addition of tRNA from one species to an in vitro translation system o f another leads to increased errors. The most obvious solution is the use of artificially synthesized genes with fully optimized codons. The problem here is our lack of understanding of messenger RNA structure and its influence on rates of translation. The formation of novel codon contexts in synthetic genes may result in undesirable mRNA structures. Reduction of expression to a level where tRNA limitation of some codon is not a problem is possible, but this defeats the purpose of maximizing expression. A more drastic solution is to clone minor tRNA genes and various proteins involved in transcription and translation, e.g. EF-Tu and RNA polymerase.
Proteolysis Not all recombinant proteins are unstable in E. coli and at present it is not possible to predict whether a protein will be rapidly turned over. The variation in half life for so-called 'normal' proteins in E. coli is very large. For example, 7% ofall proteins have a tl/2of less than 15 min, another 20-30% are not broken down except under starvation conditions while the remainder are not turned over at all23. The structural features of proteins which affect their half life are not known. Various methods have been used to
Trends in Biotechnology, VoL 1, No. 4, 1983
Fig. 3. Phase contrast photomicrograph of E. coli showing phase-bright intracellular
inclusions of a recombinant protein. overcome the problem of proteolysis, e.g. the use of Ion mutants or the T4 antiprotease as described above. Another is to increase expression t o such an extent that the cellular proteases are swamped by substrate, thus allowing a high proportion of recombinant proteins to remain intact. If this method is adopted a relatively rapid burst of expression extending over no more than one or two generations is preferable to continuous low expression over an extended period. The classical solution has been to produce fusion polypeptides. This involves the protection of the recombinant protein by the whole or part of a host protein molecule fused to the N terminus of the recombinant protein. Thus/3-galactosidase affords considerable protection for the low molecular weight polypeptide somatostatin, whose synthesis in E. coli is undetectable without the fusion leaderu. The advantage of this method is that the host fusion leader has its own SD sequence ensuring successful initiation of translation. The disadvantage is that a fusion polypeptide is produced and if this is not acceptable, methods for converting it to the natural product must be found.
Extraction and purification In many instances high expression of recombinant proteins leads to the formation of high molecular weight aggregates or inclusions (Fig. 3). The inclusions appear to fall into two cate-
gories: (1) paracrystaltine arrays as found in E. coli producing human insulin chains A and B or pro-insulin and in this state the protein is presumably in a stable conformation which may or may not be native; (2) amorphous aggregates which contain partially and completely denatured proteins as well as aberrant protein synthesized as a result of inaccurate translation as described above. Although inclusions probably afford protection against proteases they do present problems of extraction and purification. In most instances denaturants, e.g. SDS or urea have to be used to extract the protein. For proteins of pharmaceutical interest, particularly if for parenteral administration, the use of detergents is undesirable since it is difficult to remove them completely. SDS and Triton are especially troublesome. If urea is used as the extractant, amination of glutamate and aspartate residues can occur. Regardless of the extraction method used the protein almost certainly will have to be renatured and this may prove difficult, if not impossible. Final comments From the foregoing it is clear that there is no problem in obtaining high expression of recombinant proteins in E. coil However, culture stability can be a problem even on a small scale and the methods used to minimize/prevent it must be applicable on a large scale.
Trends in Biotechnology, VoL 1, No. 4, 1983
The major impact of instability is on the yield and efficiency of the fermentation process. However for pharmaceuticals which have to be manufactured under GMP (Good Manufacturing Practice) conditions the drug regulatory authorities consider instability to be undesirable. Where the recombinant protein is for industrial use, e.g. enzymes for inclusion in detergents etc., errors in the amino acid sequence introduced by mistranslation are of little importance unless they significantly decrease the specific activity. Again, however, they are of great significance for drugs intended for human
113
4 Shepard, Goeddel, 125-131 5 Gelfand, O'Farrell, 6 7 8 9 10
use.
11 References 1 Hawley, D. K. and McClure, W. R. (1983) Nucl. Acid. Res. 11, 2237-2255 2 de Boer, H. A., Comstock, L. J. and Vasser, M. (1983)Proc. NatlAcad. Sci. 80, 21-25 3 Nugent, M. E., Primrose, S. B. and Tacon, W. C. (1983) Develop. Ind. MicrobioL 24 (in press)
12
13
H. M., Yelverton, E. and D. V. (1982) DNA 1,
D. H., Shepard, M., P. N. and Polisky, B. (1978) Proc. Natl Acad. Sci. 75, 5869-5873 Sninsky, J. J., Uhlin, B. E., Gustafsson, P. and Cohen, S. N, (1981) Gene 16, 275-286 Uhlin, B. E. and Nordstrom, K. (1977) Plasmid 1, 1-7 Gentz, R., Langer, A., Chang, A. C. Y., Cohen, S. N. and Bujard, H. (1981) Proc. Natl Acad. Sci. 78, 4936-4940 Swamy, K. I-I. S. and Goldberg, A. L. (1982)J. Bacterio1149, 1027-1033 Simon, L. D., Randolph, B., Irwin, N. and Binkowski, G. (1983) Proc. Natl Acad. Sci. 80, 2059-2062 Jones, I. M., Primrose, S. B., Robinson, A. and Ellwood, D. C. (1980)214ol. Gen. Genet. 180, 579-584 Kreft, J. and Hughes, C. (1982) in Gene Cloning in Organisms other than E. coli (P. H. Hofschneider and W. Goebel, eds), pp. 1-17, Springer Verlag Ehrlich, S. D., Niaudet, B. and Michel, B. (1982) in Gene Cloning in Organisms other than E. coil (P. H. Hofschneider
Mass transfer in fermentation Klaas van 't Riet An essential consideration in the design and operation of commercial fermenters is to ensure adequate mass transfer. The complex composition of fermentation liquids makes it dittleult to predict accurately the mass transfer characteristics in large vessels. Here various aspects of mass transfer are discussed and their relationships examined. Strategies for predicting the most important type of mass transfer - between gases and liquids - in large scale fermentations are presented. The changing environment of the organism At first sight, the relevant engineering parameters for the design of large-scale bioreactors are heat transfer, mixing, cell volume and mass transfer. Order of magnitude calculations show which is the restricting factor in most cases. These calculations are based on the application of elemental balances and the growth equations. These equations show the relationships between the engineering parameters, the growth of the organism and the concentration of nutrients. The restricting engineering parameter in each case can be estimated Klaas van 't Riet is Professor of Food- and Bio-Process Engineering, Department of Process Engineering, Wageningen A g r i c u l t u r a l University, De Dreijen 12, 6703 BC Wageningen, The Netherlands.
and W. Goebel, eds), pp. 19-29, Springer Verlag 14 Jones, I. W., Primrose, S. B. and Ehrlich, S. D. (1982)Mol. Gen. Genet 188, 486-489 15 Rood, J. I., Sneddon, M. K. and Morrison, J. F. (1980)J. Bacterio1144, 552-559 16 Tomizawa, J. and Itoh, T. (1981)Mol. Gen. Genet. 178, 525-533 17 Cesarini, G., Muesing, M. A. and Polisky, B. (1982)Proc. NatlAcad. Sci. 79, 6313-6317 18 Tomizawa, J. and Itoh, T. (1982) Cell 31, 575-583 19 Stueber, D. and Bujard, H. (1982) EMBOffournal 1, 1399-1404 20 Edelmann, P. and Gallant, J. (1977) Cell 10, 131-137 21 Grosjean, H. and Fiers, W. (1982) Gene 18, 199-209 22 Chavancy, G., Daillie, J. and Garel, J. P. (1971)Biochimie 53, 1187-1197 23 Nath, K. and Koch, A. L. (1971) ft. Biol. Chem. 246, 6956-6967 24 Itakura, K., Hirose, T., Crea, R., Riggs, A. D., Heyneker, H. L., Bolivar, F. and Boyer, H. W. (1977) Science 198, 1056-1063
where: to= oxygen consumption rate (tool s -1) r~ = CH20 consumption rate (mol s -1) r x = biomass production rate (mol s-1)
By introducing the linear growth equation: 1 r~ = ~ rx + m~M~ (2)
where: Y~x = yield of biomass on substrate (mol carbon in biomass per mol carbon in substrate) by comparing sets of calculated values with literature data for fermenter m s = maintenance coefficient (mol tool-1 s-l) engineering parameters and with the Mx = biomass in the reactor (mol) characteristics of the relevant organEqn (1) can be modified to: ism. In this way 'engineering' and 'microbiology' are coupled. An aerobic organism in a stirred or bubble-column OUR = { ( ~ x - 1.041/a + ms }Cx (3) type of fermenter is used as an example. For specific applications the same proFig. 1 shows the OUR values calcucedure can be followed to check lated from Eqn 3 for Y~ -- 0.60 mol whether the result is very different. mo1-1 and ms = 10-s mol tool-1 s-1, Firstly the oxygen uptake rate, OUR taken from Ref. 1 as 'average' values. At (tool m -3 s -~) of a broth of given density steady state conditions the OUR value Cx (mol m-3), and a given growth rate/a is equal to the OTR value (oxygen (s-l) is calculated. The application of transfer rate, m o l m -3 s-1). Later it will elemental balances for growth of be shown that from the viewpoint biomass (CH1.64N0.1600.52S0.0046P0.0054) limitations of mass transfer O T R = on CH20, NH3, H3PO4 and H2SO4 2.8 x 10-5 mol m -3 s -1 (= 100 mol m -3 leads to (Ref. 1): h -~) must be regarded as a rather high value for many types of fermenter. This ro = r~ - 1.04rx (1) means that the growth of a culture can © 1983,ElsevierSciencePublishersBN., Amsterdam 0166 9430/83l$01.00