GENE AN I N T E R N A T I O N A L J O U R N A L GENES AND GENOME$
ELSEVIER
ON
Gene 174 (1996)51-58
Construction and expression of a synthetic wheat storage protein gene Olin D. Anderson *, Joseph C. Kuhl, Angie Tam 1 Western Regional Research Center, Agricultural Research Service, U.S. Department c)f Agriculture, 800 Buchanan Street, Albany, CA 94710, USA
Received 7 March 1996; accepted 9 April 1996
Abstract
A synthetic wheat high-molecular-weight (HMW) glutenin storage protein gene analog was constructed for expression in E. coli. This first synthetic HMW-glutenin gene and future modifications are intended to allow systematic dissection of the molecular basis of HMW-glutenin role in the visco-elastic properties critical for wheat product processing and utilization. The design of the gene included four features: different construction strategies for the separate assembly of major polypeptide domains, the inclusion of convenient restriction sites for modifications, use of a codon selection similar to E. coli highly expressed genes, and the ability to produce repetitive sequence domains of exact numbers of defined repeats. The complete synthetic HMW-glutenin construct was 1908 bp, and contained 32 identical copies of one of the HMW-glutenin repetitive domain motifs. The gene expressed the novel HMW-glutenin protein to relatively high levels in bacterial cultures and the protein exhibited the known anomalous behavior of HMW-glutenins in SDS-PAGE. Keywords: High-molecular-weight glutenin; Repetitive domain; Bacterial expression; pET3a vector; Oligonucleotides; Polymerase-
chain-reaction; Synthetic gene
1. Introduction
The wheat HMW-glutenin polypeptides are critical contributors to the visco-elastic properties responsible for the processing characteristics and utilizations of wheat doughs (Shewry et al., 1995). It is theorized that the HMW-glutenin contribution to visco-elasticity results from two main protein structural features: location of cysteines residues (the basis of the cross-linked gluten matrix) in the terminal non-repetitive domains of the polypeptide, and a central domain composed of repeats of short peptide motifs comprising 77-84% of the polypeptide in published sequences (Shewry et al., 1992). The HMW-glutenins are one class of wheat prola* Corresponding author. Tel. + 1 510 559 5773; Fax + 1 510 5595777; E-mail:
[email protected] 1 Current address: Metabolex, Inc., 3876 Bay Center Place, Hayward, CA 94545, USA. Abbreviations: aa, amino acid(s); Ap, ampicillin; BIS, N-N-methylenebis-acrylamide; bp, base pair(s); CAI, codon adaption index; E., Escherichia; HMW, high-molecular-weight glutenin; IPTG, isopropylf3-D-thiogalactopyranoside; kb, kilobase(s) or 1000bp; LB, LuriaBertani (medium); ms, milligram; nt, nucleotide(s); oligo (s), oligodeoxyribonucleotide(s); PAGE, polyacrylamide gel electrophoresis; PCR, polymerase chain reaction; SDS, sodium dodecyl sulfate; ss, single stranded; TE, 10 mM Tris-HCl/1 mM EDTA pH 8.0; XGal, 5-bromo4-chloro-3-indolyl-13-D-galactopoyranoside 0378-1119/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved P H S0378-1119(96)003150
mines (seed storage proteins high in proline and glutamine), and share significant features with the other wheat prolamine storage proteins: the low-molecular-weight glutenins, and the ~-, 7- and co-gliadins. The total number of prolamines produced in the wheat endosperm varies greatly but as many as 100 distinct proteins are found, which, when combined with the similarity in their physical-chemical characteristics, makes it difficult to study the contributions of individual proteins to rheologic properties. One approach has been to express cloned wheat prolamine genes in heterologous systems from which single prolamines can be isolated and studied (HMW-glutenins in bacteria, Galili, 1989; ~-gliadins in yeast, Blechl et al., 1992; ~-gliadins in bacteria, Neill et al., 1987; 7-gliadins in yeast, Pratt et al., 1991). Besides the advantages microbial systems have in rapid production of large amounts of specific proteins, these systems can be combined with recombinant DNA technology to make modifications in polypeptides and to construct simple-structure peptides for study. Such approaches have been used to study several polypeptides related to elasticity. Goldberg et al. (1989) constructed a reading frame of 32 copies of the collagen analog peptide G P P fused to 14 N-terminal amino acids of the CII protein. McPherson et al. (1992) similarly con-
52
O.D. Anderson et al./Gene 174 (1996) 51 58
structed, in two steps, a gene for an elastin-like polypeptide containing 20 copies of the motif VPGVG fused to a glutathione S-transferase for efficient affinity purification of the elastin-like polypeptide. Martin et al. (1995) accomplished the complete synthesis of a copy of a human tropoelastin of 2210 bp assembled from 8 cloned sections each made from 6-8 complementary and overlapping oligos. In order to allow more systematic study of the molecular basis of HMW-glutenin contributions to dough elasticity, a completely synthetic HMW-glutenin gene was constructed for expression in bacteria.
A 8 88
8
S
N~
COOH PGQGQQ PGQGQQGQQ PGQGQQGYYPTSPQQ
B .H.
COOH 4-
2. Results and discussion
2.1. Structure of the synthetic HMW-glutenin gene The most important structural features of the HMWglutenins, both from theoretical structure/function considerations and for the construction of genes de novo, are the terminal, cysteine-containing domains, and the central repetitive domain (Fig. 1A). The central domain is composed of 45-80 copies of short peptide motifs which are believed to form a regular higher order structure, although the exact form of the structure is not known. To understand the molecular basis of HMWglutenin functioning a strategy was developed for assembling synthetic HMW-glutenin genes with four features. The first feature is the separate construction of fused terminal domains into which can be inserted the separately constructed repetitive domains (Fig. 1B). This simplifies the construction by allowing assembly of the two disparate sections by different strategies, and allows "mixing-and-matching" of terminal regions and repetitive domains from the eventual sets of available constructions. The construct containing the fused terminal domains was modeled on the Dx5 HMW-glutenin gene (GluDl-ld; Anderson et al., 1989) which is highly correlated with wheat quality parameters and has been theorized to play a central role in dough visco-elasticity (Greene et al., 1989). The DNA and amino sequences of the synthetic terminal construct and the Dx5 gene are shown in Fig. 2. The terminal construct does not encode the signal peptide cleaved off during protein processing in the wheat endosperm. The second feature of the synthetic HMW-glutenin construct is the placement of restriction sites useful in modifying cysteine residue number and position. This strategy was used by Ferretti et al. (1986) in the construction of a synthetic rhodopsin: 72 oligos, 15-40 bp long, included 28 unique restriction fragments about 60 bp apart for systematic structure/function studies. Similarly, Martin et al. (1995) placed restriction sites at 270-300-bp intervals in their tropoelastin construct. The synthetic HMW-glutenin terminal sequence in Fig. 2 has 12 unique restriction sites (five additional sites occur in the
NH, I
COOH PGQGQQGYYPTSPQQ
Fig. 1. HMW-glutenin protein structure. (A) Model of a HMW-glutenin protein based on the cloned Dx5 gene (Anderson et al., 1989). Open boxes are terminal, non-repetitive domains. The striped box is the repetitive domain composed of interspersed copies of variants of the peptide motifs PGQGQQ, PGQGQQGQQ, and PGQGQQGYYPTSPQQ. S=cysteine residues. (B) Strategy for the three-stage construction of a synthetic HMW glutenin gene based on the Dx5 gene. The protein sequence encoded by the synthetic gene components is shown. Open boxes are terminal domains. The arrowhead indicates the junction of the fused, terminal polypeptide domains. The filled box is the repeat domain composed of tandem copies of the 15 amino acid motif. The terminal domains are constructed as a single clone with the two terminal domains fused. The repetitive domain is a second construct. The complete synthetic HMWglutenin is assembled by inserting the repetitive domain DNA construct into restriction sites separating the terminal domain DNAs.
vector but could by used, if necessary, by partial digest strategies). For example, the first 13 amino acids, including the first cysteine residue, could be modified by replacement of the NdeI-NruI DNA fragment. A site for a repetitive domain insertion is provided by the bases 340 to 357 (Fig. 2) at the XbaI and SpeI sites. When the correct orientation of the repeat domain (see Section 2.3)) is inserted into the terminal domain construct the reading frame is intact throughout the three protein domains. The fourth feature in constructing a synthetic HMWglutenin gene was to use codons correlated with highly expressed E. coli genes (Sharp et al., 1988). Although the significance is still unclear, it has been observed that organisms tend to use specific codon subsets for highly expressed genes (Sharp et al., 1986,1988). When eukaryotic genes are expressed in E. eoli the foreign gene may include a high percentage of codons rarely used by the host, and there is experimental evidence that the use of rare codons may prevent maximum heterologous expres-
53
O.D. Anderson et al./Gene 174 (1996) 51 58
NdeI
NruI
1V Dx5 Syn5 Syn5 Dx5
20
HindIII
~qo
60
AvaII
'~
80
~'
100
G C G A A T G G £ G C G C G C G A C A ATGGAAGGTGAAGCGTCTGAACAGC TGCAGTGCGATCGCGAAC TGCAGGAAC TGCAGGAACGTGAACTGAAAGCTTGCCAGCAGGTTATGGACCAGCAGC M E G E A S E Q L O C E R E L Q E L Q E R E L K A C O O V N E O O
D
AccIII •
BsmI 120
D
MscI
•
lqO
Kpnl
•
160
180
•
200
C A TAGC C G C C C CAGC C G A A G A G G C G C AT T C C TGCGTGACATCTCTCCGGAATGCCACCCGGTTGT••GT•TCT•C•GTTGCTGGCCAGTACGAACAGCAGATCGTTGTTCCGCCGAAAG•TGGTACCTTCTATCCGGGTGAA
L
R
D
1
S
P
E
C
H
P
V
V
V
S
P
V
A
G
Q
Y
E
Q
O
I
V
V
P
P
K
G
G
Eco4 7III 220
2qO
260
•
f S
F
Y
C
P
G
G
E
Sinai 280
300
•
320
G A G A C A A A T A A T A A A AAGG T AAGT A A T A A•CA•T•CG•C•CAG•AG•T••A•CAG••TATCTT•TGG•GCAT•••G••TCTG•T•AAGC••TA•TA•C••TCTGTTACTTGT•C•CAG•AGGTTTCTTACTA•C•GGG T T P P O O L Q O R I F W G ] P A L L K R Y Y P S V T C P Q O V S Y Y P G
XbaI EcoRV SpeI
E~pI
3qO • • • 360 380 qo0 q20y C A A G. . . . . . . . . . . . . . . . . . A T AGC G G G CAGC A G G A G G C TCAGGCTT•T•CGCAGCGTTCTAGAGATATCACTAGTTCTT•TTA•CAC•TTTCTGTTGAACAC•AG••TGCATCTCTGAAAGTT••TAAAGCTCAGCAGCTGGCTGCGC Q
A
S
Nael
P
O
R
S
R
D
I
T
S
S
S
Y
H
V
Earn 1105I
Vqqo
S
V
E
H
Q
A
A
IS
L
K
V
A
K
A
q
Q
L
G A
A A
BamHI
q60 • q80 q89 A G G C C AT G CAGC AGCTGCCGGCTATGTGCCGCCTGGAAGGTGGTGACGCTCTGTCTGCTTCTCAGTGATAG O L P A M C R L E G G D A L S A S Q
•
Fig. 2. DNA and protein sequences of natural and synthetic HMWiglutenins terminal domains. The DNA and protein sequences of the planned synthetic HMW-glutenin terminal DNA construct and encoded polypeptide are shown. Differences from the naturally occurring Dx5 are shown above (DNA) and below (polypeptide) the synthetic sequences. Restriction sites engineered into the synthetic sequence are indicated by arrowheads and enzyme names. Three base substitutions were necessary to create restriction sites to allow convenient altering cysteine residue number and placement, and made neutral changes in the encoded amino acid residues. The insertion of the repetitive domain into the Xbal + SpeI sites removes the intervening bases.
sion (Robinson et al., 1984; Makoff et al., 1989). Martin et al. (1995) reported a significant increase in the synthesis of tropoelastin from a synthetic gene utilizing frequently used E. coil codons as compared to a human tropoelastin sequence. Thus, the appropriate codons were substituted for the naturally occurring Dx5 termini domain codons as shown in Fig. 2. At three sites a less than optimal (moderate frequency) codon was used to create the NruI, MscI and Eco47III sites. At four sites base changes were used to create restriction sites and resulted in neutral amino acid changes: two Asp to Glu, one Glu to Asp, and one Thr to Ser exchange. Other exceptions were used in the repetitive domain where two less frequently used proline and serine codons were substituted to reduce oligo internal annealing (see Section 2.3)) and to create restriction sites. Table 1 gives the codon frequency in highly expressed E. coli proteins, and the codon distribution of the natural Dx5 gene and the planned complete synthetic HMW-glutenin gene. The codon adaption index (CAI; Sharp and Li, 1987) for genes expressed in E. coli is a function of the distribution of codons as compared to the most commonly used codons in E. coli highly expressed genes: CAI= 1.000 would indicate a gene containing only the
optimal codons. The natural Dx5 has a CAI=0.116, a very unfavorable codon distribution, while the synthetic HMW-glutenin gene (Fig. 2) has a CAI=0.757, consistent with highly expressed E. coli genes.
2.2. Construction and expression of the terminal domains A DNA fragment encoding the two fused terminal domains (Fig. 2) was synthesized using PCR to assemble the final sequence from a set of oligos: 14 oligos of 40-68 bases with complementary ends (Fig. 3A), and a 15th oligo of 21 bases (oligo 7c) to serve as the backward primer. The 522-bp BamHI+BstXI cut PCR product was cloned into pBluescript KS + and a correct sequence clone was generated as described (Fig. 3). This insert was then transferred to pET3a and expressed as shown in Fig. 4, confirming the presence of a functional translation context and reading frame. The induced terminal polypeptide (157 aa) typically comprises 10-25% of the total bacterial protein, and bands on SDS-PAGE at an apparent molecular mass of 27 kDa, significantly higher than the predicted molecular mass of 17 kDa (discussed in Section 2.4)).
O.D. Anderson et al./Gene 174 (1996) 51-58
54
Table 1 Codon usage and codon composition of the complete natural and synthetic HMW-glutenin genes A"
Cod. b
E.c. °
Dx5 a
sDxY
AA a
Cod. b
E.cf
Dx5 a
sDx5 ~
Ala
GCA GCC GCG GCU AGG CGA CGC CGG CGU GAC GAU UGC UGU CAA CAG GAA GAG GGA GGC GGG GGU CAC CAU AUA AUC d AUU
1.09 0.18 0.71 2.02 0.00 0.00 1.53 0.00 4.47 1.49 0.51 1.40 0.60 0.12 1.88 1.64 0.36 0.00 1.68 0.04 2.27 1.55 0.45 0.01 1 0.48
15 3 5 2 2 1 2 4 1 4 0 3 2 203 96 2 13 67 7 75 17 2 2 2 4 1
1 0 2 11 0 0 3 0 4 3 1 4 1 0 184 12 0 0 2 0 103 3 0 0
Leu
CUA CUC CUG UUG AAA AAG UUC UUU CCA CCC CCG CCU AGC AGU UCA UCG UCU ACA ACC ACG ACU UAC UAU GUA 0.08 GUG GUU
0.04 0.17 5.54 0.07 1.63 0.37 1.66 0.34 0.42 0.02 3.41 0.15 0.93 0.13 0.06 0.01 2.81 0.10 1.91 0.12 1.87 1.63 0.37 1.12 5 0.40 2.41
11 7 10 8 2 4 1 1 80 4 24 1 6 1 12 6 22 2 1 1 20 44 2 2 0 5 2
0 0 13 0 4 1 2 0 64 0 46 0 32 1 0 0 13 0 2 0 34 70 1 0
Arg
Asp Cys Gln Glu Gly
His Ile
0
Lys Phe Pro
Set
Thr
Tyr Val GUC
0 12
"AA = amino acid. bCod. = codon. ~RSCU = relative synonymous codon usage in highly expressed E.coli genes (Sharp et al,, 1988). dDx5 = Codon occurrence in the Dx5 (glu-Dl-ld) gene. %Dx5 = Codon occurrence in the synthetic HMW-glutenin gene.
2.3. Construction of repetitive domain The first repetitive domain chosen to be constructed was based on the pentadecapeptide motif shown in Fig. 1A (PGQGQQGYYPTSPQQ). Translation of the wheat motif sequence into the E. coli optimal expression codons showed no restriction sites. However, restriction ends could be generated by using the sequence SPQQPGQGQQGYYPT and substituting the moderately used Ser AGC codon for TCT (Fig. 5A). When two of these repeats are fused in the correct orientation, complete motifs are reformed, with a partial repeat at each end of the domain. The monomer (encoding the 15 amino acid repeat motif) was then removed by a NheI + SpeI double digest, isolated by PAGE, and ligated at high insert concentration to the vector pUC13-NSpUC19 cut with the same two enzymes (Fig. 5B). Since all four ends in the ligation are compatible, all possible ligation orientations should occur. Multiple insertion clones cut with NheI +SpeI would restrict only where NheI-NheI or SpeI-SpeI ligations occurred. The head-to-tail ligations would destroy both sites and only DNA fragments with tandemly arrayed repeats would be released from the vector. This strategy was used to construct correct dimer inserts that were used in a second round to construct inserts with four,
six, and eight excisable copies of the monomer in the same orientation. Sequencing confirmed the correct sequence of these multi-copy insertions. To build higher numbers of repeats a second procedure utilized the NheI and SpeI sites at the ends of the repeat polymer (Fig. 5C). This assembly strategy will allow building repeat domains of any size possible from the available clones; i.e., 23--1 + 6 + 16, 48= 16+32, etc.
2.4. Assembly and expression of complete synthetic gene The complete HMW-glutenin construct was assembled and expressed as described in Fig. 6. We typically obtain HMW-glutenin comprising 10-20% of total bacterial protein, or 15-30mg per liter of culture. The synthesized protein has the same unusual extraction properties characteristic of the HMW-glutenins (solubility in dilute alcohols), and its identity has been further confirmed by N-terminal sequencing (D.D. Kasarda, personnel communication). The completed construct expresses a protein of approximately 101 kDa (as calculated from standard protein markers), considerable higher than the MW of 71 kDa derived from the DNA sequence. Such anomalous migrations in SDS-PAGE are characteristic of the HMW-glutenins, and has been thought to be due to the unusual structure of the
55
O.D. Anderson et aL/Gene 174 (1996) 51-58
BstXI A1 Ib lc 2b 2c 3 3c 4 4C 5 5c 6 6c 7 7c
CGAACTGAAT TCCATATGGA TGCTGGCAAG CTTTCAGTTC AACTGAAAGC TTGCCAGCAG TGCTGTTCGT ACTGGCCAGC CTGGCCAGTA CGAACAGCAG CCCCAGAAGA TACGCTGCTG AGCAGCGTAT CTTCTGGGGC GCCTGACCCG GGTAGTAAGA TCTTAC TACC CGGGTCAGGC CAGCCTGGTG TTCAACAGAA TTCTGTTGAA CACCAGGCTG CCACCTTCCA GGCGGCACAT GCTATGTGCC GCCTGGAAGG ACGGA~'~CTATCAC T
A
Ndel AAC TGAATTC ~--~GAAG AGGTGAAGCG TCTGAACAGC ACGTTCCTGC AGTTCCTGCA GTTATGGACC AGCAGCTGCG AACCGGAGAA ACAACAACCG ATCGTTGTTC CGCCGAAAGG CAGCTGCTGC GGCGGAGTGG ATCCCGGCTC TGCTGAAGCG AACCTGCTGC GGACAAGTAA TTCTCCGCAG CGTTCTAGAG ACGTGGTAAG AAGAACTAGT CATCTCTGAA AGTTGCTAAA AGCCGGCAGC TGCGCAGCCA TGGTGACGCT CTGTCTGCTT G
TGCAGTGCGA GTTCGCGATC TGACATCTCT GGTGGCATTC TGGTACCTTC TTTCACCCGG CTACTACCCG CAGACGGGTA ATATCAC TAG GATATCTCTA GCTCAG GCTGCTGAGC CTCAGTGATA
TCGCGAA GCACTGC CCGGAA CGGAGAGATG TATCCGGGTG ATAGAAGGTA TCTGTTACTT GTAGCG TTCTTCTT GAACGC TTTAGCAACT GGGATCCATC
TCACGCAG AAAC C
TTCAGAGA CGT
BamHI
B
-> 7c
<
•-.'--~ 5c 5 ~4c 4 ~3c 3 2c
1 Polymerase Chain Reaction using primers 7c and A1 <
>
Fig. 3. Assembly of termini construct. (A) Names and sequences of the set of oligos for the terminal domain construct. (B) PCR assembly of the intact synthetic HMW-glutenin terminal domain construct. Methods: A specific HMW-glutenin gene (Glu-D-ld, or Dx5 gene encoding the Dx5 HMW-glutenin) correlated with good functional properties was chosen as the model for constructing the terminal domains. A terminal domain construct contained the two terminal regions fused at a site able to receive the separately constructed repeat domain. The choice of oligos was by the following criteria: length of approximately a maximum of 65 bases, maximum specificity of complementary ends, and minimal self-complementarity. The Oligo~ Primer Analysis Software (National Biosciences, Inc.) version 2.0 was used to analyze individual oligos. Oligos were synthesized on an Applied Biosystems Model 391 PCR-Mate ~ DNA Synthesizer. Columns were eluted with 1.5 M ammonium hydroxide, dried with nitrogen while heating at 55°C in a waterbath, and resuspended in 500 ~tl water. The assembly of the complete termini construct sequence was carried out by two PCR reactions. The first reaction contained 0.1~tM each interior oligo, 1 IxM of the PCR priming oligos (A1 and 7c), 200 ~tM each nucleotide triphosphate, 10 mM Tris pH 8.4, 2.5 mM MgCI2, 50 mM KCI, in 100 ~tl. Reaction conditions were 35 cycles with 1 min at 94°C, 1 min annealing at 50°C, 3 min extension at 72°C, terminating with a 7°C soak step. The second reaction utilized 1 I11 of the first reaction product in the same reaction conditions except that only 1 ~tM each of the two terminal primers (A1 and 7c) were included, the extension stage was for 1 min, and 20 cycles were used. The PCR product was phenol/chloroform extracted, precipitated with ethanol, restricted with BamHI + BstX1 and ligated with pBluescript KS + (Stratagene) cut with the same two enzymes. Ligations were transformed into the E. coli strain NM522 and plated on LB + ampillicin (30 pg/ml). Resistant colonies were screened for the plasmid carrying a 522-bp insert that could be removed with NdeI + BamHI. Six positive clones were sequenced using dideoxynucleotide kits from Pharmacia and Sequenase T7 DNA Polymerase from United States Biochemical. Sequencing was performed on single-stranded plasmid DNA prepared by single-strand rescue. Sequencing primers were the universal M13 primer and the construct assembly oligos (Fig. 3). All construct isolates contained single base changes and one isolate was missing 50 bp. An accurate sequence was obtained by exchanging DNA fragments at the XbaI site between two isolates each with a single base substitution, and the resulting plasmid was named puDx5. For expression of the terminal domains, puDx5 was cut with XbaI + SpeI to remove the stuffer fragment intended to be replaced by the repeat domain in the complete synthetic gene. This fragment was then reclosed, destroying both restriction sites, and named puDx5SX. The NdeI + BamHI fragment of puDx5SX was transferred to pET3a (Studier et al., 1991) and named pET3a-uDx5SX. The plasmid puDx5 was used to receive the repeat domain.
56
O.D. Anderson et aL/Gene 174 (1996) 51-58
A
B Da 116 97
66
45
31
22
Fig. 4. Expression of termini construction. Lane A: Induction of HMWglutenin terminal construct. The arrowhead indicates the induced terminal domain polypeptide. Lane B: Induction of control bacteria containing the pET3a plasmid without a coding DNA insertion. Method: Construct pET3a-uDx5SX and pET3a were transformed into E. coli strain BL21(DE3)plysS and plated on LB plus ampiciUin (30 gg/ml). Single colonies were transferred to NZ media (11 g/L NZ-amine, 4 g/L Yeast extract, 1 g/L casamino acids, 5 g/L NaCI, 5 mM MgCI, pH 7.5), 0.4% glucose, 40 mg/L ampicillin, and 34 mg/L chloramphenicol. Cultures were grown at 37°C to approximately 35 Klen units (Klett-Summerson Photoelectric Colorimeter), induced with 0.4 mM IPTG, and grown for 5 h. Cells from one ml samples of each culture were pelleted in a microfuge for 1 rain at 14 000 rpm and either frozen at - 8 0 ° C or immediately analyzed. Total bacterial protein was prepared by resuspending cell pellets in extraction buffer (66 mM Tris pH 6.8, 3% SDS, 2% [3-mercaptoethanol, 3% glycerol, bromophenol blue). Pellets were vigorously vortexed to resuspend and incubated at room temperature for 20 30 min. Resuspended samples were centrifuged at 14 000 rpm for 10 min prior to loading. Extracts were analyzed directly by PAGE consisting of a stacking gel (3.95% acrylamide, 0.05% Bis, 125 mM Tris pH 6.8, 0.1% SDS) and a separation gel (16.93% acrylamide, 0.08% Bis, 0.5 mM Tris pH 8.8, 0.1% SDS). The running buffer was 25 mM Tris, 190 mM glycine, 0.1% SDS. Staining was with Coomassie Brilliant Blue R-250 at 2 g/L in 10% acetic acid, 45% methanol. Destaining was in one wash in 5% acetic, 5% methanol for 1 h, followed by water washes. Quantitation of expression levels was by densitometry using the Alpha-Innotech IS-1000 Digital Imaging System.
repetitive domain (Goldsbrough et al., 1989) which is predicted to be highly hydrophilic. However, our previous work (Shani et al., 1992) and the expression of the synthetic terminal construct (Section 2.2)) showed that the N-terminal non-repetitive region of the HMW-glutenins can also unexpectedly contribute to anomalous migrations by an as yet unknown mechanism. The use of E. coli high expression codons did not consistently increase levels of synthesis over that of cloned HMW-glutenin genes expressed in bacteria. Since Robinson et al. (1984) reported that optimal codons
were effective only at the highest levels of synthesis, it is possible that the substitution of several suboptimal codons may have limited synthesis levels despite the high CAI of the synthetic gene. The unusual nature of the HMW-glutenin proteins may also place a limit on levels of bacterial synthesis. Another possibility is that the Coomassie Brilliant Blue dye used to identify proteins binds less to the synthetic HMW-glutenin. While the exact mechanism of Coomassie Blue binding is not clear, there are several reports that positively charged residues are the main targets (Davies, 1988; Tal et al., 1980). Since the synthetic repeat contains no positively charged residues, Coomassie Blue may be binding poorly. Goldberg et al. (1989), in studying construction and expression of a collagen-analog synthetic gene, speculated that their failure to obtain larger repetitive clones could be related to E. coli's known instability toward simple repeat sequences (Lohe and Brutlag, 1986; Sadler et al., 1980). The possibility of deletions in the HMWglutenin repeat domain was a concern, particularly when one of the first cultures produced a second, smaller, HMW-glutenin protein in addition to the expected protein. However, tests of the stability of the synthetic HMW-glutenin gene indicate infrequent deletions are only observed in specific initial bacterial isolates but not in established cell lines. In one experiment, a cell line with the correct size insert was grown serially through approximately 150 generations without evidence of deletions in individual cells. Nevertheless, as a precaution, each preparation of synthesized protein is tested for protein size before passing on to further protein experimentation, and DNA insert stability is periodically tested.
3. Conclusions
We have described the procedures used to successfully assemble a synthetic HMW-glutenin gene. The new gene contains convenient restriction sites for future modification of the terminal domains (particularly the number and placement of cysteine residues) and the set of repeat polymer clones will allow constructing 15-AA motif repeat domains of any size, limited only by the ability of E. coli strains to maintain the repeated DNA. The construct supported synthesis of a novel HMW-glutenin protein at levels at least comparable to that of cloned wheat HMW-glutenin genes expressed in E. coll. This initial synthetic HMW-glutenin protein is under study for its use in physical-chemical studies of the molecular basis of HMW-glutenin functionality. Its projected uses include studies on rheology, specificity of cysteine crosslinking, polymer formation, and protein ultrastructure. Additional constructs are now possible, including HMW-glutenins with different multiples of the
O.D. Anderson et aL/Gene 174 (1996) 51 58
A
57
S P Q Q P G Q G Q Q G Y Y P T ---AGCA~CCCACAGCAGCCAGGT~AGGGTCAGCAGGGFTACTAC~CG~TAGTGCAT ..... ---TCGTC~GGGTGTCGTCGGTCCA~TCCCAGTCGTCCCAATGATGGGCTGATC~CGTA---~
Nhel
~el
/nlo-e ~ H
Monomer
CTAGCCCACAGCAGCCAGGTCAGGGTCAGCAGGGTTACTACCCGA GGGTGTCGTCGGTCCAGTCCCAGTCGTCCCAATGATGGGCTGATC
B N
N
r• -
s
r-
s
~S, N
pNSpUC19
C
Nhel
Nhel ~ S p
el
Afllll --~
Nhel
v
Spf/IlleA
Spel
I Afllll
Fig. 5. Repeat domain assembly. (A) The repetitive domain was constructed starting with the complimentary 59-bp oligos LS-15AA (upper strand sequence) and US-15AA (lower strand sequence). Above are the encoded amino acids to form the peptide repeat and below are the included restriction sites. The two oligos were annealed and blunt end ligated into the Sinai site of pUC13. One isolate was chosen for further repeat assembly and named pl5AA. (B) To create multiples of the monomer, a vector was needed that would contain NheI and SpeI sites within a polylinker. No such vector was commercially available, so one was constructed. The oligos LSpUC19 (GATCCTTGGATCCTGAGCTAGCCCGGGTACTAGTG) and USpUCI9 (GAACCTAGGACTCGATCGGGCCCATGATCACTTAA), when annealed and ligated into BamHI + EcoRI cut pUCI3, create a polylinker with internal NheI, SpeI, and Sinai sites, plus an additional BamHI site. When ligated into pUC13 the successful insertion is seen as a white colony in the background of blue colonies. A white colony was selected and the insertion was confirmed by sequencing. A BamHl digest and religation then removed the extra NheI site and restored the reading frame through the 13-galactosidase gene to allow selection of blue colonies on a white colony background. One isolate of the new vector was selected and named pNSpUC19. Construction of multiples of the repeat monomer was accomplished by ligating high concentrations of repeat inserts (with NheI and SpeI ends) to Nhel + SpeI cut pNSpUC19. In the first round the repeat monomer with identical 5' overhanging termini was prepared from pl5AA by NheI+SpeI digestion, followed by centrifugation in MicroconTM Microconcentrator 100 tubes, recovering the 45-bp repeat fragment in the filtrate. The monomer was ligated alone for 5 rain at 20 ~tg/ml, then 0.1 ~tg vector was added and the ligation continued for 1 h at 15°C. The ligation product was transformed into E. coli NM522 cells and plated on ampicillin (40 ~tg/ml). Screening of 96 colonies found 54 with inserts the size of a dimer (90 bp) or larger. Digesting with NheI and Spel separately and together and separation on a 15% acrylamide gel found 6 isolates with the desired head-to-tail orientation to regenerate the NheI and SpeI end sites; 4 dimers and 2 trimers. Sequencing confirmed the correct structure of one of the dimers, and this clone was used to isolate the 90-bp dimer unit to repeat a second round of ligations. Out of 128 clones screened, 48 contained multi-mers of the dimer, of which 6 yielded clones with 4, 6, and 8 copies of the original monomer unit in correct tandem order. An 8 copy clone (p25D) was used to construct larger repeat units. (C) Further increases in repeat size used the unique and compatible NheI and SpeI sites flanking the repeat multimers. Plasmid p8R was cut with Spe! +AfllII and the repeat containing fragment isolated. Similarly, p8R was cut with NheI + AfllII and the repeat containing fragment isolated. The two repeat containing fragments were then ligated together resulting in directional cloning reforming the plasmid DNA and creating a repeat insert of l 6 copies (p 16R). Clone p 16R was used to repeat the procedure to create clone p32R which contains 32 tandem copies of the repeat monomer.
58
O.D. Anderson et aL/Gene 174 (1996) 51-58
A
B
C kDa •116
• 97
• 66
,45
Fig. 6. Expression of the complete synthetic HMW-glutenin gene analog. Lane A: Wheat cultivar Cheyenne seed extract. HMW-glutenins 2", 5, 7, 9, and 10 are indicated on the left. Lane B: Bacterial extract from induced cells containing the pET3a plasmid with no insert. Lane C: Bacterial extract from induced cells containing the complete synthetic HMW-glutenin expression construct. Protein molecular weight marker positions are indicated on the right. Method: The insert from clone p32R (containing 32 copies of the repeat monomer) was excised as a NheI + SpeI fragment and ligated to XbaI + SpeI cut puDx5 (Fig. 3) as planned in Fig. 1. An isolate with the correct orientation of the repeat inserted into the puDx5 was identified and named puDx5-32mer. The complete synthetic HMW-glutenin gene was isolated from puDx5-32mer as a NdeI +BamHI fragment, inserted into pET3a, and named pET3a-Dx5-32mer. For expression of the synthetic HMW-glutenin, construct pET3a-Dx5-32mer was transformed into E. coli strain BL21(DE3)plysS, induced, and analyzed as in Fig. 4 with the exception that samples were run on a 10% acrylamide separation gel (9.95% acrylamide, 0.05% Bis, 0.4 M Tris pH 8.8, 0.1% SDS). 15 a m i n o a c i d motif, w i t h m u l t i p l e s of the 6 a n d 9 r e s i d u e motifs, a n d w i t h m o d i f i c a t i o n s of t h e n u m b e r a n d p l a c e m e n t of c y s t e i n e residues.
Acknowledgement
P a r t o f this r e s e a r c h was s u p p o r t e d b y R e s e a r c h Grant Awards Nos. US-1548-88 and US-2318-9~ from BARD - The United States-Israel Binational Agricultural Research and Development Fund.
References
Anderson, O.D., Yip, R.E., Halford, N.G., Forde, J., Shewry, P.R., Malpica-Romero, J.-M. and Greene, F.C. (1989) Nucleotide sequences of two high-molecular-weight glutenin subunit genes from the D-genome of a hexaploid bread wheat, Triticum aestivum L. cv Cheyenne. Nucleic Acids Res. 17, 461-462. Blechl, A.E., Thrasher, K.S., Vensel, W.H. and Greene, F.C. (1992) Purification and characterization of wheat ¢z-gliadin synthesized in the yeast, Saccharomyces cerevisiae. Gene 116, 119-127. Davies, E.M. (1988) Protein assays: a review of common techniques. American Biotechnology Laboratory 8, pp. 28-37.
Ferretti, L., Karnik, S.S., Khorana, H.G., Nassal, M. and Oprian, D.D. (1986) Total synthesis of a gene for bovine rhodopsin. Proc. Natl. Acad. Sci. USA 83, 599-603. Galili, G. (1989) Heterologous expression of a wheat high molecular weight glutenin gene in Escherichia coli. Proc. Natl. Acad. Sci. USA 86, 7756-7760. Goldberg, I., Salerno, A.J., Patterson, T. and Williams, J.I. (1989) Cloning and expression of a collagen-analog-encoding synthetic gene in Escherichia coli. Gene 80, 305 314. Goldsbrough, A.P., Bulleid, N.J., Freedman, R.B. and Flavell, R.B. (1989) Conformational differences between two wheat (Triticum aestivum) high-molecular-weight glutenin subunits are due to a short region containing six amino acid differences. Biochem. J. 263, 837 842. Greene, F.C., Anderson, O.D., Yip, R.E., Halford, N.G., MalpicaRomero, J.-M. and Shewry, P.R. (1989) Analysis of possible qualityrelated sequence variations in the D glutenin high molecular weight subunit genes of wheat. Proc. Int. Wheat Gene. Swamp. 7th Cambridge 1,735-740. Lohe, A.R. and Brutlag, D.L (1986) Multiplicity of satellite DNA sequences in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 83, 696-700. Makoff, A.J., Oxer, M.D., Romanos, M.A., Fairweather, N.F. and Ballantine, S. (1989) Expression of tetanus toxin fragment C in E. coli: high level expression by removing rare codons. Nucleic Acids Res. 17, 10191-10202. Martin, S.L., Vrhovski, B. and Weiss, A.S. (1995) Total synthesis and expression in Escherichia coli of a gene encoding human tropoelastin. Gene 154, 159 166. McPherson, D.T., Morrow, C., Minehan, D.S., Wu, J., Hunter, E. and Urry, D.W. (1992) Production and purification of a recombinant elastomeric polypeptide, G-(VPGVG)Ig-VPGV, from Escherichia coli. Biotechnol. Prog. 8, 347-352. Neill, J.D., Litts, J.C., Anderson, O.D., Greene, F.C. and Stiles, J.I. (1987) Expression of a wheat alpha-gliadin gene in Saccharomyces cerevisiae. Gene 55, 303-317. Pratt, K.A., M adgwick, P.J., Shewry, P.R. (1991) Expression of a wheat gliadin protein in yeast (Saccharomyces cerevisiae). J. Cereal. Sci. 14, 223 229. Robinson, M., Lilley, R., Little, S., Emtage, J.S., Yarranton, G., Stephens, P., Millican, A., Eaton, M. and Humphreys, G. (1984) Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic Acids Res. 12, 6663-6670. Sadler, J.R., Tecklenburg, M. and Betz, J.L. (1980) Plasmids containing many tandem copies of a synthetic lactose operator. Gene 8, 279-300. Shani, N., Steffen-Campbell, J.D., Anderson, O.D., Greene, F.C. and Galili, G. (1992) Role of the amino- and carboxy-terminal regions in the folding and oligomerization of wheat high molecular weight glutenin subunits. Plant Physiol. 98, 433-441. Sharp, P.M., Tuohy, T.M.F. and Mosurksi, K.R. (1986) Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 14, 5125-5143. Sharp, P.M. and Li, W.H. (1987) The codon adaption index - a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281-1295. Sharp, P.M., Cowe, E., Higgins, D.G., Shields, D.C., Wolfe, K.H. and Wright, F. (1988) Codon usage in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable withinspecies diversity. Nucleic Acids Res. 16, 8207-8211. Shewry, P.R., Halford, N.G. and Tatham, A.S. (1992) High molecular weight subunits of wheat glutenin. J. Cereal. Sci. 15, 105-120. Shewry, P. R., Tatham, A. S., Barro, F., Barcelo, P. and Lazzeri, P. (1995) Biotechnology of breadmaking: unraveling and manipulating the multi-protein gluten complex. Bio/Technology 13, 1185-1190. Studier, F.W., Rosenberg, A.H., Dunn, J.J. and Dubendorff, J.W. ( 1991 ) Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol. 185, 60-89. Tal, M., Silberstein, A. and Nusser, E. (1980) Why does coomassie brilliant blue R interact differently with different proteins. J. Biol. Chem. 260, 9976-9980.