Gene,27 (1984)81-99 Elsevier GENE
940
Cloning of eukaryotic genes in single-strand phage vectors: the human interferon genes (Recombinant DNA; bacteriophage fl ; leukocyte; fibroblast; ZucUVSpromoter; Escherichia coli host; cDNA; DNA sequencing)
Donald W. Bowden, Jen-i Mao, Tina Gill, Kathy Hsiao, Jay S. Lillquist, Douglas Testa * and Gerald F. Vovis Collaborative Research, Inc., 128 Spring St., Lexington, MA 02173 (U.S.A.) Tel. (617) 861-9700, and *Inteferon Sciences, Inc., 783 Jersey Avenue, New Brunswick, NJ 08901 (U.S.A.) Tel. (201) 249-3232 (Received
July 19th, 1983)
(Revision
received
(Accepted
October
October
lOth, 1983)
1 lth, 1983)
SUMMARY
Using oligonucleotide probes with defined sequences, we have selected clones from a human lymphocyte cDNA library which represent human leukocyte (HuIFN-a) and fibroblast (HuIFN-8) interferon gene sequences. Double-stranded fl phage DNA was used as the vector for initial cloning of cDNA. Clones carrying interferon gene sequences were identified by hybridization with the oligonucleotide probes. The same oligonucleotide probes were used as primers for dideoxy chain termination sequencing of the clones. One HuIFN-a clone, 201, has a nucleotide sequence different from published HuIFN-a sequences. Under control of the ZucUVS promoter, the 201 gene has been used to express biologically active HuIFN-a in Escherichia coli.
INTRODUCTION
A wide variety of methods have been used to clone eukaryotic genes into prokaryotic systems. In this article we describe a method for cDNA cloning directly into the bacteriophage fl. Previously, the single-stranded phage cloning systems have been used primarily as second-stage cloning systems, i.e., for subcloning after the initial cDNA cloning in a Abbreviations:
AMV, avian myeloblastosis
virus; bp, base pairs;
cDNA, DNA complementary
to mRNA;
DTT, dithiothreitol;
human interferon
libroblast);p,
HuIFN,
plasmid;p,O,,
lacZpUV5
CPE, cytopathic
promoter;
form of phage fl DNA; SDS, sodium dodecyl see MATERIALS plasmid
carrier
AND METHODS,
0 1984 Elsevier
8,
RF, replicative
sulfate; TY broth,
[ 1, indicates
section
a;
Science
Publishers
state.
0378-l 119/84/$03.00
effect;
(cq leukocyte;
double-stranded vector (Messing, 198 1). The cloning system described here has a number of novel features which aid in the identification, DNA sequencing, storage and manipulation of cloned material. As an example of eukaryotic genes cloned into fl, we have chosen the genes for human leukocyte interferon (HuIFN-a) and human fibroblast interferon (HuIFN-/I). The interferons are a family of proteins which have the ability to confer a virus-resistant state upon their target cells (Isaacs and Lindenmann, 1970; Stewart, 1979). Several groups have cloned cDNAs for the HuIFN-a (Nagata et al., 1980a,b; Goeddel et al., 1980a) and HuIFN# gene (Taniguchi et al., 1980; Goeddel et al., 1980b; Derynck et al., 1980). Analysis of the cDNAs for HuIFN-a revealed a family of 20 or more distinct, but related
xx
genes (Goeddel et al., 1981; Mantei et al., 1980; Streuli et al., 1980; Lawn et al., 1981b; Nagata et al., 1980b). In contrast, there is apparently only one gene coding for HuIFN-/3 (Ohno and Taniguchi, 1981; Lawn et al., 1981~; Degrave et al., 1981). We have found representatives of both gene types in a cDNA library generated from Sendai virus-induced lymphocyte mRNA.
MATERIALS
AND METHODS
(a) Strains and media E. coli strain JMlOl [(F’ traD36 proAB laclZdM15) in a d(lac-pro) supE thi background] was used as the bacteriophage fl host. Transfections were carried out using E. co& strain BNN45 (hsdI? _ h&M+ supE44 supF met thi) of Davis et al. (1982). Strain CGE43 [F’ d(Zac-pro) x 11 l] (Davis et al., 1982) was used as a host for expression experiments. Plasmid pGL101 (Guarente et al., 1980) carries the E. coli lac promoter (UV5 allele) on a derivative of pBR322. This prCIC promoter on pGL 101 is constitutively expressed. Rich medium for growth of bacteria was TY broth (10 g Bactotryptone, 8 g NaCl, 1 g yeast extract, 1 g glucose, and 60 mg NaOH per liter). Minimal medium was M9 as described by Miller (1972) with the addition of 0.5% casamino acids. Human peripheral blood leukocytes were induced with Sendai virus and the cells harvested at 4 h after induction.
(b) Enzymes, synthetic oligonucleotides, and special chemicals T4 DNA ligase and T4 polynucleotide kinase were products of Collaborative Research, Inc. DNA polymerase I Klenow fragment was purchased from Boehringer-Mannheim. Restriction enzymes were purchased from New England Biolabs and used as recommended by the supplier. AMV reverse transcriptase was purchased from Life Sciences, Inc., St. Petersburg, FL. Hind111 linkers (CCAAGCTTGG), oligo(dT)-cellulose, oligo(T),,_,, and synthetic oligonucleotides were products of Collaborative Research, Inc.
(c) Isolation of poly(A)RNA Totai RNA (4 mg) was isolated from 3.6 g Sendai virus-induced lymphocytes by the method of Foster et al. (1980). In this method RNA is separated from a guanidinium salt cell homogenate by centrifugation through a cushion of 5.7 M CsCl. Po1yfA)RNA (40 pg) was obtained by chromatography on a l-ml oligo(dT)-cellulose column as described by Desrosiers et al. (1975). (d) Synthesis and cloning of double-stranded cDNA Double-str~ded cDNA (about 0.1 pg) was synthesized from 25 pg poly(A)RNA, purified, and inserted via Hind111 synthetic linkers into the DNA of filamentous phage fl derivative CGF4 using the procedure described by Moir et al. (1982) to generate a cDNA library from calf stomach mRNA. These procedures generated a population of replicative form CGF4 DNA containing cDNA copies of lymphocyte mRNA. The cDNA was inserted at the engineered Hind111 site (Moir et al., 1982) in the intergeni~ space of CGF4. This mixture of recombinant DNA was used to transform BNN45 to produce a library of phage plaques. The plaques produced by transformation were stored on TY plates. (e) Plaque hybridization and sizing of cDNA inserts Recombinant phage plaques were transferred to nitrocellulose filters (Schleicher and Schuell, Inc.) using the method of Benton and Davis (1977). Plaques were hybridized with [5’-32P]oligonucleotides by the method described by Wallace et al. (1981) (see Fig. 2). The size of insert DNA present in the recombinant phage was measured by electrophoresis of whole phage in agarose gels (Moir et al., 1982). Alternatively, double-stranded RF1 phage DNA was produced using a modification of the isobutanol boiling method (Holmes and Quigley, 1981). The length of inserts in such DNA could subsequently be measured by standard restriction enzyme analysis. (f) DNA isolation and DNA sequencing Double-stranded RF1 DNA was isolated from bacteriophage fl-infected bacteria by the method of
89
Model and Zinder (1974). Since-str~d~ circular fl phage DNA was prepared from plate stocks of fl using the method described by Zinder and Boeke (1982). Phage fl cDNA clones were screened by DNA sequencing using the dideoxy chain termination method of Sanger et al. (1977). Single-stranded circular phage DNAs, carrying cDNA inserts which hybridized to inte~eron probes, were used as a template for priming with synthetic interferon oligonucleotides in the dideoxy chain termination sequencing. The complete nucleotide sequence of clone 201 was obtained by using the sequencing method of Maxam and Gilbert (1980). Plasmid pBR322 and derivatives were purified by standard procedures. (g) Construction of a 5’-coding sequence for the mature HuIFN-a type 201 gene Before it was expressed in E. coli, the cloned 201 HuIFN-a gene was modified to delete the preinterferon leader coding sequence with the concomitant addition of an ATG initiation codon. The ATG was placed 5’ to the trinucleotide sequence TGT (nucleotides 70-73 of Fig. 5) which codes for the first amino acid of mature interferon (Mantei et al., 1980; Nagata et al., 1980; Lawn et al., 1981a; Goeddel et al., 1981). Based on the fact that Sau3A cuts at the 3’-side of the TGT codon, a self-complementary oligonucleotide (ACACATCGATGTGT), which is recognized by ClaI and contains an ATG-TGT sequence, was synthesized. Initially the 1237-bp cDNA insert was purified from phage 201 RFI DNA by agarose gel electrophoresis following Hind111 digestion. A Sau3A fragment which contains the coding sequence for amino acid residues 2 to 61 (nucleotides 73 through 252, Figs. 4 and 5) was purified by digesting 30 pg of the 1237-bp fragment with 10 units Sau3A (4 h at 37”C, 50-~1 volume). The appropriate Sau3A fragment was purified after polyac~l~ide gel dectrophoresis by electroelution, followed by phenol extraction and ethanol precipitation. The DNA pellet was suspended in H,O, and the cohesive ends were filled in by treating the DNA with DNA polymerase I Klenow fragment (30 min, room temperature; in 0.1 mM each nucleoside triphosphate, 66 mM Tris * HCI pH 7.5,10 mM MgCl, and 6.6 mM DTT). The oligonucleotide containing the CZaI site was ligated onto this blunt-ended Sau3A fragment by
adding 5 pg of 3zP-end-labelled oligonucIeotide, ATP to 1 mM, and T4 DNA ligase to the previous reaction (Goodman and MacDonald, 1979), followed by overnight incubation at 17 ‘C. This ligation restored the first codon of mature interferon, TGT, and placed an ATG initiation codon at its 5’-end. This ligated DNA preparation was digested with ClaI. The DNA fragment containing the original Sau3A site and now with ClaI termini was purified by agarose gel electrophoresis and ligated into the CZaIsite of pBR322 to produce pCGE32 (see Fig. 7). The resulting DNA was used to transform CGE43. The plasmid DNA in ampicillin-resistant colonies was isolated and characterized by restriction enzyme digestion. One transfo~~t with a DNA fragment containing the interferon coding sequence for amino acids 1 to 61 (nucleotides 70-252, Figs. 4 and 5) in the C/a1 site of pBR322, was designated CGE79, and the resident plasmid pCGE32. The structure of this plasmid is shown in Fig. 7. (h) Cons~uction of an expression vector for mature HuIFN-a type 201 under control of the lacUV5 promoter A plasmid designed to express mature HuIFN-a type 201 under control of the E. colz’ lucUV5 promoter &,) was constructed by the four-molecule ligation outlined in Fig. 7. The plasmid pGL 101 was digested with PvuII and Pst I, and the DNA fragment containingp,, was purified using agarose gel electrophoresis. The plasmid pBR322 was digested with PstI and Hind111 and the large DNA fragment containing the plasmid origin of replication (ori) was purified using agarose gel electrophoresis. The plasmid pCGE32 was first digested with HindIII, then treated with DNA polymerase I Klenow fragment as described in section g above to till in the cohesive ends and produce blunt-ended termini. The resulting DNA was digested with EcoRI and the short EiindIII-EcoRI DNA fragment containing the 5’coding sequence for mature interferon (ATG TGT.. . ; bp 70-181 of Figs. 4 and 5) was purified using agarose gel electrophoresis. The 1237-bp fragment containing the entire 201 cDNA insert was purified after HindI digestion of the RF1 DNA. This DNA fragment was subjected to a partial digest with EcoRI and the EcoRI-i;lindIII fragment containing the 3’-coding region of interferon
(bp 182-1202 of Figs. 4 and 5) was purified by agarose gel electrophoresis. The four different fragments were pooled and ligated with T4 DNA ligase in the manner described above (MATERIALSANDMETHODS, sectiong).After ligation the DNA was used to transform CGE43. Transformants were selected by growth on ampicillin. The plasmid DNA of ~picillin-resist~t colonies was isolated and analyzed by restriction enzyme digestion. The nucleotide sequence of three different plasmids was determined in the junction region of the promoter and mature interferon coding sequence to insure that the ATG for initiation of translation was in the correct position. The three plasmids were designated pCGE35, pCGE36, and pCGE37 (Fig. 7) and the strains in which they were grown: CGE88, CGE89, and CGEOO, respectively. (i) Preparation of extracts and assay for interferon activity in E. coli extracts Bacteria were grown in 50-ml cultures of TY broth with 20 pg/ml ampicillin to a cell density corresponding to 100 as measured by a Klett-Summerson calorimeter. The cells were collected by centrifugation, washed once with 10 ml M9 medium and resuspended in 0.6 ml of a buffer containing 0.01 M Tris * HCl pH 7.5, 0.05 M NaCl, 0.5 mM EDTA, 5 y0 (v/v) glycerol. The resuspended cells were frozen at -20°C overnight. The cells were thawed by brief exposure to room temperature and 0.05 ml egg white lysozyme (20 mg/ml) was added to the mixture. The mixture was incubated at 4°C for 1 h, followed by a brief sonication to reduce the viscosity of the lysed bacterial suspension. The mixture was clarified by centrifugation, and the supernatant assayed for IFN-cr activity. Interferon titers in bacterial extracts were determined by comparison with the NIH HuIFN-cc standards using the CPE assay. The indicator cells, WISH, were challenged with vesicular stomatitis virus.
sections a and c) and used as a substrate for the synthesis of double-stranded cDNA (MATERIALS AND METHODS, section d). Following addition of ~~~dIIIlnkers (MATERIALS AND METHODS, section d), this cDNA was ligated to phage CGF4 doublestranded DNA which had been cut with Hind111 and treated with calf alkaline phosphatase. A portion of the ligated DNA was used to transfect E. cdi strain BNN45 (MATERIALS AND METHODS, section d). A library of approx. 8500 phage plaques was generated on TY plates. Control experiments indicated that 90% of the plaques carried cDNA inserts. (b) Identification of cDNA clones carrying HuIFN-a and HuIFN-~ DNA sequences The CGF4-cDNA library was screened by hyb~dization, using “2P-labeled oligonucleotide probes (MATERIALS ANDMETHODS, sectione). The nucleotide sequences of the probes are shown in Fig. 1. The sequence for the 18-base HuIFN-or gene probe, 18A, was chosen to be complements to a sequence conserved in the HuIFN-c( gene family (Goeddel et al., 1981). The sequence for the HuIFN-~probe, 18B, was chosen to be complementary to a region of similar sequence in the HuIFN-0 gene (Taniguchi et al., 1980). These probes were hybridized to nitrocellulose filter replicas of the CGF4-cDNA library (MATERIALS AND METHODS, section e). These filters were used to expose X-ray film. Representative results are shown in Fig. 2. Two replica filters were made from each plate. One was probed with the HuIFN-a probe 18A and the other filter was probed with the HuIFN-fl probe 18B. The two probes hyb~dized with different plaques on the replica filters, even though the two probes differed in only four of eighteen nucleotides. Approx. 450 cDNAcontaining plaques were present on each plate, but only a small number of plaques were selected by the probes. This result indicated the method was very selective. About 0.6% of the plaques hybridized with the 18A probe and 0.1% of the plaques hybridized
RESULTS
(a) Con&u&ion of an fl-lymphocyte cDNA Iibrary Poly(A)RNA was isolated from Sendai virusinduced lymphocytes (MATERIALS AND METHODS,
Fig.1.Sequences ofoligonucleotide probes. DNA sequences of HuIFN-ct probe 18A and HuIFN-P probe 18B. The boxes mark nucleotides
which differ.
FILTERI.5
FILTER 16
FILTER
16
Fig. 2. Hybridization of cDNA recombinant plaques with oligonucleotide probes. Plaque transfers and DNA denaturization were performed as described in MATERIALS AND METHODS, section e. Nitrocellulose filters were hybridized with 5’-32P-labeled oligonucleotide probes (approx. lo* cpm/pg, 10’ cpm/tilter) overnight at 50°C. Hybridization buffer was 0.05 M sodium phosphate, 0.9 M NaCl, 5 mM Na, . EDTA, 0.1% SDS, 0.1% pyrophosphate, 0.02% bovine serum albumin, 0.02% polyvinyl pyrollidine, 0.02% Ficoll pH 7.0. Filters were washed with 2 x SSPE buffer (0.36 M NaCl, 0.02 M NaPO,, 2 mM Na, . EDTA, pH 7.0), air dried, and placed against X-ray film for autoradiography. Autoradiographic replicas of the filters are shown. Filters 15 and 16 were probed with 18A (left) or 18B (right). The numbers classify plaques which hybridized the labeled probes (200-299 selected by HuIFN-a probe; 300-399 selected by the HuIFN-/l probe) and consequently exposed the X-ray film.
to the 18B probe, HuIFN-figene
suggesting
sequences
both
HuIFN-a
mined from cDNA clone 304 using the 18B probe as
and
a primer. Comparison
were present in the cDNA
library.
published
sequence
1980) indicated (c) Characterization of clones by DNA sequencing
of the 304 sequence for HuIFN-fl probes
18A or 18B can only
select half of the phage plaques
RIALS
AND
nucleotide
METHODS,
probes
section
f). Using
as primers,
the single-stranded
stranded
CGF4
vector,
in either orienta-
the
mature
single-
phage DNA carries only one strand of the cDNA
because
insert.
Interferon
only
clones inserted
can be selected
the phage
which
in
by the probes
contained
single-
stranded DNA complementary to the probes will hybridize. Interferon clones carrying DNA inserts in the opposite orientation would not carry sequences complementary to the probes, and consequently would not hybridize. To find this other class of clones, nick-translated, 32P-labeled clone 201 RF1 (double-stranded) DNA was used to rescreen the nitrocellulose filters by the plaque-filter method of Benton and Davis (1977). This second screening of the filters identified additional plaques not found
HuIFN-c( type B, A and D respectively by Goeddel et al. (1981). Using the same technique, a clone selected by the 18B probe was shown to carry a cDNA insert
with the oligonucleotide probe 18A since both strands of interferon sequence of clone 201 were available for hybridization. As expected, about half
corresponding to the HuIFN-P gene. Figure 6B shows a portion of the nucleotide sequence deter-
-G-C-T-G-T-G-A
which carry inter-
Since the Hind111 linker-
could have inserted
the
only one orientation
stretches of sequence determined from these experiments are shown in Fig. 3A. The sequences derived from clones 203 and 204 were identical. The sequences from 201, 202 and 203 corresponded to three published nucleotide sequences designated
202
into
original
(1977). Initially, four plaques selected by the HuIFN-a probe 18A (designated 201,202,203 and 204) were chosen for DNA sequencing. Short
201
sequences.
tailed cDNA tion
the oligo-
DNAs were used as templates for sequencing by the dideoxy chain termination method of Sanger et al.
203
feron-related
et al.,
the clone 304 is a HuIFN-/I sequence.
The oligonucleotide
DNA from plaques selected by the oligonucleotide probes was purified and partially sequenced (MATE-
with the
(Taniguchi
G-A-A-A-T-A-C-T-T-C-C
A-A-G-A-A-T-C-A-C-T-C-T
G-A-A-A-T-A-C-T-T-C-C
A-A-G-A-A-T-C-A-C-T-C-T
G-A-A-A-T-A-C-T-T-C-C
A-A-G-A-A-T-C-A-C-T-C-T
A. 201
A-G-A-G-A-A-G-A-A-A-T-A-C-A-G-C
T C-T-T-G-T-G-C-C-T-G-G-G-A-G-G-T-T-18A
202
A-G-A-G-A-A-G-A-A-A-T-A-C-A-G-C
C C-T-T-G-T-G-
203
A-G-A-G-A-A-G-A-A-A-T-A-C-A-G-C
-18A
C-T-T-G-T-G-C-C-T-G-G-G-A-G-G-T-T-18A ltC
HuIFN-cc CLONES
304 6.
-A-G-G-A-T-T-C-T-G-C-A-T -T -G-T-G-C-C-T
-T-A-C-C-T-G-A-A-G-G-C-C-A-A-G-G-A-G-T-A-C-A-G-T-C-A-C
-G-G-A-C-C-A-T-A-18B
HuIFN-p CLONES Fig. 3. DNA sequences method indicators.
as described
of recombinant
(B) The determined
et al., 1980).
plaques.
in the text. (A) The limited sequence
Recombinant sequences
of 304 is identical
fl plaques
were selected
could be used to assign to the corresponding
and sequenced
HuIFN-a
using the dideoxy
termination
types using the boxed nucleotides
region of the published
HuIFN-/I
sequence
as
(Taniguchi
93
(e) Characterization of clone 201
of the plaques selected by the nick-translated probe were previously identified by the 18A probe. After plaque purification and rescreening with the radiolabeled probes, 50 plaques carrying HuIFN-a interferon-related sequences were obtained. Ten plaques carrying HuIFNj? interferon sequences were isolated.
The initial sequence information obtained by the dideoxy chain termination method suggested that the clone 20 1 was identical to the HuIFN-a type B clone of Goeddel et al. (1981). More detailed analysis of the insert DNA in 20 1 revealed significant differences between the 201 sequence and the HuIFN-a type B. Initially it was observed that the 20 1 insert DNA was approx. 1200 bp long, whereas the published B sequence is 1042 bp long (Goeddel et al., 1981). A detailed restriction enzyme map, shown in Fig. 4, was generated for the 201 insert DNA. The 201 insert contains four Sau3A restriction enzyme sites, while the published B sequence contains five Sau3A sites. The “missing” site is in the region of 201 corresponding to the protein coding sequence at nucleotide 37 1 of the published B sequence (Goeddel et al., 1981) (nucleotide 369 of the 201 sequence shown in Fig. 5). In addition an AccI site found in the 3’noncoding region of the B sequence is not present in the 20 1 sequence. Finally, two sites for HgiAI were found in the 3 ‘-noncoding region of 20 1, while only one HgiAI site is present in the published B sequence. These differences in the restriction enzyme map suggested the 201 clone might represent a different
(d) Characterization of the interferon clones Phage selected by the 18A or 18B probes were sized by measuring their electrophoretic mobility (MATERIALS AND METHODS, section e) and DNA insert lengths were determined by agarose gel electrophoretic analysis of Hind111 restriction endonuclease digested RF1 DNA molecules (MATERIALS AND METHODS, section e). Twelve of the 50 HuIFN-cr clones carried cDNA inserts of over 800-bp in length. This observation suggested that these recombinants might contain full length HuIFN-a genes. All of the HuIFN-/I clones carried inserts of less than 500 bp. This observation suggested that these recombinant phage do not contain full-length HuIFN-P genes. At this time we have not further characterized the HuIFN-/I clones. One HuIFN-a clone, phage 201, was selected for further study.
TGA I
ATG I (.............,
I
I
l
D
I
*
< 1 I
I
l
I
l
*
1
* I
I
l
I
w (__-__---
l- e-_-m,
(_.+_____>
J I
l
I
D l
I
II
..*.,-
<_+_______*
--I
*______-_I
Fig. 4. A map of restriction enzyme cleavage sites in the cDNA insert of recombinant phage 201. Nucleotides are numbered beginning with the A of the preinterferon ATG initiation codon. The direction is 5’ to 3’, left to right. The restriction enzyme site shown in parentheses is present at additional, unmarked sites in the cDNA insert. The marked site was used for DNA sequencing. The horizontal arrows mark the length of DNA sequence determined from a restriction fragment labeled at the site marked by a vertical line associated with each arrow. The dotted horizontal arrow indicates the sequence determined by the dideoxy chain termination method (see MATERIALS AND METHODS, section f).The solid and dashed horizontal arrows denote the sequences determined by the method of Maxam and Gilbert (1980). Solid arrows denote 5’-end labeling with T4 polynucleotide kinase and dashed arrows denote 3’-end labeling with cordycepin.
94
F3’ T
-40
CAAGCTTG
GTC ATCCATCTGA
-20
ACkAGCTCAG
G-10
CAGCATCCk
1
MCATCCTACA
10
2o
30
A
ATG GCC TTG ACT TTT TAT TTA &TG GTC GCC CTA GTG GTG met ala leu thr phe tyr leu leu val ala leu val val
60 50 70 110 40 80 90 100 120 CTC AGC TAC AAG TCA TTC AGC TCT CTG GGC TGT GAT CTG CCT CAG ACT CRC AGC CTG GGT AAC AGG AGG GCC TTG ATA CTC leu ser tyr lys sei phe ser ser leu gly cys asp leu pro gin thr his ser leu giy asn arg arg ala leu ile :eu 130 140 150 170 160 180 190 200 CTG GCA CAA ATG CGA AGA ATC TCT CCT TTC TCC TGC CTG AAG GAC AGA CAT GAC TTT GAA TTC CCC CAG GAG GAG TTT GAT leu ala gln met arg arg 11e ss~ pro phe ser cys leu lys asp arg his asp phe glu phe pro gin glu glu phe asp 280 210 220 230 240 250 260 270 GAT AAA CAG TTC CAG AAG GCT CAA GCC ATC TCT GTC CTC CAT GAG ATG ATC CAG CAG ACC TTC ARC CTC TTC AGC ACA AAG asp lys gln phe gln lys ala gln ala ile ser val leu his glu met 11e gin gln thr phe asn leu phe ser thr lys 290 300 330 340 360 310 320 350 P GAC TCA TCT GCT GCT TTG GAT GAG ACC CTT CTA GAT GAA TTC TAC ATC GAA CTT GAC CAG CAG CTG RAT GAC CTG GA% TCC asp ser ser ala ala leu asp glu the leu leu asp glu phe tyr ile glu leu asp gin gin leu asn asp ieu glu ser
370
380
390
400
420
410
440
430
A TGT GTG ATG CAG GAA GTG GGG GTG ATA GAG TCT CCC CTG ATG TAC GAG GAC TCC ATC CTG GCT GTG AGG AAA TAC TTC CAA cys val met gin glu val gly val 11e glu ser pie leu met tyr glu asp ser ile leu ala val arg lys tyr phe gln 490 520 450 460 470 480 500 510 ATC ACT CTA TAT CTG ACA GAG AAG AAA TAC AGC TCT TGT GCC TGG GAG GTT GTC AGR GCA GAA ATC ATG AGA TCC TTC ary lie thr leu tyr ieu thr glu lys lys tyr ser ser cys ala trp glu val val arg ala giu ile met arg ser phe
AGA
540
530
560
550
580
570
590
GACCTGGTAC
TCT TTA TCA ATC A?+C TTG CAA AAA AGA TTG AAG ACT AAG GAA TGA ser leu ser ile asn leu gln lys arg leu lys ser lys glu
AACACGGAAA
6
00
TGATTCTaAT
610 AGACTAATAC
620 630 650 670 640 660 690 700 710 680 TFT T AGCAGCTCAC ACTTCGACAA GTTGTGCTCT TTCAAAGACC CTTGTTTCTG CCAAAACCAT GCTATGAATT GAATCAAATG TGTCAAGTGT TTTCAGGAGT 720 GTTAAGCAAC
730 ATCCTGTTCA
820 d CTATCTATAG
A
GCTGTATGGG
830
CACTAGTCCC
930
ATATTATATT
940
770
760 TTACAGATGA
850
840
GGfITTAAATT AGTTTTGTTC
920
750
740
CCATGCTGAT
860 ATGTGAACTT
950
c
870
TT&ATTGTG
960
970
GA
A GCCiTGTTTA
TTAAATTTTT
ACTATkA
AATTCTTTAT
TTATTCTTTA
800 790 810 780 A GGATCTATTC ATCTATTTAT TTAAATCTTT ATTTAGTTAA
AAATTGAACT
880
890
AATTGTGTAA
CAAAAACRTG
900 910 A TTCTTTATAT TTATTATTTT
980 990 1000 1010 A A L1 A 4 ?? CCAACCC~GAvTTGTGCAC TGATTAAAGG AAGTGGTGCA
1020 1030 1040 1050 1060 1070 1080 1090 1100 1110 CTTGCAAACA AGCTCTACTA TCCCTGAGGA AATACCAGAG ACTCTGGARG GTGATATTCA AAAAGCAAAA AGCAAAATTC TAACACTAAT TGAACCTGAC 1120 ATTAAAACAG
1130 CACAGATGAC
Fig. 5. The nucleotide are numbered the coding
and transversions are noted
and derived
consecutively
sequence
1140 TGCTACCATA
1150 GATTCCTGCC
amino acid sequence
from the intitial methionine
of 201 and the HuIFN-a A and insertions
1170 AEGGCAAGAC
of preinterferon
encoded
of the signal sequence.
type B of Goeddel
are shown as the base which replaces
by symbol
1160 TTTCAAACGC
1180
AGGC CAAGCTTG
insert of phage 201. The amino acids
are numbered
as in Fig. 4. Points
et al. (1981) differ are noted (see RESULTS,
9 with the inserting
HuIFN-cr gene rather than an allele of the HuIFN-cr type B gene. To verify this possibility the entire 201 insert DNA was sequenced using the strategy outlined in Fig. 4. The sequence, shown in Fig. 5, demonstrated that the 201 clone is a cDNA copy of a HuIFN-a gene distinct from the type B gene of Goeddel et al. (1981). The 201 gene has a 5’-noncoding sequence 12 bp longer, and a 3’-noncoding sequence 177 bp longer, than the published HuIFN-a type B. The total length of the DNA insert is 1237 bp which
CATACGTAGA
by the cDNA Nucleotides
a base found in the 201 sequence.
are noted by symbol
1190
ATTCATTGGT
where
section e). Transitions
Bases which are deleted in the type B sequence
base.
includes an open reading frame of over 600 nucleotides. This sequence codes for a preinterferon leader peptide 23 amino acids in length and an interferon polypeptide of 166 amino acids. Such a structure is consistent with pub~shed inte~eron structures (Mantei et al., 1980; Nagata et al., 1980; Lawn et al., 1981a; Goeddel et al., 1981). While no poly(A) was found at the 3’-end of the clone, there are two potential poly(A) addition sites (ATTAAA) (Proudfoot and Brownlee, 1976) present. One begins at nucleotide 920, while the other begins at nucleotide
95
coding sequence of 20 1. There is an insertion of a G residue in the 201 sequence (as compared to the B sequence) between nucleotides 371 and 373 of Fig. 5, and deletion of one base following nucleotide 359 (as compared to the B clone). Fig. 6 compares the nucleotide sequences in the B and 201 clones and shows the derived protein sequences coded by these nucleotides. In addition Fig. 6 compares these sequences to another HuIFN-a gene designated D by Goeddel et al. (1981). The one base deletion in 201 changes a GAA triplet to GAG, but does not alter the amino acid, glutamic acid, at this position. A cysteine is conserved in this region for all three sequences, while the insertion in 20 1 places its coding sequence back in frame with B and D. The protein sequences immediately before and after the area of the deletion-insertion are the same for B, D and 20 1. Although the 201 nucleotide sequence codes for different amino acids in the region between the insertion and deletion, the interferon protein coded for by this DNA sequence is biologically active (see section f, below).
993 of the sequence. The absence of a poly(A) sequence at the 3’-end of the 201 clone is not unusual. In a number of cases cDNA clones derived from oligo(dT)-cellulose purified mRNA have been found to lack 3’-poly(A) sequences (Moir et al., 1982; Streuli et al., 1980; Seeburg et al., 1983; Parnes et al., 1981). Within the sequence common to both 201 and HuIFN-a B, 201 contains six single base additions and five single base deletions, relative to HuIFN-a B. In addition there are thirteen singlebase changes within the sequence common to both: four transversions and nine transitions. All but three of these 24 differences are in the noncoding region. Within the coding region the two genes differ from each other by one insertion, one deletion, and one transversion. The 20 1 clone has a cytosine residue at nucleotide position 22 (Fig. 5) while the B type clone has an adenosine residue. This region of the sequence is thought to code for the signal sequence of the interferon protein. The C residue of 20 1 is part of a CTG triplet coding for leucine, while the A residue of the type B sequence is part of an ATG triplet coding for methionine. The most striking differences between the two related HuIFN-a sequences is the one-base deletion and subsequent one-base insertion found in the
GOEDDEL ET AL., i 981
B
THIS PAPER
201
(f) Expression of active HuIFN-a in E. coli To see if the HuIFN-a 201 DNA coded for a gene product with interferon activity, plasmids were con-
AAT GAC CTG AAT GAC CTG GA INSERTION
DELETION
B
ASN ASP LEU GLU VAL LEU CY ASP GLN GLU
GOEDDEC ET AL., 1981
D
ASN ASP LEU GLU ALA CYS VAL MET GLN GLU
GOEDDEL ET AL 1981
201
ASN ASP LEU 1GLU SER (71 CYS VAL MET GLN GLU
THIS PAP;;
DELETION Fig. 6. Comparison and insertion.
Lower drawing
deletion-insertion. box marks
of the nucleotide
INSERTION
and amino acid sequences
shows the derived
The boxes to the left and right indicate
the conserved
cysteine
residue.
of201 and HuIFN-a
amino acid sequences
of HuIFN-a
type B.Upper
drawing
shows the one base deletion
types 201, B, and D in the region of the suggested
the amino acids at the points of the suggested
deletion-insertion.
The central
strutted to express the coding sequence in E. cofi under control ofpLac (MATERIALS AND METHODS, sections g and h; Fig. 7). To do this the preinterferon leader sequence was excised and replaced with a methionine initiation codon, ATG, on the 5’-side of the TGT encoding the putative amino terminal cysteine of mature HuIFN-a interferon (Mantei et al., 1980; Nagata et al., 1980; Lawn et al., 1981a; Goeddel et al., 198 1). Plasmids containing the entire coding sequence of mature HuIFN-a type 20 1 under control of pr,, were constructed by a four-molecule ligation described in MATERIALS AND METHODS, section h and illustrated in Fig. 7. DNA sequence analysis indicated that these plasmids, pCGE35, pCGE36, and pCGE37, had 9,10, and 8 nucleotides respectively between the ribosome-binding site ofp,‘,( and the ATG initiation codon of the interferon gene (not shown). When the strains carrying these plasmids were tested for their ability to produce active interferon the results shown in Table I were obtained. Strains carrying pCGE35 and pCGE36 both produced measurable levels of interferon, while the control strain, carrying pBR322, and CGEBO, carrying pCGE37 produced no detectable interferon activity. At this time we do not know the reason for the precipitate drop in expression when the distance between the ribosome binding site and the ATG is reduced to 8 nucleotides.
TABLE I Expression of HuIFN-z type 201 in E. coli hosts -.Construction $’
RB S-ATG ’ (bp)
CGE43[pBR322] CGES8[pCGE35] CGE89(pCGE36] CGE90[pCGE37]
_ 9 IO 8
IFN titer ’ (ui~its~ml) < 10 4700 9400 < 10
a Plasmids were constructed as described in MATERIALS AND METHODS, section h,and in Fig. 7. The plasmids were transformed into strain CGE43 to produce the strains tested for HuIFN-z expression. b Number of bp between the ribosome binding site (Shine and Dalgarno, 1975) and the ATG initiation codon of mature interferon c Bacteria were grown and extracts prepared as described in MATERIALS AND METHODS, section i.Interferon titers are expressed in units of interferon per ml of bacterial extract.
DISCUSSlON
We have used engineered strains of bacteriophage fl as primary vectors for the cloning, identi~cation, and DNA sequencing of human interferon genes. We have previously used such vectors to clone the gene for calf chymosin (Moir et al., 1982). This method has a number of novel features which aid in the identification and manipulation of cloned DNA sequences. With 1%base oligonucleotide probes of defined sequence we were able to efficiently identify recombinants that make up 17; or less of the population of cDNA clones. Two l&base oligonucleotide probes were used: one being specific for HuIFN-d and the other for HuIFN-b. Although the probe sequences differed in only 4 of 18 possible sites, they selected different populations of clones. Wallace et al. (198 1) have reported that mixed probes, differing in as little as one base, are able to discriminate between completely and partially complementary sequences. The l&base oligonucleotide probes were also used as primers for rapidly determining the nucleotide sequences of selected recombinant phage using the dideoxy chain termination method (Sanger et al., 1977). Availability of defined oligonucleotide probes for interferon made screening by sequencing easy and rapid. The advantages of detined oligonucleotide probes can be multiplied by the use of “universal” primers for fl (Vovis, G.F., unpublished) which make sequencing of any cDNA insert a simple task. Those clones selected by the probes 18A or 18B for which the sequence was investigated carried HuIFN sequences. Clones 201, 202, 203, and 204, which were selected by the I8A probe, carried sequences suggesting they were identical to published HuIFN-z gene sequences. Clone 304, which was selected by the 1Sg probe, contains a sequence identica1 to the published HuIFN-6 sequence. Further analysis of the 201 clone indicated that it represented a gene whose sequence was related to, but different from, the published HuIFN-c( type B (Goeddel et al., 198 1). In addition, at least one of the twelve clones which could carry a full-length HuIFN-a gene contained a nucleotide sequence which appeared to be related to, but was clearly different from the published HulFN-a sequences (not shown). The 201 DNA sequence, when compared to the
2. Klenow blunt
Fig. 7. Construction
of plasmids
to express
HuIFN-a
201 gene in E. coli (MATERIALS
section f for experimental
details). Dashed
are (clockwise
PvuII (blunt) to blunt HindIII,
HindIII;
fromp&:
RI, EcoRI;
ApR, ampicillin
arrows indicate the 5’-to-3’
resistance.
direction
AND METHODS,
of transcription.
EcoRI to EcoRI, Hind111 to HindIII,
section h and RESULTS,
The joints in the four-molecule and PstI to PstI. Abbreviations:
ligation HIII,
published HulFN-cr type B of Goeddel et al. (1981), has a one-base deletion followed by a distal one-base addition twelve nucleotides away. The deletionaddition has the effect of putting the protein coding sequence out of, and then back into, the type B translational reading frame. Comparison of the 201 nucleotide sequence and published HuIFN-z sequences (Goeddel et al., 1981) suggests an evolutionary path. Goeddel et al. (1981) have suggested the HuIFN-cw type B is descended from a parental type, such as HuIFN-a type D, by single-base insertion and deletion. The gene represented by 201 could have descended from HuIFN-a type B by a subsequent, additional, single-base deletion and insertion in the following manner: D:
CTG GAA GCC TGT GTG ATA CAG
I insert T B:
I
I
CTG GA
I
in a manner
to plasmid
We thank Robert Breeze and Chris Goff for critically reading the manuscript. This investigation was supported by a contract with Interferon Sciences, Inc.
REFERENCES
Beaudoin,
J.: Studies
length
particles
Wisconsin,
CAG
analogous
ACKNOWLEDGEMENTS
Benton,
I
CTG GAA GTCCTGT GTG AT delete A
201:
I delete A
transfected vectors.
on coat protein of coiiphage
Madison,
W.D.
and
mutants
and on multiple
M13. Ph.D.
Thesis,
Univ. of
WI, lY70.
Davis,
R.W.:
clones by hybridization
Screening
Igt recombinant
to single plaques
in situ. Science
196
(1977) 180-182.
insert G I
GiCCTGT GTG ATG CAG
Davis,
R.W., Botstein,
Genetics. Harbor, Degrave,
Cold
D. and Roth, 3.R.: Advanced
Spring
Laboratory,
Bacterial
Cold
Spring
NY, 1980, p. 7. W., Derynck,
R., Tavernier,
Fiers, W.: Nucleotide
An additional point suggesting that the 201 gene evolved from the type B gene is the close homology between the 3’- and 5’-noncodingregions of 201 and B. There are five single-base additions, four singlebase deletions, and twelve single-base changes in 502 nucleotides of noncoding sequence which the two genes have in common (~omp~ing 201to B). In spite of changes in the coding sequence induced by the deletion-insertion in the 201 gene, active HuIFN-z can be expressed in bacteria under control of p/,,. (Table I). The advantages which come from using the singlestranded phage system are central to the utility of the method described in this paper. These advantages are summarized by Zinder and Boeke ( 1982). f 1 is an easy phage to grow, manipulate and store. Plaques carrying inserted DNA can be added to broth, heated at 65 oC and stored for long periods of time. Mature sing?-stranded circular phage DNA, a convenient template for the dideoxy sequencing method, is simply prepared from plate stocks of phage. Doublestranded fl RF1 is prepared by methods analogous to those in use for preparation of plasmids such as pBR322. In addition, fl RF1 can be manipulated and
Harbor
human
fibroblast
sequence
(&) interferon
J., Haegeman,
G. and
of the chromosomal
gene for
and of the flanking
regions.
Gene 14 (1981) 137-143. Derynck,
R., Content,
J., Devos,
human tibroblast Desrosiers,
J., De Clerq, E., Voikaert,
R. and Fiers, interferon
R.C., Friderici,
terization
G.. Tavernier,
and structure
gene. Nature
hepatoma
in the methylated
of a
285 (1980) 542-549.
K.H. and Rottman,
of Novikoff
heterogeneity
W.: Isolation
mRNA
F.M.: Characmethylation
5’ terminus.
and
Biochemistry
14
S., Karr, S.R. and Przybyla,
A.:
(1975) 4367-4374. Foster, J.A.,Rich, Translation
C.B.,Fletcher, of chick
acid. Comparison culture.
aortic
of elastin
Biochemistry
Goeddel,
D.V.,
elastin synthesis
Yelverton,
Stebbing,
N., Crea, R.+ Maeda,
logically Goeddel,
W., Seeburg,
J.M., Gross. leukocyte
organ
A., Heyneker,
active. Nature
H.L..
P.H., Dull, T., May, L.,
S., McCandliss,
M., Familletti,
interferon
D.V., Shepard,
Crea, R.: Synthesis
in chick aorta
E., Ullrich,
G., Holmes,
Human
ribonucleic
19 (1980) 857-864.
Miozzari, A., Tabor,
messenger
R., Sloma,
P.C. and Pestka,
produced
S.:
by E. coli is bio-
287 (1980a) 411-416. H.M., Yelverton,
of human
fibroblast
E., Leung, interferon
D. and in E. co&
Nucl. Acids Res. 8 (1980b) 4057-4074. Goeddel,
D.V., Leung, D.W., Dull, T.J., Gross,
McCandliss, Gray,
interferon H.M.
of eight distinct
cDNAs.
Nature
and MacDonald,
genes from a mixture
M., Lawn, R.M..
P.H., Ullrich, A., Yelverton,
P.W.: The structure
ieukocyte Goodman,
R., Seeburg,
of cDNA
cloned
E. and human
290 (1981) 20-26.
R.J.: Cloning molecules,
of hormone
in Wu, R. (Ed.),
99
Methods in Enzymology, Vol. 68, Academic Press, New York, 1979, pp. 75-90. Guarente, L., Lauer, G., Roberts, T.M. and Ptashne, M.: Improved methods for maximizing expression of a cloned gene: a bacterium that synthesizes rabbit pglobin. Cell 20 (1980) 543-553. Holmes, D.S. and Quigley, M.: A rapid boiling method for the preparation of bacterial plasmids. Anal. Biochem. 114 (1981) 193-197. Isaacs, A. and Lindenmann, J.: Virus interference I. The interferon. Proc. Roy. Sot. B147 (1957) 258-267. Lawn, R.M., Gross, M., Houck, C.M., Franke, A.E., Gray, P.V. and Goeddel, D.V.: DNA sequence of a major human leukocyte interferon gene. Proc. Natl. Acad. Sci. USA 78 (1981a) 5435-5439. Lawn, R.M., Adelman, J., Dull, T.J., Gross, M., Goeddel, D. and Ullrich, A.: DNA sequence of two closely linked human leukocyte interferon genes. Science 212 (1981b) 1159-l 162. Lawn, R.M., Adelman, J., Franke, A.E., Houck, CM., Gross, M., Najarian, R. and Goeddel, D.V.: Human fibroblast interferon gene lacks introns. Nucl. Acids Res. 9 (1981~) 1045-1052. Mantei, N., Scharzstein, M., Streuli, M., Panem, S., Nagata, S. and Weissmann, C.: The nucleotide sequence of a cloned human leukocyte interferon cDNA. Gene 10 (1980) l-10. Maxam, A. and Gilbert, W.: Sequencing end-labeled DNA with base-specific chemical cleavages, in Grossmann, L. and Moldave, K. (Eds.), Methods in Enzymology, Vol. 65, Academic Press, New York, 1980, pp. 499-560. Messing, J.: M13mp2 and derivatives: A molecular cloning system for DNA sequencing, strand specific hybridization, and in vitro mutagenesis, in Walton, A.G. (Ed.), Proceedings of the Third Cleveland Symposium on Macromolecules. Elsevier, Amsterdam, 1981, pp. 143-153. Miller, J.H.: Experiments in Molecular Genetics. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1972, p. 431. Model, P. and Zinder, N.D.: In vitro synthesis of bacteriophage fl protein. J. Mol. Biol. 83 (1974) 231-251. Moir, D., Mao, J., Schumm, J., Vovis, G., Alford, B.L. and Taunton-Rigby, A.: Molecular cloning and characterization of double-stranded cDNA coding for bovine chymosin. Gene 19 (1982) 127-138. Nagata, S., Taira, H., Hall, A., Johnsrud, L., Streuli, M., Ecsodi, J., Boll, W., Cantell, K. and Weissmann, C.: Synthesis in E. coli of a polypeptide with human leukocyte interferon activity. Nature 284 (1980a) 3 16-320.
Nagata, S., Mantei, N. and Weissmann, C.: The structure of one of the eight or more distinct chromosomal genes for human interferon-a. Nature 287 (1980b) 401-408. Ohno, S. and Taniguchi, T.: Structure of a chromosomal gene for human interferon 8. Proc. Natl. Acad. Sci. USA 78 (1981) 5305-5309. Parnes, J.R., Baruch, V., Felsenfeld, A., Ramanathan, L., Ferrini, U., Appella, E. and Seidman, J.G.: Mouse µglobulin cDNA clones: A screening procedure for cDNA clones corresponding to rare mRNAs. Proc. Nat]. Acad. Sci. USA 78 (1981) 2253-2257. Proudfoot, N.J. and Brownlee, G.G.: 3’ Non-coding region sequences in eukaryotic messenger RNA. Nature 263 (1976) 211-214. Sanger, F., Nicklen, S. and Coulsen, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Seeburg, P.H., Sias, S., Adelman, J., De Boer, H.A., Hayflick, J., Jhurani, P., Goeddel, D.V. and Heyneker, H.L.: Efftcient bacterial expression of bovine and porcine growth hormones. DNA 2 (1983) 37-45. Shine, J. and Dalgarno, L.: Determinant of cistron specificity in bacterial ribosomes. Nature 254 (1975) 34-38. Stewart II, W.E.: The Interferon System. Springer Verlag, New York, 1979. Streuh, M., Nagata, S. and Weissmann, C.: At least three human type a interferons: structure of a2. Science 209 (1980) 1343-1347. Taniguchi, T., Ohno, S., Fujii-Kuriyama, Y. and Muramatsu, M.: The nucleotide sequence of human tibroblast interferon cDNA. Gene 10 (1980) 1l-15. Wallace, R.B., Johnson, M.J., Hirose, T., Miyake, T., Kawashima, E.H. and Itakura, K.: The use of synthetic oligonucleotides as hybridization probes, II. Hybridization of oligonucleotides of mixed sequence to rabbit /J-globin DNA. Nucl. Acids Res. 9 (1981) 879-894. Young, R.A. and Davis, R.W.: Efficient isolation of genes by using antibody probes. Proc. Natl. Acad. Sci. USA 80 (1983) 1194-1198. Zinder, N.D. and Boeke, J.D.: The tilamentous phage (Ff) as vectors for recombinant DNA - a review. Gene 19 (1982) l-10. Communicated by J. Carbon.