Nucleotide sequence of the lysozyme gene of bacteriophage T4

Nucleotide sequence of the lysozyme gene of bacteriophage T4

J. Mol. Biol. (1983)165, 229-248 Nucleotide Sequence of the Lysozyme Gene of Bacteriophage T4 Analysis of Mutations Involving Repeated Sequences JOYC...

5MB Sizes 0 Downloads 82 Views

J. Mol. Biol. (1983)165, 229-248

Nucleotide Sequence of the Lysozyme Gene of Bacteriophage T4 Analysis of Mutations Involving Repeated Sequences JOYCE El~tRICH OWEN, DENNIS W. SCHULTZ'~ ANDREW TAYLOR'~AND GERALD R. SMITH~"

Institute of Molecular Biology University of Oregon Eugene, Ore. 97403, U.S.A. (Received 5 June 1981, and in revised form 16 August 1982) The nucleotide sequence of the lysozyme (e) gene of bacteriophage T4 and approximately 130 additional nucleotides on each side has been determined. The 5'-end of the gene for internal protein III appears to be located about 70 basepairs from the 3'-end of the lysozyme gene. Nucleotide sequence analysis of mutant e genes confirmed that three identified hotspots of frameshift mutations are runs of five A nucleotides in the wild-type gene. The endpoints of two deletions are direct repeats of eight base-pairs in the wild-type gene. Two frameshift mutations with high reversion frequencies are duplications of five or seven base-pairs. The cloning and nucleotide sequence determination of the lysozyme gene will facilitate further study of the molecular biology of T4 tysozyme.

1. Introduction The bacteriophage T4 lysozyme g e n e - e n z y m e system has been widely used for the s t u d y of basic problems in molecular biology, such as the relationship between gene and protein sequences, the mechanisms of mutagenesis and the mechanism of enzymatic catalysis. The ability to s t u d y these problems is enhanced by the availability of m a n y m u t a n t s and convenient genetic mapping procedures, by knowledge of the amino acid sequence (Inouye et al., 1970b) and three-dimensional structure of the protein (Remington et al., 1978). Molecular cloning and nucleotide sequence determination of the lysozyme gene, reported here, will facilitate further studies with T4 lysozyme. The mechanism b y which mutations arise has been studied extensively in the lysozyme gene. Streisinger et al. (1966) found a site within the gene at which frameshift mutations revert a t high frequency. Okada et al. (1972) subsequently showed by comparisons of m u t a n t and wild-type amino acid sequences t h a t this frameshift hotspot was a run of five A nucleotides in the wild-type gene. Owen (1971) found t h a t this site and four others in the lysozyme gene were hotspots for mutations with high forward and reverse frequencies. Amino acid sequence ~fPresent address: Hutchinson Cancer Research Center, 1124Columbia Street, Seattle, Wash. 98104, U.S.A. 229 0022-2836/83/100229-20 $03.00/0 © 1983 Academic Press Inc. (London) Ltd.

230

J. E. OWEN E T AL.

comparisons showed t h a t one of these hotspots was a run of five A nucleotides like the previously analyzed site (G. Streisinger & J. Owen, unpublished observations). These results suggest t h a t reiterated bases are sites of frequent m u t a t i o n a n d this led Streisinger and his colleagues (Streisinger et al., 1966; O k a d a et al., 1972) to propose a m e c h a n i s m b y which these m u t a t i o n s arise b y " s l i p p a g e " of p r i m e r and t e m p l a t e during repair D N A synthesis. Nucleotide sequence analysis of other kinds of m u t a t i o n s m i g h t similarly suggest mechanisms b y which t h e y arise. F o r example, F a r a b a u g h et al. (1978) and Pribnow et al. (1981) found t h a t several spontaneously occurring deletions arose between direct repeats of a b o u t five to ten base-pairs in the Escherichia coli lacI gene and in the T4 r l I gene; these and other observations h a v e led to models of deletion fol~nation (Efstratiadis et al., 1980; Ripley, 1982; Albertini et al., 1982). R e p o r t e d here is the complete nucleotide sequence of the T4 lysozyme gene and the sequence alterations of three classes of m u t a t i o n s : frameshifts at the hotspots discussed above, deletions of 70 to 100 b p t , a n d duplications of 5 to 7 bp. A c o m m o n feature of all of these m u t a t i o n s is the presence of repeated nucleotide sequences. 2. M a t e r i a l s a n d M e t h o d s

(a) Phage All T4 phages are from our collections except strain T4D d e n A - denB- 56amE1, which was obtained from E. Kutter (Evergreen State College), and the h590 vector, which was obtained from N. Murray (University of Edinburgh). The denB- mutation is a deletion extending into r I I (Kutter et al., 1975). (b) Construction of T4D denA- denB- 56- strains containing the e + and mutant e genes of T 4 B To prepare cytosine-containing DNA for nucleotide sequence analysis of the lysozyme (e) gene of phage T4 strain B, hybrids between T4 strain D and T4B were constructed by the following crosses. A deletion of the entire e gene (eG19 in Fig. 1) was crossed into the T4D denA - denB- 56- triple mutant strain as follows. T4B eG19 phage were irradiated with 25 lethal hits of u.v. light and then crossed to the T4D triple mutant at a ratio of 1 T4B to l0 T4D. The lysate was plated under conditions permissive for all mutations present (SuIII + strain Ymel on citrate plates with egg-white lysozyme: Okada et al., 1966). The "rescued" mutant e plaques ( ~ 2 x 10 -3 of the total) were picked and tested for the presence of the denB- mutation by plating on E. coli strain K l l 0 (~) SuIII + (permissive for am, restrictive for r l l ) on citrate plates with egg-white lysozyme, and then were tested for the presence of the 56- mutation by plating on Su- strain E. coli B on citrate plates with egg-white lysozyme. In no ease was the wild-type denB or 56 gene "rescued" from the T4B strains along with the lysozyme mutation ; the presence of denA was not tested. The "wild-type" T4B lysozyme gene was then crossed into the T4D d e n A - denB- 56- eG19 strain using 1 u.v.-irradiated T4B to 10 unirradiated T4D; again, rescue was ~2 x i0 -a. Next, a number of e deletions and frameshift mutations were crossed into the T4D denA denB- 56- e + strain. In all cases, the phage carrying the T4B lysozyme marker were denB- 56-.

(c) Preparation of T4 DNA containing cytosine DNA from the phages described in section (b), above, was prepared as follows. E. coli strain BB was grown to 2 x 10 s cells/ml in M9 medium supplemented with Casamino acids t Abbreviation used: bp, base-pairs.

SEQUENCE OF T4 LYSOZYME GENE

231

(Difco), infected with 7 phage/cell, and incubated with shaking at 37°C for 90 min. Cells were collected by centrifugation, washed, and resuspended in 0-1 vol. TEN buffer (10 mMTris" HCI (pH 8"1 ), 100 mM-NaCl, 10 mM-EDTA). The suspension was lysed by the addition of sodium dodecyl sulfate (0"5~o, w/v) and proteinase K (100 ~g/ml) and incubation for 2 h at 37°C. Then it was extracted 3 times with phenol and 3 times with ether, treated with pancreatic RNase (20~g/ml) for 1 h at 37°C, extracted once more with phenol and extensively dialyzed against TEN buffer. DNA concentrations, measured by optical absorbance, ranged from 0"2 to 1"0 mg/ml when prepared from infected bacteria, but 0"03 mg/m] when prepared from uninfected bacteria, indicating that the bulk of the DNA from infected cells was T4 DNA. (d) Preparation of the E I I I fragment from T4 DNA A sample of e + T4 DNA, prepared as described in section (e), above, was digested with endonuclease HindIII (New England BioLabs, Beverly, MA) and subjected to electrophoresis as described in the legend to Fig. 2. DNA fragment E I I I (see Results and Fig. 2) was extracted from the gel by pulverizing the gel through a syringe, freezing it in liquid nitrogen, and centrifuging it at 200,000 g for 45 min. DNA from the supernate was purified by DEAE-cellulose chromatography and precipitation with ethanol. (e) Recombination between T4 e mutants and ~-EII I hybrid phage E. coli K12 cells (sensitive to ~ and T4) were mixedly infected with a candidate ~ hybrid clone and a T4 e mutant and were then spotted onto a lawn of E. coli B cells (resistant to )~ but sensitive to T4). Since only T4 e + phage can lyse the E. coli B cells, formation of plaques at high frequency ("marker rescue") indicated the presence of the e gene in the clone. (f) Nucleotide sequence determination As described in Results, T4 E I I I fragments containing the e + or various e mutant genes were cloned into the ~590 vector constructed by Murray et al. (1977); the e + E I I I fragment was purified as described above before cloning, while the mutant e E I I I fragments were cloned from a "shotgun" of HindIII fragments as described in Results, section (b). The resulting 2-EIII hybrid phage were grown, DNA was extracted from them with sodium dodecyl sulfate, and the DNA was digested with endonuclease HindIII (Sprague et al., 1978). The E I I I fragment was separated from other DNA fragments by electrophoresis through a 5% (w/v) polyacrylamide (BioRad, Richmond, CA) gel and purified by electroelution, extraction with phenol and precipitation with ethanol (Sprague et al., 1978). The purified DNA was cleaved with an appropriate endonuclease, 5'-endlabelled by the T4 polynucleotide kinase exchange reaction (Berkner & Folk, 1977), and cleaved with a second endonuclease. The labelled fragments were separated by polyacrylamide gel electrophoresis and purified as for the E I I I fragment. Further details are given in the Figure legends. Nucleotide sequences were determined by the chemical degradation methods of Maxam & Gilbert (1980). Individual gel readings were merged into a master sequence using the computer programs of Staden (1979).

3. Results (a) Preparation of cytosine-containing D N A f r o m strain T4B Most restriction endonucleases do n o t cut bacteriophage T4 D N A because it contains glucosylated h y d r o x y m e t h y l c y t o s i n e , instead of cytosine ( K a p l a n & Nierlich, 1975). Soon after infection, T4 m a k e s endonucleases, coded b y genes denA (Sadowski et al., 1971) and denB (Sadowski & Vetter, 1973) which b r e a k down cytosine-containing D N A (normally the host DNA). T4 gene 56 encodes a d C T P a s e whose action reduces incorporation of cytosine into D N A (Wiberg,

J. E. OWEN E T AL.

232

e

Gene I

G,~/,4 ( 6 kb) /

Gzz3 Hind]E. Hc/e nT

t..~, i

i

HpaI "~

,

~

50 bp



_----GIs (0.8 ,b) ~

(0"5 kb) ,

,

200 bp

l HinfI

r_r

.J320

(I.5 k b ) ' ~

G236

Hpo~

~r=

Hae'lTr Ava3E. E c o R I " ~ t , ~ , ~ &

/

Ht,n d n ' r

~IETTT

Lysozyme gene/

fragment

/

/

JDII G59 J382

?

t(30

G348

•~

t

AvaTr

EcoRT. HinfT. " a(

J440

HpaI

"t

HinfI

h

HindTrr

"-7 . . . . . . - - - _ . . . . . . . . ~ f

FIG. 1. Genetic and physical map of a segment of the genome of phage T4 including the lysozyme (e) gene. The top line shows the position and orientation of e relative to neighboring genes. Below and relative to that line are indicated deletions removing part or all of e and used for identification of HindIII fragment EIII (Fig. 2). The lengths of these deletions were determined by endonuclease cleavage analyses (data not shown) or were from Wilson et al. (1972). The second line shows the EIII fragment used for DNA sequence analysis. Positions of endonuclease cleavage sites and the extent of the e gene (heavy bar) are drawn to scale. The third line shows the portion of the EIII fragment whose nucleotide sequence was determined (Fig. 3). Asterisks indicate runs of 5 A nucleotides identified as hotspots of frameshift mutations exemplified by eJDll, ej382 and eK30, the last 2 of which were sequenced (see the text). Brackets indicate the extents of 2 deletions (eG59 and eG348) sequenced, and inverted triangles indicate the positions of 2 small insertions (eJ320 and eJ440) sequenced. Arrows below the line indicate the extents and directions of sequences determined from DNA fragments labeled at the 5'-ends of the indicated endonuclease cleavage sites. Broken lines indicate extents of the deletions (eG59 and eG348) sequenced. The complete nucleotide sequence is shown in Fig. 3. kb, l03 bases.

1967). T 4 w i t h m u t a t i o n s in t h e s e t h r e e g e n e s m a k e s D N A c o n t a i n i n g c y t o s i n e i n s t e a d o f h y d r o x y m e t h y l c y t o s i n e a n d t h e r e s u l t i n g D N A is s e n s i t i v e to r e s t r i c t i o n e n d o n u c l e a s e s (Velten e t a l . , 1976). C o n s e q u e n t l y , a t r i p l e m u t a n t s t r a i n T 4 D d e n A - d e n B - 5 6 - w a s used in t h i s w o r k . Because of the vast amount of lysozyme genetic and amino acid replacement d a t a a l r e a d y o b t a i n e d w i t h T 4 B s t r a i n s , t h e T 4 B l y s o z y m e gene (e) w a s c r o s s e d i n t o t h e T 4 D t r i p l e m u t a n t s t r a i n . As d e s c r i b e d in M a t e r i a l s a n d M e t h o d s , t h e m u t a t i o n eG19, w h i c h d e l e t e s t h e e n t i r e e gene (Fig. 1 ), w a s first c r o s s e d i n t o T 4 D . The deletion was replaced with e + from T4B, and finally various T4B e mutations w e r e c r o s s e d i n t o t h i s T 4 D d e r i v a t i v e c o n t a i n i n g t h e T 4 B e + gene. The large quantity of T4 DNA made after phage infection made possible the p r e p a r a t i o n o f T 4 D N A d i r e c t l y f r o m i n f e c t e d cells, w i t h o u t p u r i f i c a t i o n o f p h a g e p a r t i c l e s , as d e s c r i b e d in M a t e r i a l s a n d M e t h o d s . (b) Cloning wild-type and m u t a n t l y s o z y m e genes T o i d e n t i f y a f r a g m e n t o f T 4 D N A c o n t a i n i n g t h e l y s o z y m e gene, D N A s f r o m p h a g e s w i t h e + a n d s o m e o f t h e e d e l e t i o n s s h o w n in F i g u r e 1 w e r e d i g e s t e d w i t h e n d o n u c l e a s e H i n d I I I . E l e c t r o p h o r e s i s o f t h e s e d i g e s t s (Fig. 2) r e v e a l e d a

SEQUENCE O F T 4 LYSOZYME GENE

a

b

233

c

Fl~:. 2. Detection and isolation of HindIII-generated fragment EIII containing the T4 lysozyme gene. Deletions eG59 and eG19 (Fig. i ) were transferred from strain T4B, in which they were isolated, to strain T4D denA- denB- 56-. fi'om which DNA was prepared as described in Materials and Methods. Samples of about 25 to 50/~g of DNA fi'om the indicated e deletion mutants and 100/~gof e÷ were digested with 20 to 100 units of endonuclease HindIII for 24 h at 37°C. They wel~ then fi'actionated by electrophoresis (from top to bottom) in a 1"2% agarose vertical slab gel (0'6 em x 12 cm x 25 cm) in E buffer (Sharp el al., 1973) at l-0 V/cm for 48 h. DNA was visualized by staining with ethidium bromide. The arrow indicates the EIII fi'agment deduced to contain the e gene (see the text). The EIII fragment DNA was eluted from the gel as described in Materials and Methods. Lane a, eG19: lane b, eG59; lane c, e+. fragment, a b o u t 3200 bp long and designated E I I I , t h a t was present in e+ digests but missing from the digests of two deletions eG19 and eG59. Thus, this f r a g m e n t contains at least part of the e gene. The finding t h a t digests of the large, nonoverlapping deletions eG414 and eG326 also lack the E I I I f r a g m e n t suggested t h a t the entire e gene was present on the E I I I fragment (data n o t shown). Subsequent analysis at the end of this section verified this point. To facilitate preparation of D N A for nucleotide sequence analysis, the E I I I fragment with the e + gene was purified as described in the legend to Figure 2 and cloned into the A590 vector (Murray et al., 1977). Insertion of the E I I I fragment into the H i n d I I I site in the i m m 434 cI gene of ~590 produced phage with clear plaque morphology, as expected (Murray et al., 1977). The ~ phages with the inserted E I I I f r a g m e n t proved to be virulent: their plating efficiency on a ,~imm 434 lysogen was the same as t h a t on non-lysogens, but on the lysogens some of the A-EIII hybrids made small plaques, while others made large plaques. These results suggest t h a t one or more T4 promoters are present on the E I I I f r a g m e n t and t h a t the strength of virulence depends upon the orientation of the fragment. Opposite orientation of the E I I I fragment in strains with different plaque sizes was confirmed by endonuclease EcoI:tI cleavage analysis (data n o t shown). I n subsequent shotgun experiments, in which

234

J. E. OWEN ETAL.

unfractionated H i n d I I I digests of T4 DNA were ligated to ~590, 10% of the hybrid phage (i.e. those with clear plaque morphology) were virulent. With the assumption that fragments are inserted at random during cloning, about l0 of the 100 or so H i n d I I I fragments of T4 appear to contain promoters detectable by this test. T4 DNA from each of the e mutants was digested with H i n d I I I and "shotgunned '~ into h590. ~ phage containing the E I I I fragments with the mutant e gene were identified by their ability to form e ÷ recombinants with different tester T4B e mutants as described in Materials and Methods. About 10~/o of the virulent clones contained the e gene, as expected from the frequency of apparent promotercontaining H i n d I I I fragments of T4 DNA noted above. Marker rescue with appropriate T4 e mutants confirmed t h a t the clones contained the expected e mutations. That the E I I I fragment contains the entire e + gene was confirmed by showing that T4 eG19, with a deletion of the entire e gene (Fig. 1 ; Owen, 1971), recombines with the 2-EIII e + hybrid to produce T4 e +.

(c) Nucleotide sequence of the lysozyme gene The following observations allowed the location of the e gene on the E I I I fragment. An endonuclease cleavage map of the E I I I fragment (Fig. 1) was constructed by analysis of mixed digests and partial digests {data not shown). Endonuclease EcoRI cleaved the E I I I fragment only once, producing fragments about 2800 bp and 400 bp long. In an EcoRI digest of the E I I I fragment with deletion eG348 (internal to the e gene; see Fig. l) the 400 bp fragment was shorter by about 50 to 100 bp. These results show that the e gene is near the EcoRI end of the E I I I fragment. Since the entire e gene is on the E I I I fragment and the length of lysozyme predicts the e gene to be 492 bp long (Inouye et al., 1970b), the EcoRI site must be within the e gene. Inspection of the amino acid sequence of lysozyme (Inouye et al., 1970b) revealed that only one site for E c o R I was possible within e, near the middle of the gene at a site corresponding to amino acids 77 and 78. A portion of the E I I I fragment surrounding the EcoRI site was therefore analyzed further. The nucleotide sequence of 750 bp at the e end of the E I I I fragment was determined using the methods of Maxam & Gilbert (1980). The labelled DNA fragments analyzed are shown in Figure 1. All regions were sequenced two to six times from unambiguous regions of the gels. All cleavage sites used for labelling were sequenced from some other site, except for the EcoRI site which was predicted to be unique by the amino acid sequence. With this exception, the DNA sequence was determined independently of the amino acid sequence. The amino acid sequence predicted by the final DNA sequence (Fig. 3) is identical to the amino acid sequence determined by Inouye et al. (1970b). The final sequence extends 134 bp to the left (beyond the 5') end of the lysozyme gene and 124 bp to the right. The following interesting features can be noted in the DNA sequence. As expected, close to the initiating AUG codon in the messenger RNA there is a

SEQUENCE OF T4 LYSOZYME GENE $'GATTCCAGAGATGG I -~ ~.!/P_l /KMfZ

-~o

GATGGGTCCTTCACTTTACCG

-80

]

1-70

235

ACGCTTTTGCTCTTATTCCTCGTACTCAGTGGCAAThTGT -~o -~oo

-90

AATAATGAACAACCTCTTTTAATTTTATAAATACCTTCT

-60

-50

-~0

-30

AOaII

Met Asn Ile Phe Glu Met Leu Arc] Ile Asp Glu ATAAATACTTAGGAGGTATTATGAATATAIToTTTTGGAAAAATIGT2TACGTATAGATGAAGGTCTTA

Gly

Leu

Arc] Leu Lys lie Tyr Ly$ Asp Thr Glu Gly Tyr Tyr Thr Ile Gly ile Gly His Leu G k C T T A A A A T C T A T A A A G A C h C h G A AG G C T A T T A C A C T A T T G G C A T C G G T C ~ T T T G 50 60 70 60 90

Leu

-20

- 10

1

J320 ~apllest $on

Thr

Lys Ser Pro Ser Leu Asn AIo AIo CAAAAAGTCCATCACTTAATGCTGCTAAATCTG 1to

JD1] (+A)

30

I1¢0

G59 d e l e ¢ i o n

Lys

Ser

I endpoin¢

~3o

Glu Leu Asp Lys AIo Ile Gly AATTAGATAAAGCTATTGGGCGTAATT ~.o 15o

Cys Asn Gly Val Iie Thr Lys Asp Glu AIo Glu Lys Leu Phe G C A A T G G T G T A A T T A C A A A AG h T G A G G C T G A A A A A C T C T T T 170 180 190 200 J382 (+h) hlo Vol Arc] Gly Ile Led CTIGTTCG CGG AATTCTG emli--'-~ po c 230 J Ec?OR! VOI Arq Arc] Cys AIo Leu

~0

Arg

C T T A 100

Asn 16o

Asn Gin Asp Vol Asp AIo A A T C AG G h T G T I ' G A T G C T G 210 I ->20

G59 deletion

Arc] hsn AIo Lys Leu Lys Pro Vol Tyr Asp Ser AG A A A T G C T A h ATT A hA A CCGG6TITT ATG ATTCTC 2qO 250 I ] 270 GJ48 deletion endpoint H/nfI Tie Ash Met Vol Phe Gin Met Gly Glu Thr Gly

Asp Ala T TG A T G C G G ~80

Leu

VoI

AIo

Gly

TTCGTCGCTGTGCATTGATTAATATGGTTTTCChAATGGGAGAAhCCGGTGTGGCAGGAT 290

300

310

320

I

330 I

G348

Phe

Thr

TTACTA

Asn

Ser

Leu

ACTCTTTA 350

A r c ] Met

Leu

CG T A T G C T T C 360

3~o

deletion endpoint

Gin Gin Lys Arc] Trp Asp Glu AIo AIo Veil A A C h A A A A C G C T G G G A T G A AG C AG C A ~ T T 370 380 390

Ash

Leu

h A CITI"AG

~00

g3a f-A)

AIo

Ly$

CTAAA

Set ArQ Trp Tyr Asn AG T AAG A T G G, GCT A T AAATT] C G ~20 J440

Arc] Thr Gly Thr Trp GAACTGGCACTTGGGACG

Asp

TGATAGTATATTCACAATTACTTG 530

Lys

Thr

Tyr

Thr

Pro

Ash

ATA A C A C CTT A A T ~30 C GT CG

Arg A

AIo Ly$ Arc] VoI C A AU~G A~A C I G A G T I C]

duplication

~70

Met

Gin

Gin

A T A T G A A A A C A T A T C A AG A A T T T .590 600

Thr

Thr

Phe qbO

H~'nfl

AIo Tyr Lys Asn Leu eH CGTATAAAAATCTATAAAGCTGTTTACTTTCTCT1G~ q80 ~ ~90 500

540 Glu Phe

lie

kI T~50 TACA O

A ATAG

ACA b50

ATTACTA

ATTkAAATATT~ 560

AAT'fG b20

510 AAAGG

A A AC

b'(O

Ile AIo Glu AIo A T T G C C G[A AG C T TI3' 610 H/elI][

FIC:. 3. Nucleotide sequence of the lysozyme (e) gene of bacteriophage T4 and part of the presumed sequence of the IPIII gene. Nucleotides are numbered from the translation initiating codon of e. Hyphens are omitted for simplicity. Above the nueleotide sequence (l to 492) is shown the deduced amino acid sequence of lysozyme, which agrees with that determined by Inouye et al. (1970b). Above the nueleotide sequence (583 to 615) is shown the deduced amino acid sequence of the beginning of IPIII, which agrees with that determined by Isobe et al. (1976) except for the third amino acid (see Results). Certain endonuclease recognition sequences are shown to aid comparison with Fig. 1, which shows the regions sequenced to produce this composite sequence. The last two nucleotides were not sequenced directly but are inferred from the sequence specificity of endonuclease HindIII (Old el at, 1975). The sites and sequence alterations of certain mutations are shown (see Results, Figs ] and 4, and Table 2). Runs of 5 A nucleotides, hotspots of frameshift mutations, are underlined twice. Brackets indicate the repeated sequences within which the endpoints of two deletions (cO59 and eG348) occur. Sequences repeated beneath the main sequence indicate the duplications (eJ320 and eJ440) present at the points of the arrows.

s e q u e n c e c o m p l e m e n t a r y to t h e 3'-end o f t h e 16 S r i b o s o m a l R N A d e t e r m i n e d b y S h i n e & D a l g a r n o (1974). T h i s s e q u e n c e o f s e v e n bases, 5' A - G - G - A - G - G - T 3', is identical to t h e p o t e n t i a l r i b o s o m e b i n d i n g site o f t h e A m R N A o f p h a g e s R 1 7 (Steitz, 1969) and MS2 (Fiers et al., 1975), a n d is a s u b s e t o f t h e r i b o s o m e - b i n d i n g site o f t h e cro m R N A o f p h a g e ~ (Steege, 1977). T h e r e l a t i v e l y l o w G + C c o n t e n t (35°/o) o f T4 D N A ( F a s m a n , 1976) is f o u n d b o t h w i t h i n t h e e gene (36%) a n d in t h e f l a n k i n g regions (35%). A s e x p e c t e d , G a n d C are less f r e q u e n t l y used (38/164

~o0

J. E. O W E N E T

236

AL.

TABLE 1 Codous used in the 7'4 lysozyme gene TTT-Phe TTC-Phe TTA-Leu TTG-Leu CTT-Leu CTC-Leu CTA-Leu CTG-Leu ATT-Ile ATC-IIe ATA-IIe ATG-Met GTr-Val GTC-Val GTA-Val GTG-Val

4 1 5 2 6 l l l 6 2 2 5 6 l l 1

TCT-Ser TCC-Ser TCA-Ser TCG-Ser CCT-Pro CCC-Pro CCA-Pro CCG-Pro ACT-Thr ACC-Thr ACA-Thr ACG-Thr GCT-AIa GCC-AIa GCA-AIa GCG-AIa

3 0 1 0 1 0 l l 4 1 5 l 8 0 5 2

TAT-Tyr 5 TAC-Tyr I TAA1 TAG0 CAT-His l CAC-His 0 CAA-GIn 4 CAG-Gln 1 AAT-Asn l0 AAC-Asn 2 AAA-Lys 13 AAG-Lys 0 GAT-Asp 8 GAC-Asp 2 GAA-GIu 7 GAG-Glu 1

TGT-Cys TGC-Cys TGATGG-Trp CGT-Arg CGC-Arg CGA-Arg CGG-Arg AGT-Ser AGC-Ser AGA-Arg AGG-Arg GGT-GIy GGC-Giy GGA-GIy GGG-GIy

1 1 0 3 4 4 l 0 2 0 4 0 4 3 3 l

cases; 23%) t h a n A and U in the third positions of codons (Table 1). Fifteen direct repeats of eight or more base-pairs occur within the sequence {Table 2) ; as noted later the endpoints of certain deletions are within some of these repeats. (d) Location of the gene for internal protein I I I Black (1974) determined t h a t the genes for internal proteins I I and I I I ( I P I I and I P I I I ) are located near the 3'-end of the e gene, in the order e - I P I I I - I P I I . G. Stormo & L. Gold (personal communication) noted t h a t the 3'-end of the nucleotide sequence in Figure 3 corresponds, with two exceptions, to the first 11 amino acids of I P I I I (Isobe et al., 1976). I n addition, just before the A T G assigned to Met, there is a potentially strong ribosome-binding site (Shine & Dalgarno, 1974; G. Stormo & L. Gold, personal communication). Amino acid 3 predicted b y the D N A sequence (Fig. 3) is Thr, while amino acid 3 in I P I I I is reported to be Val (Isobe etal., 1976). This point of the D N A sequence was determined u n a m b i g u o u s l y six times. Possibly the discrepancy is due to a difference between T4B, the source of the e gene and an u n d e t e r m i n e d a m o u n t of flanking D N A used in the present work, and T4D, used in the work of I s o b e et al. {1976). We do not know whether the I P I I I gene in our hybrids derives from T4B or from T4D. The first 11 amino acids of I P I I I are, however, identical to a sequence of 11 amino acids starting a t amino acid 6 in I P I I {Isobe et al., 1976), except a t amino acid 3 ; I P I I has T h r a t t h a t position, identical to t h a t predicted from the D N A sequence for I P I I I . Possibly a recombination e v e n t between the I P I I and I P I I I genes produced an I P I I I sequence coding for T h r in the T4 strain used here b u t a n o t h e r coding for Val in the strain used b y Isobe et al. {1976). T h e last nueleotide (T) of the sequence in Figure 3 should be the first nucleotide of the eodon for the 12th amino acid of I P I I I , reported b y Isobe et al. (1976) to be Thr. Only codons beginning with A code for Thr. No resolution to this problem is a p p a r e n t . This T was n o t sequenced directly; its presence was inferred from the

SEQUENCE

O F T4 L Y S O Z Y M E

237

GENE

TABLE 2 Direct repeats within and near the T4 lysozyme e gene Length of repeat (bp)

Positions t

Span between repeated units (bp)

Sequence of repeated unit

8 8 8 8 8 8 8 8 8 8 8 9 9 9 13

--23, 50 --23,489 19, 352 98, 173 119, 215 123 243 140 492 248 559 253 323 267 507 577 587 --34 --21 196 346 534 550 46, 485

72 511 333 75 96 120 352 311 70 24O 10 13 153 17 440

5' T-C-T-A-T-A-A-A3' 5' T-C-T-A-T-A-A-A3' 5' T-T-A-C-G-T-A-T3' 5' T-T-A-C-A-A-A-A3' 5' A ~ - G ~ - G - C - T 3 ' 5' T-G-C-T-A-A-A-T3' 5' A-T-A-A-A-G-C-T3' 5' A-A-T-T-A-A-A-A3' 5' A-A-A-C-C-G-G-T3' 5' T-T-C-T-C-T-T-G3' 5' A-A-A-C-A-T-A-T3' 5' T-A-T-A-A-A-T-A-C3' 5' A-A-C-T-C-T-T-T-A3' 5' A-C-A-A-T-T~A-C-T3' 5' A-A-A-A-T-C-T-A-T-A-A-A-G3'

Deletion mutations~

eG19(?) eG59

e~48

eG223 (?)

This list includes all direct repeats of 8 bp or more within the sequence shown in Fig. 3. Positions refer to the first base of the e sequence, following the numbering given in Fig. 3. :~ Deletions eG59 and eG348 were sequenced (see Results and Figs 9 and 10). Deletions eG19 and eG223 are suggested to be between the indicated repeats based upon their sizes, recombination properties, and the positions of repeats (see Discussion).

H i n d I I I site at the end o f the E I I I f r a g m e n t , since e n d o n u c l e a s e H i n d I I I is r e p o r t e d to cleave o n l y a t t h e sequence 5' A - A - G - C - T - T 3' (Old et al., 1975). S t r a i n differences, as n o t e d above, m a y a c c o u n t for the a p p a r e n t d i s c r e p a n c y . T h e location o f the P I I I gene j u s t to the r i g h t o f the e gene h a d been indicated b y the analysis o f deletions r e m o v i n g p a r t or all of the e gene (Black, 1974). Deletions eG19 a n d eG223 r e m o v e n e a r l y all o f the e gene, a n d their right e n d p o i n t s m a y be n e a r n u c l e o t i d e 490 (Fig. 3 a n d T a b l e 2) for t h e reasons outlined in the Discussion. These t w o deletions do n o t abolish I P I I I synthesis, while deletion eG342, whose left e n d p o i n t is within e a n d whose r i g h t e n d p o i n t is less t h a n 1000 bp to the right o f e (Wilson et al., 1972) does abolish I P I I I synthesis (Black, 1974). These results are consistent with t h e p l a c e m e n t s of the I P I I I gene a n d o f t h e e n d p o i n t s o f t h e deletions eG19 a n d eG223 s h o w n in F i g u r e s 1 a n d 3 a n d T a b l e 2. (e) Nucleotide sequence alterations of e mutations Sequence analyses of E I I I f r a g m e n t s bearing e m u t a t i o n s s h o w e d the following. ej382 is an insertion of an A in t h e r u n o f five A residues a t nucleotides 191 to 195 (Fig. 4). eK30 is a deletion o f a n A in the run o f five A residues a t nueleotides 368 to 372 (Fig. 5). eJ320 is t h e d u p l i c a t i o n o f t h e seven nucleotides 11 to 17, as s h o w n in Figures 6 a n d 7. eJ440 is a similar t a n d e m d u p l i c a t i o n o f the five nucleotides 413 to 417 (Fig. 8). eG59 is the deletion of 96 nucleotides (Fig. 9) a n d eG348 is the deletion of 70 nucleotides (Fig. 10) b e t w e e n direct repeats o f 8 bp (Table 2). T h e

238

J.E.

O W E N E T AL.

~ % ~ . i ~ ~,

~

~ !~!L~,~~ . ~ i ~ ~

Fro. 4.

SEQUENCE

O F T4 L Y S O Z Y M E G E N E

239

last result agrees with the sizes of the small EcoRI fragments from e + and eG348 DNA noted earlier. The possibility that additional e deletions occur between repeated sequences shown in Figure 3 and Table 2 is discussed later. 4. Discussion (a) Hotspots of fl'ameshifl mutations Streisinger, (Emrich) Owen and their colleagues (Streisinger etal., 1966) identified sites in the lysozyme (e) gene at which frameshift mutations arise and revert at particularly high frequency. These mutations are of two types: (1) those that are induced by proflavine at high frequency and revert spontaneously at high frequency (high-reverting or hr mutations) and (2) those t h a t arise spontaneously at rather high frequency and revert by proflavine treatment, but not spontaneously, at high frequency (low reverting or lr mutations). (See Table 3.) Streisinger et al. (1966), Owen (1971), Okada et al. (1972), and G. Streisinger & J. Owen (unpublished observations) determined t h a t two of these hotspots are runs of five consecutive A nucleotides in the wild-type gene, at nucleotides 102 to 106 and 368 to 372 in Figure 3. They further showed that hr mutations (eJD11 and eJ335) at each of these sites adds an A to the run of five A nucleotides while one lr mutation (eJ42) at the former site deletes an A. The results reported here show t h a t the lr mutation eK30 deletes an A from the latter site mentioned above, as predicted from genetic data (Owen, 1971). In addition the results here confirm that a third hotspot is a run of five A nucleotides (nucleotides 191 to 195) and that the hr mutation ej382 adds an A at that site, as predicted. This site is also a hotspot for lr mutations (J. Owen, unpublished data), but these mutations have not been studied further. A fourth run of five A residues at nucleotides 484 to 488 may be the site of another hotspot for lr mutations. Of 102 independent spontaneous e mutations, 15 map at or near this fourth site (to the right of amber mutation HS92 at nucleotide 473 (Inouye et al., 1970a)), eight are identical to eJ42, eight are identical to eK30, and 12 map at or near the ej382 site (J. Owen, unpublished data). Mutations at this fourth site are extremely "leaky", presumably because they are located within the region coding for the last two amino acids. They do not revert with proflavine and have not been demonstrated to be frameshifts. No hr mutations have been found at this site (Owen, 1971). In summary, three and possibly all four runs of five A nucleotides in the e gene appear to be hotspots for frameshifts. (The longest runs of other nucleotides in the e gene are 4 T, 3 G and 2 C.) Fro. 4. Autoradiograph of a DNA sequencing gel showing the ej382 mutation. The E I I I fragment DNA containing the ej382 mutation was cleaved with endonuclease EcoRI, labeled at 5'-ends with a2p, and cleaved with endonuclease HindII. A fragment about 1050bp long was isolated by electrophoresis through a 4% polyacrylamide gel, purified, and sequenced by the methods of Maxam & Gilbert (1980}. The digests (A, G, C + T , and C-specific cleavages from left to right) were electrophoresed through a 20% polyacrylamide gel until the xylene cyanol F F had migrated 9"5 cm. The bracket indicates the run of 6 T nucleotides present in the mutant in place of the 5 T nucleotides present in the wild-type a t nucteotides 191 to 195 (Fig. 3); the wild-type sequence determination is shown in the left part of Fig. 9(b).

J . E. O W E N E T A L .

240

I

-:-



,,. ,.

:

:~.

-. ,,,~.,~. ,: ::,::,~ .~:,~: :: . ,..: .~-~.:.,:.~ ~,. . . . ,.

,

(a)

(b)

F1G. 5.

SEQUENCE

O F T4 L Y S O Z Y M E

i "

GENE

""

i•

-:•'_

~

(a)

'

(b) FI(:. 6.

"

242

J . E . OWEN E T AL. ~ Duplication , A A T A T A T'T T G A A A T T T G A A A T G T T A C G T

eJ820 sequence

Reversion of eJ320

A T A T°T T G A A A T~G T T A W~ld-¢ype sequence Fro. 7. Nucleotide sequence of the eJ 3 2 0 duplication mutation, eJ3£O is a duplication of the 7 bp

indicated. Reversion occurs by deletion of one of the duplicate sequences to regenerate the wild-type sequence. Mutation eJ440 is a similar duplication of 5 bp (see text). Sequence hyphens have been omitted for clarity.

(b) Duplication mutations reverting at high frequency T h e p r o f l a v i n e - i n d u c e d m u t a t i o n s eJ 3 2 0 a n d eJ 4 4 0 r e v e r t a t r e l a t i v e l y h i g h f r e q u e n c y , a b o u t 10 - 6 , c o m p a r e d t o l 0 - 5 for t h e h r m u t a t i o n s d e s c r i b e d in s e c t i o n (a), a b o v e (see T a b l e 3). H i g h - r e v e r t i n g , p r o f l a v i n e - i n d u c e d m u t a t i o n s f r e q u e n t l y a r i s e a t o r n e a r t h e s i t e s o f eJ 3 2 0 a n d eJ 4 4 0 , w h i c h t h u s a p p e a r e d t o be hotspots of mutation similar to those of the frameshift mutations just discussed. G e n e t i c a n d m u t a n t p r o t e i n a n a l y s e s f a i l e d to c o n f i r m t h i s view, h o w e v e r . P s e u d o - w i l d - t y p e r e v e r t a n t s o f b o t h eJ 3 2 0 a n d eJ 4 4 0 p r o d u c e d l y s o z y m e w i t h a d d i t i o n a l a m i n o a c i d s f r o m w h i c h t h e b a s e c h a n g e s in t h e m u t a t i o n s c o u l d n o t be deduced. Neither the original mutation nor the second compensating mutation c o u l d be i s o l a t e d f r o m a n y p s e u d o - w i l d - t y p e d e r i v e d f r o m t h e s e t w o m u t a t i o n s (J. Owen, u n p u b l i s h e d d a t a ) . T h e r e s u l t s r e p o r t e d in F i g u r e s 5 t o 7 s h o w t h a t eJ 3 2 0 r e s u l t s f r o m a d u p l i c a t i o n o f 7 b p (5' T - T - G - A - A - A - T 3') a t n u c l e o t i d e s 11 t o 17, a n d eJ 4 4 0 r e s u l t s f r o m a d u p l i c a t i o n o f 5 b p (5' G - G - T - A - T 3') a t n u c l e o t i d e s 413 to 417. Calos et al. (1978) r e p o r t e d a d u p l i c a t i o n o f 88 b p in t h e E. coli lacI gene which reverted at high frequency.

FiG. 5. Autoradiograph of DNA sequencing gels showing'the eK30 mutation and the corresponding wild-type sequence. The EIII fragment DNA was cleaved with HindII, labeled at the 5'-ends, and cleaved with HaeIII. Fragments about 800 bp long were purified from a 6O/o polyacrylamide gel. Products of chemical cleavage (A, G, C+T, and C left to right in (a) and (b)) were electrophoresed through 20% polyaerylamide gels until the xylene cyanol F F had migrated 9 cm. The brackets indicate (a) the run of 5 T nucleotides present in wild-type DNA complementary to nucleotides 368 to 372 (Fig. 3), and (b) the run of 4 T nueleotides present in eK30 mutant DNA. F/(•. 6. Autoradiograph of DNA sequencing gels showing the eJ320 mutation and the corresponding wild-type sequence. Wild-type EIII fragment DNA was cleaved with AvaII, labeled at the 5'-ends, and cleaved with HindII; a fragment about 460 bp long was purified from a 4~o polyacrylamide gel. Mutant eJ320 EIII fragment DNA was cleaved with HinfI, labeled at the 5'-ends, and cleaved with HpaII ; a fragment about 390 bp long was purified from an 8~o gel. Products of chemical cleavage (A, G, C+T, C left to r/ght in (a) and (b)) were electrophoresed through 20% polyacrylamide gels until the xylene cyanol F F had migrated 34 cm (wild-type) or 55 cm (eJ320). In these particular experiments, the "C+T cleavage" cleaved lightly at T and G nucleotides. The brackets indicate the sequence 5' T-T-G-A-A-A-T3' present (a) once in wild-type DNA at nucleotides 11 to 17 {Fig. 3) or (b} twice in mutant eJ320 DNA.

8 E ¸

(a)

(b)

FIC. 8. Autoradiograph of DNA sequencing gels showing the eJ440 mutation and the corresponding wild-type sequence. The E I I I fragment DNA was cleaved with HinfI, labeled at the 5'-ends, and cleaved with HpaII. Fragments about 120 bp long were purified from 8% polyacrylamide gels. Products of chemical cleavage (A, G, C + T , C left to right in (a) and (b)) were electrophoresed through 8% polyaerylamide gels until the xylene cyanol FF had migrated 17 em (wild-type) or 14 cm (eJ440). The brackets indicate the sequence 5' A-T-A-C-C3' present (a) once in wild-type DNA or (h) twice in mutant ej440 DNA; the complement of this sequence is at nucleotides 413 to 417 in wild-type DNA (Fig. 3).

244

J . E. O W E N E T AL.



?

.':-::2-:':'$

i.

2:

(a)

(b)

(c)

FIG. 9. Autovadiograph of DNA sequencing gels slmwing the eG59 mutation and the corresponding wild-type sequence. The E I I I fragment was cleaved with EcoRI, labeled at the 5'-ends, and cleaved with HindII. Fragments about 1050 bp long were purified from 5 0 polyacrylamide gels. Products of chemical cleavage (A, G, C + T , C left to right in each set of 4 lanes) were eleetrophoresed through 200/o polyaczTlamide gels until the xylene cyanol F F had migrated 11 cm ((a) and (c)) or through an 8°/o polyacrylamide gel until the xylene cyanol FF had migrated 20 cm (left part of (b)) or 43 cm (right part of (b)). The brackets indicate the sequence 5' A-G-C-A-G-C-A-T 3' within which the endpoints of the eG59 deletion occur. (a) The wild-type sequence around the right end of the deletion endpoint. The right part of (b) shows the wild-type sequence around the left end of the deletion endpoint. (The complementary sequences are a t nucleotides 215 to 222 and nucleotides l l 9 to 126, respectively, in Fig. 3.) (c) The eG59 m u t a n t sequence; the bracket indicates the single copy of the 8 bp sequence (nucleotides 215 to 222 or l l 9 to 126 in Fig. 3) present in the mutant. All nucleotides between this repeated sequence are deleted in tile eG59 mutant. (The bracket in the left part of(b) indicates the run of 5 T nucleotides present in wild-type DNA complementary to nucleotides 191 to 195 (Fig. 3) at which the ej382 mutation occurred; see Fig. 4.)

SEQUENCE

~ : J

O F T4 L Y S O Z Y M E G E N E

245

•"

(b)

(o)

FIG. 10. Autoradiograph of DNA sequencing gels showing the eG348 mutation and the corresponding wild-type sequence. The E I I I fragment was cleaved with EcoRl, labeled at the 5'-ends, and cleaved with HindII. Fragments about 160 bp (wild-type) or about 90 bp (eG348 mutant) were purified from 5~o polyacrylamide gels. Products of chemical cleavage (A, G, C + T , C left to right) in each set of 4 lanes) were electrophoresed through 12% polyacrylamide gels until the xylene cyanol FF had migrated (a) 19 cm o1' (b) 43 cm, or through 20O/o polyacrylamide gels until the xylene cyanol F F had migrated 32 em (left part of (c)) or l0 cm (right part of (e)). The brackets indicate the sequence 5' A-A-A-C-C-GG-T 3' within which the endpoints of the eG348 deletion occur. (a) The wild-type sequence around the left end of the deletion endpoint (nucleotides 253 to 260 in Fig. 3). (b) The wild-type sequence around the right end of the deletion endpoint (nueleotides 323 to 330 in Fig. 3). (c) The eG348 m u t a n t sequence; the brackets indicate the single copy of the 8 bp sequence (nucleotides 253 to 260 or 323 to 330 in Fig. 3) present in the mutant. All nucleotides between this repeated sequence are deleted in the eG348 mutant.

102-106 191-195 368-372 102-106 368-372 11-17

413-417

eJDll ej382 ej335 eJ42 eK30 eJ320

eJ440

Mutant

AAAAA AAAAAA AAAAA AAAAAA AAAAA AAAAAA AAAAA AAAA AAAAA AAAA TTGAAAT TTGAAATTTGAAAT GGTAT GGTATGGTAT

Wild-type

--

1.3 -3"5 4-6 5"7 -20

284 78 104 0"057 < 0"023 9.5 82-164

200 82-240 195 23"2 22"6 82-328

ll

545 170 243 91 8"3 44

Frequency of mutants x 10T[[ Spontaneous Proflavine-induced Formation Reversion Formation Reversion

(2, 3)

(1) (2) (1) (1) (1) (2, 3)

Source¶

t Prefixes for allele designations indicate the origins of the mutations: J, proflavine-induced; JD, derived from pseudowild revertants of J mutants; K, spontaneous. Numbering refers to t h a t in Fig. 3. § Nucleotide sequences are from this work (see Results) except for those for eJD11 (Okada el al., 1972), ej335 (J. Owen, unpublished data), and eJ42 (Terzaghi et al., 1966). The hyphens have been omitted for clarity. [[ Frequencies of mutants are given as the number of mutants per total phage produced in a single cycle of infection. Data are from the sources cited. ¶ (1) G. Streisinger & J. Owen (unpublished data); (2) Owen (1971); (3)' J. Owen (unpublished data).

Site:[:

Allelet

Nucleotide sequence§

TABLE 3 Nucleotide sequence a n d m u t a n t f r e q u e n c i e s o f s o m e m u t a t i o n s w i t h i n the T4 l y s o z y m e e gene

SEQUENCE OF T4 LYSOZYME GENE

247

At present the mechanism by which these duplication mutations arise is not understood. The sequence immediately preceding the duplicated bases at the first site is 5' T-A-T-A-T 3' (nucleotides 6 to 10) and t h a t at the second site is similar: 5' T-A-G-A-T 3' (nucleotides 408 to 412). A sequence similar to these, however, does not precede the duplication reported by Calos et al. (1978).

(c) Deletions formed between direct repeats The endpoints of two internal deletions, eG59 and eG348, are at repeated sequences within the wild-type gene (Table 2). In each case an 8 bp sequence is present twice in the same orientation in wild-type b u t only once in the mutants, and all nucleotides between the repeats are deleted. Two other deletions eG19 and eG223 have endpoints within or close to the e gene (see Fig. 1 and Results, section (d)). Although the extents of these deletions have not been determined by nucleotide sequence analysis, their endpoints m a y be at repeats listed in Table 2, for the following reasons. The sizes of eG19 and eG223 were determined to be 500 and 800 bp by electron microscopy of heteroduplex DNAs (Wilson et al., 1972). eG19 deletes all tested markers in the e gene, while eG223 deletes all markers to the right of eJ320 (Owen, 1971). Finally, these two deletions have endpoints within the limits of the E I I I fragment as determined by marker rescue with 2E I I I hybrids (data not shown). All of these properties are consistent with assignments of the endpoints for deletions eG19 and eG223 shown in Table 2. Deletions in other genes occur between direct repeats. Farabaugh et al. (1978) showed t h a t four different mutations within the E. coli lacI gene delete 12 to 115 bases plus one copy of repeated sequences 5 or 8 bp long. Pribnow et al. (1981) have shown t h a t two, and possibly three, deletions in the r I I locus of T4 delete a b o u t 230 to 260 bp plus one copy of repeated sequences 7 to 11 bp long. Several of the lacI and r I I deletions have been isolated more than once. I t appears t h a t deletions are frequently formed between direct repeats. Efstratiadis et al. (1980) have proposed a mechanism for the formation of long deletions between short repeats, based upon the mechanism proposed by Streisinger et al. (1966) for the formation of short deletions within reiterated sequences. The d a t a reported here, as well as those of F a r a b a u g h et al. (1978) and Pribnow et al. (1981), are consistent with the mechanism proposed by Efstratiadis et al. (1980). (See also Albertini et al., 1982.)

We thank Larry Gold, Britta Singer and George Streisinger for unpublished data and many valuable discussions; Betty Kutter and Noreen Murray for phage and unpublished data; Karen Sprague and the people in her laboratory for advice about DNA sequencing; Sue Grace and Don Ennis for technical assistance; Julie Dunn for preparation of the manuscript; and the Stanford MOLGEN Project and its staff, the National Institutes of Health SUMEX-AIM facility and Roger Staden for computer programs and help in their use. This research was supported by grants from the National Institutes of Health (GM22731-04 to G. Streisinger and AI13079 and GM26798 to G.R.S.) and from the National Science Foundation (PCM75-18739 to G.R.S.). One of us (G.R.S.) is the recipient of a Research Career Development Award from the National Institutes of Health (AI 00385).

248

J . E . OWEN ET AL. REFERENCES Albertini, A. M., Hofer, M., Calos, M. P. & Miller, J. H. (1982). Cell, 29, 319-328. Berkner, K. L. & Folk, W. R. (1977). J. Biol. Chem. 252, 3176-3184. Black, L. W. (1974). Virology, 60, 166-179. Calos, M. P., Galas, D. & Miller, J. H. (1978). J. Mol. Biol. 126, 865-869. Efstratiadis, A., Posakony, J. W., Maniatis, T., Lawn, R.M., O'Connell, C. 0., Spritz, R.A., DeRiel, J.K., Forget, B.G., Weisman, S.M., Slighton, J.L., Blechl, A.E., Smithies, O., Baralle, F. E., Shoulders, C. C. & Proudfoot, N. J. (1980). Cell, 21,653668. Farabaugh, P. J., Schmeissner, U., Hofer, M. & Miller, J. H. (1978). J. Mol. Biol. 126, 847857. Fasman, G. (1976). In Handbook of Biochemistry and Molecular Biology, 3rd edit., p. 277, CRC Press, Cleveland. Fiefs, W., Contreras, R., Duerinck, F., Hmegeman, G., Merregaert, J., MinJou, W., Raeymaekers, A., Volekaert, G., Ysebaert, M., Van de Kerckhove, J., Nolf, F. & Van Montagu, M. (1975). Nature (London) 256, 273-278. Inouye, M., Akaboshi, E., Kuroda, M. & Tsugita, A. (1970a). J. Mol. Biol. 50, 71-81. Inouye, M., Imada, M. & Tsugita, A. (1970b). J. Biol. Chem. 245, 3479-3484. Isobe, T., Black, L. W. & Tsugita, A. (1976). Proc. Nat. Acad. Sci., U.S.A. 78, 4205-4209. Kaplan, D. & Nierlich, D. (1975). J. Biol. Chem. 250, 2395-2397. Kutter, E., Berg, A., Sluss, R., Jensen, L. & Bradley, D. (1975). J. Mol. Biol. 99, 591-607. Maxam, A. & Gilbert, W. (1980). Methods Enzymol. 65, 499-560. Murray, N. E., Brammar, W. J. & Murray, K. (1977}. Mol. Gen. Genet. 150, 53-61. Okada, Y., Terzaghi, E., Streisinger, G., Emrich, J., Inouye, M. & Tsugita, A. (1966). Proc. Nat. Acad. Sci., U.S.A. 56, 1692-1698. Okada, Y., Streisinger, G., (Emrich) Owen, J., Newton, J., Tsugita, A. & Inouye, M. (1972). Nature (London), 236, 338-341. Old, R., Murray, K. & Roizes, G. (1975). J. Mol. Biol. 92, 331-339. Owen, J. E. (1971). Ph.D. thesis, University of Oregon. Pribnow, D., Sigurdson, D. C., Gold, L., Singer, B. S., Napoli, C., Brosius, J., Dull, T. J. & Noller, H. F. (1981). J. Mol. Biol. 149, 337-376. Remington, S. J., Anderson, W. F., Owen, J., Ten Eyck, L.F., Grainger, C. T. & Matthews, B. W. (1978). J. Mol. Biol. 118, 81-98. Ripley, L. S. (1982). Proc. Nat. Aead. Sci., U.S.A. 79, 4128-4132. Sadowski, P. D. &Vetter, D. (1973). Virology, 54, 544-546. Sadowski, P. D., Warner, H. R., Hercules, K., Munro, J., Mendelsohn, S. & Wiberg, J. S. (1971). J. Biol. Chem. 246, 3431-3433. Sharp, P. A., Sugden, B. & Sambrook, J. (1973). Biochemistry, 12, 3055-3063. Shine, J. & Dalgarno, L. (1974). Proc. Nat. Acad. Sci., U.S.A. 71, 1342-1346. Sprague, K. U., Faulds, D. H. & Smith, G. R. (1978). Proc. Nat. Acad. Sci., U.S.A. 75, 6182-6186. Staden, R. (1979). Nucl. Acids Res. 6, 2601-2610. Steege, D. (1977). J. Mol. Biol. 114, 559-568. Steitz, J. A. (1969). Nature (London), 224, 957-964. Streisinger, G., Okada, Y., Emrich, J., Newton, J., Tsugita, A., Terzaghi, E. & Inouye, M. (1966). Cold Spring Harbor Symp. Quant. Biol. 31, 77-84. Terzaghi, E., Okada, Y., Streisinger, G., Emrich, J., Inouye, M. & Tsugita, A. (1966). Proe. Nat. Acad. Sci., U.S.A. 56, 500-507. Velten, J., Fukada, K. & Abelson, J. (1976). Gene, 1, 93-106. Wiberg, J. (1967). J. Biol. Chem. 292, 5824-5829. Wilson, J., Kim, J. & Abelson, J. (1972). J. Mol. Biol. 71,547-556. Edited by M. Gottesman