The ovalbumin gene family

The ovalbumin gene family

I. Mol. Biol. (1982) 156, 1-19 The Ovalbumin Gene Family The 5’ End Region of the X and Y Genes ROLAND HEILIG, RHEINHOLD MURASKOWSKY AND JEAN-LOU...

5MB Sizes 1 Downloads 187 Views

.I. Mol. Biol. (1982) 156, 1-19

The Ovalbumin Gene Family The 5’ End Region of the X and Y Genes ROLAND HEILIG,

RHEINHOLD

MURASKOWSKY

AND JEAN-LOUIS

MANDEL

Laboratoire de G&nne’tiqueMoltkulaire des Eucuryotes du CNRS Unite’ 184 de Biologic Molbxdaire et GknnieGknne’tiquede I’INSERM Institut de Chimie Biologique Faculte’ de M&de&e 11, rue Humann-Strasbourg, France (Received 18 September 1981) A cluster of three genes, the X, Y and ovalbumin genes, which are expressed in the oviduct under steroid hormone control, composes the ovalbumin gene family in the chicken. We have now exactly mapped the 5’ end of X and Y precursor RNAs on the corresponding gene sequences and have obtained extensive sequence information on the regions located upstream from the transcription initiation sites. Conservation of the “TATA box” sequences and of the leader exon sequences in all three genes indicates that these elements were included in the gene unit that was duplicated to give rise to this gene family. However, contrary to the ovalbumin and Y leader exons, the corresponding exon in the X gene contains an initiation codon, in phase with the rest of the X protein-coding sequence. Sequence homology between the three genes does not extend more than 80 nucleotides upstream from the transcription initiation sites. A peculiar 306nucleotide long T + C-rich sequence element has been found in the Y gene 5’4anking region which has no counterpart in the two other genes. An 11 base-pair repeat of the X gene TATA box sequence, located 760 base-pairs upstream from the X RNA start site, appears to function as a weak promoter in the oviduct of laying hens. A computer search for sequence homologies in the 5’-flanking sequences of six chicken genes inducible in the oviduct by steroid hormones, did not reveal any striking similarities which might have corresponded to control regions.

1. Introduction A cluster of three genes, the X, Y and ovalbumin genes, composes the ovalbumin gene family in the chicken. These genes arose by duplication events from a common ancestor, as demonstrated by the great similarity in their exon-intron organization and by the sequence homologies found between corresponding exons (Royal et al., 1979; Heilig et al., 1980). These three protein-coding genes are expressed in chick oviduct under similar, but non-identical steroid hormone control. In the oviduct of laying hens, ovalbumin messenger RNA is 50 times more abundant than X and Y RNAs (LeMeur et al., 1981). In oviducts of chicks withdrawn from oestrogen stimulation, further treatment with either oestradiol, progesterone, or dexamethasone results in a dramatic increase in ovalbumin mRNA concentration 1 002%2s36/82/09000-19

$03.00/O

\c 1982 Academic Press Inc. (London) Ltd.

2

R. HEILIG,

R. MrRASKOVVSKY

ASI)

J-l,.

MANl)EI,

(McKnight & Palmiter, 1979 ; Hager et al.. 1980). Oestradiol stimulation also induces accumulation of X and Y mRNAs (Royal et al.. 1979 ; Colbert et al.. 1980 : LeMeur et al.. 1981). However, dexamethasone treatment appears to have little or no effect on X and Y mRNA accumulation, and progesterone stimulation can induce Y but not X mRNA accumulation. These differences are due, at least, in part, to differences in transcription rates (LeMeur et al., 1981). As a necessary step towards the understanding of steroid hormone control of transcription, we have undertaken a detailed structural study of the 1)NB region involved in initiation of RNA synthesis in the x’ and Y genes. We have accurat~~l? mapped the Ei’ end of their transcription unit, determined the structure of their first exon (the “leader” exon), and we have obtained extensive sequence information on the regions located upstream from the transcription initiation sites of t,he X. 1’ anti ovalbumin

genes.

2. Materials and Methods (a) Enzymes The various enzymes used were obtained from Biolabs or B. K. I,. (restriction nucleases), Miles (S, nuclease), Boehringer (DNA polymerase I, polynucleotide kinase). Reverse transcriptese from avian myeloblastosis virus was a gift from Dr eJ. Beard (Viral Cancer

Program, National

Cancer Institute).

Conditions for 5’ end-labelling with [y-32P]ATP and polynucleotide kinasr were those described by Maxam & Gilbert (1980). To separate the labelled ends, fragments were recut, with an appropriate restriction nuclease and/or submitted to electrophoret,ical strand separation. Conditions for strand separation were those described by Maxam & Gilbert (1980), except that the DNA sample to be electrophoresed contained 50% (v/v) dimethyl sulfoxide. Fragments were isolated from the gel as previously described (Heilig et al.. 1980). III addition however, the DEAE-cellulose step was omitted for single-stranded fragments. to the 4 standard sequencing reactions we generally performed the alternate A > C reaction (Maxam & Gilbert, 1980). When a totally unambiguous sequence was obtained for one strand, we did not systematically sequence the opposite skand. All restriction sites used for 5’ endlabelling were overlapped in a different sequence. (c) AS, nuclmsr

mapping

A total of 150 to 200 ng of a single-stranded restriction fragment, (400 t,o 600 nucleotides at its 5’ end, was mixed with 150 pg of poly(A)+ RNA from oviducts of long), 32P-labelled laying hens and heated to 85°C for 10 min. Hybridization was then performed in 10 tIIMPIPES (pH 65), 400 mM-NaCl at 68°C for 6 h (under these conditions the DNA probe is in at least a IO-fold excess over the estimated amount of precursor RNA). Digestion with S, nuclease was as described previously (Heilig et al., 1980). (d) Revmw transcription The 5’ end-labelled single-stranded primer was hybridized to purified X RNA at 68°C’ for 1 h as described for S, nuclease mapping (see above). After precipitation with ethanol, reverse transcription was performed for 3 h at 41°C in 50 ~1, with 1 mM of all 4 deoxynucleoside triphosphates, and 8 or 16 units of reverse transcriptase. in the buffer described by Benoist dz Chambon (1981).

5’ KSI)

RICGIOS

OF O\~AI,Hl-hiIN

FAMILY

(:ESES

3..s

(a) Localization

of the 5’ ends of X and Y gene transcription

units

The cloning of chicken DNA segments containing the X and Y genes has allowed us to elucidate the general organization of these genes (Heilig et al., 1980). Electron microscopic examination of hybrids between X and Y mRh’As and the corresponding cloned genes revealed that in both cases the first exon is very small (about 50 nucleotides), and thus resembles the 47 base-pair-long non-protein coding leader exon of the ovalbumin gene. This analogy prompted us to call the first exon of the X and Y genes “leader”, although we could not exclude that one or both of them contain an initiation codon for the corresponding protein. Comparison of the exon-intron map, obtained by electron microscopy. with the restriction map of the cloned segments showed that the leader exons of X and Y genes are located at the 3’ end region of a 1.7 kbt and a 195 kb EcoRI fragment, respectively (Heilig et al.. 1980). In order to locate more precisely the 5’ ends of X and Y gene transcription units and to determine if they correspond to the 5’ ends of the leader exons, we used the 8, mapping technique described by Berk & Sharp (1977) as modified by Weaver & Weissmann (1979). We used as hybridization probe DNA fragments containing the leader exon and 5’.labelled with 32P, at sites known to be within intron A of X or Y genes. These fragments were hybridized to poly(A)+ RNA from laying hen oviduct, subjected to S, nuclease digestion and the labelled protected DNA segments were analyzed by autoradiography after electrophoresis on denaturing polyacrylamide gels. Since the label is on a restriction site within intron A. it can be protected only by hybridization to polyadenylated precursor Rh’As containing an intron A transcript, and this type of analysis allows one to establish the location of the sequence coding for the 5’ end of such precursors. Using a CZaI-EcoRI fragment 5’.labelled at its EcoRI site as hybridization probe, the 5’ end of the X gene was first localized within the 1.7 kb EcoRI fragment’ (Fig. l(a)) at about 340 bp upstream from the 3’ EcoRI site. This placed a previously mapped Fnu4HI site within intron A, close to the leader coding sequence. A single-stranded hybridization probe, 5’ end-labelled at this site (CZaIFnu4HI fragment: Fig. l(b)), was then used for a precise determination of the 5’ end of the X gene. The S1 nuclease-resistant DKA segments were analyzed on a sequencing-type acrylamide/urea gel (Fig. 2(a), lanes 1 to 3) in parallel with Maxam-Gilbert sequencing reactions of the starting CZaI-Fnu4HI-labelled fragment. This analysis revealed four labelled bands spaced one nucleotide apart. Treatment of the hybrids with increasing concentrations of S, nuclease resulted in a proportional increase in intensity of the two lower bands, while intensity of the most upper band diminished correspondingly. Alignment of these bands with the sequence reactions run in parallel located the 5’ end of the X gene Oranscription unit within a four-nucleotide long area (starred nucleotides on the RNA coding strand : Fig. 2(a)). Using a similar approach, the 5’ end of the Y gene was first located at about t Abbreviations

used: kb. 10’ bases or base-pairs where appropriate:

bp. base-pairs

R. HEILIG,

4

O-

R. MURASKOWSKY

A?il)

200 bp 5

J.-L.

Tronscr~ptm L---lntron

TATA x Ps’i 1

Fnu471

ACCI 1

I PSf4

3, A ----/I-t I EcoRI

1 Fnu4HI t

C/a1

.T . . . .t. . .C-rich ... .. ... ..

Ov (e)

MANDEL

_____ ----

-L-

lntron A -

. _., ,. 1 HmdIU

/H/n” *

L- lntron A -----/,-d ----_--_ Eco6

FIG. 1. Maps of the sequenced 5’ end fragments of the X. I’ and ovalbumin genes. (a) Map of the I.7 kb EcoRI fragment which contains the X “leader” exon (L, heavy line) and the 5’ end of intron A (Heilig et al., 1980). The location (at position -760, see Fig. 3) of an exact 11 bp repeat of the X gene TATA box sequence is also indicated (TATA). (b) Location of the Fnu4HI-CZaI fragment used for S, nuclease mapping of the 5’ end of the X gene and of the F?zu4HI-P&I fragment used to analyze transcription directed by the upstream T-A-T-A sequence (see text and Fig. 9). The askerisks denote the labelled sites and the arrows the unlabelled ends of the fragments protected against S, nuclease. (c) Map of the 2.1 kb XbaI-Hind111 fragment (Heilig et al., 1980) which contains the Y “leader” exon (L, heavy line) and the 5’ end of intron A. The dotted line corresponds to the Tf C-rich repetitive sequence (see text). (d) Location of the Hilzfl-Ace1 fragment used for S1 nuclease mapping of the 5’ end of the Y gene (symbols as in (b)). (e) Map of the 1.7 kb P&4-EC& fragment (Gannon et al.. 1979) which contains the ovalbumin “leader” exon (L, heavy line). The region previously sequenced (Benoist et al., 1980; Gannon et al., 1980) is underlined with a broken line.

600 bp upstream from the Hind111 site present in intron A (see Fig. l(c) ; results not, shown). In a subsequent experiment, a single-stranded ilccI-Hinff DKA fragment, 32P-labelled at the HinfI site within intron A. was used as hybridization probe (Fig. l(d)). Three major bands corresponding to S, n&ease-resistant hybrids were detected (Fig. 2(b), lanes 1 to 3) and were positioned with respect’ to the corresponding sequence ladder. The relative intensity of the three bands did not vary when the S1 nuclease concentration was increased by two- and fourfold (lanes 1 to 3, respectively). This observation. and the similar one made in the X gene mapping (Fig. 2(a)) suggest a microheterogeneity for the 5’ ends of both X and Y primary transcripts. (b) Sequence of the 5’ end regions of X, Y arbd ovalbrrmit~ genes The complete sequence of the 1.7 kb EcoRI fragment encompassing the X leader was determined using the Maxam & Gilbert (1980) method, and is shown in Figure 3. A 2100 bp XbaI-Hind111 fragment which contains the I’ leader was similarly sequenced (Fig. 4). Finally, the sequence previously available around the 5’ end of the ovalbumin gene (Benoist et al., 1980 ; Gannon et al.. 1980) was extended by sequencing the complete 1.7 kb P&I-EcoRI fragment (Fig. 1 (e)) and is shown in Figure 5. The location of t,he “leader” exon sequences for X and I’ genes was derived both from the results of S, mapping experiments described above. and from the homology with the corresponding region of the ovalbumin gene (see also Fig. 6).

2

3

G

A+G

T+C

C

(b)

I 2

3

G

A+G

TfC

C

2

G

fC

7

F

-f* --G* G -T* G

-c

FIG:. 2. S1 mapping of the 5’ ends of X and Y gene transcription units. (a) The 1.7 kb EcoRI fragment of the X gene was cut with Fnu4HI,5’ end-labelled and recut with CZaI (see Fig. l(b)). The mixture of fragments w&s subjected to strand sepomtion by polyeorylemide gel electrophoresis (Mexem & Gilbert, 1980). The appropriate CZaI-Fnu4HI fragment w&s isolated, hybridized to poly(A)+ RNA from laying hen oviduct, and S, mapping w&s performed as described in Materials and Methods. Protected DNA fragments were analyzed on an 8% sequencing polyecrylamide gel in parallel with the chemical sequencing reactions done on the starting labelled fragment. Lanes 1 to 3: fragments resistant to a 2-h digestion at 25°C with 800, 1600 and 3200 units of S, nuclease, respectively. Nucleotide positions corresponding to t,hr 3’ end of the S,-resistant fragments are indicated by a star (bearing in mind that an S, nurlense-rwistant fragment. migrates one position above the correxlwndirlg position in the sequence track). (b) The 2.1 kb XbaT-Hind111 fragment of the Y gene wax cut with H%dI, 5’ endlahelled and recut with AccI (Fig. l(d)). Isolation of the single-stranded labelled probe, hybridization and S, nuclease mapping were performed as described for (a). Lanes 1 to 3 correspond to the same digestion conditions as in (a). The arrowheads correspond to the major bands discussed in the text (the 2 intercalated minor bands appear to be an artefact that we cannot explain at present, since they occupy the space for a single nucleotide position).

I

R. HEILIG.

R. Ml-RASKOWSKY

ASI)

J-l,.

M.\Sl)El,

FIG:. 3. (‘omplete src~uence of the 1.i kb /~‘eoRl fragment which contains the Iratl~~t~exam and 3’. flanking sequrnws of the X gene. Position + 1 corresponds to the 5’ end of the X genr transcription unit, based on S, nuclease mapping and homology with the ovalbumin gene (see also Fig. 6). The sequencr of the leader excm is boxed and T-A-T-A sequences which are discussed in the text are underlined. Locations of some imperfect palindromes are indicated by arrows. The hyphens have been omitted for clarit,y.

From the comparison between Y and ovalbumin gene sequences, there appears to be little ambiguity in defining the 3’ end of the Y leader, since the gene I’ sequence has conserved 19 out of the 24 nucleotides around the ovalbumin splice junction (Fig. 6). This is not true, however. in the case of the X gene. since there is less homology with the ovalbumin or Y genes towards the 3’ end of the L cxon and since several possible splice site sequences can be found. two of which (C-T/(:-T-X and A-A/(:-T-A) are located within four nucleotides in the area corresponding to the Y or ovalbumin splice site (Fig. 6). Although t,he second of these sequences fits better the model splice junction donor sequence derived from the comparison of many genes (purine-purine/GTA; see Breathnach &I Chambon, 1981) it would place the AIlr(: codon found in the X “leader” out of phase with the proteh-coding sequence in exon 1, while phase would be respected if the (‘-T/(:-T-.4 sequrnc~c defines the splice junction. This ambiguity was resolved by a direct determination of t,he 3’ t>nd of t,htb S

-1180 -1590. -1600 . . . . . ..t.~GOt~GTTTCPG~GGC*TTTTGT*TATTOt~~~G~G

-:'-.-ILWb-

-1540

" -1'540 - IS0 -1530 E ITIICTCIG(\GGTt*l(*CTIOTTTGTTTG~~~~~

-1460

-1450

-1.40

-1430

-1420

-1380 -1370 -1360 -1350 -1390 -1410 -1400 *GTT*CTTGTG**CG*TCTGT**G**TGTTCTTTTG**TGTG~****G*~*~*TTTTG**~~T~*~*T~TGGTG~TGTG*~~TGTTTG****G**~**~T~

-1140

-1,ao

-1320

-1270 -1250 -1280 -1260 -IZPO -1300 -1310 ***TC**C*TTC****CTGT*G~*GTG*GTTTG**T*~TT~T~TTGT*G~TT~TG*~TGG*GT~TG**~*T~~T**T*T~TG**TTT*****G~**~*G*

-1240

-1230

-1220

-1160 -1150 -1180 -1170 -1200 -1190 -1210 ’ *GTCTCTTCTCTGCTE**)CCTCTTCTOCOACAGT*~*T~TTT~TT~*GTT~TGT*TTTTTTTTTT~TTTT**T*~*G*TG~T~TG**T*TTG~TTT~***

-1140

-1130

-1120

-1050 -1070 -1010 -1090 -1080 -1110 -1100 *TT**TTTGG*TTC*T*C*GT*TGCTTGTTG*T*CTTTCCT*CTG*C**TCTGC*C*G*CC*TGTTGGC*C*C**GGTCCTG*GTT*G*CTGCTCC*GC*

-1040

-1030

-1020

-950 -960 -990 -980 -970 -1000 -1010 *TGCTCG*CTGCTCTGC****TGCTTT*TTTTTTGC**TTC*GGCTGT**GTGGC*TC*GGC*C**G**CT*G*C**TT*C*T*C**GTTTTC*CTGT*G

-940

-930

-920

-850 -860 -870 -890 -880 -910 -900 GT*TCCCT*TT**TTGC*G*GG*TTTGG*CT*G*TGGTCTTC**GGGTCCCTTCC***TC***TG*TGCTTTG*TTCTGTG*TTTT*TG****GTTGC*G

-840

-830

-820

-750 -760 -790 -780 -770 -800 -GIG T**GT*~*GGGTGGGC*T**~*~*G~**GG*GT~~TG**TGT*~TG~*TTTTTT*TGTT~T~*G**TGGTG*~TG~T*G*GG**TCTGG*~TGT~*GT*~

-740

-730

-720

-ASO

-640

-a0

-*20

-540

-530

-5’0

-1510

-1500

-1490

-1480

-1470

C*TTCTTTT*TTTC***GG*TlT*TG*G*TlGCTTT*TGClTGCT*TTGT*T*TT*TG*CT~TCCTGC*G*CC*TG**TGTTTC4CCTG*TGTGGC*TG*

-670 -6*o -690 -Mb -710 -700 TC*C*G*GG***************GG***TT**~TT***TT~~TT*G*G*~*TTGTG~*T**~T***T*T~*C*~TTTTTTTTTTTG~TTTGTTTT~*CT

-550 -590 -580 -570 -560 -600 -610 *TCTGTGCC*C*GT*TTTGCTTCTOTGTG~TTG**TT*T*~TT*GTGTT~**GTTT~*GTG**T*G~TTTT*T~*TTTTTGTTT~**T~TT*T~*GT*T*CT

-450 -440 -430 -420 -470 -4.30 -490 -480 -510 -500 CC*TCCTTTTCTCC**GGTG~~*T*TG*T*T~~TTCCTTtTGG**~TTTT*TTT*G*G*~TT~TTT~TTT~~TT~~~T~TT~~*TT~T~TCTTT~TTT** .. .. . .. .. . .. .. . .. .. . .. . . .. .. . .. .. . . .. .. .. .. . .. .. .. .. .. . ... .. . .. .. . .. .. .. . .. .. .. .. . .. .. . .. . -330 -360 -350 -340 -390 -380 -370 -410 -400 CTTTTTCCTTTCTCCTTTCTTTTTTGTTTCTTTTTTTTC~~TTTT*T~T,T~TTT~~TT~TTTTTT~~TT*TTT~TTTTT~~TT‘TTT~TT~~T~TTTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .........................................................................-............--.

-220

-270 -160 -250 -290 -280 -310 -100 CCTTC**TTTCTTTTTTCCTTCC*TTTCTTTT~TTTTTCTTT~TTTTCTT~~TT~~TT~~TT~~TT~~TTT~TTT~TTT~T~T~TTT~TTT~TT . .._............................................................................................................................................................................................................

-240

-160 -150 -1GO -170 -200 -190 -210 CCTTTCTCTCTCTCTTTCTGTCTTTT~TTTCCCTCTTT~~~T~TTTTTTTTTT**TTTT**TTTTT*T,TTTTTTTTGT***T***GG*~TT~**CC**GT***~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............... .. .....................

-140

-130

-120

-40

-30

-20

-70 -90 -GO -*o -50 -110 -100 GTGTGTTTCTG*C*CTG*GTTCCnTCCPlTCATTC*GTTTGG~***G*~*G**T*GG~*G~*TGGGGTGTGT~*TG*C*TT*T*~*GG*T*T*TTT~**GG

-230

-320

I.._

. .. .

61

71

RI

131 141 151 111 121 91 101 CT*ACCC*TTTTTT****TG*TC*T**G*CTT*C**G**T*CTG*TGG**CTTTGTGGTTTGTC*TC**G**C*GTC**G***C***TG*TT*A*GG*TG

161

171

181

241 251 221 231 211 191 201 *CTTCTTT*****TCTATTCTT*CCTTC*C*TTTCTGTTCTGC*TT*CTGT*CTGTTTC*C*GCCTGCC*C~T*TG**GTC***GTGTT*GTAC***GT*

261

?,I

2”1

321 331 341 351 311 291 301 **GCT*TGTTTACT**TTCTGT**C*CTG*G**GCTGGC*CTGT*CTGAG*C*CCCTTTCTTCCTTTTTC*TTG*TGTCCTTTGTTTCTG*TTTGG***T

361

371

381

451 391 401 411 421 431 441 T***TGC*GC*CTG**TTTGTTT***TT~**G*~TT**G~TG*GTTG~*TGGTCT*CCT**C*T*CTTTCTG**TG**GTT*CTG**TGC*GC*TGGTC*

461

471

4G1

21 31 41 1 II -10 5) *GTTCTGC**GGCTGT*CC*CGT*C*GCTG*G**GCTGT*CTCTT*TC*TC*C*GGTG**GCTG*T**G~T**GC*TTTCTTTTGGTT*TG*TTC*TGTT

491 511 501 GGT*TC**C**C*T*CTG~***TT**TTT~TGTPIG~*TTTG~TT~**

521

531

54,

551

561

571

xl1

591 611 401 C*GT*TGTTTTTTC**C**G*TC*T*~*TGG*G~TT***GCTT

621

*31

.54l

651

661

671

081

PIG. 4. (%~mplete sequence of the 2.1 kb X&I-Hind111 fragment which contains the leader exon and 5’.flanking sequences of the Y gene. Position + 1 is defined as for the X gene (see Fig. 3). The T+(‘-rich region discussed in the text is underlined by dots. The leader exon is boxed and the corresponding T-AT-A sequence is underlined. The hyphens have been omitted for clarity.

“leader” exon. An X complementary DNA containing this area was synthesized by specific primer elongation on an X mRNA template and was then sequenced using the Maxam & Gilbert (1980) method. The primer chosen was a 99 bp Fnu4HIHaeIII fragment, entirely contained within the exon 1 sequence, and was prepared from a digest of the EcoRI 4.4 kb sequence of the X gene (Heilig et al., 1980). The fragment was labelled with 32P at its 5’ ends and the strands were separated by gel electrophoresis. The mRNA coding strand was used for hybridization with X RNA purified by prior hybridization selection (LeMeur et al.. 1981). Purification of the

-540 -530 -520 -510 -300 -490 -480 4CTCTAGTCTC~GTTGGCTCCTTC~C~TGC~TGCTTCTTT~rTTCTCCT~TTTTGTC~~G~~~~TA~T~GGTC~CGTCTTGTTCTChCTT~TGTCCTGCC

-470

-460

-450

-340 -330 -320 -310 -300 -290 -280 TTGCTAIITT~TGTTTTCC*TCTCTCT~~GGTTCCC~C~TTTTTCTGTTTTCTT~~~G~TCCC~TT~TCTGGTTGT~~CTG~~GCTC~~TGG~~C*TG~GC~~

-270

-160

-250

-20 -10 1 21 -40 -30 11 TGGGTCRC~~TTCAGGCTAT~T~TTCCCCCPIOGGCC~GTGTCTGT’~C~T~C~GCT~G~~~GCTGT~TTGCCTTT*GC~CTC~~GCTC~~~~G,GT~~

31

41

51

71 101 61 71 81 111 121 GC~ACTCTCTGGAATTACCTTCTCTCTATATT~GCTCTTRCTTGC~CCT~~~CTTT~~~~~ATT~~C~~TTATTGTGTT~TGTGTTGT~TCTTT~~GGG~

131

141

151

181 191 “01 161 171 211 221 GA~GT4CCTGCGTGAT~CCCCCT~T~~~~TCTTCTCRCCTGTGT~TGC~TTCTGC~CT~TTTT~TT~TGTGT~~~~GCTTTGTGTTTGTTTTC~GG~GGC

231

241

251

2&l 271 281 291 TTATTCTTTGTGCTTA~~ATATGTTTTTAATTTCAGAACCTTCT~GCCC

301

311

321

331

341

351

371 381 361 T,,C,,GPIGTGCACAGAGAFCAAcMTCATGGTGTTCAGTG,MTTC

401

411

421

431

441

451

391

FIG:. 5. (‘omplete sequence of the I’stl-li:coRI fragment which contains the leader exon and 5’.Ranking sequences of the ovalhumin gene. Position + 1 corresponds to the major cap site as determined by Mulek ct al. (1981). The leader exon is boxed. Its corresponding TATA box and the additional TATA box-like sequences present in the region - 870 to - 700 are underlined. Locations of 2 imperfect palindromes are indicated by arrows. The hyphens have been omitted for clarity.

RNA appeared necessary since the primer shared 794, homology with the corresponding region of the ovalbumin gene, and thus might have reacted with ovalbumin mRNA, which is SO-fold more abundant than X mRNA in poly(A)+ RNA from a laying hen oviduct. After elongation with reverse t,ranscriptase in the presencc~ of cold deoxynucleotides, the labelled complementary DNA produced was analyzed by acrylamide gel electrophoresis. The full transcript was purified before sequencing to avoid problems due to incomplete products (the presence of specific premature stops is evident in Fig. 7. lane 2, the importance of which is much reduced if more enzyme is used during reverse transcription: see lane 3). The comparison of the complementary DNA sequence with the genomic sequences in the region of the leader exon and exon 1 demonstrated conclusively that the CT/G-T-A sequence corresponds to the 3’ limit of the L exon in the X gene (Fig. 7), which places t,he

5’ ESU ov X

REGION

OF OVALRUMIN

FAMILY

GENES

.U”....... G-r,& ............ -8.5 scGTCAAAGGtca~ActTctGAAGGGaAcCtGtGGGTGgGTCA-----cAAT -87 racTCAAAGGatGAcTcTtaGAAGGctAaCAGGGGGaGTGTCcGAA---A4s -82

Y

ttGsCAAAGacrGAAT-------aGGcAsCAtGGGGTGTGTCAtgAcattAT

) Leader +I ov X

tCAGGcATATA tcCCcAGGsCTCAGccAsTGT~T:t KTACAGCT~G~ I$ *** * gtActs,TATATATCACCaAGGaCTCAGAGA~TcTsTtC AgGTtCAdTGGc

Y

aCAG&ATATtTCAassAGttCTscaAGscTGT-::C kGTACAGCTGas

ov

AAGCTGTATTvctTtAGCAC---TcAAGCTcAaAAG GTAAGCAacTCTCl

X

AAGCTGsATTaTTAcsAGCctcttTGEtT-tTtct

Y

AAGCTGTAcTcTTATcAtCACassTGAAGCTsATAAG GTAAGCAtTTCTtT

Intron A

c

GTAAGtAcTTCTCc

FIG. 9. Sequence comparison of the 5’ ends of ovalbumin, X and Y genes. Bases which are common in at least one gene pair are shown in capital letters. Existence of deletions or insertions have been assumed when they allow us to align functional elements (TATA box, splice junction) and when they have a large effect in maximizing homology. Introduction of additional insertions/deletions in the - 25 to - 1 region would slightly improve homology. The T-A-T-A sequences have been boxed, as well as the ATG codon present in the X gene leader exon. The starred bases correspond to possible RNA start sites for X and Y genes as detected by S, mapping (Fig. 2) or reverse transcription (Fig. 7), and to the 3 cap sites found for ovalbumin mRNA by Malek et al. (1981). The first nucleotide of X and Y leader exons has been arbitrarily defined on the basis of its correspondence to the major ovalbumin cap site. The 2 formally possible splice sites in the X leader exon are underlined (see text). The hyphens have been omitted for clarity.

AUG codon in the X “leader” in phase with the protein-coding sequence in the results). following exons (Heilig et al., unpublished This experiment also gave additional support for the existence of microheterogeneity at the 5’ end of X gene transcripts (in this case X mRNA). The complete complementary DNA, when analyzed by polyacrylamide gel electrophoresis, gave rise to two major bands, and three or four minor ones above them, with a one-nucleotide spacing between two successive bands (Fig. 7. lanes 2 and 3). Comparison with the length marker (lane 1) or analysis of the sequence ladder (lanes 4 to 7) shows that the strong lower band corresponds to the A taken as first nucleotide of the X gene sequence in Figure 6.

(d) Repetitive

sequences

upstream

from

the Y gene

We have previously localized repetitive sequences within a immediately upstream from the Y gene (Heilig et al., 1980). It was of whether the characteristic T + C-rich sequence at nucleotides - 460 the initiation site of the Y gene (Fig. 4) corresponds to part, or all, of element.

1.4 kb region interest to see to - 148 from this repetitive

R. HEILIG.

I

R. MURASKOWSKY

2

3

4

ANT)

5

6

.I.-1,.

MA?;I)HI,

7

. . . C A A A A A G ---A T G

G

5’ END

REGIOS

OF OVAl,HI’MIN

FAMI1.Y

GENES

II

Sick-translat.ed t,ot.al chiekcn T)S:1 was hybridized to resfri+ction fragments originating from the Xbal-Hind111 fragment of gene Y and immobilized on L)HM paper (Alwine et al., 1977). Hybridization signals were detected in several fragments (Fig. 8(b)) allowing us to map repetitive sequences within the 400 bp ;1c&I”nu4HI fragment which contains the T+C-rich sequence (region II, Fig. 8(c). line c). Another repetitive element (I), with no striking base composition, is located upstream, as demonstrated by the hybridization to a 280 bp DdeI fragment and a 263 bp HinfI fragment (Fig. 8, lanes 2 and 3). (The lack of hybridizat’ion to any of the Mb011 fragments (lane 1) can be accounted for. in the case of the T+C-rich region (II), by the very small size of the restriction fragments. The absence of signal for the two adjacent 296 and 440 bp MhoII fragments which span region I indicates that the two repetitive elements I and II are non-contiguous, and suggests that the JZDoII site at position - 878 cleaves element T in two parts, each of them being too short’ to give a detectable hybridization signal.) It should be noted that the hybridization signals elicited by both regions I and II are much weaker than those obtained under the same conditions with other repet’itive regions in the area of the X and Y genes (see Heilig et al., 1980). That this might be due, at least in part to poor homology bet’ween repeats. is suggested by the preferential disappearance of bands specific to the T+C-rich region when the filter shown in Figure 8 was subjected to washings of increasing st’ringency (results Ilot’ shown). Since simple sequences which are internally repetitive might be hot spots for recombinational events (Slightom et al., 1980), either in the chicken genome or in t’he bacteria that harbour the cloned gene, we compared the restriction map in the vicinity of the T + C-rich sequence in DNA from three different chickens and in two independently cloned Y genes. The Fnu4HI fragment used as a probe (Fig. 4. positions - 64 to + 387) is devoid of repetitive sequences and allowed det’ection of fragments spanriing the T+C-rich region. The length of HinfI and XbaI-Hind111 fragments detected by this probe (see Fig. 8) appeared identical in all Dill’;\ samples, which suggests that there has been no major sequence re-organization in this area (results not shown).

Fro. 7. Determination of the splice junction between exons 1, and 1 in the X gene. A Pnu4HI-HaeIII fragment (99 bp long) entirely contained within exon 1 was isolated from the 44 kb. EcoRI X gene fragment (Heilig et al.. 1980). 5’ end-labelled and then subjected to electrophoretical strand separation. The strand complementary to X mRNA was identified by sequencing and was used as a primer for complementary DNA synthesis using reverse transcriptase. 3 pmol primer were hybridized to about 3 pg X mRNA partially purified by one cycle of hybridization selection (LeMeur et al., 1981). After elongation with reverse trauscriptase the full length transcript was isolated by electrophoresis on an 8% polgacrylamide/8 M-urea gel and sequenced using the method of Maxam & Gilbert (1980). Lane 1, length marker corresponding to a 5’ end-labelled MapI digest of pBR322. Lanes 2 and 3, analysis of the total products of primer elongation in 2 independent experiments, with 8 and 16 units of reverse transcriptase, respectively. On the original autoradiogram. the major band corresponding to the complete reverse transcript is clearly seen as a doublet. Lanes 4 to 7. chemical sequencing reactions performed on the full transcript (G. G + A, A > (’ and T+ C reactions, respectively).

b

3.

2. I

'.

I

III

Uli

Ul

I

I

I

Lc-

Transcription 3’

---i+ I

HmdUI

lntron A

w

FIN:. X. Location of repetitive sequences in the 5’.flanking region of the Y gene. The XbnI-Hind111 fragment which contains the L exon of the Y gene was digested with various restriction enzymes and the samples (0.4 pg DNA/track) were electrophoresed on a 2.50/6 agarose gel and transferred onto diazobenzyloxymethyl (DBM) paper (Alwine et al.. 1977; Wahl et ctl.. 1979). The blotted paper was then hybridized for 18 h to about 40 x lo6 cts/min of total nick-translated chicken DNA, in 4096 formamide. @9 M-NaCI. 50 mM-sodium phosphate (pH 65). 8% dextran sulfate at 42°C. The paper was subsequently washed in 1 x SSC (SSC is @15 M-NaCl. 0015 kr-sodium citrate. pH 7.0). @l% ( w / v ) so d’ mm dodecyl sulfate at 53°C and subjected to autoradiography with an intensifying screen at -80°C for 18 h. (a) Ethidium bromide staining pattern of the agarose gel. The restriction enzymes used were: lane 1, MboII: lane 2, D&I plus Accl: lane 3. HinfI plus -4~1: lane 4. TugI plus E’nu4HI: lane 5. f’auI1 plus AccI. The size (in bp) of the Mb011 fragments is given on the left. (b) Hybridization pattern : lanes 1 to 5 are as in (a). Two bands in lane 3 of 960 bp and 330 bp and one band in lane 2 (of 880 bp) correspond to incompletely digested fragments which are barely visible with ethidium bromide. (c) Map of the XbaI-Hind111 fragment (part a) and location of the sites for the restriction enzymes used in this experiment (part b). The maps were established from the analysis of the sequence in Fig. 5 and are in agreement with the number and sizes of fragments determined experimentally (a). Lines 1 to 5 correspond to digests shown in the corresponding lanes in (a) and (b). Thin and heavy tines correspond to weaker and stronger hybridizing bands. respectirel>-. Approximate location of the repetitive regions I and II as deduced from this experiment (part c).

296 290

390

440

562

Xba I

TfC-rich ... .... .. ....

5’

nnd in three other egg white protein

genes

A computer program uas used to search for sequence homologies arising from the gene duplication events in the ovalbumin gene family and to try to identify common sequence elements in six chicken genes under steroid hormone control. This program is a modification. devised b! R. Fritz, of the SEQFTT program of Staden (1977). and allows one to compare all strings of bases of predefincd length. drrivcd from a given sequence. to all positions of a second sequence tile (or ti positions spaced by a specified increment. if the sequences to be analyzed are t,oo long). When two strings show an homology superior or equal to a spccitied percentage. their positions in the two sequence files are printed. The 5’.flanking regions of the X. Y and ovalbumin genes were compared pairwise. The search was performed for 35 bp strings, with a minimum homology of 6( P,) and a position increment of 5. Under these conditions. 40 to 100 tits were obtained for each gene pair. When the corresponding posit.ions were plotted on a t\~o-dimctllsional map. no consistent pattern emerged for the sequences located upstream from the -80 position (results not shown). Using the same program. we looked for sequence homologies between the 5’flanking region of six chicken genes which can be induced by steroid hormones in the oviduct: in addition to t.hc sequence data presented here. we used t.he information available for the ,Cflanking region of conalbumin (ovotransferrin) ((‘ochet et ul.. 1979: E. Mulvihill. personal communication), ovomucoid (Lai et ul., 1!)79: I’. Gerlinger. personal communication). and IysozYme (M. Grez, G. Schiitz, .\.
hetmnprwit,y

of ?( and Y R-Y.4 start sites

S, mapping has allowed us to locate the sequences which correspond to the 5’ of the X and Y precursor KS.& which contain intron A t.ranscripts. These sequences are thus likely to define the sites for initiation of transcription of the two gcncs (see Brrathnach & (‘hambon. 1981). The multiple bands obtained in the S, mapping experiments suggest that some microheterogeneit~ might occur at the 5’ end of X and Y gene transcripts (Fig. 2). Additional evidence was obtained from the analysis of reverse transcripts in the case of X mRS.A (Fig. 7). Roth methods indicate that, X gene transcri1)t.s extending one to three or four nucleotides upstream from the A in position + 1 might exist. in r:i~o (Fig. 6. the starred nuclcotides locate the possible start sites). However. artifacts caused by the enzymes used cannot be ruled out, and only a direct analysis of cap sequences could cmds

14

R. HEILIG,

R. MURASKOWSKY

ASI)

J.-L.

MANUEL

prove the multiplicity of mRNA start sites. Using the latter approach, Malek et al. (1981) have indeed demonstrated the presence of three different cap structures in ovalbumin mRNA, an indication that initiation can take place at several nucleotide positions (starred in Fig. 6), even for genes which have a TATA box appropriately located upstream from the start site. A similar phenomenon has also been found for viral messengers (Kahana et al.. 1981 ; Baker & Ziff. 1981). (b) Comparison

of X. T and ovalbumin gene seyue’ncrs in the promoter and leader region

Since our experiments did not allow us to unambiguously determine the exact 5’ end(s) of the transcription unit of X and Y genes, we have defined them on the basis of sequence homologies with the ovalbumin gene (Fig. 6). The position + 1 corresponds to the major cap site in the ovalbumin gene and to the smallest resistant fragments obtained in the Si, mapping experiments (Fig. 2). Comparison of the sequences of the three genes in this area shows several interesting features. The T-A-T-A-T-A-T sequence which is found at position - 32 to - 26 (with respect to the transcription start site) in the ovalbumin gene is found at the same location in the X gene. It is also present in the Y gene, but with a one nucleotide change (T-A-T-A-T-T-T), and at position - 31 to - 25. The gene X and ovalbumin sequences thus conform to the model T-A-T-&$A$ sequence (the Goldberg-Hogness or TATA box) which has been demonstrated to be part of the promoter sequence for eukaryotic RNA polymerase B (see Breathnach & Chambon. 1981). The T residue that is present at the sixth position in the gene Y sequence is quite uncommon among functional TATA box sequences which have been studied up to now (see Breathnach & Chambon, 1981). There is thus no apparent correlation between the exact sequence of the TATA box and in Vito transcription efficiency since X and ovalbumin genes. which share an identical T-A-T-X sequence, are transcribed at efficiencies which differ by 20-fold. while X and 1’ genes which have different T-A-T-A sequences share similar levels of expression. Sequence similarity between X: Y and 011 genes persists for about 50 nucleotidrs preceding the TATA box : notably a 17 bp G-rich region and a C-,\-A-A-6 sequence around -80 are common to the three genes (see Fig. 7). The latter sequence is somewhat related to a prototype sequence which has been proposed to occur a,t -70 to - 80 with respect to the mRNA start site of several eukaryotic genes (Benoist et al., 1980). Sequence homology in this region appears to be especially well-defined for globin genes, where the C-C-A-A-T box appears in 12 members of the family (Efstratiadis et al., 1980), and might be important for transcription irr ?*i~ (Dierks et al., 1981). The region between the TATA box and the leader exon sequence shows some conservation between X and Ov genes, but not between Y and Ov genes. The overall homology, in each gene pair. for the leader sequence is about 50% (X/Y, X/O?:) and 70% (Ov/Y) excluding a three-nucleotide deletion in the ovalbumin sequence. and a one-nucleotide deletion in the X gene, This level of homology is less than that found at replacement sites in proteincoding sequences (70 to 85% homology ; see Heilig et al., 1980; and unpublished results) but is equivalent to the homology found at silent

AII ;\I’(; wdon is found within the “L” exon of the X gene which is in phase with thus rest of the protein-coding sequences in the ot.her exons (Heilig et ~1.. unpublished results). For Y and oralbumin genes. the initiation codon is located in (‘x011 I (this Al’(: codon is also conserved at t,hc same place in the X gtwe). .\wortling to the scanning hypothesis (Kozak. 1!#7H).the ..\tTC cwdon in the L axon of thv S genre should he used as an initiation co&m and t.he X protein would thaw havcx an additional nine-amino acid sequence at its S terminus. ((a) I)iwryencr

of Ihr 5’-,filnkitLy

s~qrr~nc~s

rrpstrwtn

from

the -80

reyion

The scyuences of the three genes cannot he aligned in a convincing manner upstream from the -80 posit,ion. This divergence is strikingly illust.rated by the prcwnw of a pwuliar T + (‘-rich sequence located from nucleotides - 160 to - I18 in the 1’ gene (Fig. 4). which has no counterpart in the two other genes. This seclu~‘n~~eis composed of Bo$j ‘I’. 88 C’, and only 11 A and two G residues. and wrrrsponds to one of the two repetitive elements located in this area (Fig. 8). It tiws not, appear t.0 be composed of repetitions of a simpler sequence. in contrast to the satellite-like sequences which have been found 3 kh upstream from the 5’ end of t hv cwlalhumin gene and in the (’ intron of t.he X gene (I,. Maroteaux Kr ft. Heilig. unpublished result,s). ‘I’hcs lwk of analogy bctwwn the three gene seque~~:rs upstream from t,hr lwomotw wgion has been confirmed by a computer comparison of all the sequences wlwrtc~ti here. Alt.hough some sequences 30 to 40 nucleotides hg were found which show 60 to 70”;, homology in a gene pair (see for instance the 65?0 homologous 4!) hp wgments st.artinp at positions - 1085 and - 1278 in the X and ovalbumin gcwes. respectively : other. results not shown) no consistent pattern emerged. This cannot Iw t.aken as an indication that the 5’ limit of the duplication unit lies close to the I)romotrr of the ancestral gene. since sc~quenw c:onser\-ation would not be c~xlwtt~ci for duplicaated Iioll-l)roteill-codi,l~ regions in the absence of selective pwssuw. Srqwnces of introns at homologous positions in the three genes. for resemhlanw (Heilip PI nl.. l!MO: and c~xiimI)l(~. do not gcwerally bear illl! unpuhlishr~d rvsults).

(40mfn~trr wawh of other spwific: srquenw elements did not reveal large direct or invcrtvd repeats within these sequenced fragments (the location of some of the palindromic* sequenoes which were found are given in Figs 3 and 5). An exact I I bp repeat of the X gene T.&T-:\ sequence was found. however. located at the -7760 Iwsition (Fig. 3). In order to determine whether this seyuenw is used in Gw in t.he oviduct to promote transcription. we performed an S, nuclease mapping c~xpwirnent analogous to t,hose described in the legend to Figure 2. Indeed. we could detect a band wrresponding to a 311 bp resistant hybrid (Fig. 9. lanes I to 3) labelfed at the Fn,lr4HI sit,e located at position -421 (Fig. I. line b). The signal

16

R. HEILIG. I

R. MURASKOWSKY 2

3

G

A+G

ANl) AX

J.-L.

MASDEL T+C

C

FIG. 9, Existence of an additional site of transcription initiation in the 5’.flanking region of gene X. X single-stranded Fnu4HI-PstI fragment labelled at its 5’ end (Fig. l(b)) was prepared and used for S, nuclease mapping as described in the legend to Fig. 2. Lanes 1 to 3. digestion with 800. 1600 and 3200 units of 8, nuclease, respectively. An exposure with an intensifying screen is shown for these 3 lanes. while the sequence pattern was obtained without a screen. The nucleotide position corresponding to the major band in lane 3 is indicated by a star. If an RNA species was transcribed under the control of the TA-T-A sequence present at position -706, one would observe a band at the position marked by the arrow.

obtained was less intense than the signal obtained, under similar conditions, from hybridization with X or Y precursor RNAs. This experiment demonstrates that the TATA box at -760 is used in the oviduct as a transcription initiation signal for a polyadenylated RNA. We do not know whether the product of this transcription is a functional RNA or whether this result simply reflects a background level of initiation transcription at any T-A-T-A like sequence in a chromatin “domain” which is in an active configuration. However, no RNA species was found in the same experiment which would correspond to the T-A-T-A-A-T-A sequence at position -706. Such a species would have produced an S, nuclease-resistant fragment migrating at the position marked by an arrow in Figure 9. Several T-8-T-A-like sequences are also found in the ovalbumin 5’.flanking sequences, between position -870 to -700 (see Fig. 5). Breathnach et al. (1980)

5’ END

REGION

OF OVALBUMIS

FAMILY

GENES

17

reported that, in mouse L cells transformed-with the ovalbumin gene, a specific transcript was made with a 5’ end located around the - 680 position (while no RlVA with a normal 5’ end was detected). It is possible that one of the upstream T-A-T-Alike sequences is used preferentially in L cells (most likely the one located at - 700 (C-A-T-A-A-A)) which resembles globin promoter sequences (Efstratiadis et al.. 1980).

5. Conclusion Our results demonstrate that the duplication events which gave rise to the ovalbumin gene family in chicken included both the “promoter” region and the first “leader” exon, although the 5’ end of the duplication unit cannot be determined. Since then, mutations have modified the sequences involved in the expression of these genes. One exa,mple is the presence of a new initiation codon in the leader exon of the X gene. Evolution has also resulted in different levels of expression of the three genes and in altered hormonal responses (LeMeur et al.. 1981). It was initially hoped that comparison of sequences of genes which share comparable regulation of expression might reveal the nucleotide signals involved. Such comparisons have demonstrated the existence of model sequences for splice signals (Breathnach et aZ., 1978: Seif et al., 1979) and of the T-A-T-A sequence (Goldberg, 1979) later shown to be part of the promoter of many genes (see Breathnach & Chambon, 1981). The size of such consensus sequences is, however. generally small (no more than 7 nucleotides) and wobble is allowed at most positions. These sequences could thus be recognized only because of their presence at a fixed distance from a predetermined functional element (start site foi transcription or exon-intron junction). In the present case, nothing is known about the location of the putative regions involved in the hormonal control of transcription. One cannot even eliminate the possibility that they might be located for instance in the 3’.flanking regions as suggested for elements controlling the expression of human foetal globin genes (see Maniatis et aZ., 1980). It is thus not too surprising that a sequence comparison in six chicken genes, which are under similar steroid hormone control in the same organ, has shed no light on the regulatory elements involved in this regulation. A functional assay where at least part of the control of transcription can be reproduced with the cloned genes is therefore needed. at the present time, the mechanisms of glucocorticoid regulation of the expression of rat azI? globulin (Kurtz, 1981) and of a mouse mammary tumor virus (Buetti $ Diggelmaml, 1981 ; Hynes et al., 1981) appear amenable to such analysis. using 1, cells transformed with the cloned genes. The finding that the progesterone receptor complex from chick oviduct is able to interact with specific DNA sequences from several egg white protein genes, suggests that in vitro studies might be useful in defining control regions in these genes (Mulvihill et aZ., 1982).

We thank Mrs C. Kloepfer for excellent technical assistance, Mr R. Fritz for the computer programs, Dr M. Bellard and Dr M. LeMeur for useful advice, M. C. Chanal, B. Boulay and J. M. Gamier for help at various stages of this work and Dr B. Davison for editorial comments. We are grateful to Professor P. Chambon for helpful discussions, encouragement

R. HEILIG.

18

R. MI’RASKOWSKY

.4SI)

.1./l,.

I\IANI)EI,

and support. This work was supported by grants from the (‘NRS (ATP 006520/50 and ATP endocrinologie). from the Association pour le D&eloppement de la Recherche sur le (‘ancer. from the Fondation pour la Recherche M6dicale Franvaise and from the Fondation Simone et Gino del Duca. REFEREriVES Alwine. J. (‘.. Kemp. I). J. K- Stark. (:. R. (1977). /‘tw. Snt. .-lcrrd. Sci.. l:S..-l. 74. 5350 53<51. Baker. (‘. (‘. & Ziff. E. B. (1981). J. Mol. Mol. 149, 189~221. Benoist, (‘. 8: (‘hambon, I’. (1981). Satwp (London). 290, 304-310. Benoist, (‘., O’Hare. K., Breathnach. R. Hr (‘hambon. P. (1980). ,\‘trcl. =Icids Krs. 8, 127 -142 Berk. A. J. & Sharp, P. A. (1978). Proc. Sal. Acad. Sri., f:S..4. 75, 127&1278. Breathnach. R. & Chambon. P. (1981). =I~,iu. Ru7:. Riochrm. 50. 349-383. Breathnach. R., Benoist, C., O’Hare, K.. Gannon. F. & (‘hambon, I’. (1978). Proc. Sat. ,.l~r~tl. Sci., f.A’.A 75, 4853-4857. Breathnach. R., Mantei. S. & Chambon, P. (1980). Proc. ,Ynl. .-lwd. Sci.. r’.S..-l_ 77, l(b46. Buetti, E. & Diggelmann, H. (1981). Cell, 23. 335-345. Cachet. M., Cannon, F.. Hen, It., Maroteaux. I,., Perrin. F. & (‘hambon, P. (1979). ,Vattrrr (London).

Colbert.

282,

567-574.

D. A.. Knoll, B. ,J., Woo, S. L. (‘., Maw, M. I,.. Tsai, %I. J, K- O’Malley, B. 1%‘. (1980). Riochemistry, 19, 5586-5592. Dierks, P.. Van Ooyen, A., Mantei. N. 8 Weissmann. (‘. (1981 ). I’roc. Snt. =1cad. Sci., I’.S.=l 78, 1411 -1415. Efstrat,iadis, A.. Posakony, J. W.. Maniatis. T., Lawn. K. M.. (‘onnt~ll. (‘. 0.. Spritz, R. A.. Deriel. J. K., Forget, B. G., Weissman. S. M., Slightom. .I. L.. Blechl. A. E., Smithies. 0.. Ba.ralle, F. E.. Shoulders. (‘. (‘. NL Proudfoot. N. *J. (1980). (‘~11. 21. 653468. Cannon, F.. O’Hare, K ., Perrin. F.. LePcnncc, .J. I’.. Benoist. (‘., (‘ochet, M.. Breathnach. (Londojl). 278, 42% R., Royal, A., Garapin. A.. (‘ami. B. Kr (‘hambon, I’. (1979). Satwr 434. Cannon, F.. .Jeltsch. .I. M. Nr Perrin. F. (1980). .1’1tcl. .-lcids Krs. 8. 4405~4421. Goldberg, M. L. (1979). Ph.D. thesis. Stanford University. Hager. L. J ., McKnight. G. 8. Kr PaImit)er. R. D. (1980). .I. Rio/. (‘hum. 255. 7796-7800. Heilig, R., Perrin. F.. Gannon, F.. Mandel. J. L. & (‘hambon, P. (1980). (‘pl/. 20, 625.-637. Hynes, N‘. E.. Kennedy, N.. Rahmsdorf, U. ff (ironer. H. (1981). Proc. Sat. .-lrad. Sci.. C’.S.il. 78. 2038-2042. Kahana. (‘.. Gidoni. I).. (lanaani, T). & Groner. Y. (1981). J. l.iro/. 37. 7 -16. Kozak, M. (1978). 011. 15. 110!+1123. Kurtz, D. T. (1981). Suture (London), 291. 6294i31. Lai. E. (‘.. Stein, J. I’.. (‘att,erall, .J. F.. Woo. S. L. (‘.. 41aw. 11. I,., Means. A. R. & O’Mallry. B. iv. (1979). (‘rll. 18. 829-842. LeMeur, 11.. Clanville. N., Mandel, .J. L.. (ierlinger. P., Palmiter. R & (‘hambon, I’. (1981 ). C’rll, 23, 561~57l. Malek, I,. T.. Eschenfeldt. W. H.. Munns. T. W. & Rhoads, R. E. (1981). .\‘uc/. dcids Krs. 9. 1669%1673. Maniatis, T.. Fritsch, E. F’., Lauer. ,I. & Lawn. R. &I. (1980). .-l,,,r,r. Ke/,. (+wLc/. 14. 1-kp17X. Maxam, A. M. & Gilbert, W. (1980). Method s in Enzymology, Nucleic Acids Part I (Grossman. I,. & Moldare, K.. cds), vol. 65, pp. 499%X0. ilcademic I’ress. Sew York. Mulrihill. E. R., LePennec. J. P. & Chambon, P. (1982). (‘r/l. in the press. McKnight, (;. S. $ Palmiter. R. D. (1979). J. Riol. (‘hem. 254, 9050--905X. Royal, A., (iarapin, A.. (‘ami. B., Perrin, F.. Mandel. .J. T,.. LeMeur. M.. Bregerere. I:.. Cannon. F.. LePennec. J. P.. (‘hambon, P. 8: Kourilsky. P. (1979). Sat?cw (Londo?t/, 279, 125. 132. Seif. I., Khoury. G. 8: Dhar, R. (1979). .VUC/. =1cids Krs. 6. 3387.-3397. Slightom. J. I,.. Blechl, A. E. & Smithies. 0. (1980). (~‘~11.21. 627432.

5’ ESI)

KE:(;loS

OF O\~.AI.I~I’\IIS

F.-\!Illl.Y

(;ESES

I9

lx. (1977). .Vuc/. .-Icida Hrs. 4;~4o:wwtl. Wahl. (;. M.. Stern, 31. & St.ark. G. K. (1979). I’ror. Snt. .-lcud. Sri.. I’.S.,I. 76. 36K3-3687. Kcaavc-r. Ii. P. C%Weissmann. (‘. (19i9). Std. .-1ds Nrs. 7. I I73 -1193.

staAiden.

Ediled

by S. Hrcnttrt

.V& a&cd in proof: Some sequences around the transcription initiation site of X and j’ genes have recently been reported by Knoll et al. (1981). The sequences of the 5’ flanking region of the chick lysozyme gene have now been published by Grew ef al. (1981). together with comparisons with ovalbumin and conalbumin flanking sequences. (:r~z. 11.. Land. H.. Giesecke. K.. Schlitz. (i.. Jung. A. & Sippel. A. E. (1981). (‘e/I. 25, 74% 752. Knoll. 13.J.. LVoo. S. L. C’.. BeattIe. W. & OXalley. R. M’. (1981). J. Biol. (‘hem. 256. 79497953.