Sequence homologies in the mouse protamine 1 and 2 genes

Sequence homologies in the mouse protamine 1 and 2 genes

Biochimica et Biophysica Acta 950 (1988) 45-53 Elsevier 45 BBA 91806 Sequence homologies in the mouse protamine 1 and 2 genes Paula A. Johnson a, J...

691KB Sizes 0 Downloads 37 Views

Biochimica et Biophysica Acta 950 (1988) 45-53 Elsevier

45

BBA 91806

Sequence homologies in the mouse protamine 1 and 2 genes Paula A. Johnson a, Jacques J. Peschon b, Pamela C. Yelick Richard D. Palmiter b and Norman B. Hecht a

a,

a Department of Biology, Tufts Unioersity, Medford, MA and t, Howard Hughes Medical Institute, Department of Biochemistry, Unioersity of Washington, Seattle, WA (U.S.A.) (Received 23 July 1987)

Key words: Protamine gene; DNA sequence; Gene regulation; (Mouse)

To identify candidates for cis-acting sequences that regulate the stage and cell-specific expression of the two coordinately regulated protamine genes in the mouse, genomic clones were isolated and the nucleotide sequences of the 5' flanking regions and coding regions were compared. Unlike most histone genes and the multigene family of trout protamine genes which are intronless, each mouse protamine gene has a single, short intervening sequence. Although the coding regions do not share significant nucleotide homology, the 5' flanking regions contain several short homologous sequences that may be involved in gene regulation. An additional shared sequence is present in the 3' untranslated region surrounding the poly(A) addition signal in both genes.

Introduction Protamines are small, arginine-rich proteins that replace histones and transition proteins on the DNA during the later stages of spermatogenesis in many vertebrates [1-3]. These basic proteins enter the nucleus and aid in the compaction of the mammalian sperm head by neutralizing the charge of the DNA. In contrast to the trout protamines, which are quite similar to one another and appear to have arisen from a single primordial gene [4], the two mouse protamines (mP1 and mP2) are very distinct, mP2 is initially synthesized as a larger protein which is processed to produce the mature protein found in spermatozoa [5], whereas mP1 is initially synthesized as the mature protein [7]. Mature raP2 contains 13 histidines and lacks tyrosine, while mP1 contains three tyrosines and lacks histidine [5,7]. Both proteins have several Correspondence: N.B. Hecht, Department of Biology, Tufts University, Medford, MA 02155, U.S.A.

clusters of arginine and contain cysteine interspersed throughout the molecules. At the nucleotide level, however, the cDNAs share little homology and do not cross-hybridize. By the criterion of protein gel electrophoresis, the spermatozoa of most mammals examined appear to have only one protamine [6,8-16]. Exceptions are the mouse [6,17] and hamster (Ref. 18; Balhorn, R., personal communication) which each have two, and the human [19-24], which has three. The amount of protamine 2 in spermatozoa varies from about 70% of the protamine in the mouse [6] to an undetectable level by gel electrophoresis in the rat [14] and most other mammals. Although the reason for this variability is not understood, apparently there is much flexibility in the number and/or type of proteins needed to compact the DNA in sperm. The genes for the mouse protamines axe present as single copy genes on chromosome 16 [25]. They are transcribed only in the testis and are translationally regulated [26]. By Northern analysis,

0167-4781/88/$03.50 © 1988 Elsevier Science Publishers B.V. (Biomedical Division)

46 mRNAs for both genes are first detectable in round spermatids, the first haploid precursors to mature spermatozoa [27]. These stable, very abundant mRNAs are then stored for up to 8 days before being translated in the elongating spermatids. A shortening of the poly(A) tail coincides with the presence of the mRNAs on polysomes [26]. To examine the coordinate expression of the mouse protamine genes we have sequenced and compared their flanking regions and have identified several sequences which may be important for their proper expression. Materials and Methods

Isolation of mP1 and mP2 genomic clones The mP1 clone was isolated by Peschon et al. [28]. C57B1 mouse spleen DNA was cloned into lambda phage EMBL3 and 300 000 recombinants were screened with a pair of radiolabeled complementary oligonucleotides corresponding to nucleotides 232-287 of the mP1 cDNA [7]. Protamine 2 genomic clones were isolated by screening 2- 106 recombinants of a lambda phage library (partial MboI-digested Balb/C mouse embryo DNA in k Charon 28) with an mP2 cDNA clone [5] which was labeled by primer extension using random hexanucleotides as primers [29]. Specific activity of the probe ranged from 1 • 108 to 8.108 cpm/#g DNA. 50 positive plaques were identified in duplicate. Three of these clones were restriction mapped and characterized by Southern blot analysis [30]. Nucleotide sequencing The mP1 genomic clone was sequenced using the dideoxy chain termination method of Sanger et al. [31] as described by Peschon et al. [28]. Restriction fragments of the mP2 clones were subcloned into the M13 mpl8 and mpl9 vectors and the nucleotide sequences of the genomic subclones were then determined by the dideoxy method of Sanger et al. [31]. 7-Deaza-dGTP was used instead of dGTP to read through difficult compressions [32]. The entire mP2 nucleotide sequence was determined for both strands of one or more clones. All enzymes used in cloning and sequencing were purchased from New England Biolabs or Bethesda Research Laboratories.

[32p]dATP (800 Ci/mmol) was purchased from Amersham.

Sequence analysis The nucleotide sequences were assembled and compared using the programs of Mount and Conrad [33]. S1 nuclease mapping Total testis RNA was prepared using guanidine thiocyanate as previously described [27]. Singlestranded uniformly labeled DNA probes were prepared by Ley et al. [34]. Primer extended M13 clones were digested with a restriction enzyme that cleaves just 3' to the end of the insert and the labeled single-stranded DNA probe was separated from the unlabeled strand and the M13 DNA on a denaturing 5 M urea/5% acrylamide gel. The probe was electroeluted and hybridized to total RNA in a mixture containing 10 #g total testis RNA, 40000-80000 cpm of probe DNA (spec. act. = 108 cpm/#g), 40 mM Pipes (pH 6.4), 1 mM EDTA, 0.4 M NaCI, and 80% formamide. Hybridizations were carried out at 58°C for 18 h. S1 nuclease (BRL, 100-1500 units) was then added in a buffer containing 0.28 M NaC1/5.0 mM sodium acetate (pH 4.6)/4.5 mM ZnSO4 in a volume 10-times the hybridization volume. Samples were incubated at 37 °C for 1 h and precipitated with ethanol and products were analyzed by electrophoresis on 6 M urea/5% acrylamide sequencing gels. Exact sizes of the products were determined by comparison with a sequencing ladder that was run on the same gel. Primer extension Primer extension analysis was performed as described by Peschon et al. [28] by hybridizing RNA to an end-labeled oligonucleotide primer complementary to the mP1 sequence between + 33 nt and + 62 nt, relative to the mRNA cap site. The oligonucleotide was labeled using T4 kinase and y-[32p]dATP. Hybridizations were carried out at 42°C for 90 rain in 30 #1 of 660 mM NaC1, 22 mM Tris-HC1 (pH 8), 4.4 mM EDTA, 20 fmol end-labeled primer and 2 #g testis RNA. Reactions were diluted, precipitated, and resuspended in 30/~1 of 100 mM KC1, 100 mM Tris-HC1 (pH 8.3), 10 mM MgC12, 10 mM dithiothreitol, 250

47 ~tg/ml bovine serum albumin, 500/~M each dNTP and 7 units AMV reverse transcriptase (Life Sciences) and incubated at 42°C for 30 rain. Products were precipitated, denatured and analyzed on an 8 M urea/8% acrylarnide sequencing gel. Results and Discussion

Gene structure Mouse protamine 1 and 2 genomic clones were sequenced and compared with their respective cDNA sequences (Fig. 1). Each gene contains one intron. These introns are 94 nucleotides in mP1 and 105 nucleotides in raP2, distinguishing them from the intronless trout protamine genes. Jankowski et al. [35] suggest that the trout genes may have been transmitted horizontally to some fish by retroviruses [35]. This seems unlikely for the mouse genes because they contain introns. The 3' ends of the second exons were determined by the positions of the poly(A) tails in the eDNA clones. The 5' ends of the mRNAs were determined experimentally. For mP1, this was done by primer extension using an oligonucleotide complementary to part of the 5' untranslated region (see Materials and Methods). The mRNA start site is within one nucleotide of the site indicated in Fig. 1 [28]. The 5' end of the mP2 mRNA was determined by an S1 nuclease protection assay (see Materials and Methods). The single-stranded labeled DNA probe was made using an M13 mpl8 clone (containing the mP2 TaqI-PstI genomic fragment inserted into the M13 mpl8 Accl and PstI sites) as a template. A 32p_ labeled complementary strand was synthesized and the single-stranded fragment of interest was isolated by digestion with BamHI (see Materials and Methods). The 443-nucleotide-long probe therefore contains 9 nucleotides of M13 DNA on the 5' end and 31 nucleotides on the 3' end, and is homologous to the mRNA from the transcription start site through nucleotide + 68 in Fig. 1. With 750 units of S1 nuclease, two protected fragments of 68 and 71 nucleotide appear as bands of approximately equal intensity (Fig. 2, lane 1). When the S1 is increased to 1500 units, the fragments appear to be somewhat degraded, giving several smaller, lighter bands and some free nucleotides at the bottom of the gel (Fig. 2, lane 2). Therefore,

the mRNA start site is probably either the position indicated in Fig. 1, as deduced from the 68 nucleotide fragment, or 3 nucleotides upstream, as the 71 nucleotide band may indicate. It is also possible that both sites are used. The site indicated in Fig. 1 is consistent with the consensus mRNA start sequence Y-C-A-Y (in which A is position + 1, and Y is a pyrimidine) [36].

Sequence comparisons The nucleotide sequences for the mP1 and mP2 genes are shown in Fig. 1. The mP1 genomic nucleotide sequence differs from the cDNA sequence at three positions, denoted by X in Fig. 1. Two of the differences are in the 3' untranslated region (positions 433 and 475) and one is in the coding region (position 149, G instead of A). The nucleotide difference in the coding region does not change the amino-acid sequence. These changes are probably due to mouse strain polymorphisms, since the eDNA sequence was derived from a CD1 mouse, whereas the genomic sequence is from a C57BL/6 mouse. The mP2 genomic and cDNA sequences are identical, although they are from Balb/C and CD1 DNA, respectively. There are several sequence similarities between the two protamine genes. The optimal sequence for translation initiation by eukaryotic ribosomes (ACCATGG) as determined by Kozak is present [37]. Several promoter elements also appear to be conserved in the mP1 and mP2 5' flanking regions. The clement closest to the transcription start site present in both genes is the TATAA box (mP1 at - 32, mP2 at - 37; all positions indicated are nucleotide position of the 5'-most base of the sequence relative to the mRNA cap site). This sequence is found in most eukaryotic genes about 23-30 basepairs upstream of the transcribed region [36] and is thought to aid in site-specific initiation of RNA polymerase II transcription. Proceeding in a 5' direction the next regulatory elements, generally found 40 to 100 nucleotides upstream of the start site, are called upstream elements. These include 'CCAAT' boxes and 'GC' boxes and appear to be important for efficient transcription from many promoters [38,39]. At position - 8 1 , the mP2 gene has a 'CCAAT' box (AATCAATCA) which differs sightly from the consensus 'CCAAT" sequence (GGcXCAA~CT)

48 MOUSE PROTAMINE 1 GENOMIC SEQUENCE

H0USE PROTAMINE 2 GENOMIC SEQUENCE

-SSO -530 -510 GTCTAGIAAFGFCCAACACCTCCCICAGTCCAAACACIGCTCIGCAICCAIGTGGCICCC

-850 -830 -810 CTGTGGCCTGGCATGTGCCTAAGAACTGGACAAGTTGTGTTTCCCACTATAGCCCAGTGG

-470

-490

-450

ArTTATACCrGAAGCACTTGATGGGGCCTCAATGTTITACTAGAGCCCACCCCCCIGCAA -430

-410

-390

CTCTGAGACCCTCTGGATTTGrCTGTCAGIGCCTCACTGGGGCGTTGGArAATFTCTTAA

-790 -770 -750 CCACAGTAACAAATCAAACTATTATTTTCCAGCAAGAGTAGCCATCTACTGTGTACTCTG -730 -710 -690 CACTCTACCATAGACTGAAATTACTAGGTTGTGCGGTTGTGCACTTTTTGCAACTAGTTT

-3/0

-350

-330

-670 -550 -630 AGATCTTATTTAAGCTAGCACTACTAACTGCTCATTGTCCTGTGACAGGCTCAGAfTCAA

-310

-290

-270

-610 -590 -570 GAATTATGAGGGACTATGTCCCTCTGCCCCACTCCCACACCCTTCTGATTCTGGACTATT

-250

-230

AA~GTCAAGTTCCCTCAGCAGCATFCTCTGAGCAGTCIGAAGATGTGIGCT~TCACAGTT ACAAATCCAFGTGGCTGTTTCACCCACCTGCCTGGCCTTGGGTTATCTATCAGGACCTAG -210

CCTAGAAGCAGGIGTGTGGCACTTAACACCTAAGCTGAGIGACIAACTGAkCACTCAAGI -190

-170

-150

-110

-90

-490 -470 -450 GTAGGCAGACCTGGTTAGCCTAAGATCCAAGATAGGAAGTTAAGAAGTTAAGAAGCGCTT

UGATGCCATCTTTGTCACTTCTTGACTGTGACACAAGCAACTCCrGATGCCAAAGCCCTG -130

-550 -530 -510 CATCAGCAAGGTAGAGTTGAGAATCAACAAATGTCTCAGATATGCCTCAGGAAAGTAGAG

-430 -410 -390 ATCTGTGCCTCAGTTTCCCCCATGGGAAGATATGTTACACTGTCCTTAATGCACATGGAG

CCCACCCCTCTCAFGCCCATATTTGGACATGGTACAGGTCCTCACrGGCCAFGGTCIGTG AG~TCCTGGTCCTCTTTGACTTCATAATTCCTAGGGGCCACTAGTATCTATAAGAGGAAG

-370 -350 -330 TGGGTGGGAAGCTTTTGCTAGGAGTTGGTGAGTGCTCTCCTCGAATGCCCAACCAAGGCC

-10 +1 IO 30 AGGGTGCTGGCTCCCAGGCC ACAGCCCACAAAATTCCACCTGCTCACAGGTTGGCTGGC

-310 -290 -270 ATGCATGGGCTGCATCTCCAACACTGCATGTCCAGGGCATGGAAGCACACATATGTGATT

SO 70 90 TCGACCCAGGTGGTGTCCCCTGCTCTGAGCCAGCTCCCGGCCAAGCCAGCACC

-250 -230 -210 CCAACCCCTGGGAGGTAAAGGGACCAGGGTTAGAAGTTCAAGGTCArTCTTCATTGTGTG

-70

-SO

-30

ATG

IO0 120 140 GCC AGA TAC CGA TGC TGC CGC AGC AAA AGC AGG AGC AGA TGC CGC ALA ARG TYR ARG CYS CYS ARG SER LYS SEN ARG SER ARG CYS ARG CGF ARG

x 160 180 CGC AGG CGA AGA TGT CGC AGA CGG AGG AUG CGA TGC TGC CGG ARG ARG ARG ARG CYS ARG ARG ARG ARG ARG ARG CYS CYS ARG A

200 ?20 CGG AGG AGG CGA A gtaagtagagggct9ggctlggctgtggggggtgtggggtg ARG ARG ARG ARG A 250 270 290 cgggacttgggcatgtctgggagtccctctcaccacttttcttacctttctagGA TGC RG C¥S

3]0 330 TGC CGT CGC CGC CGC TCA TAC ACC ATA AGG TGT AAA AAA TAC TAG CYS ARG ARG ARG ARG SEA TYR THR ILE ANG CYS LYS LYS TYR * * * 350 370 390 ATGCACAGAATAGCAAGTCCATCAAAACTCCTGCGIGAGAATTTrACCAGACTICAAGAG 410

430

x

450

CAFCrCGCCACATCTrGAAAAATGCCACCGrCCGATGAAAAACAGGAGCCTGCrAAGGAA 470 x .. 490 V 510 CAArfiCCACCTGTCAATAAA~GTTGAAAACTCArCCCATTCCTGCCTCTTGGTCCTTGGG C 530 SSO S/O CITGGGGAGGGGIGCGCGGATGFGGTTAGGGAACATGACTGGTCAAATGGGAAGGGCNC 590 610 AAAAGAATTCCCAAFATTGACTACCAAGCCACCTGTACAGATCr

-190 -170 -150 AGGCCATCTCACATTCAATAAGTCAGCATGCTTCAAAGCAAGATGAGTAACTTGGCCCCT -130 -110 -90 AAGCCAGTCCTGCAAACCCTGTGCCGCCCTCACAGAGGGGACTGGGCAGGGrGGGAACAA

-70 -50 -30 TCAATCAGGGGTGGGCCGACAGGTCACAGTGGGGTTTACCTTTATAIATGAGCCCTCTGA -lO +I 10 30 GAGCCCCAAACACCAGACC ATCATCACCACCAAGAGCAGGTGGGCAGGCTTTCGTCCCr

SO 70 90 CCTCCTCCAATCCAGGTCAGCTGCAGCCTCAATCCAGAACCTCCTGATCTCCTGGCACC rio 130 ATG GTT CGC TAC CGA ATG AGG AGC CCC AGT GAG GGT CCG CAC CAG VAL ARG TYR ARG NET ARG SER PRO SER GLU GLY PRO HIS GLN 150 170 GGG CCT GGA CAA GAC CA[ GAA CGC GAG GAG CAG GGG CAG GGG CAA GLY PRO GLY GLN ASP H|S GLU ARG GLU GLU GLN GLY GLN GLY GLN ZOO 2~0 GGG CTG AGC CCA GAG CGC GTA GAG GAC TAT GGG AGG ACA CAC~AGG GLT LEU SER PRO GLU ARG ¥AL GLU ASP TYR GLY ARG THR H[S*ARG 240 260 GGC CAC CAC CAC CAC AGA CAC AGG CGC TGC TCT CGT AAG AGG CTA GLY HIS HIS HIS HIS ARG H]S ARG ARG CYS SER ARG LYS ARG LEU 290 310 CAr AGG ATC CAC AAG AGG CGT CGG TCA TGC AGA AGG CGG AGG AGA HIS ARG ILE HIS LYS ARG ARG ARG SER C¥S ARG ARG ANG ARG ARG

330 350 370 CAC TCC TGC CGC CAC AGG AGG CGG CAT CGCAGA G g t a a g c a c c c c a c a 9 HIS SER CYS ARG HIS ARG ARG ARG H|S ARG ARG G 390 410 430 cc9accccctggccacctgtgctgctgctgcccatctaaaccctgctgccttccaggca9 450 cctagcaaacctcgactttcctttctacagGC

Fig. 1. Nucleotide and predicted amino-acid sequences of the mP1 and mP2 genes. The bases are numbered relative to the m R N A cap sites. X over a base indicates that the genornic sequence differs from the e D N A sequence at that position (see text). The c D N A nucleotide is shown beneath the genomic nucleotide. A v sign shows the position of the poly(A) tail in each mature m R N A . The intervening sequences are in lowercase letters.

470 TGC AGA AGA TCC CGA AGG AGG LY CYS ARG ARG SER ARG ANG ARG

490 510 AGG AGA TGC AUG TGC AGG AAA TGT AGG AGG CAC CAT CAC TAA ARG ARG CYS ARG CY5 ARG LYS CYS ARG ARG HIS HiS HIS * * * 540 560 580 GCCTCCCCAGGCCTGTCCATTCTGCCTGGAGCCAAGGAAGTCACTTGCCCAAGGAATAGT 500 6ZO 640 CACCTGCCCAAGCAACATCATGTGAGGCCACACCACCATTCCATGTCGATGTCTGAGCCC 660 580 700 TGAGCTGCCAAGGAGCCACGAGATCTGAGTACTGAGCAAAGCCACCTGCCAAATAAAGCT V TGACACGAG

49

Fig. 2. $1 nuclease mapping of the mP2 mRNA transcription start site. Lane 1:10 /~g total testis RNA, 750 units S1 nuclease; Lane 2:10 /tg total testis RNA, 1500 units S1 nuclease; Lane 3: Undigested DNA probe, (443 nt long). The adjacent sequencing ladder was used to size the protected fragments. The sizes of the prominent bands are given to the right of the ladder in nucleotides. The gel is shown in three sections in order to show the top, middle, and bottom of a very long gel. There were no bands present in the excised regions.

[38]. The raP1 gene does not have any similar sequence. At position - 1 1 6 , the mP2 gene has a ' G C ' box ( C C G C C C ) but raP1 lacks this sequence. Several other sequences common to the 5' un-

translated regions of the mP1 and mP2 genes have been identified. The sequences that are in approximately the same position in both genes are depicted in Fig. 3 as the hatched and shaded boxes labeled X, A, B, C, D, E and F. An arrow from right to left indicates that the sequence is present on the complementary D N A strand in the reverse orientation. X ' and A' are sequences which are subsets of X or A, respectively. These shared sequences are listed in Fig. 4. Sequences C and D overlap in mP2. In mP1, however, the complement of D is present and is two nucleotides away from C. Some of these sequences may serve as alternative ' u p s t r e a m elements' which affect the efficiency of transcription of the mouse protamine genes. Possible candidates may be C C T C A C A G A (or the complement, T C T G T G A G G ) , which is found at - 8 6 in the raP1 gene and - 1 1 2 in the mP2 gene (Fig. 3, 4A, sequence E), or possibly T G G G C A G G , which is found at - 1 4 4 in the mP1 gene and - 9 7 in the mP2 gene (and also at + 22 and + 590 in mP2) (Fig. 3, 4A, sequence B). In the 3' untranslated region of the mP1 and mP2 genes, the sequence G C C A C C T G is present. This is a consensus sequence which exists between four binding sites within an immunoglobulin enhancer. In the immunoglobulin gene, three of these sites differ from this sequence by one nucleotide, while the fourth differs by two. Distinct factors have been shown to bind to each of these sequences [40]. It is curious that this consensus sequence is found in mP1 and mP2 immediately adjacent to the poly-(A) addition signal, in a larger conserved region (Fig. 4B). This region may function as a transcriptional enhancer in the protamine genes, or it may play a role in the poly(A) shortening or translational regulation of the mRNAs. It is not specific to mouse. It has also been found in the bull, human and rat protamine genes (Refs. 41, 42, 47, 48; Johnson, P., unpubhshed results). A similar sequence is also found in the 3' end of a gene coding for an 18.5 k D a androgen-dependent secretory protein in the rat epididymis [43] (Fig. 4B). Additional sequences which are conserved in the 5' untranslated regions of the mP1 and mP2 genes (shown in Fig. 3 and 4) may possibly serve as enhancers, increasing the level of transcription and contributing to the abundance of these

50

X

A

X' A'

F

C D

B

E

TATAA

ATO

,~Z/AATAA~

mPl I I

-560

Iml

-32 * I

raP2

96

198

293 339

I

494

626

ATCAAT X

A

F

CD

£B

I

TATATAT

I

-859

1

ATO

Z

PAT&tLA,

I

-79 -37 +1

103

358

464

525

717

Fig. 3. Structure of the mouse protamine 1 and 2 genes. This diagram depicts the regmns of the raP1 and raP2 genes that have been sequenced. Open boxes represent the amino-acid coding regions. Heavy lines represent the transcribed regions. Hatched and shaded boxes represent sequences that are found in the raP1 and mP2 genes in similar positions. These regions are also lettered and are listed in Fig, 4. An arrow under the box indicates that the sequence is present on the complementary D N A strand. Positions indicated in this figure are relative to the m R N A cap sites. The vertical solid bar in the raP2 coding region shows the position at which the protein precursor is processed to produce the mature protein [5].

Position (mPl, raP2)

A. x

-553, -530 X'

-444

Homology

C A C CC C AATGTC~A~A~ATGCCTCAG

15/21

GTCTCAGAgtTGC

-490, -448

A'

Nucleotide Sequence

A

GA'rG GA"PG G _

-415

TCaGTGCCTCA

11/13 15/21 10/I 1

B

-144, -97

TGGGCAGG

8/8

C

-199, -200

~GCCATCT

9/10 15/19

<--E

-86,

-112

CCTCACAGA

9/9 15/18

B.

Position (mPl, raP2) Z

460, 682

Rat K~ididymal Protein

Nucleotide Sequence A T TTG GA~CAAAGCCACCTGcCA (AATAAA )GcTTGA

Homology 18/24

caCACCTGTgCaAAATAAAGCITG

Fig. 4. Conserved nucleotide sequences in the raP1 and raP2 genes. These sequences are found in similar positions in the two mouse protamine genes. They are diagrammed in Fig. 1 relative to the rest of the gene. In positions where the sequences differ, both nucleofides are shown with the raP1 nucleotide over the raP2 nucleotide. (A) These sequences are found in the 5' untranslated regions of both genes. X ' and A' are subsets of X and A respectively. In these sequences, capital letters mean that the nucleotide matches either the raP1 sequence or the raP2 sequence. Small letters do not match either. An arrow underneath the position indicates that the sequence is found on the complementary D N A strand. (B) The sequence Z, depicted here, is found in the 3' untranslated region of the protamine genes, surrounding the poly(A) addition signal (shown in parentheses). The underlined sequence is a consensus sequence for an immunoglobulin heavy chain gene enhancer (see text). The rat 18.5 kDa secreted epididymal protein contains part of the Z sequence [42].

51

Sequence

Position in mPl mP2

AGCACACAT

-338

Sequence also found in gene for:

Position

-276

Rat Hlt Histone

-138

<---

GGTCK3GCAG

-143 <--

+20

Rat Hlt Histone

+787

C ~ G

-10 <--

-253

Human PGK2

-1307

-510,-311 <-- <--

-18,+121

Human PGK2 (2X)

-i004 * ,-640 ,

GCCTC2dkT

-475

+66

Human PGK2

-iii0

•WCCTCTG

-434

-602

Human PGK2

-254 *

CC

-369

-600

Human PGK2

-252

GGGGTGGG

-455,-140 <--<--

-379,-72

Human PGK2 (2X)

-454 ,-390

~ACCA

-74 <--

-243

Trout Protean.he

-43*

-199

-200

Trout P r o t ~ i n e

-351

AGC

~cCA

C

CA'I~

Fig. 5. Sequences shared by the two mouse protamine genes and other testis-specificgenes (Refs. 4, 45; McCarrey, J., personal communication).An * indicates that the position is givenin nucleotidesrelative to the ATG start codon rather than the mRNA start site.

mRNAs. Repeated sequences that are present significantly more often than by random chance may play this role or have other biological significance. The sequence A G G T C C is repeated three times on the mP1 coding strand and once on the noncoding strand in the 5' untranslated region. The raP1 gene also c o n t a i n s the s e q u e n c e s C C T C A C T G G and A T C C A T G T G G C T repeated twice. T h e mP2 gene has the sequence T I ' A A G A A G repeated twice in tandem, in the 5' untranslated regions and has the complement of one of the mP1 repeats (CCTCACTGG) in the coding region at position +125. Several short inverted repeats were also found in the coding region of each gene. C G C C G T C G A G G C G appears at +138 in mP1 and TACACCATAA G G T G T A at position + 316. In mP2 these two inverted repeats exist: T G C C G C C A C A G G A GGCGGCA at + 3 3 1 , and G G T G C A G G A A A T G T A G G A G G C A C C at + 497. These two possible hairpin loops are on either side of the intervening sequence in each gene.

Comparison with other testis-specific genes The nucleotide sequences of the untranslated regions of the mP1 and mP2 genes were compared with the untranslated regions of the trout protamine genes [4], the human testis-specific PGK2 gene (McCarrey, J., personal communication), the sea-urchin testis-specific histone variant H2B-2 gene [44], and the rat testis-specific H l t histone variant gene [45]. Of the sequences which were found in a similar position in the raP1 and mP2 genes (Fig. 4), only sequence 'C' was identified in these other testis-specific genes. This sequence was found in the 5' untranslated region of the trout protamine genes. Other sequences, however, that were found in the mP1 and mP2 5' regions in seemingly random positions were also found in other testis-specific genes (Fig. 5). These elements may be necessary for testis-specific expression, but are not specific to expression during spermiogenesis, since all the other testis-specific genes examined are initially transcribed in earher stages of spermatogenesis than are the protamine genes.

52

Summary Protamine gene expression may be controlled at many levels. Different transcription control elements could be required for gene expression in the male as opposed to the female, in the testis as opposed to other tissues, and in the haploid spermatids as opposed to spermatogonia or spermatocytes. We have identified short nucleotide sequences which are present in the mouse P1 and P2 protamine genes which may be necessary for their proper expression. We have also identified sequences shared by the mouse protamine genes which are present in other testis-specific genes. These elements may function in the testisspecific expression. Additional studies indicate that the 5' flanking region of mP2 described here and a 2.4 kb fragment of the mP1 gene containing 880 bp of 5' flanking region are sufficient to confer testis-specific and post-meiotic expression in transgenic mice [28,46]. Further experiments are in progress to determine the function of these shared sequences, if any, and to determine which regions of the flanking DNA are necessary for proper transcription.

Acknowledgements We thank P. Leder for generously supplying the Balb/c mouse embryo genomic library. This work was supported by USPHS Grant GM-29224 to N.B.H. and by HD-01972 to R.D.P.

References 1 Belly6, A.R. (1979) in Oxford Review of Reproductive Biology (Finn, C.A., ed.), Vol. 1, 159-260, Oxford University Press, Oxford. 2 Hecht, N.B. (1986) in Experimental Approaches to Mammalian Embryonic Development (Rossant, J. and Pedersen, R., eds.), pp. 151-193, Cambridge University Press, London. 3 Hecht, N.B. (1988) in Histones and Other Basic Nuclear Proteins (Hnilica, L., Stein, G. and Stein, J., eds.), CRC Press, Boca Raton, in press. 4 Aiken, J.M., McKenzie, D., Zhao, H-Z, States, J.C. and Dixon, G.H. (1983) Nucleic Acids Res. 11, 4907-4922. 5 Yelick, P.C., Balhorn, R., Johnson, P.A., Corzett, M., Mazrimas, J.A., Kleene, K.C. and Hecht, N.B. (1987) Mol. Cell. Biol. 7, 2173-2179. 6 Balhorn, R., Weston, S., Thomas, C. and Wyrobek, A.J. (1984) Exp. Cell Res. 150, 298-308. 7 Kleene, K.C., Distel, R.J. and Hecht, N.B. (1985) Biochemistry 24, 719-722.

8 Belly6, A.R., Anderson, E. and Hanley-Bowdoin, L. (1975) Dev. Biol. 47, 349-365. 9 Coelingh, J.P., Mortfoort, C.H., Rozijn, T.H., GeversLeuven, J.A., Schiphof, R., Steyn-Parve, E.P., Braunitzer, G., Schrank, G. and Ruhfus, A. (1972) Biochim. Biophys. Acta 285, 1-14. 10 Mazrimas, J.A., Coi-zett, M., Campos, C. and Balhorn, R. (1986) Biochim. Biophys. Acta 872, 11-15. 11 Monfoort, C.H., Schiphof, R., Rozijn, T.H. and Steyn-Parve, E.P. (1973) Biochim. Biophys. Acta 322, 173-177. 12 Kistler, W.S., Keim, P.S. and Heinrickson, R.L. (1976) Biochem. Biophys. Acta 427, 752-757. 13 Tobita, T., Tsutsumi, H., Kato, A., Suzuki, H., Nonoto, M., Nakano, M. and Ando, T. (1983) Biochim. Biophys. Acta 744, 141-146. 14 Calvin, H.I. (1976) Biochim. Biophys Acta 434, 377-389. 15 Loir, M. and Larmeau, M. (1975) Exp. Cell Res. 92, 509-512. 16 Sauti6re, P., Belaiche, D., Martinage, A. and Loir, M. (1984) Eur. J. Biochem. 144, 121-125. 17 Bellv6, A.R. and Carraway, T. (1978) J. Cell Biol. 79, 177a. 18 Bower, P.A., Yelick, P.C. and Hecht, N.B. (1987) Biol. Reprod. 37, 479-488. 19 Gaastra, W., Lukkes-Hofstra, J. and Kolk, A.H.J. (1978) Biochem. Genet. 16, 525-529. 20 Puwaravutipanich, T. and Panyim, S. (1975) Exp. Cell Res. 90, 153-158. 21 McKay, D.J., Renaux, B.S. and Dixon, G.H. (1985) Biosci. Rep. 5, 383-391. 22 Kolk, A. and Samuel, T. (1975) Biochim. Biophys. Acta 393, 307-319. 23 McKay, D.J., Renaux, B.S. and Dixon, G.H. (1986) Eur. J. Biochem. 156, 5-8. 24 Ammer, H., Henschen, A. and Lee, C. (1986) Biol. Chem. (Hoppe-Seyler) 367, 515-522. 25 Hecht, N.B., Kleene, K.C., Yelick, P.C., Johnson, P.A., Pravtcheva, D.D. and Ruddle, F.H. (1986) Somatic Cell Mol. Genet. 12, 203-208. 26 Kleene, K.C., Distel, R.J. and Hecht, N.B. (1984) Dev. Biol. 105, 71-79. 27 Kleene, K.C., Distel, R.J. and Hecht, N.B. (1983) Dev. Biol. 98, 455-464. 28 Peschon, J.J., Behfinger, R., Brinster, R.L. and Palmiter, R.D. (1987) Proc. Natl. Acad. Sci. USA 84, 5316-5319. 29 Feinberg, A.P. and Volgelstein, B. (1983) Anal. Biochem. 132, 6-12. 30 Southern, E. (1975) J. Mol. Biol. 98, 503-517. 31 Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 32 Mizusawa, S., Nishirnura, S. and Seela, F. (1986) Nucleic Acids Res. 14, 1319-1324. 33 Mount, D.W. and Conrad, B. (1984) Nucleic Acids Res. 12, 811-818; DNA Sequence Programs Vers. 1.6, Copyright © 1983, 1984. 34 Ley, T.J., Anagnou, N.P., Pepe, G. and Neinhuis, A.W. (1982) Proc. Natl. Acad. Sci. USA 79, 4775-4779. 35 Jankowski, J.M., States, J.C. and Dixon, G.H. (1986) J. Mol. Evol. 23, 1-10.

53 36 Breathnach, R. and Chambon, P. (1981) Annu. Rev. Biochem. 50, 349-383. 37 Kozak, M. (1986) Cell 44, 283-292. 38 McKnight, S.L. and Kingsbury, R. (1982) Science 217, 316-324. 39 Kadonaga, J.T., Jones, K.A. and Tjian, R. (1986) Trends Biol. Sci. 11, 20-23. 40 Weinberger, J., Baltimore, D. and Sharp, P.A. (1986) Nature 322, 846-848. 41 Lee, C., Mansouri, A., Hecht, W., Hecht, N.B. and Engel, W. (1987) Biol. Chem. (Hoppe-Seyler), 368, 131-135. 42 Krawetz, S.A., Connor, W. and Dixon, G.H. (1987) DNA 6, 47-57.

43 Brooks, D.E., Means, A.R., Wright, E.J., Singh, S.P. and Tiver, K.K. (1986) J. Biol. Chem. 261, 4956-4961. 44 Lai, Z. and Childs, G. (1986) Nucleic Acids Res. 14, 6845-6856. 45 Cole, K.D., Kandale, J.C. and Kistler, W.S. (1986) J. Biol. Chem. 261, 7178-7183. 46 Stewart, T.A., Hecht, N.B., Hollingshead, P.G., Johnson, P.A., Leong, J.C. and Pitts, S.L. (1988) Mol. Cell Biol., in press. 47 Lee, C.-H., Hoyer-Fender, S. and Engel, W. (1987) Nucleic Acids Res. 15, 7639. 48 Tanhauser, S.M., Johnson, P.A., Yelick, P.C. and He~ht, N.B. (1983) J. Cell. Biol. 105, 166A.