Nucleotide sequences of Caenorhabditis elegans core histone genes

Nucleotide sequences of Caenorhabditis elegans core histone genes

J. Mol. Biol. (1989) 206, 567-517 Nucleotide Sequences of Caenorhabditis elegans Core Histone Genes Genes for Different Histone ClassesShare Common F...

2MB Sizes 0 Downloads 76 Views

J. Mol. Biol. (1989) 206, 567-517

Nucleotide Sequences of Caenorhabditis elegans Core Histone Genes Genes for Different Histone ClassesShare Common Flanking SequenceElements Susan Boseman Roberts?, Scott W. Emmons and Geoffrey Childst Departments of Genetics and Molecular Biology Albert Einstein College of Medicine Bronx, NY 10461, U.S.A. (Received 23 May 1988, and in revised form 1 December 1988) We have determined the nucleotide sequence of core histone genes and flanking regions from two of approximately 11 different genomic histone clusters of the nematode Caerwrhabditis elegans. Four histone genes from one cluster (H3, H4, H2B, H2A) and two histone genes from another (H4 and H2A) were analyzed. The predicted amino acid sequences of the two H4 and H2A proteins from the two clusters are identical, whereas the nucleotide sequences of the genes have diverged 9% (H2A) and 12% (H4). Flanking sequences, which are mostly not similar, were compared to identify putative regulatory elements. A conserved sequence of 34 base-pairs is present 19 to 42 nucleotides 3’ of the termination codon of all the genes. Within the conserved sequence is a 16-base dyad sequence homologous to the one typically found at the 3’ end of histone genes from higher eukaryotes. The C. elegant core histone genes are organized as divergently transcribed pairs of H3-H4 and H2A-H2B and contain 5’ conserved sequence elements in the shared spacer 3’, is located regions. One of the sequence elements, 5’ CTCCNCCTNCCCACCNCANA immediately upstream from the canonical TATA homology of each gene. Another sequence element, 5’ CTGCGGGGACACATNT 3’, is present in the spacer of each beterotypic pair. These two 5’ conserved sequences are not present in the promoter region of histone genes from other organisms, where 5’ conserved sequences are usually different for each histone class. They are also not found in non-histone genes of C. elegans.These putative regulatory sequences of C. elegant core histone genes are similar to the regulatory elements of both higher and lower eukaryotes. The coding regions of the genes and the 3’ regulatory sequences are similar to those of higher eukaryotes, whereas the presence of common 5’ sequence elements upstream from genes of different histone classes is similar to histone promoter elements in yeast.

1. Introduction

are regulated

transcriptionally and post-transcriptionally (for a review, see Wu et al., 1986); furthermore, the expression of various histone genes is regulated in both a temporal and a tissue-specific manner (Bird et al., 1985; Brown et al., 1985; Childs et al., 1979; Lieber et al., 1986). Regulation of the histone multi-gene family is complex and relevant to numerous unresolved biological problems. For example, it is unclear whether histone protein variants are functionally distinct and important for regulating other genes (Newrock et al., 1977). If they are not, why are multiple copies of each histone gene maintained? Also, the mechanisms that control the co-ordinate

Histone proteins are vital components of ordered chromatin structure, and the study of histone structure and function is important for understanding basic cellular processes (McGhee & Felsenfeld, 1980). Histones also provide an attractive model system for studying mechanisms of gene regulation. Histone transcripts are small and abundant, and t Present address: Laboratory of Molecular Biology, Rockfeller University, Pu’ewYork, KY 10021, U.S.A. $ Author to whom correspondence should be addressed. OW2-2836/89/080567-

I 1 $03.00/O

567

0

1989 Academic

Press Limited

568

S. 13. Roberts et al.

expression of the five histone classes and the signals that couple histone expression and DNA synthesis are not understood (Alterman et al., 1984; Coffin0 et al., 1984; Heintz et al., 1983; Hereford et al., 1981). Studies of the structure and sequence of histone genes and regulatory regions are necessary as the first step in addressing these problems. The genome of the nematode Caenorhabditis elegans contains approximately 11 dispersed clusters of core histone genes. Many of the clusters have at least one copy of each core histone gene. The order of the genes with respect to one another within the clusters varies, and histone mRNAs are transcribed from both DNA strands (Roberts et al., 1987). Published studies (Certa & von Ehrenstein, 1981; VanFleteran & VanBeeumen, 1983; VanFleteran et al., 1986) have demonstrated that C. elegans chromatin contains multiple histone subtypes. Therefore, we have analyzed the nucleotide sequences of histone genes from two different clusters to determine whether the clusters encode different histone subtypes, and to identify conserved sequence elements that may be required for regulation of histone expression in C. elegans.

2. Materials and Methods (a) Materials

Restriction enzymes and the Klenow fragment of DNA polymerase I were purchased from New England Biolabs and used according to NEB recommendations. Bacteriophage Ml3 primer and M13mplS and M13mp19 cloning vectors were from Pharmacia P-L Biochemicals. Nuclease S, was from Sigma. Radioisotopes were purchased from Amersham. Nitrocellulose was purchased from Schleicher and Schuell. (b) Northern and Southern hybridizations Nucleic acids were isolated from mixed populations of nematodes (N2). Total RNA was fractionated by gel electrophoresis on 2% (w/v) agarose/3% (v/v) formaldehyde gels in Mops buffer (0.01 ~-Mops, pH 7.4, 0901 MEDTA) and transferred to nitrocellulose. Genomic DNA digested with Hind111 was fractionated on 0.7% (w/v) agarose gels and transferred to nitrocellulose. Baked membranes were hybridized to an oligonucleotide, GC-7; or a C. &guns histone cluster, mCeh-2/18.3 (Roberts et al., 1984). Hybridization with mCeh-2/18.3, radiolabeled by primer extension, was carried out at 37°C in a solution of 50% formamide, 5 x Denhardt’s (1966) solution, 0.02 Msodium phosphate (pH 6.8), 5 x SSC (0.75 M-NaCl, 0.075 M-sodium citrate, pH 7.1), 10% dextran sulfate, and 2OOpg sheared denatured salmon sperm DNA/ml. Filters were washed in 0.5 x SSC, 0.1 ye SDS at 65 “C for 2 h and autoradiographed. An oligonucleotide GC-7 (100 ng) was radiolabeled at the 5’ end with 7OpCi of [Y-~‘P]ATP (3000 Ci/mmol) using bacteriophage T4 polynucleotide kinase (Maniatis et al., 1982). Oligonucleotide hybridization was carried out in a solution of 7 o/o SDS, 5 x Denhardt’s solution, 0.02 Msodium phosphate (pH 6*8), and 2OOpg sheared denatured salmon sperm DNA/ml at 52°C (Geliebter et al., 1986). Blots were washed in 1 x SSC, 0.1 y. SDS at room temperature.

The 5’ and 3’ ends of histone H2A and H4 transcripts were determined by nuclease S, analysis as described (Knowles & Childs. 1984; Weaver & Weissman, 1979). Briefly, end-labeled DNA ( lo5 cts/min per assay) in 16 ~1 deionized formamide was heated to 50°C for 2 min and added to total C. elegans RNA (50 pg) in 4 ~1 of 10 x SSC (0.15 M-sodium citrate, 0.15 M-sodium chloride, pH 7.1). The mixture was incubated at 49°C for 18 h. The annealed mixture was diluted with 180 ~1 cold S, buffer (0.25 M-sodium chloride, 0.03 M-sodium acetate, pH 4.6. chloride) and nuclease S, (250 to 500 units) 0.001 M-Zinc was added. The reaction mixture was incubated at 37°C for 1 h. The products were precipitated with ethanol and fractionated on 50’ (w/v) acrylamide/T M-urea gels (0.4 mm).

(d) Sequencing methods The enzymatic procedure of Sanger et al. (1977) was used to determine all nucleotide sequences. Singlestranded templates were generated by subcloning fragments into M13mp18 or M13mp19 (Yanish-Perron et al., 1985). Most of the fragments for sequencing were obtained from libraries generated by digesting large DNA fragments (agarose gel purified) with enzymes that cut frequently and cloning the products into appropriately digested Ml3 vector (Hong, 1982). In addition, deletion libraries (Dale et al., 1985; Henikoff, 1984) were used. Gels (0.4 mm) of 5% or 8% (w/v) polyacrylamide, 7 M-URa and either 40 cm or 80 cm long were used to resolve DNA fragments. (e) Computer analysis Programs available on Bionet were used for analyzing DNA sequences. In addition, computer programs based on the Kern/Queen algorithm as utilized by the Seq. program (Brutlag et al., 1982) and written for the VAX 780 computer at the Albert Einstein College of Medicine by David Steele, and modified by Lewis Berman, were used to search for homologies among sequences and to align overlapping Ml3 clones.

3. Results (a) Nucleotide sequences of C. elegans histone genes We have analyzed two histone clusters with different arrangements of histone genes. These clusters were chosen because their different gene order suggested that they might be more divergent and more likely to include different histone subtypes. Furthermore, if these two clusters are diverged, then essential regulatory sequences may be recognized as conserved sequences. The two clusters are named HIS1 and HIS3 (Roberts et aZ., 1987). HISl, cloned on plasmid pCeh-1, contains four histone genes: his-l (H4), his-2 (H3), his-3 (H2A) and his-4 (H2B). his-l and his-3 were sequenced. HIS3 contains eight histone genes: his-9 (H3), his-10 (H4), his-11 (H2B), his-12 (H2A), his-13 (H3), his-14 (H4), his-15 (H2B) and his-16 (H2A). Of these, his-g, his-lo, his-11 and his-12, all cloned on plasmid pCeh-3.2, were sequenced. The order and orientation of these genes, as well as the

C. elegans Histone Genes EcoRi pCeh-3.2

H3

H4

569

H2B

EcoRI

H2A

e c--g= *-

I

I 500 bp

pCeh-I

Bg/n 1 -2

H4

H3

H2A *

-

HindIU

H2B

?a-4

Figure 1. Organization of clones and strategy for DNA sequencing. Nucleotide sequences of histone genes from cluster HIS3 (pCeh-3.2) and cluster HIS1 (pCeh-I) (Roberts et al., 1987) were determined by the method of Sanger et aE. (1977). The 3400 base-pair (bp) EcoRI insert from pCeh-3.2 was digested with either Hue111 or HpaII, and fragments were subcloned in M13mpl8. Random clones from each set were isolated. In addition, 2 different deletion libraries were constructed (Dale et al., 1985; Henikoff, 1984) to generate overlapping clones. A 3500 base-pair BgZI-Hind111 insert from pCeh-1 was digested with either Hue111 or HpaII, and fragments were subcloned in M13mp18. Random clones were isolated. Additional subclones were generated by digesting the Hind111 insert from pCeh-1 with XhoI and subcloning the resulting fragments in M13mp18 and mp19.

The nucleotide sequence of DNA from each subclone was determined, and data were organized using the Gel Computer Program of Bionet). The position and polarity of each histone gene on the clusters are indicated by thick black arrows. The direction of the thin arrows indicates the strand of the DNA sequence that was read. The nucleotide sequence of only the regions shown with thin arrows is reported.

in Figure 1. regions sequenced, are shown Throughout this paper, genes are referred to by histone class and cluster. The nucleotide sequences of C. elegans core histone genes show that these genes encode proteins with amino acid sequences similar to core histone proteins from other organisms. Each gene has a single open reading frame, and putative regulatory sequences, which are described below. There is no indication that any of them is a pseudogene. Direct evidence that these are functional genes is provided by the transcription studies described below. amino acid Comparison of the predicted sequences of H2A and H4 from the two clusters (Figs 2 and 3) shows that the clusters encode the same H2A and H4 proteins. Nevertheless, the nucleotide sequences of the genes are divergent. The H2A genes have diverged 9% in nucleotide sequence, and the H4 genes have diverged 12 y. from one another. The H4 genes are approximately as divergent from one another as they are from the late-stage-specific H4 gene of sea urchin, Lytechinus p&us (15%) (Lieber et aE., 1986). VanFleteran et al. (1986) have determined the amino acid sequences of several C. elegans histones and demonstrated the presence of H2B variants. Both of the H2B variants reported by VanFleteran have lysine at positions 30 and 31, whereas the predicted amino acid sequence of the H2B gene studied here has, respectively, arginine and histidine at these two positions. This may indicate that the histone H2B gene on pCeh-3.2 encodes a minor variant, or that the gene is expressed in a restricted stage-specific or tissue-specific pattern. The inferred amino sequence of each C. elegans histone protein was compared with histone protein sequences from other organisms (Protein Sequence Database of the National Biomedical Research Foundation). The amino-terminal region and the

last five amino acids of the carboxy-terminal of H2A (approximately 15% of the protein) are divergent from those of other organisms, whereas the remaining segment of the C. elegans H2A protein is conserved with only a few conservative amino acid substitutions. This pattern of conservation

and

divergence

is

similar

to

the

pattern

observed for histone H2A from other multicellular eukaryotes. A similar comparison of histone H4 sequences shows that C. elegans H4 is identical to sea urchin H4 and differs by one amino acid (Cys to Thr at’ position 74) from the vertebrate H4 sequence. Comparison of histone H3 with H3 from organisms as diverse as yeast, Drosophila, sea urchins and mice reveals conservative amino acid substitutions scattered throughout the C. elegans H3 sequence. Histone H2B protein sequences are divergent in the amino-terminal region, approximately one-third of the molecule. Conservative amino acid substitutions are found in the remaining two-thirds of the molecule. Overall, C. elegans histone proteins are typical, diverging only within regions previously known to be variable. (b) Expression of selected histone genes Untranslated regions 5’ and 3’ of the coding sequences of the H2A and H4 genes from cluster HIS1 are protected from nuclease S, by C. elegans RNA. Mobilities of the protected fragments are shown in Figure 4. The presence of protected fragments extending beyond the coding regions indicates that these genes, or other genes identical to them, are expressed. In studies of histone transcripts in other organisms, the extensive homologies in the coding sequences of members of the histone multigene family yield nuclease S,-protected fragments corresponding to the length of the coding segment in addition to full-length

570

8. B. Roberts et al. 20

40

60

80

100

~tC.C..tt.~t..tC~ttCCt~.t~.t.Ct~~t~.Ct.ttCt...tttt.~.t.C~..tttCtt..ttttC.~tttttC.~~t..t.tt.t~~C..t~C... C.tt~C-~.--.6-...t..t-.C~.--C~-tt~t6..tC..CC--.-CC-.-.-~.-~-----CttC~...C-.---C.-tt--t.--.-.-t.C~------

140 160 120 .C.t&.ttt~.t..~.ttt.ttC..S$tt.t ..t.~~.ttt.t..CATTTGT~T~~T~~T~~Tt.~~t~t~tt -*cc.------*-a-*---ttt*-c--6tt~s*t-6-------t*--t----~-~---------------------------,-**---t--*s----~--

200 6..66.t6C

160

.EKDGGTKKPLLVAQINPLVGGQAIT Ct.

ttC

Ctt

StC

tCC

SCC 6(lt

Ctt

Ctt

t,,&

S.8

Sal)

SW

S&2

tt@

,,.t

&tt

C&3

..6

SSC

tCC

tCC

tt.,

.lC

6.t

t&t

---

---

---

6--

---

t-m

t-m

---

c--

c-*

---

---

---

_-_

__-

*--

t--

-_-

---

--_

-__

__-

6--

---

6--

V

G

L

E

R

V

A

5

L

E

A

I

R

T

K

&aCtCC a(lCC..

.Sa6ttt

6tt

C..

CtC

ttC

6tC

Stt

SC6

(l.C

66.2

a.6

tt6

.a((

St6

tCt

t.,r

66C

&St

SC,,

&.&.t

Ctt

t--

c--

---

---

---

c--

*--

6--

- _-

- - _ _ _ _ c--

---

6--

---

-__

___

_--

*--

t-t

---

---

---

A ---

L ---

L

---

K ---

N

E

D

N

Q

R

P

KNDRAANGALELVEAALYELVAALYV

--- --- __---- --- --- --- t-- *----- --- --- --- --- --- --- --- --- c--*-- --- 6--*-- --6--- *-- H2A

Ctt

gtt

StC

SC6

&(lC

S.9

6tt

SCC s&C

C..

CtC

6.6

SSC

CtC

S&Z

S6C

6.6

It.

ttC

6.6

SSC

SIC

66C

C..

&t.

&SC

PAGAGVRQAYNGKRLIRRLRGVPFQL t&6 66-ZtCC 66CtCC 6.C SC6 tt,,

S&2

It.

gtt

tCC

ttt

&Z6

a.6

SSt

SC6

bt(l

S.8

SC&

SCC SSC

tg&

6..

ttg

6.6

___

___

---

---

--_

__-

*--

6--

---

---

--_

__-

___

___

---

---

---

-__

_-_

&CC tCC

Ctt

tCC

SC6

tCC

S.5.

CSt

---

---

6-- --_ __- ___

___

___

_-_

_--

*--

*--

___

GARSSRSKAKGGTKAKGGKGRGSH tCC

6gCtCt

tg.

t6.

--_ ___ ___ ___ ---

i3C(l

66.

Ctt

6&C

Ctt

tCC

tCC

.&

Ctt

66.2

ttt

---

---

---

---

---

---

---

---

---

---

_-_ ---

---

569 TGTTt GTGA-

660 640 600 mRNAstart sta6sstc6satcTGATGAG6cssct6cs6a 6acactatTATATAcaattTCTGTGGTGGG TAGGUiGAGacsCTGCGGGGACAC g.tt6 --~~~6tt6gta6ats6-~s--a-~ ------A;GA---c-stt---cc-----c--T--T--A---~s A---C----MC-------GC---ts6a~s~aa66=

800 cscttgtttcas6tcaccaactctCAAC 6.t6..tt.tt.taa66aatc

MPPKPSAKGAKKAAKTVT at6 cm cc. aa cca tct 6cc aa

66a 6cc aa

aa

6cc 6cc aa

act 6tt

ac6

KPKDGKKRREARKESYSVYIYRVLKQ

asg cc* sag esc 86a aa.3aa. ala c6t cat gee c6t aa 6aa tea tat tee 6tc tat stc tat c6t 6tc ctc

aa

caa

6.8

Cat

Stt

&.Ct

6tC

cgt

H2B

VEPDTGVSSKAMSIMNSFVNDVFERI gtt

CSt

CC.

6.C

SCt

66.

&tt

tCC

tCC

S..

&CC

St6

tCt

StC

St6

SSC

tCt

AAEASRLAEYNKRSTISSREIQTAVR &Ct 6.. gca tee Cgt ctt 6Ct cat tsc sac aa6 cgt tee .ca ate tc.

gCt

LILPGELAKEAVSEGTKAVTKYTSSK ctg stc ctt cc. gga gag ctt gee aag cat 6cc 6t6 tct

6.6 66a act aa

ttt

&C

SSC g.t

&tC

tee

C&C

6a. att

ca6 act

6cc 6tt

act aag tat

ttC

act tee age as6

1180 1200 1160 tss 6ccattc66ct6aasat6tt6.ACAACAACCGAACCCAACGGCCCTCTTTXGGCACAAA

Figure 2. Nucleotide sequence comparison of C. elegans H2A genes and the nucleotide sequence of a C. elegant H2B gene. The nucleotide sequences of the H2A (his-12) and H2B (his-II) genes and surrounding DNA from pCeh-3.2 and the H2A (hiu-3) gene from pCeh-1 are shown. The sequence of the H2A gene and the H2B gene from pCeh-3.2 is given on the top line. Base differences in the sequence of the H2A gene from pCeh-1 are printed in lower case letters beneath the sequence. Broken lines indicate homology between the 2 sequences, and the spaces in the flanking DNA represent deletions or insertions that were postulated in order to make the best alignment of the flanking sequences. Capital letters in the nucleotide sequences indicate conserved sequence elements that are discussed in the text. The coding strands of the H2A genes are shown; the anti-coding strand of H2B is shown. The arrows indicate directions of transcription. The 5’ and 3’ ends of H2A mRNAs were mapped by nuclease S, analysis using the H2A gene from pCeh-1 as a probe: asterisks mark the DNA positions that correspond to the ends of the H2A mRNA.

C. elegans Hi&me

40 20 60 ctg.gt.c.g*gtctgt.ctgcag.cctcc.gcttt..*.tgtt...g.tttt....c....t..gt.....g...g.c..g.tttt.tt.tt..*.g*.c.* --.CtttC..C.tt-------C-.C&C--,,.--.8aCh-----t-g-tctgtat-g..gc.g

80

100

.--CtC.-ge.t.-Et-.,,a.-C.C.--ttt.tg--C

120 140 160 ttc.t.attggatttc.g..tATTTGTOGCCCT~~~T~T~T.gtttggt.g.tgcatgg..gc..g~g.c *&.*-*-t-,------t------i---------------------------t-g-----tGYLTRG tee &t. cog ggt tct --- --- --- *-- ---

571

Genes

180

. G G P tt. tee tee &.& - -- -- - --- 0-e

-atg..atc-.g---t.-a.

QRKLAYVVDHATVTKRKAEE tee

ttg

*cg

ctt

C..

IJBC at.

g&c O&C l tc

---

---

&-- ---

---

*--

---

CYTVADRIVNELFVKLVGRTEEYILG ~CS gt. tgt gac ggc .tc .c* gst c.c stt

---

---

*--

&gc 0.&t &ac g&t ctt tct ___ --_ I-- _-- --- --_ ---

cot

I&C gtg ctc --- --- ---

ctt ---

*-- _--&---_---- --- --- --- *-- *-- --- g----- t-- --- --- *-- --- --- *----- --- (L----- --- --- H4 SIRKVGGRRALRRIAPKTIGQINDRL aga gat tct ctt gac tee &CC tct ---

---

acg -__ _-- ---

---

tct

ctc sag g..

c.c ctt

g.* gac tee acg ggt ttc

&SC 8.8 acg &Cb aat tgc c&g ctt

acg acg *--

VKRHRKAGGKGLGKGGKGRGSH C&C ttt gcg atg *cl) ctt ggc tee tee ttt I-- c-- t-- --- &-- --- --- *-- --- ---

*--

---

II-- ---

tee nag &CC ttt --- ___ --_ ---

*--

t--

---

tee tee ctt --- --- ---

ctc gt.

gat gsg tee

ggt gat (ICC ttg

gat gtt

ate acg ..g

---

---

a-- a--

---

t--

---

tee scg tee gga cat --- --- --- --- ---

---

---

SlO TGTTGacssttg. ------g---m

52?b%NA start 560 580 600 tg..g.ctc.c..gtgag.TGAAGAActg.ttga..ttgccttc TTTATAt.cccct..tgtct.cTAACACZTCXGMGAGGTGGAGtc.c.tt..gcc ct---~--------------------~~&~~~gc ~g~~~g~~c~c~~~~~~ccs~ --T@J---TA---C----aa 620 640 660 660 atcCTGTGGGGACACATTTcat8igtct.acctaac.c.ccg.atgtcta.ctgtcgtctgtctcttccttccCTC~T~CCA~CCA ~~-C-----~-~~~~-gtcttga--t~~t---t~~~~~~~~~~ sct-tt~~ccs~~~-ct-ac~t~~~----~~----A--~~~~~~~~g 740 760 720 agaaacg.t..ggtctcccttTTCTTCAc.gtcccc. cgg.tt.cc..ccaa.GCATC --------------tt-t------ga--t--t----tt t-c-gt-tatg------TGGKAPRKQLATKAARKSAPASGGVK &CC gga gga as& get cc. a*. a.6 c..

ttg

H A R T K Q T A R K S atg get cgt act sag c.. .cc gee cgt . . . tcg

gee .cc sag get gee c6t . . . tcg got cc. get tee gg. ggt gtc sag

KPERYRPGTVALREIRRYQKSTELLI sag cc. cat cgt tat cgt cc. gg. act gtc get ct. cgt gag ate a*. cgt tat RRAPFQRLVREI C&t

.g.

8.28

CC.

700 CCtgcsTATAAA ------

tag . . . tee .cc gag ctt

ctc att

8.t

tC.

H3

AQDFKTDLRFQSSA ttC

C.8

Cut

Ctt

8tC

C@,

8.6

.tt

8Ct

C.8

&.t

ttC

a.8

.CC

CtC

Cl.

ttc

gag gat .cc *.c ttg

ttC

C.8

tgt

gee ate c.t

VHALQEAAEAYLVGLFEDTNLCAIBA gtt atg get ctt c.. gag &CC &CC giag get tat

ctc gtc g&a ctt

KRVTIMPKDIQLARRIRGERA. sag cga gtt &CC att atg cc. aa& &&C ate c..

ttg gc. ag. cg. ate cg. gg. gag cgt get t..

1200 1220 tactgagccagcttgatctcsaaa.tct~CCGMCCCMCGGtt

tCt

8Ct

get 1167

1240

1260

.tgccttcc..tgg 1260

Figure 3. Nucleotide sequence comparison of C. elegans H4 genes and the nucleotide sequence of a C. elegans H3 gene. The nucleotide sequences of the H4 (his-IO) and H3 (L-9) genes and flanking regions from p&h-3.2 and the H4 gene (his-I) from pCeh-1 are shown. The sequence of the H4 and H3 genes from pCeh-3.2 is given on the top line. Base differences in the H4 gene from pCeh-1 are printed in lower case letters beneath the sequence. Symbols used are described in the legend to Fig. 2. The coding strands of the H4 genes are shown; the anti-coding strand of H3 is given.

572

8. H. Roberts et al.

(cl H2A-I M 5'

(4 H4-I nt

M5’

,295nt ’ ‘26lnt

cl’

b

(d) H2A-I M 3’

n 53#@+ stu1 Xhol I WA-I 144

4

12 4nt ‘+iE?

239nt, 8) 186nt’

Figure 4. Identification of the 5’ and 3’ ends of histone H2A and H4 transcripts by Si nuclease analysis. Samples (50 pg) of total C. elegant RNA from a mixed population of worms at various stages of development (egg to adult) were hybridized to 5’ end-labeled DNA probes ((b) and (d)) or 3’ end-labeled probes ((a) and (c)). Probes are from the H4 (his-I) and H2A (his-3) genes of cluster HISl. RNA: DNA hybrids and non-hybridized probe were digested with nuclease S,, and the protected DNA fragments were fractionated on 5% (w/v) polyacrylamide/7 M-urea gels. Markers (M) are end-labeled pBR322 digested with Hoe111 and 4X174 digested with HueIII; markers are the same for (a) and (c), and for (b) and (d). Probes are illustrated schematically below the autoradiograms. The letters (a, b, c and d) on the left indicate the probe used for each nuclease S, analysis. The number shown above each arrow representing a 5’ probe (a and c) indicates the distance between the putative cap site consensus sequence and the radiolabeled end of the probe. The number shown above each arrow representing a 3’ probe (b and d) indicates the distance from the site in the conserved 3’ sequence where the histone mRNA is expected to terminate to the radiolabeled end of the probe. The number below each arrow (a, b, c, d) indicates the size of the fragment between the radiolabeled end and the end of the coding region. The size of each probe is indicated in each autoradiogram (p). Fragments corresponding in size to fully protected genes are marked (*) and fragments corresponding in size to protected gene coding regions are indicated (+). Fragments smaller than the size of the protected gene coding regions result from partial protection of histone H4 and H2A mRNAs that are not identical to the probe. nt, nucleotides. fragments corresponding to the 5’ or 3’ ends of the individual family member used as probe (Knowles & Childs, 1984; Sittman et al., 1983). A comparison of the ratio of full-length protected probe (*) with probe protected as far as the initiation codon (-)) suggests that the H4-1 gene (his-l) encodes the

major class of H4 transcripts in C . elegans (Fig. 4(a)), and that the HZA-1 (his-d) encodes H2A transcripts roughly representative of the gene’s copy number (Fig. 4(c)). In addition, comparison of the protected indicates that

fragments in Figure 4(c) the 3’ untranslated regions

and (d) of H2A

C. elegans Histone Genes genes are more similar than the corresponding 5’ leader sequences. The H2A fragments that are smaller than the full-length coding region represent mRNAs from divergent H2A genes. The sequence 5’ CAACA 3’ is adjacent to the translational start site of both H4 genes (see below). Nuclease S, analysis of histone H4 transcripts using H4-1 as a probe (Fig. 4(a)) shows that the shortest fragment protected by this probe (266 nucleotides) extends 5 base-pairs upstream from the translational start site (expected fragment of 261 nucleotides; see Fig. 4 legend). This result suggests that most H4 genes that are expressed have the 5’ CAACA 3’ sequence. Variations of the sequence are also adjacent to the translational start sites of the other core histone genes (see below). The transcriptional start sites of the H4 and H2A genes, determined from the data shown in Figure 4(a) and (c), fall within the sequence 5’ TTCTTCA 3’, which is present 23 to 33 base-pairs upstream from the translational start sites of all the genes. This sequence is similar to the conserved sequence 5’ PyCATTCPu 3’ found at the cap site of sea urchin histone genes (Hentschell & Bernstiel, 1981; Sures et al., 1980). (0) Comparison of jlanking DNA sequences: 5’ and spacer regions The DNA flanking each gene was examined to determine if there are common 5’ sequences that may be important for histone gene expression in C. elegans. Discrete regions of sequence similarity and sequence divergence are present in both the 5’ and the 3’ flanking regions (Figs 2, 3 and 5). Overall, the leader sequences between the transcription and translation initiation sites of the two H4 genes are approximately 92% conserved; the leader sequences of the two H2A genes are approximately 65 % conserved. The 5’ regions of sequence similarity are aligned in Figure 5, and a consensus sequence of each conserved element is shown. The DNA regulatory element 5’ TATA 3’ is located 17 to 21 base-pairs upstream from the putative cap site consensus in all of the genes that 4 base-pairs were sequenced. Approximately upstream from the TATA homology (16 base-pairs in H4-3 (his-IO)) is a sequence 5’ CTCCNCCTNCCCACCNCANA 3’ that is present in all of the histone genes that were analyzed (Fig. 5). Hybridization of an oligonucleotide, 5’ GATCTCCGCCTACCCACCGCAGA 3’, containing this conserved sequence to two other cloned C. elegans histone clusters, HIS2 and HIS4 (Roberts et al., 1987), indicates one of these clusters also contains the conserved sequence and the other one may not, suggesting that all C. elegans histone genes do not have this 5’ element (data not shown). This sequence is not present in the promoter region of histone genes from other organisms (Perry et aZ., 1985; Wells, 1986). A second conserved sequence element, 5’ CTGCGGGGACACATNT 3’, is also present in each intergenic spacer region examined. Each histone gene described here

573

is part of a heterotypic pair transcribed from a common spacer. The spacer between the H2A and H2B genes on pCeh-3.2 (his-11 and his-22) is 226 base-pairs; the spacer between the H4 and H3 genes (his-9 and his-lo) is 272 base-pairs. (d) Comparisons of jianking DNA sequences: 3’ regions Nucleotide sequence comparisons of the DNA flanking the 3’ end of each histone gene reveal a perfectly conserved 34 base-pair sequence (5’ ACCGAACCCAACGGCCCTCTTTAGGGCCACAAAT 3’), as shown in Figure 6. The conserved sequence is located 19 to 42 nucleotides downstream from the stop codon of each histone gene. Sequences between the end of the coding regions and the beginning of the conserved sequence are not similar. Within the conserved sequence there is a perfect copy of the dyad sequence that is present in the histone genes of other multicellular organisms (Hentschel & Birnstiel, 1981). The conserved dyad has been shown by Birnstiel and co-workers (Birchmeier et al.. 1982, 1983, 1984; Strub & Birnstiel, 1986) to be required for efficient processing of the 3’ ends of sea urchin histone transcripts injected into Xenopus oocytes. Indeed, nuclease S1 analysis of C. elegans H2A and H4 transcripts (Fig. 4) shows that the 3’ ends of these histone mRNAs fall within the conserved sequence (Fig. 6). The CI. elegans sequence differs from the 3’ conserved sequences of other organisms only in that it extends over a longer region. Other organisms that have the conserved dyad at the 3’ end of core histone genes also contain one or more genes encoding histone Hl that share the same sequence (Wells, 1986). If this were the case in C. elegans, then we could use this sequence as a means of isolating the Hl gene(s) of this organism. To determine whether C. elegans Hl genes contain the 3’ conserved sequence, and to determine whether all of the C. elegans core histone clusters contain genes that have the 3’ conserved sequence, a synthetic oligonucleotide with the sequence 5’ AGAGGGCCGTTGGGTTCGGT 3’ (GC-7) was used. GC-7 is identical to the anticoding strand of the C. elegans conserved sequence (Fig. 6). A Southern blot of genomic DNA (N2) digested with Hind111 was hybridized to the radiolabeled oligonucleotide GC-7 and compared with a duplicate blot probed with a cloned C. elegans histone cluster, Ceh-2 (HIS2) (Roberts et al., 1987). Both probes hybridize to a similar set of DNA fragments (Fig. 7). This analysis suggests that all of the C. elegans core histone gene clusters contain genes that have the 3’ conserved sequence; and in addition, there are several fragments that are not detected by this probe. Assuming that all genes have the conserved sequence, the number of histone genes on each of the genomic restriction fragments can be estimated by comparing t,he relative intensit,ies of hybridization with GC-7. For example, the largest fragment is a cluster duplication

574

S. B. Roberts et al.

(a) +l

-76 &CCTCCTTCCCACCACAGA--4-AATAAA-l6-TCTCTCA-24-CAAC-

ATG

H2B-3.2

ATG

H2A-3.2

ATG

H2A-1

ATG

H3-3.2

ATG

H4-3.2

ATG

H4-1

-60 CTCCGCCTACCCACCACAGA--5-TATATA-lO-CTCATCA-l9--MCA -91 l

r” GCCCGCCTATTCACCGCAGT--4-TTTAAA-2O-TCCTTCA-29-TCAC-66 CTCCGCCTACCCACCCCACC--4-TATAAA-2l-TTCTTCA-26-CATC-100

TT

I CTCCACCTCCCCACCTGTTA-16-TATAAA-16-TTCTTCA-27-CAACA -67 l

CTCCGCCTTCACACCGCATA--4-TATAAA-l7-TTCTTCA-28-CAACA

> CTCCNCCTNCCCACCNCANA

TATAM

TTCTTCA

CAACA

consensus

(b) +l

+1

H2A-3.2

CAT-83-

CTGCGGGGACACATTT-127- iTG

H2B-1

. . . . . . ..CT

H4-3.2

CA+-116-CTGTGGGGACACATTT-140-

H4-1

CA;-SO-

CGGG ACACA CT-175-

CTGCGGGGATACATAT.........

CTGCGGGGACACATNT

H28-3.2

iTG

H2A-1

iTG

H3-3.2

H3-1

consensus

Figure 5. Sequences conserved in the 5’ flanking DNA of C. elegans core histone genes. (a) The nucleotide,pequences of similar regions upstream from core histone genes are aligned. The number of base-pairs between each conserved region is indicated. The A residue of the first methionine codon of each gene is designated + 1, and the total number of nucleotides within each region is shown as -n. The asterisks above the sequences of H2A-1 and H4-1 mark the transcriptional start sites mapped by nuclease S1 analysis. The arrow over the cap site consensus sequence indicates the direction of transcription. (b) The nucleotide sequence conserved in the spacer region of each divergently transcribed histone gene pair is shown. The number of base-pairs between the A residue of the first methionine codon of each histone gene and the conserved element is indicated; ellipses indicate that the distance has not been determined.

(pCeh-3; Roberts et al., 1987), which contains eight histone genes. Hybridization to this fragment is about twice as intense as to the third largest fragment, which contains one cluster of four genes. The estimated number of core histone genes (70), based on the analysis with GC-7, is higher than the number (45) estimated from hybridization experiments with gene-specific histone probes (Roberts et al., 1984), suggesting the existence of C. elegans histone genes with conserved 3L ends but coding sequences divergent from those that have been

isolated. An intriguing alternative is that genes other than histones contain the 3’ conserved sequence. Evidence that the additional sequences detected by the oligonucleotide are not Hl genes was obtained by analyzing transcripts homologous to GC-7. The autoradiogram shown in Figure 8 demonstrates that GC-7 hybridizes only to histone mRNAs of the sizes expected for the core histones. Th e pattern of hybridization and the sizes of individual core histone mRNAs are well character-

C. elegans Histone Genes -**

2042bp c. e/egons -STOP. . . . . . . . .pTTTmT Gc-7

consensus--STOP.

575

--------------------

20-P . . . . . . . .CAACOOCCCmT AA T

-

l *

A

C

Sea urchin

Drosqohila Xerwpus Chicken Mouse Human Figure 6. Sequences conserved in the 3’ flanking DNA of C. elegans core histone genes. The top line is the nucleotide sequence that is conserved at the 3’ end of all C. elegana core histone genes that were analyzed. The conserved dyad that is present in the histone genes of many other organisms is shown below the C. elegana sequence. The arrows show the positions of dyads. The asterisks indicate the 3’ ends of histone transcripts mapped by nuclease S, analysis. Numbers represent the nucleotides present between the stop codon of each histone gene and the 5’ end of the conserved sequences. The broken line indicates the sequence that is complementary to the oligonucleotide GC-7.

HIS2

GC-7

H3H2A/ ‘H20H4-

Figure 7. Genomic histone clusters contain genes that have the 3’ conserved sequence. Genomic (N2) DNA was digested with Hind111 and fractionated on 0.6% (w/v) agarose gels (4 pg/lane). Individual lanes from Southern blots of this DNA were hybridized to either a C. elegans histone cluster HISZ, containing one copy of each of the 4 histone genes (mCeh-2/l&3; Roberts et al., 198’7), or to an oligonucleotide, GC-7. kb, lo3 bases.

Figure 8. C. elegans histone transcripts that contain the 3’ conserved sequence. C. elegant (N2) total RNA was fractionated on a 2% (w/v) agarose/3% (v/v) formaldehyde gel (15 pg/lane). A Northern blot of the RNA was hybridized to the oligonucleotide GC-7 as described in Materials and Methods.

576

S. N. Roberts

ized (Roberts et al.. 1987). The transcripts detected by CC-7 are not large enough to encode proteins the size of C. eEegans Hl (VanFleteran &, VanBeeumen, 1983). Furthermore, an Hl gene that encodes a 1.O x IO3 base polyadenylated transcript has been cloned and sequenced (M. Sanicola, G. Childs & S. Emmons, unpublished results). This gene does not have the 3’ conserved sequence in its flanking DNA. If C. elegans has an HI gene that contains the 3’ conserved sequence, it may be expressed in a stage-specific or tissue-specific manner that could not be detected in mRNA from the population of worms that was analyzed.

4. Discussion Characterization of the core histone genes of C. elegans by nucleic acid sequence analysis shows that the histone gene family of C. elegans has features in common with and features that are different from histone genes of other multicellular organisms. The 34 base-pair sequence at the 3’ end of all of the core histone genes is analogous t,o the conserved sequence present in the histone genes of other multicellular organisms (Hentschel XI. Birnstiel, 1981; Wells, 1986) and may be necessary for efficient processing of the 3’ ends of C. elegans histone transcripts. By analogy with other systems, we predict that, C. elegans also has a U7 small nuclear RNA that is required for processing core histone precursor mRNAs within the conserved 3’ sequence (Mowry & Steitz, 1987; Schaufele et al., 1986). A novel conserved sequence of 20 nucleotides is located immediately upstream from the TATA homology in the 5’ region of each of the six core histone genes that were analyzed. Another novel conserved sequence is located in the middle of each of the four spacer regions examined. Other conserved sequence elements upstream from each class of histone gene are characteristic of general RNA polymerase II promoter elements such as the TATA box and the CAP site (Breathnach & Chambon, 1981); however, the novel conserved sequences are not observed in the promoter regions TT transcripof other C. elegans RNA polymerase tion units such as vitellogenin (Spieth et al., 1985), actin (Files et aE., 1983) or the major sperm protein genes (Klass et al., 1984), nor are they present in the promoter region of histone genes from other organisms (Perry et al., 1985; Wells, 1986). Osley et al. (1986) have recently reported the characterization of sequence elements in the spacer region of divergently transcribed yeast histone H2A-H2B genes that convey cell cycle regulation of transcription to associated transcription units. Two elements were identified. One functions as a cell cycle-dependent transcriptional activator, and another acts as a negative regulator of transcription. The cell cycle-dependent transcriptional activator element is found upstream from other yeast histone genes as well as in another cell cycle-

et al

regulated gene, HO. Moreover, different conserved sequence elements (upstream activator sequences) are present in other coordinately regulated yeast gene systems. Well-characterized yeast’ systems include general amino acid control and galactosr. where both the upstream sequence elements and the activating proteins have been genetically and biochemically characterized (Brent & Ptashne. 1985; Donahue et al., 1983; Hope & Struhl, 1986). The conserved sequences in the promoter regions of C. elegam histone genes may be functionally analogous to yeast IJAS elements. In more complex eukaryotes, some genes that are coordinately regulated. such as the heat shock genes of drosophila (Parker & Topol, 1984: Wu. 1984), share conserved regulatory sequences; however, genes that are not co-ordinately regulated also share sequence elements. For example, conserved sequences that bind the mammalian transcription factor Spl are found in numerous gene promoters that have no apparent requirement for shared regulation (Dynan & Tjian, 1985). On the other hand, sequence elements shared by different classes of core histone genes. which arc co-ordinately regulated, have not’ been found previously for the histone genes of multicellular eukaryotes. The multicellular eukaryotes that, have been analyzed have histone genes that contain conserved sequence elements as part’ of t’heir functional promoters: however, the elements are class-specific. For example, histone H2A genes have conserved promoter elements different from H2B, H3, H4 and Hl conserved promoter elements (Hentschel & Birnstiel, 1981; Perry et al., 1985). In addition to these class-specific sequences. there are essential promoter elements that correspond to sequences recognized by general transcription factors, such as Spl, CAAT transcription factor (CTF) and octomer transcription factor (OTF) (Perry et al.. 1985). Thus. higher eukaryotes may co-ordinate the ontological and cell-cycle control of live classes of histone genes by using both class-specific and generalized transcription factors. Unlike histone genes of human, mouse, chicken, Xenopus and sea urchin, C. elegans histone genes of all classes share conserved upstream sequence elements. The less-complex arrangement of conserved sequence elements in C. elegans resembles yeast histone gene promoters more than those of higher eukaryotes (Choe et al., 1985; Matsumoto ?L Yanagida, 1985; Osley et aE., 1986; Perry et al., 1985; Wells, 1986). Whereas the putative regulatory yeast UAS sequences ,5’ of the genes resemble (upstream activator sequences) elements that coordinate expression of related gene families, the body of the histone genes and conserved 3’ sequence elements are typical of more complex eukaryotes. This arrangement suggests that C. elegans has an interesting position in the evolutionary history of histone gene organization and regulation. Further examination of other C. elegans genes and of histone genes of species evolutionarily related to nematodes may reveal the origins of the potentially different

C. elegans Histone Genes modes of histone gene regulation. In further studies using modified transgenes and the analysis of DNA binding proteins, we would like to determine whether the conserved sequence elements shared by the different classes of C. elegans histone genes are important for regulation of the genes. We thank Marie Lancellotti for careful preparation of this manuscript. This work was supported by grant GM11301 from the NIH. G. C. is the recipient of an Irma T. Hirsch1 Career Scientist Award, and S. E. is the recipient of The Harry Winston Established Fellowship for Research of the New York Heart Association.

References Alterman, R. M., Ganguly, S., Schulze, D. H., Marzluff, W. F., Schildkraut, C. I,. & Skoultchi, A. I. (1984). Mol. Cell. Biol.

4, 123-132.

Birchmeier, C., Grosschedl, R. & Birnstiel, M. L. (1982). Cell, 28, 739-745. Birchmeier, C., Folk, W. & Birnstiel, M. L. (1983). Cell, 35, 433440. Birchmeier, C., Schumperli, D., Sconzo, G. & Birnstiel, M. L. (1984). Proc. Nat. Acad. Sci., U.S.A. 81, 10571061. Bird, R. C., Jacobs, F. A., Stein, G., Stein, J. & Sells, B. H. (1985). Proc. Nat. Acud. Sci., U.S.A. 82, 67606764. Breathnach, R. & Chambon, P. (1981). In Annu. Rev. Biochem. (Snell, E. E., Boyer, P. O., Meister, A. & Richardson, C. C.; eds), vol. 50, pp. 344-383, Annual Reviews Inc.. Palo Alto, California. Brent, R. & Ptashne, M. (1985). Cell, 43, 72!%736. Brown, D. T., Wellman, S. E. & Sittman, D. B. (1985). Mol. Cell. Biol.

5, 287%2886.

Brutlag, D. L., Clayton, J., Friedland, P. & Kedes, L. H. (1982). Nucl. Acids Res. 10, 279-293. Certa, U. & von Ehrenstein, G. (1981). Anal. Biochem. 118, 147-154. Childs. G. C., Maxson, R. & Kedes, L. H. (1979). Dev. Biol. 73, 153-173. Choe, J., Shuster, T. & Grunstein, M. (1985). Mol. Cell. Biol. 5, 3261-3269. Coflino, P., Stimac, E., Groppi, V. E. & Bieber, D. (1984). Histone

Genes: Structure,

Organization

and Regulation

(Stein, G. S., Stein, J. L. & Marzluff, W. F., eds), chapter 13, John Wiley and Sons Inc.. New York. Dale, R. M. K., McClure, B. A. & Houchins, J. P. (1985). Plasmid, 13, 3140. Denhardt. D. T. (1966). Biochem. Biophys. Res. Commun. 23, 641-646. Donahue, T. F., Daves, R. S., Lucchini, G. & Fink, G. R. (1983). Cell, 32, 89-98. Dynan. W. S. & Tjian, R. (1985). Nature (London), 316, 774-778. Files, J. G., Carr. S. & Hirsch, D. (1983). J. Mol. Biol. 164, 355-375. Geliebter, J.. Zeff, R. A., Schulze, D. H., Pease, L. R., Weiss, E. H., Mellor, A. L., Flavell, R. A. & Nathensen, S. G. (1986). Mol. Cell. Biol. 6, 645-650. Edited

577

Heintz, N., Sive, H. L. & Roeder, R. G. (1983). Mol. Cell. Biol.

3, 53%550.

Henikoff. S. (1984). Gene, 28, 351-359. Hentschel, C. C. & Birnstiel, M. L. (1981). Cell, 25, 3Olp 313. Hereford, L. M., Osley, M. A., Ludwig, J. R. II. & McLaughlin, C. S. (1981). Cell, 24, 367-375. Hong? G. F. (1982). J. Mol. Biol. 158, 539-549. Hope, I. A. & Struhl, K. (1986). Cell, 46, 885-894. Klass, M. R., Kinsley, S. & Lopez, L. C. (1984). Mol. Cell. Biol. 4, 529-537.

Knowles, J. A. & Childs, G. (1984). Proc. Nut. Acad. Sci., C:.S.A. 81, 2411-2415. Lieber, T., Weisser, K. & Childs, G. (1986). idol. Cell. Biol. 6, 2602-2612. Maniatis, T., Fritsch, E. F. & Sambrook. tJ. (1982). Molecular Cloning. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Matsumoto. S. & Yanagida, M. (1985). EMBO J. 4, 3531-3538. McGhee,
74, 5463-5467.

Schaufele F.. Gilmartin, G. M., Bannworth, W. & Birnitiel. M. L. (1986). Nature (Lon,don). 323. 777F 781. Sittman, D. B., Graves, R. A. & Marzluff. W. F. (1983). Nucl. Acids Res. 11, 66794697. Spieth, J.. Denison, K., Kirtland, S., Cane. J. & Blumenthal, T. (1985). Nucl. Acids Res. 13, 52835295. &rub. K. 1GBirnst,iel. M. L. (1986). E:\fRO J. 5, 1675.. 1682. Sures. I., Lowry, J. & Kedes, L. (1980). Proc. L\Ja,t. Acnd. Sci., U.S.A. 77, 1265-1269. VanFleteran, J. R. & VanBeeumen, J. .J. (1983). Comp. Biochem. Physiol. 76B, 179-184. VanFleteran, J. R., VanBun, S. M., Delcambe, L. L. & VanBeeumen. J. J. (1986). Biochem. J. 235, 769-773. Weaver, R. F. & Weissmann, C. (1979). Nucl. Acids Res. 7. 1175~1193. Wells. D. E. (1986). Nucl. Acids Res. 14, r119--r149. Wu. C. (1984). Nature (London), 311, 81-84. Wu. R. S.. Panusz, H. T., Hatch, C. L. & Bonner, W. M. (1986). &it. Rev. Biochem. 20, 201-263. Yanish-Perron, (1.. Vieira, J. & Messing. J. (1985). Gene, 33, 103-l 19.

by S. Brenner

,