Characterization of the opposite-strand genes from the mouse bidirectionally transcribed HTF9 locus

Characterization of the opposite-strand genes from the mouse bidirectionally transcribed HTF9 locus

&,7r, 103 (1991) 201-209 ICI 1991 Elscvier GENE Science Publishers B.V. All rights reserved. 037X-I Il9~91/$03.50 201 05009 Characterization...

966KB Sizes 6 Downloads 22 Views

&,7r,

103 (1991) 201-209

ICI 1991 Elscvier

GENE

Science

Publishers

B.V. All rights reserved.

037X-I Il9~91/$03.50

201

05009

Characterization

of the opposite-strand

(CpG island; gene molecular organization; interspecific backcross mapping)

genes from the mouse bidirectionally

bidirectional

transcription;

housekeeping

Alessandro Bressan a, Maria Patrizia Somma il, Joe Lewis b, Carlo Santolamazza Nancy A. Jenkins’ and Patrizia Lavia”

transcribed HTF9 locus

expression;

nucleotide

sequence;

a, Neal G. Copeland ‘, Debra J. Gilbert r,

‘I Centro di Gerzeticrr Ewlu5onisticu

del CNR. c/o Diprtimento di Genetica e Biologia Molecolare, Universitd ‘La Sapien-a ‘,Rome 00 185 (Ita!),) Tel. (39-61445 6205 ; Ii Iwitute of Molecular Patholog>., Viennrr (Austria) Tel. (43-222) 792636; and ’ Marnmaliar~ Genetics Luhorutor!, ABL-Basic, Resetrrch Program, NCI-Frederick Cancer Research and Development Center, Frederick, MD 21702 (U.S.A.) Tel. (301)846-1260 Rcccivcd by J.-P. Lecocq: 6 December Revised: 28 January 1991 Accepted: 14 February 1991

1990

SUMMARY

The mouse HTF9 locus contains two genes that are bidirectionally transcribed with opposite polarity from a shared CpG-rich island. Both genes were previously shown to be expressed in a housekeeping fashion in mouse. We have now determined the molecular organization of the genes over 12 kb surrounding the island. In addition, we show that the HTFY locus resides in the proximal region of mouse chromosome 16. We have sequenced the cDNAs corresponding to both divergent transcripts. Both genes appear to code for novel proteins that are structurally unrelated to each other. Finally. wc show that both genes are highly conserved and efficiently expressed in human cells.

INTRODUCTION

In higher eukaryotes only a minor proportion of the genomc is accounted for by coding genes, whose estimated

Corres~ondr~e

10: Dr. I’. Lavia, Centro

CNR, c/o Dipartamento

di Genetica

‘La Sapienza’,

Rome 00185 (Italy).

Abbreviations:

aa, amino

plementary tase-encoding

acid(s);

G&3,

gene; HMG 1. high-mobility 9; kb, kilobase transferasc-encoding

DHFR,

growth-accelerating group protein

Evoluzionistica

Molecolare.

bp, base pair(s);

to RNA; CM, centiMorgan(s); gene;

di Genetica

e Biologia

cDNA,

DNA com-

dihydrofolate protein

1; HTFY,

del

UnivcrsitB

reduc-

43-encoding

HpcrII Tiny Fragment

or 1000 bp; HPRT, hypoxanthine phosphoribosylgene; I&l, immunoglobulin i chain-l-encoding

gene; nt, nuclcotide(s);

ORF, open reading frame; PFGE, pulsed field gel

clcctrophoresis; PGK. 3-phosphoglycerate kinase-encoding gene; PolIk, Klenow (large) fragment of E. coli DNA polymerase I; poly(A) +RNA. polyadcnylated tion-fragment

RNA; Pm-l,

protamine-l-encoding

length polymorphism;

somatostatin-encoding

gene; RFLP, restric-

SDS, sodium dodecyl

gene; fsp, transcription

sulfate; Smst,

start point(s).

number ranges between 10000 and 50000, while the majority of eukaryotic sequences are noncoding. One shortcut to identifying novel genes from highly complex genomes is provided by the observation that many genes are associated to CpG-islands (Bird, 1987; Gardiner-Garden and Frommer, 1987). These sequences, being unmethylated in vivo. do not undergo the CpG suppression typical of 5-methylcytosine-containing DNA and are therefore discriminated by a variety of methyl-sensitive restriction enzymes recognizing C + G-rich sequences (Brown and Bird, 1986). The possibility of identifying CpG-islands has led to cloning of several potential genes (see for example Estivill et al., 1987; Kappold et al., 1987; Toniolo et al., 1988; Sargent et al., 1989). The mouse sequence HTFY was isolated during the characterization of the CpG-rich genomic fraction (Bird et al., 1985) and was shown to have typical ‘island DNA’ features: it is extensively unmethylated in vivo (Bird ct al., 1985). remains resistant to in vitro methylation (Carotti

202 et al., 1989) and has an accessible chromatin organization in nuclei (Antequera et al., 1989). Transcription studies showed that two genes that are arranged head-to-head are

factors that can activate transcription in both orientations. such as Spl. Certain bidirectional loci include pairs ofgcncs that arc related in structure or in function. For example,

transcribed with opposite polarity from complementary DNA strands of HTF9 and are expressed in a variety of tissues (Lavia et al., 1987). No stringent regulatory signals such as the TATA box are found in the region of divergent initiation and consistently both genes are initiated at multiple sites on opposite DNA strands. In mammals, bidirectional initiation occurs preferentially at CpG-rich promoters and may reflect features that are

related products are synthesized from the divergent gems encoding the xl(W) and 32(W) collagen chains (Burbelo et al., 1988). On the other hand, the divergent DHFR and rep-3 (formerly rep-l ) genes, which originate from the same CpG-island. do not share any obvious similarity (Linton et al., 1989). Thus, we do not know whether bidirectional transcription is always associated with coordinate cxpression of related gent products. The aim of the present work was the characterization of the mouse HTF9 locus, which included determining the

recurrent in this class of promoters, such as the lack of a TATA box and the frequent occurrence of target sites for

34

5

9416-

66822

2322

C

C gene

pL9.2

4 1 kb

4J=-h R

&

pL9.5

w

4 R

H

H

B

---

HRRBH IIII

111I

B IllI

1111 PP

V

H

R

I I I I

I I

I 1 _

,-p\\,,q&Tt Fig. I. Southern

analysis

and labeled

technique

Hybridizations

(Feinberg were carried

and Vogelstein,

regions

non localised B= Bgl I EmRl HA-lind

P=Pstl

ofthe mouse HTF9 genes. cDNA probes were purified from 1.2”” agarose membranes.

coding

R=

using the random-primer

gels and blotted on Hybond-N

--

m

A gene

procedures

--

P

III

VfcoRV

or 6”” polyacrylamide gels, eluted following standard genomic DNA was separated on 0.X”,, agarobe

1984). Digested

out with standard

methods.

Phage i. DNA Hind111 fragment

sizes are indicated

on the left margin in bp. (Panel A) Mouse liver DNA dlgested with Hind111 (lane 1) and EcoRI (lane 2) and probed with the cDNA clone 19 corresponding to the HTFY-A gent. (Panel B) Mouse liver DNA digested with HirldIII (lane 3) and EcoRI (lanes 4,5 and 6) and probed with the entire HTF9-C cDNA clone (lane 4), with the 3’ portion of the HTFY-C cDNA (lane 5) or with the genomic subclone pL9.5, mapping to the right of HTF9 (lane 6): both probes hybridize to the same gcnomic fragment. (C) Summary of the mappmg data. Exon boundaries were approximately localized by hybridizing portions 01 the cDNA clones onto different regions of the genomic clone\. Only the relevant restriction enzymes are indicated. The extent of the CpG-rich region is indicated

by a striped

box. Vertical

arrows

indicate

r.s,r~.

203 molecular

organization

HTFY, i.e., HTF9-A

of the

leftwards

genes

transcribed

and HTFY-C

rightwards,

A

from as

well as their chromosome location. We also wished to establish whether the divergent gene products were related to each other. Finally, we have sought to assess the expression from HTF9 in the human genome.

RESULTS

AND

1%

-

2%

-

DISCUSSION

(a) Organization

of the HTF9-A

and HTF9-C

genes in the

mouse genome We first analyzed the genomic organization of the HTFYassociated genes. The cDNA clones for the HTF9-A and HTF9-C genes had been isolated from a AgtlO mouse embryo library after screening with probes mapping to the left and to the right of the HTF9 CpG-island as described (Lavia et al., 1987). Selected phage clones were subcloned into pUC vectors and used to probe Southern blots of mouse liver DNA. Southern analysis using the HTF9-A cDNA as a probe revealed a complex pattern of bands with different enzymes (Fig. 1A). Formally it was possible that the HTFY-A gent was split into a high number of exons included in several genomic fragments. However, a purified fragment corresponding to exon 1 only also detected multiple fragments (data not shown), suggesting that several pseudogenes map at different genomic locations. On the other hand, Southern-blot hybridization experiments with the HTFY-C cDNA clone showed a pattern corresponding to a singlecopy gene. A tentative exon map was achieved by hybridizing progressive portions of both cDNAs to cloned genomic sequences around the HTFY island. Genomic clones containing (pL9.2) or flanking (pL9.3 and pL9.5) the island were described earlier (Bird et al., 1985). Because available cloned genomic sequences on the left-hand side of HTF9 only extend about 4 kb away from the island, only the most 5’ portion of the A gene could be mapped. Different regions of the HTFY-A cDNA probe were hybridized to the genomit subclones digested with multiple restriction enzymes, which enabled us to locate the first three exons of the HTFY-A gene on the left-hand side of HTFY (summarized in Fig. 1C). The genomic organization of the HTFY-C gene was similarly analyzed. The entire HTFY-C gene appeared to be contained within approx. 8 kb from HTF9 (Fig. 1B). In addition, the most 3’ region of the HTFY-C cDNA hybridized to a genomic fragment mapping to the right-hand side of HTFY that had been cloned previously (subclone pL9.5, lanes 5-6 in Fig. 1B). Therefore, mapping of the exon/ intron structure could be achieved throughout the HTFY-C gene by hybridizing different portions of the cDNA onto the

Fig. 2. Structure

of the

(Panel A) Northern

major

transcripts

blot analysis

of mouse

ized with the cDNA

clone 19 corresponding

from

the

liver poly(A)

HTF9-A

to the HTFY-A

gene. Tran-

script sizes (approximately

ranging

relation

to the 28s

18s ribosomal

RNA

labeling

of independently

selected

clones after EroRI

showing

the variation

estimated HpaII

and

from 1100-700 cDNA

in the most

3’ region;

from a 6”” polyacrylamide

was the marker.

Lanes:

l-4.

sized at 40 bp; 5-7, clones The large 5’ fragment

850-870

bp. (C)cDNA

bands.

in

(Panel B) End-

approximate

gel; pUC19

3’ fragments. from

nt) were estimated

digestion, sizes were

DNA digested

with

cDNA clones 10. 18, 16 and 1 I have

3’ fragments

class. The thick line represents

gem.

’ RNA hybrid-

clone

12, 19 and 13 have 210-bp-long

was similar in all clones and ranged maps

representative

cDNA sequences,

of each

size

the dashed lines at each

end represent linker sequences originally attached to the i clones and carrying the EcoRI sites (downward arrows) used for plasmid subcloning. The double-headed homologous

horizontal

arrow

shows the extent of the 5’ region

to HTFY.

genomic clones digested with a combination of restriction enzymes. The results of this analysis, showing the arrangement of the divergent genes over 12 kb of the HTF9 locus, are summarized in Fig. 1C. (b) Nucleotide sequence of the HTFY-A gene Previous results had shown that several transcripts of different size are originated from the lower strand of HTFY, with two prominent bands sized at 700 and 1100 bp roughly (Fig. 2A). Consistently, screening of a cDNA library with genomic probes derived from the left-hand side of HTFY had yielded several clones of varying size (Lavia et al., 1987). All clones were digested with suitable enzymes, endlabeled using PolIk and sized on polyacrylamide gels (Fig. 2B). The results showed that the length variation among independent cDNA clones occurred in the 3’ end region of the HTFY-A gene. Most clones fell into two major class sizes. and the observed size variation was consistent with the transcript sizes estimated from Northern blots. Two independent cDNA clones representative of each size

204 C-cDNA

Fig. 3. Nucleotide

sequence

of the HTFY-A

(GenBank’EMBL

accession

Nos. X56046 and X56045. rcspcctlvely).

1987) are underlined. $cnc

(GenBankiEMBL

Start and stop codons accession

and HTFY-C‘

genes.

(A)Two

are 1n bold letters. AlternatIve

No. X56044).

T\IO major

polyadcnylation signal are underhned. Start and stop codons method from clones carrying overlapping deletions.

independent

Alternative

cDNA

polyadenylation

RNA ~.YJI,obtained

signals

from previous

are m bold letters. All sequences

class were then sequenced and compared. The nt sequence of both cDNA clones (clones 16 and 19) is shown in Fig. 3A. The size variation is due to the USC of alternative polyadenylation signals and does not affect the OKF in the HTFY-A gene. In all clones the variant TATAAA polyadenylation signal is found at position 83X. In the larger cDNA clones, of which clone 19 is an cxamplc. the canonical AATAAA sequence, which can be expected to be more efficiently recognized (Wickens, 1990), is found further 3’ (position 1014). Indeed, the larger mRNA species, generated by the addition of the poly(A) tail after the type-19 polyadenylation signal, appear to be somewhaL more abundant than the shorter transcripts in most analyzed tissues (Lavia et al.. 1987; Fig. 2A). Translation of the ORF from the HTF9-A cDNA group

clones (16 and 19) corresponding

RNA start sites, reported

from previous

are also underlined.

S I mapping

experiments

tvere determined

to the HU9.4

Sl mapping (B) Sequence (Lawa

on both strands

gent

data (Lavia et al.. of the HTFY-C‘

et al.. 1987), and the using the Sequenax

(Fig. 4A) gives rise to a highly charged protein, very rich in hydrophylic aa residues. Indeed, 49”” of all aa are electrically charged (21.6”,, = Glu +Asp; 9.5”,, = Ser+Thr; 17.6 (Io = Lys + Arg). Four potential helix-forming domains can be identified, which account for most of the protein. The most abundant aa, glutamic acid, is mainly clustered in negative stretches in the regions forming helixes 1, III and IV, while helix-forming region II is rich in positively charged Lys and Arg residues (35”, of all aa). Helix III and IV represent a potential helix-turn-helix domain. Although no extensive sequence homology was found in the EMBL (Heidelberg) or Gcnentech (San Francisco) protein libruries, these features arc reminiscent of certain DNA-binding proteins.

205 B

A MMAKDSHPD

STTHPQFEPI

30

MWTGWAEVGWGSSHYCRIKDRMGENWVSRV

30

MRAKLFAFAS

60

KERVSPGLRG

VCTNGDLSAV

WGSESYQLEP

60

90

SPRPVCSHVG

SGAHGGLRPG

LPSCTPALRP

90

120

HYVKKRKQGL

GQLQGLERKP

GLYSYIRDDL

120

HDTSTENBDE

VSVPEQEIKT

LEEDEEELFK

ENDLPEWKEPRHGDVKLLKH

KEKGTIRLLM

RRDKTLKICANHYITPMMEL

KPNAGSDRAW

VMUTHTDFbDECPKPELLbl

RFLNAENAQK

150

FTSEIFKLEL

QNVPRHASFS

DVRRFLGRFG

150

FKTKFEECRK

EIEEREKKGP

GKNDNAEKVA

130

LQSHKIKLFG

QPPCAFVTFR

SAAERDKGLR

1.30

EKLEALSVRE

MEEAEEKSE

EKQ.

/

C / /

/ /

165 ,/

//

1

/

/

4 // /

//

I/

’ //

‘// /’

VLHGALWKGC

PLAYAWPDPR

EGDSEPSVTQ

SCRCGDPSVD

SDWNVSGCYR

NLeRElGNTN

RQQHNKACCP

/

LEGVKPSPQQ

KAFTTCLMS

GMCTVSKGM

SRHPTVLSFP

STPTPTRNSS

LTPWLGRGRQ

2 IO

TALHVSSWSR

240

RALLPWLLLQ

270

TEYRNKCEFW -

300

TPHVPPLVLP VPHHTRHMTL

330 360

RHIQATGSN

//

/

j /

C

Fig. 3. Comparison are underlined. A window

of the HTF9-A

(B) Deduced

protein

and HTFO-C aa sequences.

aa sequence

of 9 aa, with four mismatches

of the HTF9-C (including

(A) Deduced

protein. A hydrophobic

both identical

aa sequence domain

and conserved

(c) Nucleotide sequence of the HTF9-C gene The upper strand of HTFY is transcribed into a discrete 2.4-kb mRNA (Lavia et al., 1987). One cDNA clone was originally isolated using a genomic probe mapping to the right of HTF9. For the purpose of sequencing overlapping subclones carrying extensive deletions of the original insert were constructed using both Ml3 and pUC vectors. The sequence of the HTF9-C cDNA clone is shown in Fig. 3B. Analysis of the ORF revealed an unusually long (about 600 bp) leader sequence upstream from the start codon, which is entirely contained within the CpG-rich island. Translation of the ORF (Fig. 4B) showed a nonhelical protein with a hydrophobic domain near the C terminus. The HTF9-C protein is rich in Leu (lo:;, of all aa, distributed in small clusters throughout the central portion of the protein) and in Ser (also 10”; of all residues, mostly clustered at both the N and the C termini). A search in the protein data library revealed no extensive homology to known sequences. Thus, both the HTF9-A and HTFPC products represent novel proteins. A matrix dot analysis showed that the two proteins are structurally unrelated to each other (Fig. 4C). (d) Chromosomal location of the HTF9 locus in mouse The mouse chromosome location of the HTFY locus was determined by interspecific backcross analysis. using pro-

of the HTF9-A

is underlined.

protein.

Four potential

helix-forming

(C) Dot plot of the HTFY-A and HTF9-C

regions proteins.

aa) was used.

geny generated by mating (C57BLj6J x Mus .sp~tu.c.) Fl females to C57BL/6J males as previously described (Buchberg et al., 1988). This interspecific backcross mapping panel has been typed for over 660 loci distributed among all the autosomes as well as the X chromosome. C57BL/6J and M. .spretus DNAs were digested with several enzymes and analyzed by Southern-blot hybridization for informative RFLPs using the HTF9 genomic probe. Fragments of 5.1 and 3.3 kb were detected in QhI-digested C57BL/6J DNA; fragments of 11.0 and 7.0 kb were detected in SphI-digested M. spretus DNA. The 11.0 and 7.0-kb M. spretus SphI-specific RFLPs were used to follow the segregation of the HTF9 locus in backcross mice. The two fragments cosegregated and the mapping results indicated that HTFY is located in the proximal region of chromosome 16 linked to Prm-1, Igl-1, Smst, and Gap43. The following probes and RFLPs were used for the linkage studies around the HTF9 locus. The Prm-I probe (prm-1) was a 400-bp mouse cDNA (Kleene et al., 1983) that detected a 4.4-kb TuqI fragment in C57BL/6J DNA and a l.l-kb fragment in M. spretus DNA. The I&I probe was a 1250-bp mouse cDNA containing the VI,, and CA, regions (Scott et al., 1982). X&I digestion and hybridization with the I&l probe produced fragments of 11.0, 7.2, 6.1, 3.9 kb in C57BL/6J DNA and of 13.0, 11.5, 6.1 and 1.9 kb in M. .spretu.r DNA. The 13.0 and 1.9-kb M. ywztu.s fragments

206 cosegrcgated in these studies. The Slnsr probe (pMST-1.4) was a 1.4-kb DmI mouse genomic clone (O’Hara et al., 1988) that detected an approx. 1%kb Bill fragment in C57BL/6J DNA and an approx. 6.2-kb BglI fragment in M. .spretu.s DNA. The probe for the G+3 locus was a 700-bp &I-EcoRI rat cDNA clone (Neve et al.. 1987) that detected a 3.0-kb 7’llqI fragment in C57BLi6J DNA and 3.S- and 1.7-kb TcryI fragments in M. spretus DNA. The 3.S- and 1.7-kb M. .spretus fragments cosegregated in these

studies. Although 90 mice were analyzed for cvcry markcl and are shown in the segregation analysis (Fig. 5), up to 189 mice were typed for some markers. Each locus \\as analyzed in pairwise combinations for recombination frequencies using the additional data. Gent order was detcrmined by minimizing the number of recombination cvcnts required to explain the allele distribution patterns. The ratios of mice exhibiting recombinant chron~oson~es to the total number of mice analyzed for each pair of loci and the most likely gene order arc: ccntromcrc - PI.III-l - 4,‘95 HTFY - 1:‘189 - Igl - l/135 - Stnvt - I6/97 - G’(r,&3. Recombination frequencies (expressed as genetic distances in CM + the standard error) are Pm?-I - 4.2 t 2.1 - /ITbY _ 0.5 + 0.5 - Igl- 0.7 & 0.7 - Smt - 16.5 + 3.X - C;ti/?43. The placement of HTF9 between Pm-l, which is located on human chromosome 16, and Igl, which is located on human chromosome 22. suggests that the human homolog of HTFY could be on either chromosome (Fig. 5). (e) Conservation

of the

genes

from

HTFY in the human

genome

16

I

‘{,

0

I

0.5

mined

by interspecific

I

II I

III 4.2/y

\\ \

16.5

0.7

backcross

analysis.

(C57BL;6J

x :M. spverus)F,

wcrc mated to C57BLib.l DNA

mice. A total of 205 FZ progeny were obtained. and restriction analysis, agarose gel electrophoresis

isolation.

Southern

hybridizations

were prepared

were as described

with Zetabind

generally

done to a final stringency

NaCW.3

M

Na,

citrate,‘0.3

SDS at 65°C. The segregation

(Jenkins

membranes M

of0.6

mM

and flanking

cross animals

that were typed in common

of the figure.

For individual

Washing

was

x SSCP (20 x SSCP = 2.4 M

NaZHP0,,‘80

of HTFY

et al.. 1982). Blots

(AMF-Curio).

for HTF9

NaH,PO,):‘O.

1 O,,

genes in 90 backis shown at the top

pairs of loci. more than 90 animals

were

typed. Each column represents the chromosome identified m the backcross progeny that was inherited from the (C57BL,6J x M. sprelus)F, parent. Blackened boxes represent the presence ofa C57BL’6J allele and open boxes represent the presence of a hl. qwr~r.r allele. The number ot offspring

inheriting

each column. bottom

of the

described MADNESS Frederick,

each type of chromosome

A partial figure.

(Green,

chromosome Recombination

1981)

using

the

is listed at the bottom

I6 linkage distance computer

developed by D. Dave (Data MD) and A.M. Buchberg

map is shown were program

of

at the

calculated

as

SPRETUS

Management Scrviccs, Inc.. (NCI-FCRDC‘. ABL-BRP.

Frederick, MD) and are shown in CM below the chromosome Above. the positions of all loci except HTFY in human chromosomes (Reeves et aI.. 1989) are shown.

We wished to assess whcthcr the HTFY-associated genes were conserved in the human gcnome. Southern analysis showed that the human homologue of HTFY is a single-cop) sequence which maintains the characteristics of a typical CpG island. as judged by the frequency of HpuII sites and by the cxtensivc lack of methylution at these sites (data not sho\vn). Attempts to investigate the gene organization in relation to the human HTFY island were difficult for scvcral reasons: lirst, the HTFY-A cDNA hybridized to man) genomic bands, as would be expected if pseudogene existed at genomic locations other than HTFY; second, the first exon of the HTFY-A gene. which falls in the CpG-rich domain and would have been suitable for investigating the linkage of the gene 5’ ends in the human genome (see map in Fig. 1). is nearly entirely noncoding in mouse and ma! have diverged in humans. Conservation of expression from the human HTE‘Y homologue \vas investigated in Northern-blot experiments. Poly(A) + mRNA was purified from human adenocarcinema HeLa cells and hybridized with the mouse gcnomic HTF9 probe and with the cDNA clones for both gcncs (Fig. 6). The hybridization pattern of the genomic probe (lane 2) \vas virtually identical to that observed in mouse: the human transcript sizes corresponded to that of the unique HTF9-C transcript and to the cluster of HTFY-A transcripts. Hybridization using the cDNA clones as probes showed that the abundance of the HTFY-A and HTFY-C transcripts (Fig. 6, lanes 1 and 3, respectively) in human cells was very similar to that seen in mouse. Indeed, the proportion between the short and large HTFV-,I mRNAs was also maintained. suggesting that the alternative pol\,adenylation signals Mere also conserved in the

207 a genomic compartment of high gene density. The mouse divergent surf-l and su$2 genes are originated in the surfkit locus which contains at least six tightly linked genes (Williams et al., 1989; Huxley and Fried, 1990). Similarly, divergent transcripts have been identified in the major histocompatibility complex class-III locus (Sargent et al., 1989). The HTFY locus is also included in a chromosome region with closely clustered CpG-islands, each of which is likely to mark a novel gene, as seen by PFGE mapping (Brown and Bird, 1986). We have now assigned this region

2%

to the proximal end of mouse chromosome 16. Interestingly, the composite map of chromosome 16 shows two

1%

Fig. 6. Northern-blot was extracted purified

hybridization

using the guanidine

by oligo(dT)

agarosei2.2

cellulose

M formaldehyde washes

gels

chromatography as described

Hybridizations

and

HTFY-A cDNA

clone, (2) the genomic

and (3) the HTF9-CcDNA bands

were

of HeLa poly(A)‘RNA. HCI method,

carried

and (Lavia

out at 65’C

subclone

clone. The migration

HeLa RNA

and poly(A) + RNA was run

on

et al.,

1.2”, 1987).

using:

(1) the

pLY.2 containing

HTF9

ofthe 28s and 18.5 RNA

is indicated.

human HTF9-A cDNA. The high-stringency conditions used in this experiment indicate that very little divergence has occurred in the coding sequences of either gene. Thus, both HTFY-associated genes are active in human cells. The detection of a single-copy island sequence in the human genome using the mouse HTF9 probe, which also hybridizes to both human transcripts, suggests that the genes are still linked at a human genomic locus containing a CpGisland homologous to the mouse HTFY. (f) Conclusions We have analyzed the molecular organization of the mouse HTFY-A and HTFY-C genes that are divergently transcribed from a bidirectional promoter contained in the CpG-island HTFY. The exon/intron structure of both genes was determined over 12 kb around HTFY. The HTFY-C gene, which is strictly single-copy, was entirely mapped to the right of the island. On the other hand, mapping of the HTFY-A gene on the left-hand side of HTF9 was not possible due to the occurrence of multiple hybridization signals. possibly reflecting the existence of several pseudogenes. The close proximity of the bidirectional genes may indicate

sites of viral integration (namely, Akr-2 and MAY-6) at the approximate location of HTFY, which also suggests an active chromosome domain. We have fully sequenced both mouse cDNAs. Both products from the HTFY locus represent novel proteins. The deduced sequence of the HTF9-A protein is similar to that of certain DNA-binding proteins. On the other hand, no obvious function can at present be attributed to the HTFY-C gene from inspection of the sequence alone. The presence of mature messengers from both genes in early mouse embryos and in at least eight differentiated tissues (Lavia et al., 1987) suggests that both products serve some basic function. Both transcripts are also expressed in human cells. The high conservation of the coding sequences is indicated by the high stringency conditions at which heterologous hybridization was detected in human cells. In a number of systems, divergently transcribed genes from a shared promoter encode related products: this is well exemplified by the Drosophila genes encoding the H2a and H2b histones which are designed to assemble into nucleosome particles, by the yeast GALI and GAL10 genes that are co-induced by the binding of Gal-4 to their bidirectional promoter, and by the human xI(IVJ and ct2(/V) collagen genes whose products also assemble in the mature collagen molecule. In contrast, the HTFY-associated HTFY-A and HTF9-C gene sequences are totally unrelated and the predicted protein characteristics would result in totally different secondary structures which are unlikely to serve similar functions. All known bidirectional genes in mammals are linked to a CpG-island. Therefore bidirectional activity has been thought of as an intrinsic property of CpG-rich promoters, possibly reflecting the absence of a positioning TATA box (Melton, 1987; Johnson and Friedmann, 1990) and/or the abundance of sites for transcription factors that are bidirectionally active, such as Spl (Dynan et al., 1986). This view is consistent with the ability of elements derived from certain CpG-rich promoters [for example, those of the human HPRT (Melton, 1987), of the human c-Ha-vus (Lowndes et al., 1989) and of the mouse c-Ki-rcrs (Hoffman et al., 1987) genes] to direct bidirectional transcription

208 in vitro. However,

this capability

is clearly not absolute

and

does not apply, for example, to the CpG-island associated with the human PGK gene, which does not drive antisense transcription in similar in vitro assays (Johnson and Friedmann, 1990). Bidirectional transcription may be associated with coordinate expression of genes that may be structurally

unrelated

yet are similarly required. The HTFY-A and HTF9-C transcript origins are exceptionally close in the mouse genome, their major start sites falling within 30 bp, and thus the genes can be expected to share some regulatory sequence. A functional study of the HTFYassociated bidirectional promoter supports this view (Somma et al., 1991). In this light, it may be of interest that the levels of transcription of both genes in human cells are similar to those observed in mouse tissues. Cloning of the human HTFY homologue and Sl mapping of the human transcript origins will be required to ascertain whether the linkage of the divergent gene 5’ ends is also maintained in the human genomc.

I I: linkage to other proto-oncogenc

to chromosome

factor loci using interspecific

backcross

and grolvth Res. 2 ( 198X)

mice. Oncogene

IA’)-16’. Burbelo,

P.D., Martm,

G.R. and Yamada,

Y.: rl(lV)

and r2(IV) collagen

genes are regulated by a bidirectional promoter and a shared enhancer. Proc. Natl. Acad. Sci. USA 85 (198X) 9679-0682. Carotti,

D.. Palitti, F., Lavia, P. and Strom,

CpG-rich

islands.

Estivill. X.. Farval, Bates,

G.,

Nucleic

M., Scambler,

Kruyer,

Williamson,

H.,

P., Bell, G., Haivley,

Frederick,

R. and Wainwright,

locus isolated

R.: In vitro methylation

01

Acids Res. 17 (1989) 9219-9229.

by selection

P.. Stanier.

B.: A candidate

K.. Lench.

N..

P.. Watson,

E..

for the cystic fibrosis

for methylation-free

islands.

Nature

326

(1987) X40-845. Feinberg,

A. and Vogelstein,

restriction Biochem.

fragments

for radiolabeling

to high specific

DNA

activity.

Anal.

137 (1984) 266-267.

Gardiner-Garden, genomes. Green,

B.: A technique

endonuclease M. and

Frommer,

M.: CpG

islands

in vertebrate

J. Mol. Biol. I96 (1987) 261-282.

E.L.: Linkage,

Probability

recombination

and mapping.

in Animal Breeding

Experiments.

In: Genetic&

Macmillan,

and

New York.

pp. 77-113.

1981, Hoffman,

E., Trusko,

functional

S., Freeman,

characterisation

N.A. and George,

of the promoter

D.: Structural

and

region of the mouse c-Ki

T(ISgene. Mol. Cell. Biol. 7 (1987) 2592-2596. Huxley,

C. and Fried, M.: The mouse surfeir locus contains

six genes associated

ACKNOWLEDGEMENTS

genomic

We thank Paola Lucarelli for the gift of human Southern filters, Roger Greeves for probes and D.A. Swing, C.M. Silan and B. Cho for excellent technical assistance. This work was supported by Consiglio Nazionale delle Ricerche, by Progetto Finalizzato Ingegneria Genetica and by Fondazione Cenci-Bolognetti (PL) and by the National Cancer Institute, DHSS, under contract NOl-CO-74101 with ABL.

Jenkins,

with four CpG-rich

islands

a cluster

of

in 32 kllobaaes

of

DNA. Mol. Cell. Biol. 20 (1990) 605-614.

N.A., Copeland,

distribution

N.G., Taylor, B.A. and Lee, B.K.: Organization.

and stability

virus DNA sequences

of endogenous

in chromosomes

ecotropic

murine

leukemia

of Mu.r ,nuscu[u~. J. Viral. 43

(1982) 26-36. Johnson,

P. and Friedman”,

housekeeping

T.: Limited

gene promoters:

bidirectional

human

HPRT

activity

and PGK.

of two Gene

X8

(1990) 207-213. Kleene, K.C., Distel, R.J. and Hecht. plasmic

poly(A) + RNAs

haploid

phases

N.B.: cDNA clones encoding

which

first appear

of spermatogenesis

at detectable

in the mouse.

cyto-

levels in

Develop.

Biol. YX

(1983) 455-464. Lavia, P.. Macleod,

D. and Bird, A.: Comcident

transcripts at a randomly J. 6 (1987) 2773-2779.

ADDENDUM

start sites for dlvergcnt

selected CpG-rich

island of mouse. EMBO

Linton, J.P., Yen, J., Selby, E.. Chen, Z.. Chinsky.

A significant similarity (54”,,, including identical and conserved aa) between the HTF9-A and HMGl proteins was observed after sorting the protein database with the Computer

Rewarch

(DN:\

Inc..

Star

Rewurcc\

hladison.

P ZT-SC,lN

R.E. and Grouse,

G.F.: Dual bidirectional

DHFR locus: cloning the divergently

and characterization

transcribed

J.M., Lin, K., Kellems, promoters

at the mouse

of two mRNA classes

rrp-1 gene. Mol. Cell. Biol. 9

of

(1989)

3058-3072.

propm

Lowndes,

WI).

N.F.. Paul, J., Wu, J. and Allan, M.: c-Ha-rclsgene

promoter

expressed

in vitro: location

and regulation

bidirectional Mol. Cell. Biol.

9 (1989) 375X-3770. Melton, REFERENCES Antcquera. F., Macleod, D. and Bird, A.P.: Specific protection methylated CpGs in mammalian nuclei. Cell 58 (1989) 509-517. Bird,

A.P.:

Trends

CpG-islands Genet.

as gene markers

in the vertebrate

nucleus.

M., Miller, O.J. and Macleod, D.: A that is derived from islands of non-

methylated, CpG-rich DNA. Cell 40 (1985) Yl-99. Brown, W.R.A. and Bird, A.: Long-range restriction mapping genomic

of

3 (1987) 342-347.

Bird, A.P., Taggart, M., Frommer, fraction of the mouse genome

malian

D.: Strategies

initiation 267-270.

DNA. Nature

of mam-

332 (1986) 477-481.

Buchberg, A.M., Bedigian, H.G., Taylor, B.A., Brownell, E., Ihle, J.N., Nagata. S.. Jenkins, N..4. and Copeland. N.G.: I.ocnlizatlon of E+2

and mechanisms

of mammalian

Neve, RT.L., Perrone-Bizzozero, E., Kurnit, protein

(B50-Fl):

genes. J. Cell. Sci. XX(19X7)

N.I., Finklcstein,

D.M. and Benowitz,

Gap-43

for the control oftranscrlptlonal

protein-coding

neuronal

lation and regional distribution Brain Res. 2 (1987) 177-183.

S., Zwicrs,

L.I.: The neuronal specificity.

of the human

H., Brid,

growth-associated developmental

regu-

and rat mRNAs.

Mol.

O’Hara, B.F.. Bendotti, C., Reeves, R.H., Oster-Granite, M.L., Coyle, J.T. and Gearhart, J.D.: Genetic mapping and analysis of somatostatin in Snell dwarf mice. Mol. Brain Res. 4 (198X) 283-292. Rappold, G., Stubbs, L., Labeit, S., Crkvenjakov, B. and Lehrach, Identification of a testis-specific gene from the mouse t-complex to a CpG-rich island. EMBO J. 6 (1987) 1975-19X0.

H.: next

209 Reeves,

R.H., Gearhart,

O’Brien,

J.D., Hecht,

S.J.: Mapping

linkage of Prm-I

N.B., Yelick, P., Johnson,

of Prm-1 to human

and Prw2

chromosome

on mouse chromosome

P. and

16 and tight

16. J. Hered.

80

(1989) 442-446. Sargent,

E.A., Dunbam,

I. and Campbell,

HTF island-associated complex

R.D.: Identification

genes in the human

class III region.

EMBO

ofmultiple

major histocompatibility

J. 8 (1989) 2305-23

wild mice. Nature

the mouse @G-island

HTFY contains

multiple

Toniolo,

promoter

from

protein-binding

ele-

Nucleic Acids Res. lY (199 I)

D., Persico,

M. and Alcalay,

M.: A ‘housckccpmg’

X chromosome encodes a protein similar Acad. Sci. USA 85 (1988) 851-855. M.: How the messenger

nucleus.

300 (1982) 757-760.

M.P., Pisano, C. and Lavia, P.: The housekeeping

redundant.

in press.

Wickens.

12.

Scott, C.L., Mushinski, J.F., Huppi, K., Wigert, M. and Potter, M.: Amplification of immunoglobulin i constant genes in populations of Somma,

ments that are functionally

Williams.

Trends

Biochem.

T., Yun, J., Huxley,

contains

to ubiquitin.

got its tail: addition

evolution.

Proc. Natl.

of poly(A) in the

I5 (1990) 277-281. C. and Fried, M.: The mouse sufeit

a very tight cluster of four housekeeping

served through 3527-3530.

gene on the

Proc.

Natl.

Acad.

locus

genes that is conSci. USA

85 (1988)