&,7r,
103 (1991) 201-209
ICI 1991 Elscvier
GENE
Science
Publishers
B.V. All rights reserved.
037X-I Il9~91/$03.50
201
05009
Characterization
of the opposite-strand
(CpG island; gene molecular organization; interspecific backcross mapping)
genes from the mouse bidirectionally
bidirectional
transcription;
housekeeping
Alessandro Bressan a, Maria Patrizia Somma il, Joe Lewis b, Carlo Santolamazza Nancy A. Jenkins’ and Patrizia Lavia”
transcribed HTF9 locus
expression;
nucleotide
sequence;
a, Neal G. Copeland ‘, Debra J. Gilbert r,
‘I Centro di Gerzeticrr Ewlu5onisticu
del CNR. c/o Diprtimento di Genetica e Biologia Molecolare, Universitd ‘La Sapien-a ‘,Rome 00 185 (Ita!),) Tel. (39-61445 6205 ; Ii Iwitute of Molecular Patholog>., Viennrr (Austria) Tel. (43-222) 792636; and ’ Marnmaliar~ Genetics Luhorutor!, ABL-Basic, Resetrrch Program, NCI-Frederick Cancer Research and Development Center, Frederick, MD 21702 (U.S.A.) Tel. (301)846-1260 Rcccivcd by J.-P. Lecocq: 6 December Revised: 28 January 1991 Accepted: 14 February 1991
1990
SUMMARY
The mouse HTF9 locus contains two genes that are bidirectionally transcribed with opposite polarity from a shared CpG-rich island. Both genes were previously shown to be expressed in a housekeeping fashion in mouse. We have now determined the molecular organization of the genes over 12 kb surrounding the island. In addition, we show that the HTFY locus resides in the proximal region of mouse chromosome 16. We have sequenced the cDNAs corresponding to both divergent transcripts. Both genes appear to code for novel proteins that are structurally unrelated to each other. Finally. wc show that both genes are highly conserved and efficiently expressed in human cells.
INTRODUCTION
In higher eukaryotes only a minor proportion of the genomc is accounted for by coding genes, whose estimated
Corres~ondr~e
10: Dr. I’. Lavia, Centro
CNR, c/o Dipartamento
di Genetica
‘La Sapienza’,
Rome 00185 (Italy).
Abbreviations:
aa, amino
plementary tase-encoding
acid(s);
G&3,
gene; HMG 1. high-mobility 9; kb, kilobase transferasc-encoding
DHFR,
growth-accelerating group protein
Evoluzionistica
Molecolare.
bp, base pair(s);
to RNA; CM, centiMorgan(s); gene;
di Genetica
e Biologia
cDNA,
DNA com-
dihydrofolate protein
1; HTFY,
del
UnivcrsitB
reduc-
43-encoding
HpcrII Tiny Fragment
or 1000 bp; HPRT, hypoxanthine phosphoribosylgene; I&l, immunoglobulin i chain-l-encoding
gene; nt, nuclcotide(s);
ORF, open reading frame; PFGE, pulsed field gel
clcctrophoresis; PGK. 3-phosphoglycerate kinase-encoding gene; PolIk, Klenow (large) fragment of E. coli DNA polymerase I; poly(A) +RNA. polyadcnylated tion-fragment
RNA; Pm-l,
protamine-l-encoding
length polymorphism;
somatostatin-encoding
gene; RFLP, restric-
SDS, sodium dodecyl
gene; fsp, transcription
sulfate; Smst,
start point(s).
number ranges between 10000 and 50000, while the majority of eukaryotic sequences are noncoding. One shortcut to identifying novel genes from highly complex genomes is provided by the observation that many genes are associated to CpG-islands (Bird, 1987; Gardiner-Garden and Frommer, 1987). These sequences, being unmethylated in vivo. do not undergo the CpG suppression typical of 5-methylcytosine-containing DNA and are therefore discriminated by a variety of methyl-sensitive restriction enzymes recognizing C + G-rich sequences (Brown and Bird, 1986). The possibility of identifying CpG-islands has led to cloning of several potential genes (see for example Estivill et al., 1987; Kappold et al., 1987; Toniolo et al., 1988; Sargent et al., 1989). The mouse sequence HTFY was isolated during the characterization of the CpG-rich genomic fraction (Bird et al., 1985) and was shown to have typical ‘island DNA’ features: it is extensively unmethylated in vivo (Bird ct al., 1985). remains resistant to in vitro methylation (Carotti
202 et al., 1989) and has an accessible chromatin organization in nuclei (Antequera et al., 1989). Transcription studies showed that two genes that are arranged head-to-head are
factors that can activate transcription in both orientations. such as Spl. Certain bidirectional loci include pairs ofgcncs that arc related in structure or in function. For example,
transcribed with opposite polarity from complementary DNA strands of HTF9 and are expressed in a variety of tissues (Lavia et al., 1987). No stringent regulatory signals such as the TATA box are found in the region of divergent initiation and consistently both genes are initiated at multiple sites on opposite DNA strands. In mammals, bidirectional initiation occurs preferentially at CpG-rich promoters and may reflect features that are
related products are synthesized from the divergent gems encoding the xl(W) and 32(W) collagen chains (Burbelo et al., 1988). On the other hand, the divergent DHFR and rep-3 (formerly rep-l ) genes, which originate from the same CpG-island. do not share any obvious similarity (Linton et al., 1989). Thus, we do not know whether bidirectional transcription is always associated with coordinate cxpression of related gent products. The aim of the present work was the characterization of the mouse HTF9 locus, which included determining the
recurrent in this class of promoters, such as the lack of a TATA box and the frequent occurrence of target sites for
34
5
9416-
66822
2322
C
C gene
pL9.2
4 1 kb
4J=-h R
&
pL9.5
w
4 R
H
H
B
---
HRRBH IIII
111I
B IllI
1111 PP
V
H
R
I I I I
I I
I 1 _
,-p\\,,q&Tt Fig. I. Southern
analysis
and labeled
technique
Hybridizations
(Feinberg were carried
and Vogelstein,
regions
non localised B= Bgl I EmRl HA-lind
P=Pstl
ofthe mouse HTF9 genes. cDNA probes were purified from 1.2”” agarose membranes.
coding
R=
using the random-primer
gels and blotted on Hybond-N
--
m
A gene
procedures
--
P
III
VfcoRV
or 6”” polyacrylamide gels, eluted following standard genomic DNA was separated on 0.X”,, agarobe
1984). Digested
out with standard
methods.
Phage i. DNA Hind111 fragment
sizes are indicated
on the left margin in bp. (Panel A) Mouse liver DNA dlgested with Hind111 (lane 1) and EcoRI (lane 2) and probed with the cDNA clone 19 corresponding to the HTFY-A gent. (Panel B) Mouse liver DNA digested with HirldIII (lane 3) and EcoRI (lanes 4,5 and 6) and probed with the entire HTF9-C cDNA clone (lane 4), with the 3’ portion of the HTFY-C cDNA (lane 5) or with the genomic subclone pL9.5, mapping to the right of HTF9 (lane 6): both probes hybridize to the same gcnomic fragment. (C) Summary of the mappmg data. Exon boundaries were approximately localized by hybridizing portions 01 the cDNA clones onto different regions of the genomic clone\. Only the relevant restriction enzymes are indicated. The extent of the CpG-rich region is indicated
by a striped
box. Vertical
arrows
indicate
r.s,r~.
203 molecular
organization
HTFY, i.e., HTF9-A
of the
leftwards
genes
transcribed
and HTFY-C
rightwards,
A
from as
well as their chromosome location. We also wished to establish whether the divergent gene products were related to each other. Finally, we have sought to assess the expression from HTF9 in the human genome.
RESULTS
AND
1%
-
2%
-
DISCUSSION
(a) Organization
of the HTF9-A
and HTF9-C
genes in the
mouse genome We first analyzed the genomic organization of the HTFYassociated genes. The cDNA clones for the HTF9-A and HTF9-C genes had been isolated from a AgtlO mouse embryo library after screening with probes mapping to the left and to the right of the HTF9 CpG-island as described (Lavia et al., 1987). Selected phage clones were subcloned into pUC vectors and used to probe Southern blots of mouse liver DNA. Southern analysis using the HTF9-A cDNA as a probe revealed a complex pattern of bands with different enzymes (Fig. 1A). Formally it was possible that the HTFY-A gent was split into a high number of exons included in several genomic fragments. However, a purified fragment corresponding to exon 1 only also detected multiple fragments (data not shown), suggesting that several pseudogenes map at different genomic locations. On the other hand, Southern-blot hybridization experiments with the HTFY-C cDNA clone showed a pattern corresponding to a singlecopy gene. A tentative exon map was achieved by hybridizing progressive portions of both cDNAs to cloned genomic sequences around the HTFY island. Genomic clones containing (pL9.2) or flanking (pL9.3 and pL9.5) the island were described earlier (Bird et al., 1985). Because available cloned genomic sequences on the left-hand side of HTF9 only extend about 4 kb away from the island, only the most 5’ portion of the A gene could be mapped. Different regions of the HTFY-A cDNA probe were hybridized to the genomit subclones digested with multiple restriction enzymes, which enabled us to locate the first three exons of the HTFY-A gene on the left-hand side of HTFY (summarized in Fig. 1C). The genomic organization of the HTFY-C gene was similarly analyzed. The entire HTFY-C gene appeared to be contained within approx. 8 kb from HTF9 (Fig. 1B). In addition, the most 3’ region of the HTFY-C cDNA hybridized to a genomic fragment mapping to the right-hand side of HTFY that had been cloned previously (subclone pL9.5, lanes 5-6 in Fig. 1B). Therefore, mapping of the exon/ intron structure could be achieved throughout the HTFY-C gene by hybridizing different portions of the cDNA onto the
Fig. 2. Structure
of the
(Panel A) Northern
major
transcripts
blot analysis
of mouse
ized with the cDNA
clone 19 corresponding
from
the
liver poly(A)
HTF9-A
to the HTFY-A
gene. Tran-
script sizes (approximately
ranging
relation
to the 28s
18s ribosomal
RNA
labeling
of independently
selected
clones after EroRI
showing
the variation
estimated HpaII
and
from 1100-700 cDNA
in the most
3’ region;
from a 6”” polyacrylamide
was the marker.
Lanes:
l-4.
sized at 40 bp; 5-7, clones The large 5’ fragment
850-870
bp. (C)cDNA
bands.
in
(Panel B) End-
approximate
gel; pUC19
3’ fragments. from
nt) were estimated
digestion, sizes were
DNA digested
with
cDNA clones 10. 18, 16 and 1 I have
3’ fragments
class. The thick line represents
gem.
’ RNA hybrid-
clone
12, 19 and 13 have 210-bp-long
was similar in all clones and ranged maps
representative
cDNA sequences,
of each
size
the dashed lines at each
end represent linker sequences originally attached to the i clones and carrying the EcoRI sites (downward arrows) used for plasmid subcloning. The double-headed homologous
horizontal
arrow
shows the extent of the 5’ region
to HTFY.
genomic clones digested with a combination of restriction enzymes. The results of this analysis, showing the arrangement of the divergent genes over 12 kb of the HTF9 locus, are summarized in Fig. 1C. (b) Nucleotide sequence of the HTFY-A gene Previous results had shown that several transcripts of different size are originated from the lower strand of HTFY, with two prominent bands sized at 700 and 1100 bp roughly (Fig. 2A). Consistently, screening of a cDNA library with genomic probes derived from the left-hand side of HTFY had yielded several clones of varying size (Lavia et al., 1987). All clones were digested with suitable enzymes, endlabeled using PolIk and sized on polyacrylamide gels (Fig. 2B). The results showed that the length variation among independent cDNA clones occurred in the 3’ end region of the HTFY-A gene. Most clones fell into two major class sizes. and the observed size variation was consistent with the transcript sizes estimated from Northern blots. Two independent cDNA clones representative of each size
204 C-cDNA
Fig. 3. Nucleotide
sequence
of the HTFY-A
(GenBank’EMBL
accession
Nos. X56046 and X56045. rcspcctlvely).
1987) are underlined. $cnc
(GenBankiEMBL
Start and stop codons accession
and HTFY-C‘
genes.
(A)Two
are 1n bold letters. AlternatIve
No. X56044).
T\IO major
polyadcnylation signal are underhned. Start and stop codons method from clones carrying overlapping deletions.
independent
Alternative
cDNA
polyadenylation
RNA ~.YJI,obtained
signals
from previous
are m bold letters. All sequences
class were then sequenced and compared. The nt sequence of both cDNA clones (clones 16 and 19) is shown in Fig. 3A. The size variation is due to the USC of alternative polyadenylation signals and does not affect the OKF in the HTFY-A gene. In all clones the variant TATAAA polyadenylation signal is found at position 83X. In the larger cDNA clones, of which clone 19 is an cxamplc. the canonical AATAAA sequence, which can be expected to be more efficiently recognized (Wickens, 1990), is found further 3’ (position 1014). Indeed, the larger mRNA species, generated by the addition of the poly(A) tail after the type-19 polyadenylation signal, appear to be somewhaL more abundant than the shorter transcripts in most analyzed tissues (Lavia et al.. 1987; Fig. 2A). Translation of the ORF from the HTF9-A cDNA group
clones (16 and 19) corresponding
RNA start sites, reported
from previous
are also underlined.
S I mapping
experiments
tvere determined
to the HU9.4
Sl mapping (B) Sequence (Lawa
on both strands
gent
data (Lavia et al.. of the HTFY-C‘
et al.. 1987), and the using the Sequenax
(Fig. 4A) gives rise to a highly charged protein, very rich in hydrophylic aa residues. Indeed, 49”” of all aa are electrically charged (21.6”,, = Glu +Asp; 9.5”,, = Ser+Thr; 17.6 (Io = Lys + Arg). Four potential helix-forming domains can be identified, which account for most of the protein. The most abundant aa, glutamic acid, is mainly clustered in negative stretches in the regions forming helixes 1, III and IV, while helix-forming region II is rich in positively charged Lys and Arg residues (35”, of all aa). Helix III and IV represent a potential helix-turn-helix domain. Although no extensive sequence homology was found in the EMBL (Heidelberg) or Gcnentech (San Francisco) protein libruries, these features arc reminiscent of certain DNA-binding proteins.
205 B
A MMAKDSHPD
STTHPQFEPI
30
MWTGWAEVGWGSSHYCRIKDRMGENWVSRV
30
MRAKLFAFAS
60
KERVSPGLRG
VCTNGDLSAV
WGSESYQLEP
60
90
SPRPVCSHVG
SGAHGGLRPG
LPSCTPALRP
90
120
HYVKKRKQGL
GQLQGLERKP
GLYSYIRDDL
120
HDTSTENBDE
VSVPEQEIKT
LEEDEEELFK
ENDLPEWKEPRHGDVKLLKH
KEKGTIRLLM
RRDKTLKICANHYITPMMEL
KPNAGSDRAW
VMUTHTDFbDECPKPELLbl
RFLNAENAQK
150
FTSEIFKLEL
QNVPRHASFS
DVRRFLGRFG
150
FKTKFEECRK
EIEEREKKGP
GKNDNAEKVA
130
LQSHKIKLFG
QPPCAFVTFR
SAAERDKGLR
1.30
EKLEALSVRE
MEEAEEKSE
EKQ.
/
C / /
/ /
165 ,/
//
1
/
/
4 // /
//
I/
’ //
‘// /’
VLHGALWKGC
PLAYAWPDPR
EGDSEPSVTQ
SCRCGDPSVD
SDWNVSGCYR
NLeRElGNTN
RQQHNKACCP
/
LEGVKPSPQQ
KAFTTCLMS
GMCTVSKGM
SRHPTVLSFP
STPTPTRNSS
LTPWLGRGRQ
2 IO
TALHVSSWSR
240
RALLPWLLLQ
270
TEYRNKCEFW -
300
TPHVPPLVLP VPHHTRHMTL
330 360
RHIQATGSN
//
/
j /
C
Fig. 3. Comparison are underlined. A window
of the HTF9-A
(B) Deduced
protein
and HTFO-C aa sequences.
aa sequence
of 9 aa, with four mismatches
of the HTF9-C (including
(A) Deduced
protein. A hydrophobic
both identical
aa sequence domain
and conserved
(c) Nucleotide sequence of the HTF9-C gene The upper strand of HTFY is transcribed into a discrete 2.4-kb mRNA (Lavia et al., 1987). One cDNA clone was originally isolated using a genomic probe mapping to the right of HTF9. For the purpose of sequencing overlapping subclones carrying extensive deletions of the original insert were constructed using both Ml3 and pUC vectors. The sequence of the HTF9-C cDNA clone is shown in Fig. 3B. Analysis of the ORF revealed an unusually long (about 600 bp) leader sequence upstream from the start codon, which is entirely contained within the CpG-rich island. Translation of the ORF (Fig. 4B) showed a nonhelical protein with a hydrophobic domain near the C terminus. The HTF9-C protein is rich in Leu (lo:;, of all aa, distributed in small clusters throughout the central portion of the protein) and in Ser (also 10”; of all residues, mostly clustered at both the N and the C termini). A search in the protein data library revealed no extensive homology to known sequences. Thus, both the HTF9-A and HTFPC products represent novel proteins. A matrix dot analysis showed that the two proteins are structurally unrelated to each other (Fig. 4C). (d) Chromosomal location of the HTF9 locus in mouse The mouse chromosome location of the HTFY locus was determined by interspecific backcross analysis. using pro-
of the HTF9-A
is underlined.
protein.
Four potential
helix-forming
(C) Dot plot of the HTFY-A and HTF9-C
regions proteins.
aa) was used.
geny generated by mating (C57BLj6J x Mus .sp~tu.c.) Fl females to C57BL/6J males as previously described (Buchberg et al., 1988). This interspecific backcross mapping panel has been typed for over 660 loci distributed among all the autosomes as well as the X chromosome. C57BL/6J and M. .spretus DNAs were digested with several enzymes and analyzed by Southern-blot hybridization for informative RFLPs using the HTF9 genomic probe. Fragments of 5.1 and 3.3 kb were detected in QhI-digested C57BL/6J DNA; fragments of 11.0 and 7.0 kb were detected in SphI-digested M. spretus DNA. The 11.0 and 7.0-kb M. spretus SphI-specific RFLPs were used to follow the segregation of the HTF9 locus in backcross mice. The two fragments cosegregated and the mapping results indicated that HTFY is located in the proximal region of chromosome 16 linked to Prm-1, Igl-1, Smst, and Gap43. The following probes and RFLPs were used for the linkage studies around the HTF9 locus. The Prm-I probe (prm-1) was a 400-bp mouse cDNA (Kleene et al., 1983) that detected a 4.4-kb TuqI fragment in C57BL/6J DNA and a l.l-kb fragment in M. spretus DNA. The I&I probe was a 1250-bp mouse cDNA containing the VI,, and CA, regions (Scott et al., 1982). X&I digestion and hybridization with the I&l probe produced fragments of 11.0, 7.2, 6.1, 3.9 kb in C57BL/6J DNA and of 13.0, 11.5, 6.1 and 1.9 kb in M. .spretu.r DNA. The 13.0 and 1.9-kb M. ywztu.s fragments
206 cosegrcgated in these studies. The Slnsr probe (pMST-1.4) was a 1.4-kb DmI mouse genomic clone (O’Hara et al., 1988) that detected an approx. 1%kb Bill fragment in C57BL/6J DNA and an approx. 6.2-kb BglI fragment in M. .spretu.s DNA. The probe for the G+3 locus was a 700-bp &I-EcoRI rat cDNA clone (Neve et al.. 1987) that detected a 3.0-kb 7’llqI fragment in C57BLi6J DNA and 3.S- and 1.7-kb TcryI fragments in M. spretus DNA. The 3.S- and 1.7-kb M. .spretus fragments cosegregated in these
studies. Although 90 mice were analyzed for cvcry markcl and are shown in the segregation analysis (Fig. 5), up to 189 mice were typed for some markers. Each locus \\as analyzed in pairwise combinations for recombination frequencies using the additional data. Gent order was detcrmined by minimizing the number of recombination cvcnts required to explain the allele distribution patterns. The ratios of mice exhibiting recombinant chron~oson~es to the total number of mice analyzed for each pair of loci and the most likely gene order arc: ccntromcrc - PI.III-l - 4,‘95 HTFY - 1:‘189 - Igl - l/135 - Stnvt - I6/97 - G’(r,&3. Recombination frequencies (expressed as genetic distances in CM + the standard error) are Pm?-I - 4.2 t 2.1 - /ITbY _ 0.5 + 0.5 - Igl- 0.7 & 0.7 - Smt - 16.5 + 3.X - C;ti/?43. The placement of HTF9 between Pm-l, which is located on human chromosome 16, and Igl, which is located on human chromosome 22. suggests that the human homolog of HTFY could be on either chromosome (Fig. 5). (e) Conservation
of the
genes
from
HTFY in the human
genome
16
I
‘{,
0
I
0.5
mined
by interspecific
I
II I
III 4.2/y
\\ \
16.5
0.7
backcross
analysis.
(C57BL;6J
x :M. spverus)F,
wcrc mated to C57BLib.l DNA
mice. A total of 205 FZ progeny were obtained. and restriction analysis, agarose gel electrophoresis
isolation.
Southern
hybridizations
were prepared
were as described
with Zetabind
generally
done to a final stringency
NaCW.3
M
Na,
citrate,‘0.3
SDS at 65°C. The segregation
(Jenkins
membranes M
of0.6
mM
and flanking
cross animals
that were typed in common
of the figure.
For individual
Washing
was
x SSCP (20 x SSCP = 2.4 M
NaZHP0,,‘80
of HTFY
et al.. 1982). Blots
(AMF-Curio).
for HTF9
NaH,PO,):‘O.
1 O,,
genes in 90 backis shown at the top
pairs of loci. more than 90 animals
were
typed. Each column represents the chromosome identified m the backcross progeny that was inherited from the (C57BL,6J x M. sprelus)F, parent. Blackened boxes represent the presence ofa C57BL’6J allele and open boxes represent the presence of a hl. qwr~r.r allele. The number ot offspring
inheriting
each column. bottom
of the
described MADNESS Frederick,
each type of chromosome
A partial figure.
(Green,
chromosome Recombination
1981)
using
the
is listed at the bottom
I6 linkage distance computer
developed by D. Dave (Data MD) and A.M. Buchberg
map is shown were program
of
at the
calculated
as
SPRETUS
Management Scrviccs, Inc.. (NCI-FCRDC‘. ABL-BRP.
Frederick, MD) and are shown in CM below the chromosome Above. the positions of all loci except HTFY in human chromosomes (Reeves et aI.. 1989) are shown.
We wished to assess whcthcr the HTFY-associated genes were conserved in the human gcnome. Southern analysis showed that the human homologue of HTFY is a single-cop) sequence which maintains the characteristics of a typical CpG island. as judged by the frequency of HpuII sites and by the cxtensivc lack of methylution at these sites (data not sho\vn). Attempts to investigate the gene organization in relation to the human HTFY island were difficult for scvcral reasons: lirst, the HTFY-A cDNA hybridized to man) genomic bands, as would be expected if pseudogene existed at genomic locations other than HTFY; second, the first exon of the HTFY-A gene. which falls in the CpG-rich domain and would have been suitable for investigating the linkage of the gene 5’ ends in the human genome (see map in Fig. 1). is nearly entirely noncoding in mouse and ma! have diverged in humans. Conservation of expression from the human HTE‘Y homologue \vas investigated in Northern-blot experiments. Poly(A) + mRNA was purified from human adenocarcinema HeLa cells and hybridized with the mouse gcnomic HTF9 probe and with the cDNA clones for both gcncs (Fig. 6). The hybridization pattern of the genomic probe (lane 2) \vas virtually identical to that observed in mouse: the human transcript sizes corresponded to that of the unique HTF9-C transcript and to the cluster of HTFY-A transcripts. Hybridization using the cDNA clones as probes showed that the abundance of the HTFY-A and HTFY-C transcripts (Fig. 6, lanes 1 and 3, respectively) in human cells was very similar to that seen in mouse. Indeed, the proportion between the short and large HTFV-,I mRNAs was also maintained. suggesting that the alternative pol\,adenylation signals Mere also conserved in the
207 a genomic compartment of high gene density. The mouse divergent surf-l and su$2 genes are originated in the surfkit locus which contains at least six tightly linked genes (Williams et al., 1989; Huxley and Fried, 1990). Similarly, divergent transcripts have been identified in the major histocompatibility complex class-III locus (Sargent et al., 1989). The HTFY locus is also included in a chromosome region with closely clustered CpG-islands, each of which is likely to mark a novel gene, as seen by PFGE mapping (Brown and Bird, 1986). We have now assigned this region
2%
to the proximal end of mouse chromosome 16. Interestingly, the composite map of chromosome 16 shows two
1%
Fig. 6. Northern-blot was extracted purified
hybridization
using the guanidine
by oligo(dT)
agarosei2.2
cellulose
M formaldehyde washes
gels
chromatography as described
Hybridizations
and
HTFY-A cDNA
clone, (2) the genomic
and (3) the HTF9-CcDNA bands
were
of HeLa poly(A)‘RNA. HCI method,
carried
and (Lavia
out at 65’C
subclone
clone. The migration
HeLa RNA
and poly(A) + RNA was run
on
et al.,
1.2”, 1987).
using:
(1) the
pLY.2 containing
HTF9
ofthe 28s and 18.5 RNA
is indicated.
human HTF9-A cDNA. The high-stringency conditions used in this experiment indicate that very little divergence has occurred in the coding sequences of either gene. Thus, both HTFY-associated genes are active in human cells. The detection of a single-copy island sequence in the human genome using the mouse HTF9 probe, which also hybridizes to both human transcripts, suggests that the genes are still linked at a human genomic locus containing a CpGisland homologous to the mouse HTFY. (f) Conclusions We have analyzed the molecular organization of the mouse HTFY-A and HTFY-C genes that are divergently transcribed from a bidirectional promoter contained in the CpG-island HTFY. The exon/intron structure of both genes was determined over 12 kb around HTFY. The HTFY-C gene, which is strictly single-copy, was entirely mapped to the right of the island. On the other hand, mapping of the HTFY-A gene on the left-hand side of HTF9 was not possible due to the occurrence of multiple hybridization signals. possibly reflecting the existence of several pseudogenes. The close proximity of the bidirectional genes may indicate
sites of viral integration (namely, Akr-2 and MAY-6) at the approximate location of HTFY, which also suggests an active chromosome domain. We have fully sequenced both mouse cDNAs. Both products from the HTFY locus represent novel proteins. The deduced sequence of the HTF9-A protein is similar to that of certain DNA-binding proteins. On the other hand, no obvious function can at present be attributed to the HTFY-C gene from inspection of the sequence alone. The presence of mature messengers from both genes in early mouse embryos and in at least eight differentiated tissues (Lavia et al., 1987) suggests that both products serve some basic function. Both transcripts are also expressed in human cells. The high conservation of the coding sequences is indicated by the high stringency conditions at which heterologous hybridization was detected in human cells. In a number of systems, divergently transcribed genes from a shared promoter encode related products: this is well exemplified by the Drosophila genes encoding the H2a and H2b histones which are designed to assemble into nucleosome particles, by the yeast GALI and GAL10 genes that are co-induced by the binding of Gal-4 to their bidirectional promoter, and by the human xI(IVJ and ct2(/V) collagen genes whose products also assemble in the mature collagen molecule. In contrast, the HTFY-associated HTFY-A and HTF9-C gene sequences are totally unrelated and the predicted protein characteristics would result in totally different secondary structures which are unlikely to serve similar functions. All known bidirectional genes in mammals are linked to a CpG-island. Therefore bidirectional activity has been thought of as an intrinsic property of CpG-rich promoters, possibly reflecting the absence of a positioning TATA box (Melton, 1987; Johnson and Friedmann, 1990) and/or the abundance of sites for transcription factors that are bidirectionally active, such as Spl (Dynan et al., 1986). This view is consistent with the ability of elements derived from certain CpG-rich promoters [for example, those of the human HPRT (Melton, 1987), of the human c-Ha-vus (Lowndes et al., 1989) and of the mouse c-Ki-rcrs (Hoffman et al., 1987) genes] to direct bidirectional transcription
208 in vitro. However,
this capability
is clearly not absolute
and
does not apply, for example, to the CpG-island associated with the human PGK gene, which does not drive antisense transcription in similar in vitro assays (Johnson and Friedmann, 1990). Bidirectional transcription may be associated with coordinate expression of genes that may be structurally
unrelated
yet are similarly required. The HTFY-A and HTF9-C transcript origins are exceptionally close in the mouse genome, their major start sites falling within 30 bp, and thus the genes can be expected to share some regulatory sequence. A functional study of the HTFYassociated bidirectional promoter supports this view (Somma et al., 1991). In this light, it may be of interest that the levels of transcription of both genes in human cells are similar to those observed in mouse tissues. Cloning of the human HTFY homologue and Sl mapping of the human transcript origins will be required to ascertain whether the linkage of the divergent gene 5’ ends is also maintained in the human genomc.
I I: linkage to other proto-oncogenc
to chromosome
factor loci using interspecific
backcross
and grolvth Res. 2 ( 198X)
mice. Oncogene
IA’)-16’. Burbelo,
P.D., Martm,
G.R. and Yamada,
Y.: rl(lV)
and r2(IV) collagen
genes are regulated by a bidirectional promoter and a shared enhancer. Proc. Natl. Acad. Sci. USA 85 (198X) 9679-0682. Carotti,
D.. Palitti, F., Lavia, P. and Strom,
CpG-rich
islands.
Estivill. X.. Farval, Bates,
G.,
Nucleic
M., Scambler,
Kruyer,
Williamson,
H.,
P., Bell, G., Haivley,
Frederick,
R. and Wainwright,
locus isolated
R.: In vitro methylation
01
Acids Res. 17 (1989) 9219-9229.
by selection
P.. Stanier.
B.: A candidate
K.. Lench.
N..
P.. Watson,
E..
for the cystic fibrosis
for methylation-free
islands.
Nature
326
(1987) X40-845. Feinberg,
A. and Vogelstein,
restriction Biochem.
fragments
for radiolabeling
to high specific
DNA
activity.
Anal.
137 (1984) 266-267.
Gardiner-Garden, genomes. Green,
B.: A technique
endonuclease M. and
Frommer,
M.: CpG
islands
in vertebrate
J. Mol. Biol. I96 (1987) 261-282.
E.L.: Linkage,
Probability
recombination
and mapping.
in Animal Breeding
Experiments.
In: Genetic&
Macmillan,
and
New York.
pp. 77-113.
1981, Hoffman,
E., Trusko,
functional
S., Freeman,
characterisation
N.A. and George,
of the promoter
D.: Structural
and
region of the mouse c-Ki
T(ISgene. Mol. Cell. Biol. 7 (1987) 2592-2596. Huxley,
C. and Fried, M.: The mouse surfeir locus contains
six genes associated
ACKNOWLEDGEMENTS
genomic
We thank Paola Lucarelli for the gift of human Southern filters, Roger Greeves for probes and D.A. Swing, C.M. Silan and B. Cho for excellent technical assistance. This work was supported by Consiglio Nazionale delle Ricerche, by Progetto Finalizzato Ingegneria Genetica and by Fondazione Cenci-Bolognetti (PL) and by the National Cancer Institute, DHSS, under contract NOl-CO-74101 with ABL.
Jenkins,
with four CpG-rich
islands
a cluster
of
in 32 kllobaaes
of
DNA. Mol. Cell. Biol. 20 (1990) 605-614.
N.A., Copeland,
distribution
N.G., Taylor, B.A. and Lee, B.K.: Organization.
and stability
virus DNA sequences
of endogenous
in chromosomes
ecotropic
murine
leukemia
of Mu.r ,nuscu[u~. J. Viral. 43
(1982) 26-36. Johnson,
P. and Friedman”,
housekeeping
T.: Limited
gene promoters:
bidirectional
human
HPRT
activity
and PGK.
of two Gene
X8
(1990) 207-213. Kleene, K.C., Distel, R.J. and Hecht. plasmic
poly(A) + RNAs
haploid
phases
N.B.: cDNA clones encoding
which
first appear
of spermatogenesis
at detectable
in the mouse.
cyto-
levels in
Develop.
Biol. YX
(1983) 455-464. Lavia, P.. Macleod,
D. and Bird, A.: Comcident
transcripts at a randomly J. 6 (1987) 2773-2779.
ADDENDUM
start sites for dlvergcnt
selected CpG-rich
island of mouse. EMBO
Linton, J.P., Yen, J., Selby, E.. Chen, Z.. Chinsky.
A significant similarity (54”,,, including identical and conserved aa) between the HTF9-A and HMGl proteins was observed after sorting the protein database with the Computer
Rewarch
(DN:\
Inc..
Star
Rewurcc\
hladison.
P ZT-SC,lN
R.E. and Grouse,
G.F.: Dual bidirectional
DHFR locus: cloning the divergently
and characterization
transcribed
J.M., Lin, K., Kellems, promoters
at the mouse
of two mRNA classes
rrp-1 gene. Mol. Cell. Biol. 9
of
(1989)
3058-3072.
propm
Lowndes,
WI).
N.F.. Paul, J., Wu, J. and Allan, M.: c-Ha-rclsgene
promoter
expressed
in vitro: location
and regulation
bidirectional Mol. Cell. Biol.
9 (1989) 375X-3770. Melton, REFERENCES Antcquera. F., Macleod, D. and Bird, A.P.: Specific protection methylated CpGs in mammalian nuclei. Cell 58 (1989) 509-517. Bird,
A.P.:
Trends
CpG-islands Genet.
as gene markers
in the vertebrate
nucleus.
M., Miller, O.J. and Macleod, D.: A that is derived from islands of non-
methylated, CpG-rich DNA. Cell 40 (1985) Yl-99. Brown, W.R.A. and Bird, A.: Long-range restriction mapping genomic
of
3 (1987) 342-347.
Bird, A.P., Taggart, M., Frommer, fraction of the mouse genome
malian
D.: Strategies
initiation 267-270.
DNA. Nature
of mam-
332 (1986) 477-481.
Buchberg, A.M., Bedigian, H.G., Taylor, B.A., Brownell, E., Ihle, J.N., Nagata. S.. Jenkins, N..4. and Copeland. N.G.: I.ocnlizatlon of E+2
and mechanisms
of mammalian
Neve, RT.L., Perrone-Bizzozero, E., Kurnit, protein
(B50-Fl):
genes. J. Cell. Sci. XX(19X7)
N.I., Finklcstein,
D.M. and Benowitz,
Gap-43
for the control oftranscrlptlonal
protein-coding
neuronal
lation and regional distribution Brain Res. 2 (1987) 177-183.
S., Zwicrs,
L.I.: The neuronal specificity.
of the human
H., Brid,
growth-associated developmental
regu-
and rat mRNAs.
Mol.
O’Hara, B.F.. Bendotti, C., Reeves, R.H., Oster-Granite, M.L., Coyle, J.T. and Gearhart, J.D.: Genetic mapping and analysis of somatostatin in Snell dwarf mice. Mol. Brain Res. 4 (198X) 283-292. Rappold, G., Stubbs, L., Labeit, S., Crkvenjakov, B. and Lehrach, Identification of a testis-specific gene from the mouse t-complex to a CpG-rich island. EMBO J. 6 (1987) 1975-19X0.
H.: next
209 Reeves,
R.H., Gearhart,
O’Brien,
J.D., Hecht,
S.J.: Mapping
linkage of Prm-I
N.B., Yelick, P., Johnson,
of Prm-1 to human
and Prw2
chromosome
on mouse chromosome
P. and
16 and tight
16. J. Hered.
80
(1989) 442-446. Sargent,
E.A., Dunbam,
I. and Campbell,
HTF island-associated complex
R.D.: Identification
genes in the human
class III region.
EMBO
ofmultiple
major histocompatibility
J. 8 (1989) 2305-23
wild mice. Nature
the mouse @G-island
HTFY contains
multiple
Toniolo,
promoter
from
protein-binding
ele-
Nucleic Acids Res. lY (199 I)
D., Persico,
M. and Alcalay,
M.: A ‘housckccpmg’
X chromosome encodes a protein similar Acad. Sci. USA 85 (1988) 851-855. M.: How the messenger
nucleus.
300 (1982) 757-760.
M.P., Pisano, C. and Lavia, P.: The housekeeping
redundant.
in press.
Wickens.
12.
Scott, C.L., Mushinski, J.F., Huppi, K., Wigert, M. and Potter, M.: Amplification of immunoglobulin i constant genes in populations of Somma,
ments that are functionally
Williams.
Trends
Biochem.
T., Yun, J., Huxley,
contains
to ubiquitin.
got its tail: addition
evolution.
Proc. Natl.
of poly(A) in the
I5 (1990) 277-281. C. and Fried, M.: The mouse sufeit
a very tight cluster of four housekeeping
served through 3527-3530.
gene on the
Proc.
Natl.
Acad.
locus
genes that is conSci. USA
85 (1988)