Gene, 128 (1993) 227-236 0 1993 Elsevier Science Publishers B.V. All rights reserved. 0378-l 119~93/$06.~
227
GENE 07138
The human erythropoietin-encoding gene contains a CAAT box, TATA boxes and other transcriptional regulatory elements in its 5’ flanking region {Gene expression; transcription signals; growth factors; hormone; hematopoiesis)
Sylvia Lee-Huang”, Jih-Jing Linb, Hsiang-fu Kungb, Philip Lin Huangc, Paul Lee Huangc
Leo I,,,eeb and
‘Department of Biochemistry, New York University School of Medicine, New York, NY 10016, USA; bLaboratory of Biochemical Physiology, National Cancer lnstitate-Frederick Cancer Research and Development Center, Frederick, MD 21701, USA. Tel. (301) 8463703;
and “Department of Medicine,
Massachusetts General hospital and Harvard Medical School, Boston, MA 02114, USA. Tel. (617) 726-2000
Received by M. Bagdasarian: 2 December 1992; Accepted: 12 January 1993; Received at publishers: 1 March 1993
SUMMARY
We have reported the cloning and expression of a human erythropoietin (hEp~encoding cDNA [Lee-Huang, Proc. Natl. Acad. Sci, USA 81 (1984) 270%27121. Using this hEp cDNA as a probe, we isolated a 9.3-kb BumHI genomic Ep clone from a human leukocyte library soon thereafter. The size and restriction map of this clone is in agreement with restriction analysis of human genomic DNA probed with the hEp cDNA, demonstrating that this clone is representative of the single hEp gene. This clone is unique in that it extends beyond any reported hEp genomic clone by 3.9 kb on the 5’ side and by 1.8 kb on the 3’ side. The promoter function of the newly described 5’ flanking region has been demonstrated by the expression of biologically active hEp in transfected cells. We find that, despite reports to the contrary, hEp does contain classic canonical TATA boxes and a CAAT box. The 5’-flanking region also contains cytokine-responsive consensus sequences, tissue-specific and metal-responsive elements, CRE and GRE sites, and binding sites for transcription factors, including API, NF-K@ and Spl. These regulatory elements have not been found in the hEp genomic clones thus far reported. The identification of these elements and their precise localization in hEp should be useful in studying the regulation of hEp expression, as well as in gene therapy and physiologic modulation of this hormone.
INTRODUCTION
Erythropoietin (Ep) is a 34-kDa glycoprotein hormone that serves as the prime regulator of mammalian red blood cell production. Expression of Ep is developmentally regulated and tissue specific. In the developing fetus, Ep is produced by the liver (Zanjani et al., 198 l), but after Correspondence to: Dr. S. Lee-Huang, Department of Biochemistry, New York University Schooi of Medicine, New York, NY 10016, USA. Tel. (212) 2634135; Fax (212) 26343166.
Abbreviations: aa, amino acid(s); AIDS, acquired immunodefi~iency syndrome; AP, activator protein; bp, base pair(s); CAMP, cyclic AMP, CRE, CAMP-recognition sequence; Ep, erythropoietin; hEp, human Ep;
birth, it is produced by the kidney (Jacobson et al., 1957). Anemia and hypoxic stress increase Ep production significantly over basal levels (Boundurant et al., 1986; Throling et al., 1986). Ep gene expression is also stimulated by cobalt chloride, hypoxia and interleukin-6 (IL6 ) in human hepatoma cell lines (Goldberg et al., 1987; Faquin et al., 1992). Analysis of hEp expression in hEp, gene (DNA) encoding hEp; GM-CSF, granulocyte-macrophage colony-stimulating factor; GM-CSF, gene encoding GM-CSF, GRE, glucocorticoid-recognition sequence; HIV, human immunodeficiency virus; hsp, heat-shock protein(s); IL, interleukin; kb, kilobase or 1000 bp; MRE, metalIo-r~~n~ve element(s); NF, nuclear factor; nt, nucleotide(s); RE, IL-responsive element(s); tsp, transcription start point(s).
228
transgenic mice reveals that sequences extending beyond the 5’-flanking region of the reported 3.6kb fetal liver Ep gene are essential for tissue specific and inducible expression (Semenza et al., 1989; 1990). Multiple transcription initiation sites and c&acting regulatory regions were observed in transgenic animals as well as in human hepatoma cells (Semenza et al., 1989; 1990; 1991a). However, no corresponding transcription start signals or regulatory elements were identified to account for these observations. Despite the importance of this gene and recent advances in the field, remarkably little is known about the molecular mechanisms involved in its tissue-specific production, developmental regulation, and inducible expression. The hEp genomic clones that have been described thus far do not contain TATA boxes or CAAT boxes (Lin et al., 1985; Jacobs et al., 1985; Egrie et al., 1992). The lack of such elements was surprising and difficult to explain. Furthermore, the hEp gene lacks the initiator sequences that are normally present when the promoter lacks a classic TATA box (MeKnight et al., 1982; 1987). We have reported the isolation of a human Ep cDNA clone (LeeHuang 1984). Using this hEp cDNA as a probe, we isolated a 9.3-kb genomic clone from a human leukocyte genomic library soon thereafter. This hEp genomic clone extends in both directions beyond the Ep genes reported by others. In this paper, we report that the hEp gene does indeed contain classical canonical promotors in its extended 5’-flanking region. In addition, this region also contains cytokine consensus sequences, tissue-specific and metal-responsive elements (e.g., CRE, GRE), as well as binding sites for transc~ption factors, including APl, NF-KB and Spl. These regulatory elements have not been found in the hEp genomic clones reported by others (Lin et al., 1985; Jacobs et al,, 1985; Egrie et al., 1992).
RESULTS AND DISCUSSION
(a) Isolation, characterization and genomic organization of the hEpSLH clone Human Ep cDNA was radiolabelled with 32P by the nick-translation method (Rigby et al., 1977). The cDNA was used to screen a human leukocyte genomic library that we cloned in hgtl 1. Hybridizing phage were plaque purified and phage DNA was prepared using established methods. The 9.3-kb genomic clone hEpSLH was selected for further analysis. (2)
Characterization
ofthe genomic
clone
The BamHI insert of hEpSLH was subcloned into pUC 19 for large scale preparation. The clone was charac-
terized by restriction analysis and blotting (Southern, 1975) using Ep cDNA as a probe. The sequence of the genomic DNA was determined by the dideoxy method (Sanger et al., 1977) using Ml3 mp18 and mp19 subclones. (3) Genomic organization of the hEpSLH clone Fig. 1 shows the results of restriction and Southern blot analysis of human genomic DNA and hEpSLH clone using the coding region of the hEp cDNA as a probe. As seen in Fig. IA, BamI-II digestion of human leukocyte DNA results in a single hybridizing band of 9.3 kb (lane 1). This is the same size as the BamHI insert of h&S./H (lane 2), indicating that the hEpSLH represents the same single Ep gene as reported from the fetal liver. 32Plabelled Hind111 fragments of h DNA were used as size markers (lane 3). Figs. 1B and 1C are Southern blot and restriction digestion results of hEpSLH respectively. Lanes 3 and 7 in both figures are the 9.3-kb BamHI insert of hEpSLH, the starting material of the restriction digestion. Lanes 1 and 4 are nearly complete and partial Sac1 digests. The 9.3-kb insert of hEpSLH has a unique Sac1 site, and complete digestion results in two fragments. One fragment, about 6.9 kb, contains the S-flanking region and the entire coding region of the clone. It is thus detected by the hEp cDNA probe (Fig. 1B and IC, lanes 1 and 4). The other fragment, about 2.3 kb, consists mainly of the extended 3’-flanking region and does not hybridize to the hEp cDNA. The 9.3-kb hEpSLH insert has a unique Hind111 site, and partial digestion with this enzyme (Fig. lC, lane 2) produces two fragments in addition to the undigested 9.3-kb band. One 3.9-kb fragment contains the entire extended new 5’-flanking region of the clone. The other, a 5.4-kb fragment, contains the previously reported 3.6-kb portion of the hEp gene and an additional 1.8 kb of the extended 3’ region, Only the uncut 9.3-kb insert and the 5,4-kb fragment hybridize to the hEp cDNA probe (Fig. lB, lane 2); the 3.9-kb band does not. These results indicate that the hEpSLH clone extends beyond the fetal liver Ep clones by 3.9 kb at the 5’ end, and 1.8 kb at the 3’ end. These results were also confirmed by DNA sequence data. Lanes 5 and 6 are HincII and BstEII digests of hEpSLH respectively, and lane 8 is unlabelled size marker, the Hind111 fragments of h DNA. Fig. 2 shows the restriction map of hEpSLH, and a schematic representation of its genomic organization. The extended 5’-flanking region reported here is directly continuous through the Hind111 site with the 3.6-kb clone reported by others (Lin et al., 1985; Jacobs et al., 1985), as shown. We have determined the complete nt sequence of the hEpSLH clone. The coding region consists of five exons
229 C
B
A 12345678
123
12345678
kb
Fig. I. Southern hybridization agarose using Tris*phosphate cDNA.
using “P-labeled 3ZP-labeled
23.1
-
0.6
and restriction endonuclease analysis of human leukocyte DNA and hEpSLH. Electrophoresis was carried out in I% buffer, and the resolved DNA fragments were transferred to a nitrocellulose filter and hybridized with 32P-labeled hEp
The filter was washed
hybridization
-
at 65’C
with 0.15 M NaCI/IS
hEp cDNA
weight
marker,
cDNA
as probe; lanes: I, Sucl digest; 2, Hind111 partial
insert; 8, unlabeled
A. DNA/Hind111
as probe;
h DNA/Hind111
hEpSLH
mM Na,ecitrate
lanes: I, human
digest. (B) Southern
genomic
pH 7.0 containing DNA digested
hybridization
of restriction
digest; 3, uncut hEpSLH
I% SDS, and exposed
with BarnHI; 2, hEpSLH digests
of hEpSLH
insert; 4, Sncl partial
digest (see panel C). (C) Ethidium-bromide-stained
insert
to film. (A) Southern
BarnHI
insert; 3, molecular
DNA using 32P-labeled
digest; 5, HincII; 6. BstEII; 7. uncut
gel, same samples
hEp
hEpSLH
as in panel B.
clone 9.3 kb
BarnHI t Exon
II
Ill
IV v
Previously reported sequence
Extended 5’ flanking region -
I
Extended 3’ region
3.6kb -1.8kb--+I 5.4 kb -
3.9 kb
Hindlll MElI
Exon I
Bgfl SacI
II
Ill
Psfl
IV v
Fetal liver Ep clone 3.6 kb Fig. 2. Restriction hEpSLH genomic
map and genomic structure of hEpSLH. Restriction map and schematic representation of the genomic structure of the 9.3-kb clone, and comparison with the 3.6-kb genomic clone reported by others (Lin et al., 1985; Jacobs et al.. 1985). The positions of the
9.3-kb BarnHI-BarnHI
insert and the 5.4-kb HindlII-EarnHI
fragment
and four introns, with intron/exon boundaries flanked by consensus donor and acceptor splice sites. This organization and the size of the exons in hEpSLH are identical to those reported for the fetal liver clones. In the region where they overlap, the sequence of hEpSLH is 97% identical to the fetal liver genomic clones. These results will be reported elsewhere. Over 99% identity was also found in the S-flanking region where the clones overlap.
used in constructs
listed in Table I are shown.
(b) Expression of biologically active erytbropoietin For expression of the hEpSLH gene, the 9.3-kb BumHI fragment and other constructs containing various regions of the gene were cloned in an eukaryotic expression vector, pSVL and used to transfect COS-7 cells (ATCC1651) by the Ca.phosphate method (Wigler et al., 1977). Biologically active hEp was produced in COS-7 cells that had been transiently transfected with pSVL contain-
230 ing the genomic hEpSLH 9.3-kb BamHI-BumHI or 5.4kb HindIII-BamHI fragments under the control of the SV40 late promoter. Ep production was measured by the in vivo exhypoxic polycythemic mice bioassay (Cotes, 1961; Lee-Huang, 1980). About 12-19 units/ml or 7-11 units/ml of hEp were detected in the 9.3-kb or 5.4-kb construct, respectively. When the SV40 late promoter was deleted, only 3-5 units/ml or
sequence of the 3892 bp in the extended 5’-flanking region against GenBank did not reveal significant homology with any published nt sequence. In addition to the CAAT and TATA boxes, many potential transcriptional regulatory elements were also identified in this region. Table II shows a select few of
TABLE II A selected list of transcriptional
regulatory
elements and their positions
in hEpSLH Locationb
Element’
-
(nt) 1954
TATA box
2006 2304 CAAT box Lymphokine
875 consensus
NF-IL6
Sequence (5’ to 3’) TATAAA TATAAA TATAAA AGCCACT
1271
GGGGTTTCAC
1871
GAGGTTTCAC
304 319
TGGGAGA
432 442
TGGGGGA TGGGGGA
508 601 1837
TGAGGGA
2271
TGACTTCT
2909 (both dir)
CTGACTAA
896 2621
TCCCCCTCCC TCCCCCACCC
2785
TCGCCCAGGC
AP4
3040
TCAGCTGCGG
CRE
1437
AGACGTCA
GR-MT
2015
TGTCCT
2646 3371
TGTCCT TGTCCT
APl
AP2
(c) Identification of regulatory elements in the S-flanking region
TGGGGAA
TGGGAGA TGAGAGA
(1) Nucleotide sequence
Fig. 3 shows the nt sequence of the extended S’-flanking region from the BamHI site to the Hind111 site of hEpSLH. A computer-aided homology search of the TABLE I Expression No.
of hEp from hEpSLH
constructs Size
Construct’
Promoterb
(kb)
1. 2. 3. 4.
hEpSLH
GR-Utero GRIPR-MMTV
3847 3295
TGTCCT TGTTCT
3841
AGAACA
(units/ml)
MRE
2232
TGCACAC
12-19 3-5 7-11 <1
NF-KB
1271 3199
GGGGTTTCAC GGGAATCTC
Spl (GC box)
2390 (both dir) 1969 (both dir)
CCCCGCCC
in Cos 7 cells
BarnHI-BamHI
9.3
+
hEpSLH BamHI-BamHI hEpSLH HindIII-BamHI hEpSLH HindIII-BarnHI
9.3 5.4 5.4
+ -
hEpC production
‘Constructs contain either the entire 9.3-kb BarnHI-BamHI insert of hEpSLH (1 and 2) or a 5.4-kb HindIII-EarnHI fragment located at the 3’ end of hEpSLH (3 and 4), as shown in Fig. 2. “The fragments of hEpSLH were placed in the constructs in the presence (+) or absence (-) of the SV40 late promoter in pSVL. ‘hEp production was measured by the in vivo polycythemic mouse assay (Cotes, 1961; Lee-Huang, 1980).
GGCGGG
“The locations of the transcription regulatory elements listed are shown in Fig. 4. ‘The nt positions, measured from the BamHI site at the 5’ end of hEpSLH as shown in Fig. 2, are numbered according to Fig. 3. Sites which are present in both 5’ to 3’ direction and 3’ to 5’ direction are listed as ‘both dir’.
231
these elements and their positions. A schematic map of these regulatory elements is presented in Fig. 4. (2) Classical promoter elements: TATA boxes and CAATbox Most eukaryotic promoters for RNA polymerase II contain the sequence TATAA (TATA box) located about
25 to 30 bp upstream from tsp and the sequence CCAAT (CAAT box) located 60 to 80 bp upstream from tsp (McKnight and Kingsbury, 1982). The TATA box is recognized by transcription factor TFIID, and the TFIIDpromoter complex directs the assembly of an initiation complex with RNA polymerase II and other initiation
232 Hindlll
Fig. 4. Identification of hEpSLH
of the classical
was analyzed
canonical
using the Genetics
CAAT box Computer
, TATA boxes and potential
Group
programs
((33.3
transcriptional
sequence
analysis
regulatory
software
package,
Ep coding Barn HI
elements GCG,
of hEp. The nt sequence Inc., Madison,
WI, USA).
Sequence comparisons were carried out using BESTFIT and GAPSHOW programs. Localization of sequences corresponding to transcription factor binding sites was done using the FINDPATTERN program. A schematic representation of locations of the CAAT box, TATA boxes, and other selected transcriptional
regulatory
elements
in the S-flanking
region
of hEpSLH
factors (McKnight and Tjian, 1987). The rate of transcription is specified by the interaction of DNA sequencespecific binding proteins and their recognition sites at the TATA box, at upstream activating sequences, and at initiator elements at the tsp itself. The human Ep genomic clones thus far reported do not contain classic TATA boxes or CAAT boxes (Lin et al., 1985; Jacobs et al., 1985). It has thus been suggested that the Ep gene may be similar to ‘housekeeping genes’ that also lack such elements. However, this has been difficult to reconcile with the fact that Ep is not a ‘housekeeping gene’, and its expression is regulated in a precise tissue specific and developmental manner, and in response to specific stimuli such as hypoxia or cytokines. Furthermore, typical initiator sequences that are normally present when the promoter lacks a classic TATA box (McKnight et al., 1982; 1987) are absent in the hEp gene. We found that the hEpSLH clone does contain three TATA boxes with the canonical sequence TATAAA located at 1954,2006, and 2304. In addition, three potential weak TATA boxes with the sequence TACAAA are found at nt positions 1754, 1778, and 3554. The identification of multiple promoters with both canonical and noncanonical TATA sequences in hEpSLH is in agreement with, and indeed provides an explanation for, observations of multiple tsp. Studies of Ep expression in transgenic animals and in the hepatoma cell lines Hep3B and Hep2G (Semenza et al., 1989; 1990; 1991; Imagawa et al., 1991) have mapped tsp to a putative site that is 219 nt upstream from the start codon. However, all of these studies, which use RNase protection or primer extension, detect larger transcripts that have been unexplained. RNase protection of transcripts of the human Ep gene in transgenic mice
is shown.
Table II shows the exact nt positions
of these elements.
showed protection up to the 5’ end of several probes used. Control experiments showed that these larger transcripts were clearly from the human transgene and not from mouse lip mRNA. These transcripts were also seen in Hep3B cells, showing that they do not result simply from the location on a transgene. Our finding that the extended 5’-flanking region contains canonical TATA boxes provides an explanation for these observations. The number and size of the transcribed but non-translated exons from this extended 5’-flanking region are not yet known. It is possible that relative transcription from upstream promoters such as the ones we report here (resulting in longer RNA) and less well-defined downstream promoters (resulting in shorter RNA) is regulated differentially in different tissues and under conditions of basal expression or induction. This genomic organization is similar to that of the GMCSF gene and the IL-3 gene, where transcription from a site close to the majority of the coding sequences is detected using RNA probes from this region, but alternative transcripts that arise very far upstream (some over 10 kb upstream) are also present (Stanley, 1985). These longer transcripts are associated with a cytokine consensus sequence, and may be related to coordinate expression of multiple cytokines. There is a CAAT box present at nt position 875 of the hEpSLH clone, with the sequence 5’-AGCCACT. This is a high affinity CP2 binding site that has also been found in other human genes such as fibrinogen and H-2Kb (Chodosh, 1988). No sites that bind to CPl or NFl (CTF) were found. This CAAT box is located further upstream from the TATA boxes than in most reported genes. It is not known whether it is able to function as a conventional CAAT box from this distance, or whether it serves a different role in hEp expression.
233 (3) Cytokine regulatory sequences: lymphokine consensus and IL&responsive
elements (RE)
Interleukin-6 (IL-6) plays a central role in host defense mechanisms by regulating immune responses, hematopoiesis, and acute phase reactions (Kishimoto, 1990). Recently, it was reported that IL-6 also stimulates hypoxia induced Ep production in Hep3B cells (Faquin et al., 1992). IL-6-responsive elements (BE) can be grouped into two types of consensus sequences (Heinrich et al., 1990; Hattori et al., 1990; Majello et al., 1987). One is a decanucleotide, S-GRGRTTYCAY, and the other is a heptanucleotide, S-TG($A. The promoter region of hEpSLH contains both sequence motifs. Interestingly, both of these motifs are also found in the IL-6 RE of C-reactive protein. These elements are located at both the proximal and distal regions of the promoter and they are involved in the coordinated regulation of gene expression; the proximal element confers tissue specificity and the distal element confers cytokine responsiveness. The decanucleotide sequences have been found in the S-flanking regions of genes encoding the hematopoietic growth factors GM-CSF, IL-3 and IL-2. These elements are also known as lymphokine consensus elements and they are involved in the coordinate regulation of these lymphokines by each other and by themselves (Heinrich et al., 1990). The Ep gene contains two copies of this lymphokine consensus sequence, one located at nt position 1271 (S-GGGGTTTCAC) and other at nt 1871 (5’GAGGTTTCAC) in hEpSLH. As with the other lymphokines, these consensus sequences are located within several hundred bases upstream from canonical TATA boxes, but are separated from the main protein coding section of the gene by a significant distance (see Table III). These consensus sequences are thought to be related to basal
expression and induced expression under coordinate control. The heptanucleotide sequences are found in genes whose expression is regulated by IL-6, including acute phase hepatic proteins, C-reactive protein, haptoglobin, and hemopexin (Gauldie et al., 1987). A nuclear factor, NF-IL6, binds to these elements and mediates IL-6 induction (Poli and Cortese, 1989). Seven copies of the IL-6 RE were identified in hEpSLH. Six of these elements are clustered upstream at nt positions 304 (5’-TGGGAGA), 319 (5’-TGGGGAA), 432 (5’-TGGGGGA), 442 (5’TGGGGGA), 508 (5’-TGAGGGA), and 601 (5’TGGGAGA), and one is located further downstream at nt position 1837 (5’-TGAGAGA). The presence of multiple copies of these IL-6 RE at specific distal and proximal regions of the Ep gene may play an important role in tissue specific and inducible coordinated expression of this hormone, as demonstrated by the upregulation of hypoxia-induced Ep production in Hep3B cells by IL-6 (Faquin et al., 1992). (4) Binding sites for activator proteins APl, AP2, and AP4
Two APl-binding sites were found at nt positions 2271 (5’-TGACTTCT) and 2909 (5’-CTGACTAA) of hEpSLH, with the latter being a palindromic site. Three AP2 sites at nt 896 (5’-TCCCCCTCCC), 2621(5’-TCCCCCACCC), and 2785 (5’-TCGCCCAGGC) and a single APCbinding site at nt position 3040 (5’-TCAGCTGCGG) were found as well. The transcription factor API is a heterodimer of the jun and fos proto-oncogene products, which dimerize by a leucine-zipper mechanism (Landschulz et al., 1988). API mediates responsiveness to phorbol esters, but may be involved in basal expression as well (Curran and
TABLE III Distance between lymphokine consensus sequences and TATA boxes and ATG start codon in various lymphokine-encoding Lymphokine-encoding
Murine GM-CSF Murine IL-3 Murine IL-2 Human IL-2 Human Ep
gene”
Consensus recognition sequenceb
GAGATTCCAC GAGATTCCAC GAGGTTCCAT GGGATTTCAC GGGATTTCAC GGGGTTTCAC GAGGTTTCAC
genes
Location relative to: TATA box(es) (nt)
ATGd (kb)
-78 -265 -84 -175 -172 -83, -135, -433 -683, -735, -1033
>I0 214 >I4 >2 22 >2 >2
“Species of origin and the lymphokine-encoding gene. ‘Consensus recognition sequence, from 5’ to 3’ end. See Table II. ‘The nt position of consensus sequence relative to TATA boxes of lymphokine-encoding gene. In the case of hEp, the locations relative to each of the three TATA boxes are listed. dDistance between consensus sequence and start codon ATG in lymphokine-encoding gene.
234 Franza, 1988). While APl is found in all cell types, AP2 is a tissue specific factor, found in HeLa cells but not in hepatocytes (Masayoshi et al., 1987). It mediates response to phorbol esters and CAMP dependent protein kinase (Cohen, 1985; Nishizuka, 1986). AP4 mediates tissuespecific and developmental regulation (Mermod et al., 1988). It is involved in pituitary cell type-specific expression of prolactin and growth hormone (Nelson et al., 1986). These multiple copies of different types of the APbinding sites may play a coordinated role in the regulation of Ep production. It has been reported that phorbol ester, CAMP, and steroid hormones all stimulate the production of Ep (Rodgers et al., 1975). (5) CAMP-responsive
elements (CRE)
CAMP mediates the expression of numerous eukaryotic genes. CRE enhancer sequences, 5’-ACGTCA, are responsible for the induction of many genes in response to the increased intracellular CAMP concentration. CAMP brings about the phosphorylation of the transcription factors that bind to the CRE of the DNA and activate transcription. Protein kinase, that is activated physiologically by diacylglycerols (DAGs) and by the tumor promoter phorbol myristyl acetate (PMA), also known as tetradecanoyl phorbol acetate (TPA) also regulates the expression of certain genes. This again involves the phosphorylation of transcriptional factors API. We have identified a CRE sequence, S-ACGTCA, in the promoter region of our hEpSLH at nt position 1438. The regulatory function of this element remains to be defined. (6) Steroid hormone glucocorticoid-metallo-responsive elements (GRE)
Glucocorticoids responsive elements (GRE) are in general palindromic, often consisting of an inverted repeat sequence separated by a 3-nt gap, for example, the sequence 5’-AGAACANNNTGTTCT. GRE also include elements responsive to androgens, metallocorticoids (GRMT) and progestins (Ahe et al., 1985). The action of steroid hormones are mediated by a family of specific DNAbinding proteins, that consist of an N-terminal of variable length, followed by about 60 aa residues containing two zinc fingers and a C-terminal that binds the hormone (Yamamoto, 1985). Free receptor proteins are complexed with the heat shock proteins (hsp) in the cytosol. When the hormone enters the cells it binds to the receptor and causes it to dissociate from the hsp. The hormone bound receptor is then free to migrate to the nucleus, where it recognizes and binds to its specific responsive sequence via its zinc fingers. Binding of these factors to DNA sites can either stimulate or inhibit transcription. Four copies of glucocorticoid-metallo-responsive element (GR-MT) 5’-TGTCCT are identified at nt positions
2105,2646,3371 and 3847 of the hEp gene. Furthermore an interesting motif identified as the glucocorticoiduteroglobin as well as the glucocorticoid/PR-MMTVresponsive element 5’-TGTTCT is identified at nt position 3295. Steroid hormones stimulate Ep production and they have been used widely in clinical treatment of anemia and hemopoietic disorders (Alexanian, 1966; Alexanian et al., 1972). The identification of multiple copies of the GRE in the promoter region of hEp and their local assemblies with reference to other regulatory elements may provide further insight on the complex hormonal regulation of Ep expression. Studies on the modulation at GRE by members of the APl family, the cooperation of the protein kinases C and A phosphorylation pathways, and the cross-talk between divergent classes of transcription factors and hormone receptors will be exciting challenges in the field of Ep biology. (7) NF-KB-binding sites NF-KB is a member of the Rel-related family of transcription factors (Lanardo and Baltimore, 1990). Under basal conditions, these ubiquitous proteins are retained in the cytoplasm by an inhibitor, IKB. Upon the appropriate stimulus (phorbol esters, protein kinase C, eIF-2 kinase), the complex dissociates and the NF-KB subunits dimerize and translocate to the nucleus, where they recognize and bind to specific decanucleotide binding sites, and activate transcription. NF-KB binding is involved in the regulation of expression of IL-l, IL-2, IL-6, B-interferon, TNF-CX,and other cytokines. It is also involved in the cytokine mediated expression of liver proteins in response to inflammation, and may be utilized by viruses to express viral genes as well. Two NF-@-binding sites are identified in hEpSLH, at nt positions 1271 (5’-GGGGTTTCAC) and 3199 (5’GGGAATCTC). The 1271 site overlaps with the lymphokine consensus element. Similar overlaps are found in the genes encoding IL-2 and IL-6. The identification of these binding sites may provide further insight on how this single transcription factor works to regulate gene expression in diverse cell types and under various conditions. (8) Spl-binding sites Spl is a ubiquitous transcription factor that selectively binds to G + C-rich regions (GC boxes) with the sequence of 5’-GGGCGG or its reversed orientations (Jones and Tjian, 1985). G + C-rich polynucleotide elements have been identified in multiple copies in promoter regions of several viruses including SV40, the herpes virus, and the AIDS retrovirus HIV-l as well as in many cellular genes. Spl is involved in maintaining basal transcription level and its core sequence has been noted in the G + C-rich regions of many enhancers, such as the metal- responsive
235
element (MRE), and hsp. Spl contains three zinc fingers at its C terminus. There are three Spl binding sites in hEp; two of these are located in the promoter region at nt positions 2390 (5’-CCCCGCCC) and 1969 (SGGCGGG) in hEpSLH, and one is located further downstream, 3’ to the Hind111 site.
HL 30862. She also wishes to acknowledge the constant support and encouragement by Dr. Henry I. Huang during the course of this work and her convalescence from injury.
(d) Conclusions
REFERENCES
(2) Southern blot analysis indicates that there is a single copy of the Ep gene in the human genome. The genomic organization of the hEpSLH clone and its nt sequence show that this clone represents the same gene as the fetal liver Ep clones that have been reported (Lin et al., 1985; Jacobs et al., 1985). However, hEpSLH extends beyond the previously reported genes by 3.9 kb in the 5’ direction, and 1.8 kb in the 3’ direction. The extended 5’-flanking sequences contain many potential transcriptional regulatory elements. (2) The 5’-flanking region reported here is required for proper hEp expression, as demonstrated by transient expression in COS-7 cells. (3) Despite reports to the contrary, we find that the hEp gene does contain classic canonical TATA boxes and a CAAT box. The newly described 5’-flanking region of the gene also contains consensus cytokine sequences, tissue-specific and metallo-responsive elements (CRE, GRE), and binding sites for transcription factors, including API, NF-KB, and Spl. These regulatory elements have not been found in the Ep genomic clones thus far described (Lin et al., 1985; Jacobs et al., 1985; Egrie et al., 1992). (4) The nt sequence of the extended 5’-flanking region from the BumHI site to the Hind111 site of hEpSLH is a new and unreported sequence. A computer-aided homology search of the entire 3892 bp in the 5’-flanking region against GenBank did not reveal significant homology with any published DNA sequence. (5) The presence of these regulatory elements provides a molecular basis for many observed features of tissuespecific, developmentally regulated and inducible expression of the human erythropoietin. The identification and the precise localization of these regulatory elements should be useful in the modulation of human erythropoietin production under physiological and pathological conditions.
ACKNOWLEDGEMENTS We thank David Liu and Andrew Rich for assistance in the sequencing of hEp,SLH and Cheng Hsuan Lee for excellent technical assistance. S.L.-H. acknowledges partial support of her work by NIH grants HL21683 and
Ahe, D., von der Janich, S., Scheidereit, C., Renkawitz, R., Schutz, G. and Beato, M.: Glucocorticoid and progesterone receptors bind to the same sites in two hormonally regulated promoters. Nature 313 (1985) 706-711. Alexanian, R.: Erythroprotein excretion in man following androgens. Blood 28 (1966) 1007. Alexanian, R., Nadell, J. and Alfrey, C.: Oxymetholone treatment for the anemia of bone marrow failure. Blood 40 (1972) 353-365. Boundurant, M.C. and Koury, M.J.: Anemia induces accumulation of erythropoietin mRNA in the kidney and liver. Mol. Cell Biol. 6 (1986) 2731-2733. Chodosh, L.A., Baldwin, AS., Carthew, R.W. and Sharp, P.A.: Human CCAAT-binding proteins have heterologous subunits. Cell 18 (1988) 1l-24. Cotes, P.M. and Bangham, D.R.: Bioassay of erythropoietin in mice made polycythemic by exposure to air at reduced pressure. Nature 191 (1961) 1065-1067. Curran, T. and Franza, Jr., B.R.: Fos and Jun: the AP-1 connection. Cell 55 (1988) 395-397. Egrie, J.C. and Goldwasser, E.: ‘Erythropoietin’. In: Aggarwal, B.B. and Gutterman, J.U. (Eds), Human Cytokines, Blackwell Scientific Publication, Oxford, 1992, pp. 383-398. Faquin, C.W., Schneider, T.J. and Goldberg, M.A.: Effect of inflammatory cytokines on hypoxia-induced erythropoietin production. Blood, 79 (1992) 1987-1994. Gauldie, J., Richards, C., Harnish, D., Lansdrop, P. and Baumann, H.: Interferon g,/B-cell stimulatory factor type 2 shares identity with monocyte-derived hepatocyte-stimulating factor and regulates the major acute phase protein response in liver cells. Proc. Natl. Acad. Sci. USA 84 (1987) 7251-7255. Goldberg, M.A., Glass, G.A., Cunningham, J.M. and Bunn, H.F: The regulated expression of erythropoietin by two human hepatoma cell lines. Proc. Natl. Acad. Sci. USA 84 (1987) 7972-7976. Hattori, M., Abraham, L.J., Northernann, W. and Fey, G.H.: Acutephase reaction induces a specific complex between hepatic nuclear proteins and the interleukin 6 response element of the rat a,-macroglobulin gene. Proc. Natl. Acad. Sci. USA 87 (1990) 23642368. Heinrich, P.C., Castell, J.V. and Andus, T: Interleukin-6 and the acute phase response. Biochem. J. 265 (1990) 621-636. Imagawa, S., Goldberg, M.A., Doweiko, J. and Bunn, H.F.: Regulatory elements of the erythropoietin gene. Blood 77 (1991) 278-285. Jacobs, K., Shoemaker, C., Rudersdrof, R., Neil], S.D., Kaufmann, R.J., Mufson, A., Seehra, J., Jones, S.S., Hewick, R., Fritsch, E.F., Kawakita, M., Shimizu, T. and Miyake, T.: Isolation and characterization of genomic and cDNA clones of human erythropoietin. Nature 313 (1985) 806-810. Jacobson, L.O., Goldwasser, E., Fried, W. and Plzak, L.: Role of the kidney in erythropoiesis. Nature 179 (1957) 633-634. Jones, K.A. and Tjian, R: Spl binds to promoter sequences and activates herpes simplex virus ‘immediate-early’ gene transcription in vitro. Nature 317 (1985) 179-182. Kishimoto, T.: The biology of interleukin-6. Blood 74 (1990) l-10. Landschulz, W.H., Johnson, P.F. and Mcknight, S.L.: The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 240 (1988) 1759-1764.
236 Lee-Huang, S.: A new preparative method for isolation of human urinary erythropoietin with hydrophobic interaction chromatography. Blood, 56 (1980) 620-624. Lee-Huang, S.: Cloning and expression of human erythropoietin cDNA in E. cob. Proc. Nat]. Acad. Sci. USA 81 (1984) 2708-2712. Lenardo, M.J. and Baltimore, D.: NF-KB: a pleiotropic mediator of inducible and tissue-specific gene control. Cell 58 (1990) 227-229. Lin, F.-K, Suggs, S. , Lin, C.-H, Browne, J.K., Smalhng, R., Egrie, J.C., Chen, K.K., Fox, G.M., Martin, F., Stabinsky, Z., Badrew, S.M., Lai, P.H. and Goldwasser, E.: Cloning and expression of human erythropoietin gene. Proc. Nat]. Acad. Sci. USA 82 (1985) 7580-7584. Majello, B., Arcone, R., Toniatti, C. and Ciliberto, G.: Constitutive and IL-6-induced nuclear factors that interact with the human C-reactive protein promoter. EMBO J. 9 (1990) 457-465. Masayoshi, I., Chiu, R. and Karin, M.: Transcription factor AP-2 mediates induction by two different signal-transduction pathways: protein kinase C and CAMP. Cell 51 (1987) 251-60. McKnight, S.L. and Kingsbury, R.: Transcriptional control signals of a eukaryotic protein coding-gene. Science 217 (1982) 316-324. McKnight, S. L. and Tjian, R.: Transcriptional selectivity of viral genes in mammalian cells. Cell 46 (1987) 795-805. Mermod, N., Williams, T. J. and Tjian, R.: Enhancer binding factor AP-4 and AP-I act in concert to activate SV40 late transcription in vitro. Nature 332 (1988) 557-561. Nelson, C., Crenshaw III, E.B., France, R., Lira, S.A., Albert, V.R., Evans, R.M. and Rosenfeld, M.G.: Discrete cis-active genomic sequences dictate the pituitary cell type-specific expression of rat prolactin and growth hormones genes. Nature 322 (1986) 557-562. Nishizuka, Y.: Studies and prespectives of protein kinase C. Science 233 (1986) 305-310. Poli, V. and Cortese, R.: Interleukin-6 induces a liver-specific nuclear protein that binds to the promoter of acute-phase genes. Proc Natl. Acad. Sci. USA 86 (1989) 8202-8206. Pugh, C.W., Tan, C.C., Jones, R.W. and Rarcliffe, P.J.: Functional analysis of an oxygen-related transcriptional enhancer lying 3’ to the mouse erythropoietin gene. Proc. Natl. Acad. Sci. USA 88 (1991) 10553-10557. Rigby, P.W.I., Dickmann, M., Rodes, C. and Berg, P.: Labeling of deox-
yribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I. J. Mol. Biol. 113 (1977) 237-241. Rodgers, G.M., Fisher, J.W. and George, W.J.: The role of renal adenosine 3’,5’-monophosphate in the control of erythropoietin production. Am. J. Med. 58 (1975) 31-38. Sanger, F., Nicklen, S. and Coulson, A. R.: DNA sequencing with chainterminating inhibitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Semenza, G.L., Traystman, M.D., Gearhart, J.D. and Antonarakis, S. E.: Polycythemia in transgenic mice expressing the human erythropoietin gene. Proc. Natl. Acad. Sci. USA 86 (1989) 2301-2305. Semenza, G.L., Dureza, R.C., Traystman, M.D., Gearhart, J.D. and Antonarakis, SE.: Human erythropoietin gene expression in transgenic mice: multiple transcription initiation sites and cis-acting regulatory elements. Mol. Cell. Biol. IO (1990) 930-938. Semenza, G.L., Koury, ST., Nejfelt, M.K., Gearhart, J.D. and Antonarakis, S.E Cell-type-specific and hypoxia-inducible expression of the human erythropoietin gene in transgenic mice. Proc. Natl. Acad. Sci. USA 88 (1991a) 8725-8729. Semenza, G.L., Nejfelt, M.K., Chi, SM. and Antonarakis, S.R.: Hypoxia-inducible nuclear factor bind to an enhancer element located 3’ to the human erythropoietin gene. Proc. Natl. Acad. Sci. USA 88 (1991b) 5680-5684. Southern, E.: Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98 (1975) 503-517. Stanley, E., Metcalf, D., Sobieszczuk, P., Gough, M.N. and Dunn, A.R.: The structure and expression of the murine gene encoding granulocyte-macrophage colony stimulating factor: evidence for utilisation of alternative promoters. EMBO J. 4 (1985) 2569-2573. Throling, E.B. and Erslev, A.J.: The tissue tension of oxygen and its relation to hematocrit and erythropoiesis. Blood 31 (1968) 332-343. Wigler, M., Silverstein, S., Lee, L.S., Pellicer, A., Cheng, Y.-C. and Axel, R.: Transfer of purified herpes virus thymidine kinase gene to cultured mouse cells. Cell 11(1977) 223-232. Yamamoto, K.R.: Steroid receptor regulated transcription of specific genes and gene networks. Annu. Rev. Genet. 19 (1985) 209-252. Zanjani, E.D., Ascensao, J.L., McGlave, P.B., Bansidre, M. and Ash, R.C.: Studies on the liver to kidney switch of erythropoietin production. J. Clin. Invest. 67 (1981) 1183-188.