Gene 211 (1998) 195–203
Structure of the Drosophila DNA topoisomerase I gene and expression of messages with different lengths in the 3∞ untranslated region Sheryl D. Brown 1, Claire X. Zhang, Alice D. Chen, Tao-shih Hsieh * Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA Received 7 August 1997; received in revised form 6 November 1997; accepted 6 November 1997; Received by C.M. Kane
Abstract The nucleotide sequence of the Drosophila DNA topoisomerase I gene (top1) has been determined. Structurally, top1 consists of eight exons and seven introns. The top1 coding region contains a new class of opa repeats, encoding clusters of serine residues instead of glutamine repeats usually seen in Drosophila genes of the neurogenic loci. A unique feature of top1 is the developmental switch of its transcripts: a heterogeneous population of transcripts ranging from 3.8 to 4.2 kb seen maximally at 0–2 h of embryogenesis and a 5.2-kb transcript maximal at 6–12 h of embryonic development. The transcripts expressed in the 0–2-h embryo have been shown as maternal storage products specific to ovarian tissues. RACE analysis shows that whereas the 6–12-h transcripts have a single site for polyadenylation, there are at least 12 different sites for poly(A) addition to the 0–2-h transcripts. An additional intron specific for the maternal storage transcripts appears in some of the 0–2-h transcripts. No significant heterogeneity at the 5∞ end of the top1 transcripts is seen. Sequence searches have revealed a number of regulatory sequences for potential translational control in the 3∞ untranslated region. © 1998 Elsevier Science B.V. All rights reserved. Keywords: Development; Genome; Polyadenylation sites; Structural analysis
1. Introduction Type I DNA topoisomerases are involved in many aspects of DNA metabolism including replication, transcription, repair, and recombination [reviewed in Vosberg (1985), Hsieh (1993), Gupta et al. (1995) and Wang (1996)]. DNA topoisomerase I is not essential for the viability of the unicellular organisms bacteria and yeasts. Bacterial mutants lacking topoisomerase I grow slowly but are viable. However, they generally acquire compensatory mutations, often in gyrase genes, which can lower the superhelical density to physiological limits (DiNardo et al., 1982; Pruss et al., 1982). Experiments using mutants of Saccharomyces cerevisiae and Schizosaccharomyces pombe demonstrate that whereas * Corresponding author. Tel: +1 919 684 6501; Fax: +1 919 684 8885; e-mail:
[email protected] 1 Present address: Department of Human Genetics, Merck Research Laboratories, West Point, PA 19486, USA. Abbreviations: ARE, AU-rich element; CPE, cytoplasmic polyadenylation element; Dm, Drosophila melanogaster; Dv, Drosophila virilis; PCR, polymerase chain reaction; poly(A), polyadenylation; RACE, rapid amplification of cDNA ends; RP49, ribosomal protein 49 gene; top1, DNA topoisomerase I gene. 0378-1119/98/$19.00 © 1998 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 9 8 ) 0 0 11 9 - X
they have drastically decreased topoisomerase I activity, less than 1%, these cells do grow, albeit slower than wild-type cells ( Thrash et al., 1984, 1985; Uemura and Yanagida, 1984; Uemura et al., 1987). In these instances, topoisomerase II can substitute for topoisomerase I. Topoisomerase I does have biological functions distinct from those of topoisomerase II. Mutants of Saccharomyces cerevisiae have been isolated that require topoisomerase I expression for viability or normal growth (Sadoff et al., 1995). In particular, mutations have been isolated that are synthetically lethal with top1 but do not affect topoisomerase II. Whereas DNA topoisomerase I (top1) is dispensable in bacteria and yeast, it is essential for the development of multicellular organisms, including Drosophila melanogaster (Lee et al., 1993) and mouse (Morham et al., 1996). In Drosophila, the regulation of the top1 protein expression correlates well with the DNA replication events during the development. In mouse, embryos with homozygous null alleles of top 1 die between the fourand 16-cell stages. All these observations suggest that DNA topoisomerase I plays crucial biological roles in higher eukaryotes. A cDNA encoding Drosophila DNA topoisomerase I
196
S.D. Brown et al. / Gene 211 (1998) 195–203
has previously been isolated and characterized (Hsieh et al., 1992). Upon screening a Drosophila cDNA library for a gene specific for eukaryotic DNA topoisomerase I, the cDNA clones isolated are categorized into two groups. Sequence analysis of the two groups of clones, one 5.2 kb in length and the other 3.8 kb, shows that they have identical protein-coding regions but differ at their 3∞ untranslated regions, 1955 versus 612 nucleotides, respectively. Examination of the expression of the top1 gene during the various stages of the Drosophila life cycle reveals two populations of top1 transcripts expressed differentially during development (Lee et al., 1993). A 5.2-kb transcript homogeneous in size is maximal at 6–12 h of embryonic development, a time of highest DNA synthesis. However, a heterogeneous population of top1 transcripts ranging from 3.8 to 4.2 kb in length is maximal at 0–2 h of development and is seen again in adult flies. Recent studies have demonstrated the importance of the 3∞ untranslated region in the regulation of gene expression, especially for maternal or early embryonic transcripts [reviewed in Richter (1996) and Wickens et al. (1996)]. Maternal transcripts are synthesized and stored in a translationally inactive form in the growing oocyte and are then activated during oocyte maturation, at fertilization, or in early embryogenesis. The 3∞ untranslated region houses the cis-acting elements that are required in controlling polyadenylation, determining message stability, regulating translation initiation, and also determining cytoplasmic localization of mRNA. The differing lengths of the 3∞ untranslated regions of the two top1 cDNA species and the two populations of top1 transcripts differentially expressed during development are the key features that have prompted us to further analyze the structure and regulation of the gene encoding Drosophila DNA topoisomerase I.
2. Materials and methods 2.1. Cloning, nucleotide sequencing, and DNA sequence analysis Molecular cloning was performed according to standard procedures (Sambrook et al., 1989). top1 genomic DNA fragments from the two isolated top1 genomic clones lgtop1-1 and lgtop1-2(Lee et al., 1993) were cloned into the Bluescript vector (Stratagene Cloning Systems, La Jolla, CA). The fragments g1-1 and g1-2 of lgtop1-1 and the fragments g2-1, g2-2, and g2-3 of lgtop1-2 were also cloned into the Bluescript vector (see Fig. 1). Nested deletion series were generated by exonuclease III digestion. Regions of DNA covering the entire transcribed region of the top1 gene within the Drosophila genome were sequenced on both strands according to the Sanger dideoxynucleotide chain ter-
mination method, using T7 DNA polymerase ( US Biochemical Corporation, Cleveland, OH ) and also 5∞ exo−-Taq DNA Polymerase (a gift from Dr David Tu, Pennsylvania State University). The nucleotide sequence data were compiled by the Pustell and MacVector DNA analysis programs (International Biotechnologies, New Haven, CT ) and were deposited in GenBank under the Accession No. U80064. Sequence homology comparisons using BLAST were performed using the National Center for Biotechnology Information’s BLAST WWW Server. 2.2. Northern blot Total RNA was isolated from 0–2-h and 6–12-h Drosophila embryos, and RNA blots were performed as previously described (Lee et al., 1993). Approximately 5 mg of total RNA samples were run on a 1.1% agarose gel containing formaldehyde and analyzed by blot hybridization, using 32P-nick-translated DNA probes made from the 5.2-kb top1 cDNA and also ribosomal protein gene RP49 (O’Connell and Rosbash, 1984) used as a control for amount of total RNA loaded. Using T7 RNA polymerase, RNA size markers were made as in-vitro transcription products off linearized plasmid DNA of the top1 cDNAs and their subcloned fragments. 2.3. Rapid amplification of cDNA ends (RACE) Both 3∞ and 5∞ RACE were performed with modifications ( Frohman et al., 1988). The adapter primer 5∞ CAGGATCCTCGAGAAGC 3∞ and the adapter-dT 17 primer were synthesized. Remaining oligonucleotides referred to in this section are numbered from lctop1-2 sequences. For 3∞ RACE, two oligos representing nucleotides 2883–2900 and nucleotides 3145–3162 (oligos 16 and 53, respectively, in Fig. 3A) were used in separate PCR amplifications of the 0–2-h cDNA pool. An oligo of nucleotides 4278–4295 (oligo 27 in Fig. 5B) was used in amplification of the 6–12-h cDNA pool. For 5∞ RACE, an oligo of nucleotides 369–351 was used in the 0–2-h reverse transcription reaction, and an oligo of nucleotides 507–490 was used in the 6–12-h reverse transcription reaction. For 5∞ RACE PCR amplification, an oligo representing nucleotides 288–269 was used for the 0–2-h cDNA pool, and an oligo of nucleotides 294–278 was used for the 6–12-h cDNA pool. Approximately 1 mg of total RNA from 0–2-h and 6–12-h Drosophila embryos was used in reverse transcription reactions using AMV reverse transcriptase with (dT ) as the primer (Amersham, Arlington 17 Heights, IL). The resulting cDNAs were PCR-amplified, digested with restriction enzymes, cloned into the Bluescript vector, and transformed into AG1 cells (Stratagene Cloning Systems, La Jolla, CA). Screening for transformants was done by colony hybridization
S.D. Brown et al. / Gene 211 (1998) 195–203
197
Fig. 1. A physical map showing the genomic structure of the top1 gene. The lines labeled lgtop1-1 and lgtop1-2 represent the two overlapping l genomic DNA clones containing Drosophila top1 and the neighboring dah gene that were isolated previously. g1-1 and g1-2 are subclones of lgtop1-1; g2-1, g2-2, and g2-3 are subclones of lgtop1-2. The shaded bars labeled lctop1-2 and lctop1-6 represent the two full-length top1 cDNA clones that were previously isolated. The shaded areas represent the exons of the top1 and dah genes, and the darkly shaded areas within the lightly shaded areas represent their protein coding regions. Introns are depicted as the open boxes between shaded exons. The translational start codon AUG and stop codons UAA and UGA are indicated. Poly(A) addition sites are represented by A The following restriction sites are indicated: n. SpeI, Sp; XbaI, X; PstI, P; EcoRI, R; NruI, N; HpaI, H; and BamHI, B.
methods (Sambrook et al., 1989). For screening 3∞ RACE transformants, a labeled DNA probe specific for the EcoRI–BamHI fragment contained in p[top1] was used (see Fig. 3). 5∞ RACE transformants were screened with a labeled DNA probe specific for the 5∞ EcoRI fragment of lctop1-2.
3. Results and Discussion 3.1. Structure of the Drosophila top1 gene To study the structure and regulation of the gene encoding DNA topoisomerase I in Drosophila, we have set out to isolate the genomic locus of the top1 gene. Using the top1 cDNA as a probe, we have screened Drosophila genomic DNA libraries and have isolated several recombinant phages. Among them, lgtop1-1 and lgtop1-2 are two overlapping l clones that span a region of over 20 kb of the Drosophila genome [Fig. 1; see also Lee et al. (1993)]. This cloned genomic region covers
the entire transcribed region of two Drosophila genes, top1 and dah. dah is a maternal-effect gene that regulates the cytoskeletal structure during early embryogenesis ( Zhang et al., 1996). With EcoRI digestion and subcloning, lgtop1-1 has been subdivided into the fragments g1-1 and g1-2 and lgtop1-2 into the fragments g2-1, g2-2, and g2-3 (Fig. 1). Using these cloned DNA fragments as templates, a nucleotide sequence determination from SpeI to the last EcoRI site, 13 951 bp in total, has been made ( Fig. 1). By comparison of the genomic DNA reported here (GenBank Accession No. U80064) and the sequences of the top1 cDNA (Hsieh et al., 1992; GenBank Accession No. M74557), the gene encoding Drosophila DNA topoisomerase I comprises eight exons and seven introns (shaded boxes and open boxes, respectively in Fig. 1). Exon 8 is more precisely referred to as exon 8-short and exon 8-long, which correspond to the last exon of the top1 cDNAs, lctop1-6 and lctop1-2, respectively. All of the top1 introns have the canonical GT and AG dinucleotides at their 5∞ and 3∞ ends, respectively ( Keller and Noon, 1985; Mount et al.,
198
S.D. Brown et al. / Gene 211 (1998) 195–203
1992). The branch point consensus A, which is conserved as the 3∞ splice signal in more than 80% of Drosophila introns, is seen in all of the top1 introns ( Table 1). The locations of two introns are conserved between the fly and the human enzymes. Relative to the amino acid sequences of these homologous proteins, the Drosophila introns 4 and 7 are located at identical sites compared to the human introns 8 and 20, respectively. These sites are found within regions of highly conserved amino acid sequences. It is possible that these intron positions may have been present prior to the divergence between the two species (Marchionni and Gilbert, 1986). However, they do not demarcate the protein structural domains of human topoisomerase I, as revealed by using limited proteolysis and sequence comparisons (Stewart et al., 1996a,b). There are no introns detected in the topoisomerase I genes isolated from Saccharomyces cerevisiae ( Thrash et al., 1985) and the human malarial parasite Plasmodium falciparum ( Tosh and Kilbey, 1995). Schizosaccharomyces pombe contains two small introns in the 5∞ part of its topoisomerase I gene ( Uemura et al., 1987). We have confirmed that the cloned genomic regions have not undergone any rearrangement nor alteration by genomic Southern experiments (data not shown). Using hybridization probes made from g1-2, g2-2, and g2-3, one or two major fragments of Drosophila genomic DNA have been specifically recognized after single restriction digestions. Interestingly, when g2-1 is used as a probe, a smear of hybridization and many other distinct bands have been detected in addition. Further genomic Southern experiments along with sequence searches in the Drosophila database have revealed that specific sequence repeats in the top1 gene are homologous to repetitive elements seen in several genes from Drosophila melanogaster and Drosophila virilis ( Table 2). Several stretches of AGC repeats encoding serine residues at the top1 amino terminus are homologous to CAG repeats encoding glutamines in some of the genes of the neurogenic loci, including mastermind and Notch.
These glutamine stretches represent a family of the repeated motif termed opa, (CAX ) where X is G or A n ( Table 2) ( Wharton et al., 1985). At least 12% of known Drosophila melanogaster proteins contain multiple long homopeptides, predominantly glutamine runs, and a large majority of these proteins are essential developmental proteins, which primarily have roles in central nervous system development ( Karlin and Burge, 1996). 3.2. Developmental expression and tissue specificity of the top1 transcripts Earlier studies have determined the developmental expression of the top1 transcripts (Lee et al., 1993). We have previously shown that at 0–2 h of embryogenesis, the expression of a population of top1 transcripts ranging in size from 3.8 to 4.2 kb is maximal (Fig. 2, lane 1). Also, a 5.2-kb top1 transcript is maximal at 6–12 h of development ( Fig. 2, lane 2). Comparable expression of both species of top1 transcripts has been seen in adult flies. To further define the expression of these transcripts, we have since determined the tissue specificity of the top1 transcripts using lctop1-2 as a probe in the Northern analysis of adult female and male flies, their ovaries and testes, respectively, and their remaining carcasses ( Fig. 2). The 3.8- to 4.2-kb top1 transcripts are specific only to the ovaries of adult female flies ( Fig. 2, lane 4), thus confirming the maternal storage and expression of these messages. However, all samples analyzed, including the ovaries, possess the 5.2-kb top1 transcript ( Fig. 2, lanes 3–8). These data demonstrate that whereas the heterogeneous population of top1 transcripts is ovarian tissue-specific, the homogeneous 5.2-kb top1 transcript is present in all adult fly tissues. The heterogeneity of Drosophila top1 transcripts appears to be unique among the developmentally regulated transcripts, which are usually characterized by only several, no more than seven, distinct species ( Edwalds-Gilbert et al., 1997). In Xenopus laevis, two distinct forms of DNA topoisomerase I have been found: a 165-kDa form specific to oocytes and a 110-kDa form
Table 1 Sequences at the splice junctions of the Drosophila top1 gene
Intron Intron Intron Intron Intron Intron Intron Intron
1 2 3 4 5 6 7 8
Size (NT )
5∞ end (C/A)AG3GT(A/G)AGT
3∞ splice variant (c/T )T(A/G(A(C/T )N 15–26
3∞ end (T/C ) N(C/T )AG3(G/A) ≥11
1342 489 71 1095 74 61 81 385
AAT3GTAAGT TCG GTACGT AAG GTAGGC ATG GTGAGT AAG GTGCGT CAT GTAAGT TTG GTGAGT AAG GTACGC
CTGATN 35 CTAATN 8 ATAATN 6 CTAACN 6 CTTACN 14 TTAATN 3 CTAATN 12 CTAACN 32
CTCTCTCGGCCGCAG3A CCTTTTACATTTTAG C AATTTTTTTTGTCAG G CATCCTACATTCCAG G TGGCCAACGCAATAG G TATAAACCAATGCAG T ATCATCCATTTGCAG G TCCAATATATGTCAG T
Consensus sequences for Drosophila splice junctions are from Keller and Noon (1985) and are in good agreement with sequences described in Mount et al. (1992).
S.D. Brown et al. / Gene 211 (1998) 195–203
199
Table 2 Homology of top1 sequences to repetitive elements 1 Dm top1 cDNA Dm rutabaga ADC
375 4215
2 Dm top1 cDNA Dm 75B mRNA
378 605
Dm top1 cDNA Dm 75B mRNA
622 1298
3 Dm top1 cDNA Dv mastermind gene
378 439
Dm top1 cDNA Dv mastermind gene
621 3112
Dm top1 cDNA Dv mastermind gene
764 3111
Dm top1 cDNA Dm mastermind gene
375 961
Dm top1 cDNA Dm mastermind gene
765 969
Dm top1 cDNA Dm opa repeat
375 101
4
5
SSSSSSS CAGCAGCAGCAGCAGCAGCAGCAA CAGCAGCAGCAGCAGCAGCAGCAA QQQQQQQ SSSSSS CAGCAGCAGCAGCAGCAGCA CAGCAGCAGCAGCAGCAGCA QQQQQQ SSSS AGCAGCAGCAGCAA AGCAGCAGCAGCAA SSSS SSSSSS CAGCAGCAGCAGCAGCAGCAA CAGCAGCAGCAGCAGCAGCAA QQQQQQQ SSSS gAGCAGCAGCAGCAa cAGCAGCAGCAGCAg QQQQQ SSS GCAGCAGCAGCAG GCAGCAGCAGCAG QQQQ SSSSSSS CAGCAGCAGCAGCAGCAGCAGCAA CAGCAGCAGCAGCAGCAGCAGCAA QQQQQQQQ SSSS GCAGCAGCAGCAGt GCAGCAGCAGCAGc QQQQ SSSSSSS CAGCAgCAGCAGCAGCAGCAGCA CAGCAaCAGCAGCAGCAGCAGCA
Specific sequences of the top1 gene are homologous to repetitive elements seen in several genes of Drosophila melanogaster, Dm and Drosophila virilis, Dv. Triplet codons AGC and AGT encode serine (S ). Triplet codons CAG and CAA encode glutamine (Q) residues. An S or Q above or below a nucleotide denotes the beginning of that triplet encoded. Homologous nucleotides are indicated in capital letters; non-homologous nucleotides are not capitalized. The beginnings of each sequence shown are numbered according to the nucleotide sequences from their GenBank (gb) or EMBL (emb) Accession Numbers. Dm top1 cDNA, gb-M74557; (1) Dm rutabaga adenylyl cyclase, gb-M81887; (2) Dm 75B, a hypothetical protein encoded by mRNA from inducible puff 75B, emb-X15586; (3) Dv mastermind gene, gb-M92914; (4) Dm mastermind gene, emb-X52451; (5) Dm opa repetitive element from the Notch locus, gb-M12175.
specific to somatic cells (Richard and Bogenhagen, 1991). Experiments have determined that a large amount of the somatic topoisomerase I message is maternally stored in oocytes, yet is translationally dormant in oocytes (Pandit et al., 1996). 3.3. Characterization of the maximally expressed top1 transcripts As a means of trying to determine the structural differences between these developmentally regulated top1 transcripts, the procedure termed the rapid amplification of cDNA ends ( RACE) (Frohman et al., 1988) has been used to characterize the top1 RNA from the developmental stages of the Drosophila life cycle which have maximal top1 gene expression, during 0–2 and
6–12 h of embryogenesis. Both the 3∞ and 5∞ ends of these RNA have been examined, using a reverse transcription and polymerase chain reaction to synthesize the sequences at both the 3∞ and 5∞ ends. The resulting amplification products have been analyzed by molecular cloning and nucleotide sequencing. 3∞ RACE experiments have allowed mapping of the polyadenylation sites for the top1 transcripts maximally expressed during Drosophila development. Thirty-seven different 3∞ end amplification products reveal at least 12 different polyadenylation sites for the 0–2-h top1 transcripts (Fig. 3A). Poly(A) is added at sites ranging from nucleotides 3642 to 4251 of the cDNA clone of lctop1-2. Two of the amplification products are polyadenylated at the same site as that determined for the cDNA clone of lctop1-6. Also, an additional intron, 385 bp in size,
200
S.D. Brown et al. / Gene 211 (1998) 195–203
Fig. 2. Developmental expression and tissue specificity of the top1 transcripts. Ovaries and testes were dissected from adult flies. Total RNA was extracted from 0–2-h Drosophila embryos ( lane 1), 6-12-h Drosophila embryos ( lane 2), whole female flies ( lane 3), ovaries ( lane 4), remaining female heads and bodies after ovary dissection ( lane 5), whole male flies ( lane 6), testes ( lane 7), and remaining male heads and bodies after testis dissection ( lane 8). RNAs were separated on a 1.1% agarose gel containing formaldehyde, transferred to nitrocellulose, and hybridized to probes made from the 5.2-kb top1 cDNA (upper panel ) and RP49 as a control for amount of total RNA loaded ( lower panel ). The sizes of RNA markers ( lanes M ) made from in-vitro transcription products off linearized plasmid DNA of the top1 cDNAs are indicated.
is spliced out in the 3∞ untranslated region of some maternal transcripts of the top1 gene (marked with asterisks on lines in Fig. 3A). Just as the other seven top1 introns, the newly determined eighth intron also conforms to the consensus sequence for the splicing of Drosophila introns (Table 1). The multiple sites of polyadenylation and the alternative splicing of this additional intron account for the molecular heterogeneity of the 0–2-h top1 transcripts. Experiments have also been performed in which 0–2-h total RNA has been incubated with and without oligo(dT ) prior to RNase H treatment. Their results show that the poly(A) tract length of these transcripts, less than 100 adenosine residues, does not significantly contribute to this heterogeneity in length (data not shown). The exact mechanism for the multiple poly(A) addition sites is not known but may be due to a reduced specificity of polyadenylation and cleavage during oogenesis. In marked contrast, four different 6–12-h 3∞ end amplification products confirm a precise polyadenylation site for the 6–12-h top1 transcripts. All show that poly(A) is added at just one nucleotide downstream from nucleotide 5132, where poly(A) is added to the cDNA clone of lctop1-2 (Fig. 3B). This downstream nucleotide, T, is consistent with the genomic sequence. It is likely that this T may have been lost during the initial cDNA library construction and that during the
cDNA screening process, the clone of lctop1-2 lacking this nucleotide was selected. The single poly(A) addition site for the 6–12-h transcripts is also consistent with previous Northern data, showing a single zygotic transcript at this developmental stage [Fig. 2, lane 2; see also Lee et al. (1993)]. 5∞ RACE has been performed for the 0–2-h and the 6–12-h top1 transcripts. Fig. 4 depicts the results of these experiments, investigating the initiation of transcription for the top1 gene. Of the 14 amplification products generated for the 0–2-h transcripts (depicted as open circles in Fig. 4), their 5∞ ends fall within six different sites, ranging from 25 to 67 nucleotides upstream of the first exon as observed in the cloned cDNA of lctop1-2 or lctop1-6. Results of the four 5∞ end amplification products determined for the 6–12-h top1 transcripts are shown as the closed circles in Fig. 4. One is initiated 36 bp upstream of cDNA exon 1, whereas the other three are initiated 40 nucleotides upstream of the first cDNA exon. Only the site ATCAGCC beginning 67 bp upstream of cDNA exon 1 conforms best to the consensus for Drosophila transcription start sites, ATCAG/ TC/ (Snyder et al., 1982; T T Hultmark et al., 1986). Therefore, it is possible that transcription actually starts near this site, whereas the other 5∞ ends observed in the cDNA and RT/PCR clones are generated because of the stalling of reverse transcriptase near the 5∞ ends of the mRNA. Sequence analysis of the top1 gene shows that its upstream promoter region lacks TATA and CCAAT boxes. It is possible that top1 mRNA initiates at heterogeneous sites within a narrowly defined region. Cellular housekeeping genes, generally lacking upstream regulatory elements like TATA and CCAAT boxes, often initiate transcription at multiple, closely spaced sites (Stout and Caskey, 1985). Therefore, the 5∞ RACE results demonstrate that both the 0–2-h and 6–12-h transcripts are initiated in a narrowly defined region of sequence and that this limited amount of heterogeneity at the 5∞ end does not account for the size variations observed in the 0–2-h message. 3.4. Examining regulatory sequences in the 3∞ untranslated region of the top1 gene The differentially expressed top1 transcripts that are characterized by the variation in the polyadenylation sites have prompted us to search for regulatory elements in the sequence of the 3∞ untranslated region ( Fig. 5). Cytoplasmic polyadenylation has been shown to be essential for activation of translationally dormant maternal mRNAs [reviewed in Wickens (1992), Curtis et al. (1995) and Richter (1996)]. During oocyte maturation, it requires two cis-acting elements, the canonical nuclear polyadenylation signal AAUAAA, usually located 10–30 nucleotides upstream of a poly(A) addition site,
S.D. Brown et al. / Gene 211 (1998) 195–203
201
Fig. 3. Analysis of 3∞ end amplification products. The exons the top1 mRNA are represented as boxes beginning with a portion of exon 5. Downward arrowed lines show the translational stop codon UAA and the polyadenylation sites, A , for the two top1 cDNAs. Their positions are marked in n (B). Positions of top1 oligos used in PCR amplifications are indicated as flags pointing to the right. (A) Zero- to 2-h 3∞ end amplification products. Thirty-seven different 3∞ end amplification products are depicted at 12 different sites of polyadenylation with the length of each line proportional to the number of identified products. Representation of a single independent amplification product is shown. Splicing of intron 8 (shaded region within last exon) is seen in two amplification products and is represented as the symbol for an independent amplification product with an asterisk. (B) Six- to 12-h 3∞ end amplification products. Four different 3∞ end amplification products are depicted at a single site of polyadenylation, just one nucleotide downstream of the polyadenylation site for lctop1-2.
Fig. 4. Analysis of 5∞ end amplification products. The exons of the top1 mRNA are represented as boxes. The translational start codon AUG and stop codon UAA are shown. The polyadenylation sites, A , for n the two top1 cDNAs are shown. A 0–2-h independent amplification product is represented as an open circle; a 6–12-h independent amplification product is shown as a closed circle.
and a U-rich cytoplasmic polyadenylation element (CPE ) minimally defined as U AU (Paris and Richter, 4 1990). In the top1 3∞ untranslated region, there is only one perfect match for the polyadenylation/cleavage sequence
AAUAAA ( Fig. 5B, shown as the line with the asterisk), located 240 nucleotides upstream of the poly(A) addition site for 5.2-kb message. However, if allowing for single base substitutions, these sequences can be found near the poly(A) addition sites for 3.8–4.2-kb and 5.2-kb messages (Fig. 5B). Although these sequence variations alter the efficiency of polyadenylation, they can function as signals for polyadenylation (Sheets et al., 1990). The sequences U AU, U AU, and U AAU function 4 5 5 as cytoplasmic polyadenylation elements in several Xenopus RNAs during oocyte maturation [reviewed in Richter (1996)]. The locations of these U-rich sequences in the 3∞ untranslated region of Drosophila top1 are noted in Fig. 5B (vertical lines seen for CPEs). Of the four CPEs depicted, the first is U AU, the second and 4 fourth are U AU, and the third is U AAU. 5 5 AU-rich elements (AREs) that are found in the 3∞ untranslated regions of mammalian proto-oncogene, lymphokine, and cytokine mRNAs confer message instability. Although the exact sequence of the ARE has not been completely defined, it is thought that the consensus
202
S.D. Brown et al. / Gene 211 (1998) 195–203
Fig. 5. Regulatory sequences in the 3∞ untranslated region of the top1 gene. (A) Representation of the top1 mRNA. The shaded box represents the coding region of top1. Positions of translational start codon AUG, stop codon UAA and Poly(A), A , are indicated. (B) Schematic representation n of cis-regulatory sequences within the 3∞ untranslated region. The following polyadenylation signals, Poly(A) signals, are shown as a vertical line with the following symbols: AAUAAA, asterisk; AUUAAA, open circle; AGUAAA, closed box; CAUAAA, open diamond; UAUAAA, closed circle; AAUACA, open box; AAUAUA, line only. Cytoplasmic polyadenylation elements U AU, U AU, or U AAU are indicated as vertical lines 5 4 5 on the line labeled CPEs. An AU-rich element, AUUUA, is represented as a vertical line on the line labeled AREs. The block labeled as 0–2-h transcripts defines the region for the polyadenylation sites of the 0–2-h 3∞ end amplification products. (C ) Representation of highly stable RNA secondary structures within the 3∞ untranslated region. Top: Stable RNA structures are represented as darkly shaded rectangles. All hairpin structures depicted have a predicted melting temperature above 75°C. Bottom: Structure 2, the nucleotide sequence and predicted secondary structure are shown. The numbering of this structure is from the lctop1-2 sequences.
may be multiple copies of the sequence AUUUA. The search for the pentamer AUUUA reveals that eight copies of this motif are present in the top1 3∞ untranslated region, depicted in Fig. 5B as the vertical lines seen for AREs. Indeed, the top1 3∞ untranslated region is highly AU-rich. Whereas the base composition of the top1 coding region is 46% AU-rich, the composition of its 3∞ untranslated region is 64% AU-rich. The steady-state levels of message may also be affected by RNA secondary structures within the top1 3∞ untranslated region. RNA folding algorithms predict at least five highly stable stem and loop structures that are present in the top1 3∞ untranslated region ( Fig. 5C, analysis by mfold server; Michael Zuker, Washington University Medical School ). In particular, there is one hairpin structure approximately 70 nucleotides in length,
that comprises primarily AU repeats in the stem plus three loops ( Fig. 5C, structure 2). This structure is highly stable and has a predicted melting temperature of 80°C. It is possible that this secondary structure or others may have some effect in the stabilization of the top1 messages. The block labeled as 0–2-h transcripts in Fig. 5B defines the region for the multiple polyadenylation sites of the 0–2-h messages obtained by 3∞ RACE amplification products. A significant number of the regulatory sequences and potential secondary structures that control polyadenylation and message stability are situated within or surrounding the region denoted as the 0–2-h top1 transcripts. However, it remains to be determined as to which sequences, if any, play a role in the regulation of the top1 maternal messages.
S.D. Brown et al. / Gene 211 (1998) 195–203
Acknowledgement We thank Dr Maxwell P. Lee for his work in the early stages of this research. This work is supported by grant GM29006 from the National Institutes of Health.
References Curtis, D., Lehmann, R., Zamore, P.D., 1995. Translational regulation in development. Cell 81, 171–178. DiNardo, S., Voelkel, K.A., Sternglanz, R., Reynolds, A.E., Wright, A., 1982. Escherichia coli DNA topoisomerases I mutants have compensatory mutations in DNA gyrase genes. Cell 31, 43–51. Edwalds-Gilbert, G., Veraldi, K.L., Milcarek, C., 1997. Alternative poly(A) site selection in complex transcription units: means to an end? Nucleic Acids Res. 25, 2547–2561. Frohman, M.A., Dush, M.K., Martin, G.R., 1988. Rapid production of full-length cDNAs from rare transcripts: Amplification using a single gene-specific oligonucleotide primer. Proc. Natl. Acad. Sci. USA 85, 8998–9002. Gupta, M., Fujimori, A., Pommier, Y., 1995. Eukaryotic DNA topoisomerases I. Biochim. Biophys. Acta 1262, 1–14. Hsieh, T.-s., Brown, S.D., Huang, P., Fostel, J., 1992. Isolation and characterization of a gene encoding DNA topoisomerase I in Drosophila melanogaster. Nucleic Acids Res. 20, 6177–6182. Hsieh, T.-s., 1993. DNA topoisomerases. In: Linn, S.M., Lloyd, R.S., Roberts, R.J. ( Eds.), Nucleases, 2nd ed. Cold Spring Harbor Laboratory Press, Plainview, NY, pp. 209–233. Hultmark, D., Klemenz, R., Gehring, W.J., 1986. Translational and transcriptional control elements in the untranslated leader of the heat shock gene hsp22. Cell 44, 429–438. Karlin, S., Burge, C., 1996. Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc. Natl. Acad. Sci. USA 93, 1560–1565. Keller, E.B., Noon, W.A., 1985. Intron splicing: A conserved internal signal in introns of Drosophila pre-mRNAs. Nucleic Acids Res. 13, 4971–4981. Lee, M.P., Brown, S.D., Chen, A., Hsieh, T.-s., 1993. DNA topoisomerase I is essential in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 90, 6656–6660. Marchionni, M., Gilbert, W., 1986. The triose phosphate isomerase gene from maize: Introns antedate the plant–animal divergence. Cell 46, 131–141. Morham, S.G., Kluckman, K.D., Voulomanos, N., Smithies, O., 1996. Targeted disruption of the mouse topoisomerase I gene by camptothecin selection. Mol. Cell. Biol 16, 6804–6809. Mount, S.M., Burks, C., Hertz, G., Stormo, G.D., White, O., Fields, C., 1992. Splicing signals in Drosophila: Intron size, information content, and consensus sequences. Nucleic Acids Res. 20, 4255–4262. O’Connell, P., Rosbash, M., 1984. Sequence, structure, and codon preference of the Drosophila ribosomal protein 49 gene. Nucleic Acids Res. 12, 5495–5513. Pandit, S.D., Richard, R.E., Sternglanz, R., Bogenhagen, D.F., 1996. Cloning and characterization of the gene for the somatic form of DNA topoisomerase I from Xenopus laevis. Nucleic Acids Res. 24, 3593–3600. Paris, J., Richter, J.D., 1990. Maturation-specific polyadenylation and translational control: Diversity of cytoplasmic polyadenylation elements, influence of poly(A) tail size, and formation of stable polyadenylation complexes. Mol. Cell. Biol. 10, 5634–5645. Pruss, G.J., Manes, S.H., Drlica, K., 1982. Escherichia coli DNA topoi-
203
somerase mutants: Increased supercoiling is corrected by mutations near gyrase genes. Cell 31, 35–42. Richard, R.E., Bogenhagen, D.F., 1991. The 165-kDa DNA topoisomerase I from Xenopus laevis oocytes is a tissue-specific variant. Dev. Biol. 146, 4–11. Richter, J.D., 1996. Dynamics of poly(A) addition and removal during development. In: Hershey, J.W.B., Mathews, M.B., Sonenberg, N. ( Eds.), Translational Control. Cold Spring Harbor Laboratory Press, Plainview, NY, pp. 481–503. Sadoff, B.U., Heath-Pagliuso, S., Castano, I.B., Zhu, Y., Kieff, F.S., Christman, M.F., 1995. Isolation of mutants of Saccharomyces cerevisiae requiring DNA topoisomerase I. Genetics 141, 465–479. Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sheets, M.D., Ogg, S.C., Wickens, M.P., 1990. Point mutations in AAUAAA and the poly(A) addition site: Effects on the accuracy and efficiency of cleavage and polyadenylation in vitro. Nucleic Acids Res. 18, 5799–5805. Snyder, M., Hunkapiller, M., Yuen, D., Silvert, D., Fristrom, J., Davidson, N., 1982. Cuticle protein genes of Drosophila: Structure, organization, and evolution of four clustered genes. Cell 29, 1027–1040. Stewart, L., Ireton, G.C., Champoux, J.J., 1996a. The domain organization of human topoisomerase I. J. Biol. Chem. 271, 7602–7608. Stewart, L., Ireton, G.C., Parker, L.H., Madden, K.R., Champoux, J.J., 1996b. Biochemical and biophysical analyses of recombinant forms of human topoisomerase I. J. Biol. Chem. 271, 7593–7601. Stout, J.T., Caskey, C.T., 1985. HPRT: Gene structure, expression, and mutation. Annu. Rev. Genet. 19, 127–148. Thrash, C., Bankier, A.T., Barrell, B.G., Sternglanz, R., 1985. Cloning, characterization, and sequence of the yeast DNA topoisomerase I gene. Proc. Natl. Acad. Sci. USA 82, 4374–4378. Thrash, C., Voelkel, K., DiNardo, S., Sternglanz, R., 1984. Identification of Saccharomyces cerevisiae mutants deficient in DNA topoisomerase I activity. J. Biol. Chem. 259, 1375–1377. Tosh, K., Kilbey, B., 1995. The gene encoding topoisomerase I from the human malaria parasite Plasmodium falciparum. Gene 163, 151–154. Uemura, T., Yanagida, M., 1984. Isolation of type I and II DNA topoisomerase mutants from fission yeast: Single and double mutants show different phenotypes in cell growth and chromatin organization. EMBO J. 3, 1737–1744. Uemura, T., Morino, K., Uzawa, A., Shiozaki, K., Yanagida, M., 1987. Cloning and sequencing of Schizosaccharomyces pombe DNA topoisomerase I gene, and effect of gene disruption. Nucleic Acids Res. 15, 9727–9739. Vosberg, H.-P., 1985. DNA topoisomerases: Enzymes that control DNA conformations. Curr. Topics Microbiol. Immunol. 114, 19–102. Wang, J.C., 1996. DNA topoisomerases. Annu. Rev. Biochem. 65, 635–692. Wharton, K.A., Yedvobnick, B., Finnerty, V.G., Artavanis-Tsakonas, S., 1985. opa: A novel family of transcribed repeats shared by the Notch locus and other developmentally regulated loci in D. melanogaster. Cell 40, 55–62. Wickens, M., 1992. Forward, backward, how much, when: Mechanisms of poly(A) addition and removal and their role in early development. Dev. Biol. 3, 399–412. Wickens, M., Kimble, J., Strickland, S., 1996. Translational control of developmental decisions. In: Hershey, J.W.B., Mathews, M.B., Sonenberg, N. (Eds.), Translational Control. Cold Spring Harbor Laboratory Press, Plainview, NY, pp. 411–450. Zhang, C.X., Lee, M.P., Chen, A.D., Brown, S.D., Hsieh, T.-s., 1996. Isolation and characterization of a Drosophila gene essential for early embryonic development and formation of cortical cleavage furrows. J. Cell Biol. 134, 923–934.