Comparative analysis of the RED1 and RED2 A-to-I RNA editing genes from mammals, pufferfish and zebrafish

Comparative analysis of the RED1 and RED2 A-to-I RNA editing genes from mammals, pufferfish and zebrafish

Gene 250 (2000) 41–51 www.elsevier.com/locate/gene Comparative analysis of the RED1 and RED2 A-to-I RNA editing genes from mammals, pufferfish and ze...

621KB Sizes 1 Downloads 19 Views

Gene 250 (2000) 41–51 www.elsevier.com/locate/gene

Comparative analysis of the RED1 and RED2 A-to-I RNA editing genes from mammals, pufferfish and zebrafish D. Slavov a,b, M. Clark b, K. Gardiner a, * a Eleanor Roosevelt Institute, 1899 Gaylord Street, Denver, CO 80206-1210, USA b Max Planck Institute for Molecular Genetics, Berlin, Germany Received 18 November 1999; received in revised form 27 March 2000; accepted 31 March 2000 Received by G. Bernardi

Abstract One type of RNA editing involves the deamination of adenosine (A) residues to inosines (I ) at specific sites in specific premRNAs. These inosines are subsequently read as guanosines by the ribosome, with potentially significant consequences for protein sequence. In mammals, two such A-to-I RNA editases are RED1, which edits some serotonin and glutamate receptors, and RED2, with unidentified substrates. To study the evolutionary conservation among these editases, we have isolated homologous genes from the Japanese pufferfish, Fugu rubripes. Fugu has two genes homologous to Red1 that are similar in size and organization and that show a fivefold compaction relative to the human gene; they differ, however, in their base compositional features. The Fugu gene for RED2 is unusually large, spanning more than 50 kb; within the largest intron, there is evidence for a novel gene on the opposite strand. Because of these unusual features, the partial genomic structure was determined for the mouse RED2 gene. A partial cDNA for RED1 was also isolated from zebrafish. Comparisons between fish and between fish and mammals of the protein sequences show that the catalytic domains are highly conserved for each gene, while the RNA-binding domains vary within a single protein in their levels of conservation. Different levels of conservation among domains of different functional roles may reflect differences in editase substrate specificity and/or substrate sequence conservation. © 2000 Elsevier Science B.V. All rights reserved. Keywords: Fugu rubripes; Genomic organization; RED1; RED2; RNA editing; Zebrafish

1. Introduction RNA editing refers to a number of processes, excluding exon splicing, in which the nucleotide sequence of an initial RNA transcript is altered (reviewed in Smith et al., 1997). In higher eukaryotes, one type of RNA editing involves the deamination of adenosine (A) residues at specific sites in specific pre-mRNAs (reviewed in Bass, 1997). The resultant inosine (I ) residues are subsequently read as guanosines by the ribosomes, with predictable consequences for the amino acid specification of the associated codon and with potentially significant consequences for the associated protein sequence

Abbreviations: bp, base pair; kb, kilobase pair; RED1, RNA editase 1; RED2, RNA editase 2; RT-PCR, reverse transcription polymerase chain reaction. * Corresponding author. Tel.: +1-303-336-5652; fax: +1-303-333-8423. E-mail address: [email protected] ( K. Gardiner)

and function. Currently, the known mammalian substrates for A-to-I editing include the pre-mRNAs for the ionotropic glutamate receptors, GluRsB-D and GluRs5 and 6 (reviewed in Maas et al., 1997), and the serotonin receptor, 5HT-2C (Burns et al., 1997). The extent of editing varies with the substrate and with the site within the substrate, as well as among different brain regions and during development (Lomeli et al., 1994; Paschen and Djuricic, 1995; Paschen et al., 1997). Editing at specific sites produces, in the glutamate receptors, decreased calcium permeability (reviewed in Seeburg 1996) and, in the serotonin receptor, reduced G-protein coupling (Burns et al., 1997). Thus, A-to-I RNA editing is a process that can produce complex patterns of protein sequence and functional diversity Three RNA editases that can carry out A-to-I editing have been reported in mammals. DRADA and RED1 are known to edit sites in the glutamate receptors and in the serotonin receptor (Melcher et al., 1996a; Burns et al., 1997). The third editase, RED2, in spite of

0378-1119/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 0 0 ) 0 0 17 4 - 8

42

D. Slavov et al. / Gene 250 (2000) 41–51

significant sequence similarities with RED1 and brainspecific expression, does not edit the GlurB glutamate receptor and, so far, is of unknown substrate specificity (Melcher et al., 1996b). The expression levels of these editases and the levels of inosine detected in rat brain both suggest that there are numerous additional substrates for A-to-I editing (Paul and Bass, 1998). Because of the potentially wide reaching effects of RNA editing, further analysis of the associated editase genes is of interest. To this end, we have begun a comparative analysis of the protein sequences and the gene structural organization of editase genes from human, rodent, pufferfish and zebrafish. Protein sequences have been reported previously for human and rat RED1 and the intron/exon structure has been determined for the human gene ( Villard et al., 1997). Human RED1 and DRADA, although showing regions of significant amino acid sequence similarity, are easily distinguished by structure. DRADA contains three RNA-binding domains, each split between two exons ( Wang et al., 1995); RED1 contains only two RNA-binding domains, both contained within a single large exon ( Villard et al., 1997). Thus, the DRADA and RED1 genes from other organisms should be distinguishable. For RED2, the only information previously available has been the protein sequence of the rat gene. The rat RED2 amino acid sequence is only 50% identical and 64% similar to the rat RED1, and therefore, RED2 and RED1 sequences from other organisms should also be distinguishable. Here, we discuss RED1 and RED2 genes; in the accompanying paper (Slavov et al., 2000a), we discuss DRADA. The pufferfish, Fugu rubripes, was chosen for structural analysis of the editase genes. The compact Fugu genome, approximately one-eighth the size of that of mammals, reflects in part small intron sizes and thus facilitates structural determinations ( Elgar et al., 1996). In search of RED1 homologues, three editase genes were isolated from Fugu: two homologous to RED1 and one homologous to RED2. A partial cDNA for RED1 was also obtained from the zebrafish, Danio rerio. Because of unusual structural features found in the Fugu RED2 gene, the mouse RED2 gene structure was also partially characterized. Differences in organization and sequence characteristics of these editase genes may reflect differences in substrate specificity and/or differences in editing efficiency, both between editases and among organisms.

2. Methods 2.1. Probes The human RED1 cDNA spanning approximately 1.5 kb of the 3∞ coding sequence has been described

previously ( Villard et al., 1997). To obtain a probe for RED2, primers were designed within putative exons 1, 2, 4 and 10 of the rat cDNA sequence (Accession No. U74586; Melcher et al., 1996b) and used to amplify the complete mouse cDNA sequence by RT-PCR from mouse brain total RNA. The primer sequences were: rr2-1a, ATGGCATCTGTCCTGGGG and rr2-3b, CCTTTGGTCATGACGATTC (spanning nucleotides 326–1539 of the cDNA rat sequence), and primers rr2-4a, GGACACCAAACAAGCACAG and rr2-10b, GAAACTGATCTTGCTCTG (spanning nucleotides 1543–2552 in the rat sequence). In both cases, products of the anticipated size were obtained; these were cloned in the TA vector (Invitrogen) and completely sequenced. The 5∞ and 3∞ mouse RED2 RT-PCR products were used separately to screen the mouse cosmid library. 2.2. Screening of arrayed libraries High-density filters of the following libraries were obtained from the Resource Center of the German Human Genome Project (Berlin, Germany) (Lehrach et al., 1990): the Fugu cosmid library 65 in Lawrist 4 constructed by G. Elgar ( Elgar et al., 1995), the mouse cosmid library 121 in Lawrist 7 constructed by C. Burgtrof, A. Poch and M. Wiles, and the zebrafish cDNA library 524 ( late somitogenesis) in pSport1 constructed by M. Clark. Libraries were initially screened with probes derived from the human RED1 and mouse RED2 cDNAs and subsequently with fragments derived from an appropriate Fugu cosmid. All hybridizations were carried out for 18 h at 42°C in a 40% formamide buffer (Sambrook et al., 1989), using probes labeled by random hexamer priming ( Feinberg and Vogelstein, 1989). Filters were subsequently washed at 50°C in 0.5× SSC/0.1%SDS. Clones were verified by Southern hybridization to contain the appropriate probe. Non-overlapping or partially overlapping cosmids chosen for further analysis were subcloned as partial MboI digests in pBluescript; 200–400 random subclones were arrayed and screened with appropriate cDNA probes to generate start sites for sequencing by primer walking. 2.3. DNA sequencing Plasmids from cosmid subcloning and cDNA clones were purified by standard miniprep procedures (Sambrook et al., 1989), and completely sequenced using SK/KS vector primers (pBluescript) or T7/Sp6 primers (pSport), followed by primer walking. Cosmids were purified through CsCl and directly sequenced through the region of interest using primers designed to the corresponding plasmid subclone sequence (for Fugu RED1a, RED1b and RED2) or to the mouse RED2 RT-PCR product sequence. Sequencing reactions were

D. Slavov et al. / Gene 250 (2000) 41–51

carried out using dideoxy dye-terminators (FsTaq) and modified cycling conditions (Philibert et al., 1995). Reactions were electrophoresed on an ABI373a. Sequences have been deposited in GenBank: Fugu genomic sequences for RED1a (18 kb), AF124050; RED1b (11 kb), AF124049 and RED2 (51 kb), AF124051, and zebrafish cDNA sequence for RED1, AF124333. 2.4. Sequence analysis Cosmid sequences were analyzed with four exon prediction programs using the Genotator software (Harris, 1997). The percentage GC and CpG dinucleotide frequency were determined using the Composition program of the GCG package ( Wisconsin Computer Group). CpG islands were identified using the tool at Oak Ridge National Laboratory (http://avalon. epm.ornl.gov/Grail-bin/EmptyGrailForm). BLASTX searches (Altschul et al., 1997) were used to identify regions of homology to RNA editase proteins. Two Sequence BLAST searches ( Tatusova and Madden, 1999) were used in calculation of pairwise domain identities and similarities. Multiple sequence alignments were obtained using ClustalW1.7 at the Baylor College of Medicine sequence analysis site (http://www.hgsc. bcm.tmc.edu/SearchLauncher).

3. Results and discussion 3.1. Gene identification Twelve Fugu cosmids were verified to hybridize at low stringency to the probe for the human RED1 coding sequence. Based on the number and sizes of bands generated by digestion with EcoRI (data not shown), these cosmids were divided into four groups representing apparently non-overlapping sequences: ICRFc65M1953Q5 and ICRFc65M2478Q5; ICRFc65O1963Q5; ICRFc65E0511Q5; ICRFc65P0965Q5, ICRFc65H0640Q5, ICRFc65G0640Q5, ICRFc65J1131Q5, ICRFc65N2311Q5, ICRFc65D0121Q5, ICRFc65L0413Q5 and ICRFc65O0959Q5 (hereafter, abbreviated clones names are used). Clones M1953, O1963 and E0511 were chosen for a detailed analysis from the first three groups. Clones P0965 and H0640 from the fourth group were shown to contain DRADA sequences and are discussed in the accompanying paper (Slavov et al., in press). Rescreening of the cosmid library with probes from E0511 and O1963 identified cosmids I1330 and C2339, respectively. Partial MboI digests from each cosmid were subcloned into plasmids, and several clones that were again positive by hybridization were completely sequenced. Using these sequences, direct sequencing of cosmid DNA was carried out by primer walking. By these

43

means, the following sequences were generated: 18 467 bp from cosmid M1953; 10 776 bp from O1963 and 51 000 bp from E0511 plus I1330. The exon prediction programs Grail2, Genscan, Genefinder and Xpound were used to locate coding exons within each of the four sequences. In each case, patterns of consistent exon prediction were seen (these data can be found at: http://www-eri.uchsc.edu). BLASTX analysis revealed homologies to mammalian RED1 in M1953 and O1963 and to rat RED2 in E0511 plus I1330. Thus, lowstringency hybridization identified a family of potential RNA editase genes. Amino acid comparisons and gene structural analyses are discussed separately for each homology. 3.2. RED1 protein conservation among fish and mammals The Fugu genes in cosmids M1953 and O1963 were designated RED1a and RED1b, respectively. For each gene, the locations and sizes of coding exons were determined from the cosmid sequences by comparison with the sizes and amino acid sequences of the human RED1 exons. Homologies to coding exons 2–5 and 7– 10 were easily discernible in both cosmids, supporting the suggestion that neither is a pseudogene. Coding exon 6 of the human gene is a translated AluJ sequence ( Villard et al., 1997); as expected, it is not present in either Fugu sequence. In the Fugu RED1b cosmid, a putative first coding exon, with six of nine amino acids identical to the human first coding exon, is identified within an exon prediction made by all four programs. For the Fugu RED1a cosmid, identification of the first coding exon is ambiguous. Two exon prediction programs, Genscan and Grail, both predict the same locations for two exons ~2.3 kb upstream of exon 2. Both exons contain a nine-amino-acid open reading frame that starts with a methionine and that could splice correctly to exon 2. There is, however, no homology with the Fugu RED1b or the mammalian RED1 first coding exon. Thus, it is not clear if one of these represents the bona fide exon 1, if both represent alternative first exons, or if neither is the correct protein start for RED1a. Screening of the zebrafish late somitogenesis cDNA library identified a single partial cDNA clone, ICRFp524A15139Q10, with a 4627 bp insert. Sequence analysis identified a 1098 bp open reading frame, followed by 3529 bp of 3∞ untranslated region. BLASTX searches revealed a very high sequence similarity with the human and rat RED1 proteins, starting at amino acid 335 of the human sequence and continuing to the termination codon (Fig. 1). Although the clone is truncated at the amino terminal end before the RNA-binding domains, the identity is so high throughout the catalytic domain region (>85%) that this is without doubt a zebrafish homologue of the mammalian RED1 protein.

44

D. Slavov et al. / Gene 250 (2000) 41–51

45

D. Slavov et al. / Gene 250 (2000) 41–51 Table 1 RED1 and RED2 protein domain homologiesa RBD1

RBD2

Catalytic

Mammalian vs. Fugu RED1a Fugu RED1b Fugu RED2

95%/97% 91%/96% 88%/ 94%

82%/94% 80%/89% 60%/78%

78%/91% 82%/92% 72%/84%

Fugu RED1b vs. Fugu RED1a Zebrafish

94%/99% N/A

90%/96% N/A

91%/96% 88%/96%

similarity between Fugu and mammals and, as is reasonable, show a slightly higher conservation between Fugu and zebrafish, at 88% identity and 96% similarity, when comparing Fugu RED1b to zebrafish. In all organisms, the cysteine residues believed to be involved in coordination of zinc ions, and thus required for catalytic function ( Kim et al., 1994), are conserved. 3.3. Genomic features of the Fugu RED1 genes

a Percentages indicate the identity/similarity for the RNA-binding domains (RBD) and the ~170 amino acids of the catalytic domain surrounding the conserved cysteines that are indicated in Figs. 1 and 2.

Alignments of the RED1 protein sequences from human, rat, zebrafish and Fugu are shown in Fig. 1. As expected, the Fugu proteins are highly similar to each other, with an overall identity of 78% and similarity of 84%. In comparison with the human sequences, Fugu Red1a is overall 70% identical and 79% similar, and RED1b is 74% identical and 82% similar. As anticipated, the RNA-binding domains are the regions with highest conservation both between fish and between fish and mammals. As shown in Table 1, the first RNA-binding domains (RBD1) of Red1a and RED1b are, respectively, 95% and 91% identical to the human RBD1. Interestingly, the second domains, RBD2, are only 82 and 80% identical with the human. The catalytic domains average 78 and 82% identity, and 91 and 92%

Table 2 lists the sizes and locations of each exon and intron of the Fugu RED1a and b genes. For comparison, exon and intron sizes are also given for the human gene. All splice sites conform to the consensus sequences (data can be found at http://www-eri.uchsc.edu). Exon sizes are almost identical among the three genes; this includes the large (>900 bp) second exon that contains the two RNA-binding domains. The relative sizes of corresponding introns in the two Fugu genes are variable; for example, intron 7 in RED1a is 872 bp versus 2500 bp in RED1b, while intron 9 in RED1a is 1097 bp versus 225 bp in RED1b. Introns average 5.5 kb for the human gene, and approximately 1 kb and 940 bp for Fugu 1a and 1b, respectively. Adding the approximately 2.1 kb coding regions, the human gene spans 52.5 kb versus 10.4 kb and 9.7 kb for the Fugu genes. This approximately fivefold difference is consistent with the eightfold difference in genomic size between Fugu and human, and with the typical compression seen previously in Fugu genes ( Tassone et al., 1999).

Table 2 Exon/intron sizes and locations for Fugu and human RED1 genes Humana

Fugu

RED 1

RED 1A

Number Exon size (nt)

Intron size (nt)

1

28

4048

2 3 4 5 6 7 8 9 10

935 115 169 149 120 169 182 179 180

3661 2137 609 960 328 19 423 16 096 1000 –

RED 1B

Number Exon size Exon locationb

Intron size Number Exon size Exon locationc Intron size

1a 1b 2 3 4 5 N/A 7 8 9 10

28 28 935 115 169 155

3379–3406 3786–3813 6741–7675 9606–9720 10 259–10 427 10 933–11 087

3334 2927 1930 538 505 226

169 182 179 177

11 314–11 482 12 347–12 528 12 724–12 902 14 001–14 177

864 195 1098 –

1

28

718–745

2360

2 3 4 5 N/A 7 8 9 10

926 115 169 152

3106–4031 4733–4847 5220–5390 5519–5670

701 374 128 517

169 182 179 177

6190–6358 9010–9191 9972–10 150 10 406–10 582

2651 780 255 –

a Data from Villard et al. (1997) and Accession No. AJ239326. b Within sequence of cosmid M1953 (Accession No. AF124050). c Within sequence of cosmid O1963 (Accession No. AF124049).

Fig. 1. Putative amino acid sequences of RED1 proteins. Hs, human, Accession No. 2114493; Rn, rat, P51400; Dr, zebrafish, from cDNA clone ICRFp524I13A10; Fr, Fugu sequence derived from cosmid M1953 (RED1a) and cosmid O1963 (Red1b). Numbers on the left correspond to amino acids; locations of introns in human and Fugu sequences are indicated by arrowheads above and below the sequences, respectively; exons are numbered. Shaded boxes indicate the RNA-binding domains; catalytic domain cysteine residues are shown as shaded ‘C’s.

46

D. Slavov et al. / Gene 250 (2000) 41–51

(a)

(b)

D. Slavov et al. / Gene 250 (2000) 41–51

Although there are no dramatic differences in size or structure to distinguish RED1a and RED1b, the two genes do differ in two features: base composition and CpG dinucleotide frequency. The genomic sequence spanning RED1a is only 39.5% GC and the CpG frequency, observed/expected, is only 0.32. Both these numbers are low with respect to observations of the Fugu genome as a whole and in comparison with other documented genes ( Elgar et al., 1995). The Fugu genome averages 45% GC with a CpG frequency of 0.6, typical of cold-blooded vertebrates. In contrast to RED1a, Red1b appears more typical of Fugu, at 47.1% GC with a CpG frequency of 0.64. Both Fugu genes have CpG islands located near the 5∞ end of the coding regions. For RED1a, the island spans the vicinity of the putative first exons. For RED1b, there are three separate islands, two within intron 1, and one within exon 2. For RED1a, the stop codon is located at nucleotide 12902, and the first polyadenylation site is 3061 nucleotides downstream. This distance is similar to the 2.9 kb 3∞UTR of the shorter human transcript, and 3.6 kb 3∞ UTR of the zebrafish cDNA. A comparison of the Fugu putative 3∞UTR sequence with that of the zebrafish did not reveal any significant similarities or unusual features of base composition. This is in contrast to the 3∞ UTRs of mammals, where a stretch of >200 nucleotides of near identity is present in the human, rat and mouse genes and segments of an unusually high GC level are seen downstream in the human and mouse genes ( Villard et al., 1997; and unpublished observations). For RED1b, the cosmid insert ends immediately downstream of the stop codon, and therefore, no information is available on polyadenylation sites. 3.4. RED2 protein conservation between Fugu and mammals Fugu cosmids E0511 and I1330 overlap and both contain sequences with homology to the rat RED2 protein. Similar to the analysis of the RED1cosmids, the preliminary locations and sizes of Fugu RED2 coding exons were determined from exon prediction programs and BLASTX homologies to rat RED2. To obtain mouse RED2 protein sequence, primers were designed to the rat sequence and used to amplify RED2 cDNA from mouse brain. The specificity of the RT-PCR products was verified by sequencing, which also provided data for the protein sequence of the mouse gene. Screening of the zebrafish cDNA library unfortunately

47

failed to identify any positive clones. Comparisons of the mouse and rat protein sequences with the genomic sequences from Fugu cosmids E0511 and I1330 were used to refine exon locations of Fugu RED2 and to determine the amino acid sequence ( lacking the first ~30 amino acids; see below), as shown in Fig. 2. As with the RED1 genes, the regions of highest conservation are the RNA-binding domains, but overall, the conservation is lower between Fugu and mammals for RED2 than it was for the RED1 genes. As shown in Table 1, while the Fugu RED2 RBD1 is 88% identical and 94% similar to that of mammals, RBD2 is only 60% identical and 78% similar. Conservation in the catalytic domain is also reduced, but catalytic site cysteine residues remain conserved. These reduced levels of conservation do not suggest that this gene represents yet another homologue of RED1 because similarities to the mammalian RED1s are even lower: overall 49% identity and 63% similarity to rat RED1 versus 58% identity and 70% similarity to rat RED2. 3.5. Genomic features of the Fugu RED2 gene Table 3 lists the locations and sizes of all exons and introns as determined by a BLASTX analysis of exon predictions compared with the rat and mouse cDNA sequences. One clear feature is that the positions of introns in RED2 closely approximate their corresponding positions within the RED1 proteins. Thus, the equivalents of exons 2–5 and 7–10 were identified. As with Fugu RED1a, however, the location of a complete first coding exon was ambiguous. In sequence immediately 5∞ to what would correspond to the position of amino acid 33 of the rat sequence and within a putative first exon predicted by both Grail and Genscan, there are no regions in the cosmid of discernible similarity or even regions that maintain the downstream open reading frame. These data suggest that there is an intron at this location in the Fugu gene producing exons 1a and 1b. Approximately 4.6–5.0 kb upstream, there are a series of potential first exons, designated as such because they start with a methionine, that provide an approximately 30 amino acid open reading frame (to produce a protein of approximately the same size as the rat and mouse), and they splice correctly into the downstream open reading frame. Absent, however, is the stretch of six arginines seen in both the rat and mouse protein sequences. A second, more dramatic, feature of the RED2 gene

Fig. 2. Putative amino acid sequence of RED2 proteins. (a) Rn, rat, Accession No. U74586; Mm, mouse is from sequencing of RT-PCR products; Fugu is derived from sequencing overlapping cosmids E0511 and I1330. Other designations are as in Fig. 1; (b) Possible first exons 1a and 1b; each pair is of a similar size to the rat exon 1; exons 1a each start with a methionine and could correctly splice in frame to exon 1b; because none has homology to the mammalian sequence, exon 1a remains ambiguous.

48

D. Slavov et al. / Gene 250 (2000) 41–51

Table 3 Exon/intron sizes and locations for Fugu and mouse RED2 genesa Mouse RED2

Fugu RED2

Number

Exon size (nt)

Intron size (nt)

1A

N/A

1B

87

>12 000

2 3 4 5 6 7 8 9 10

908 115 169 152 N/A 169 182 179 N/D

>23 000 ~12 000 ~4000 ~9000 ~2500 N/D N/D –

Number

Exon size

Location

Intron size (nt)

1Aa 1Ab 1Ac 1Ad 1Ae 1Ba 1Bb 1Bc 1Bd 1Be 2 3 4 5

96 100 99 91 86 85 87 85 87 87 953 115 169 146

2749–2844 2749–2848 3620–3718 3895–3985 3910–3985 8636–8720 8634–8720 8636–8720 8634–8720 8634–8720 9197–10 149 28 772–28 886 34 288–34 456 36 192–36 337

5791 5785 4917 4648 4648 476 476 476 476 476 18 622 5401 1735 1680

7 8 9 10

169 182 182 177

38 018–38 186 41 695–41 876 45 476–45 657 46 799–46 975

3508 3500 1141 –

a Fugu data refer to sequences from cosmids E0511 and I1330. Multiple entries for Fugu exons 1A and 1B reflect the ambiguous locations of these exons; homology is low, but all could splice correctly into exon 2. Mouse data are derived from cosmids J102131, H05400 and I21237; intron sizes were obtained by PCR and are approximate. N/D, not determined.

is intron size: introns average >5 kb with only one of nine introns less than 1 kb. The largest intron, intron 2, is >18 kb. This contrasts with the average Fugu intron that is typically ~125 bp (Elgar et al., 1995, 1996). An unusual intron size may indicate functional importance, which, in the case of intron 2, is further suggested by the prediction (by both Grail and Genscan) of three exons within the 18 kb (data not shown). Consistent exon prediction, especially by Grail and Genscan, is strong supporting evidence for the presence of a bona fide exon (Claverie, 1997; Slavov et al., 2000b). Consensus splice sites are also present, and transcription could produce an open reading frame of >150 amino acids, which could, of course, be larger if additional, unpredicted, exons are present in this putative novel gene. The genomic structure for a mammalian RED2 gene was not known. However, because of the unusual features of the Fugu gene, it was considered important to examine the genomic structure of the mouse RED2 at least in part. Mouse RED2 RT-PCR products were used to screen a mouse cosmid library, identifying clones MPMGc121I102131Q2, MPMGc121H05400Q2 and MPMGc121I21237Q2. Mouse cDNA sequence, coupled with information on the locations and sizes of exons predicted in the Fugu RED2 genomic sequence, was used to design primers for direct cosmid sequencing. This allowed determination of precise intron/exon boundaries for all exons except exons 1 and 10 for which no cosmids were identified. This also confirmed

the conservation of intron locations between the mouse and Fugu genes. The cDNA sequence was also used to design primers for genomic PCR in an attempt to determine intron sizes. This effort was only partially successful; genomic PCR resulted in products only for introns 4–7, listed in Table 2. Nevertheless, these could be estimated to a total of ~27 500 bp. The corresponding introns in the Fugu gene total 12 325 bp, suggesting that the mammalian Red2 gene is not of an extraordinary size. Unfortunately, gaps in the coverage of the mouse cosmid library prevented contig construction across the most interesting intron, the 18 kb intron 2, and thus its size can be estimated only as >23 kb. The 51 kb spanning the Fugu RED2 gene is 45.8% GC with a CpG frequency, observed/expected, of 0.56 and thus seems typical for Fugu. Three CpG islands are predicted, one surrounding the putative first exons 1a, and two within the 18 kb intron 2. Several polyadenylation signals are found that could provide for 3∞ untranslated regions 1–3 kb in size. One last noteworthy feature of the Fugu RED2 gene structure concerns two additional exons, each of which has been predicted by Grail and Genscan plus Xpound. One of these, located on the reverse strand at approximately 50 kb has homology to a Fugu reverse transcriptase. The other is located on the forward strand at approximately 31 kb, within the 5 kb intron 3. This could represent another exon of RED2, perhaps alternatively spliced and rarely used because it was not reported in the rat cDNA (Melcher et al., 1996b). This exon

D. Slavov et al. / Gene 250 (2000) 41–51

could be spliced in frame to exons 3 and 4, and therefore, would not be expected to create a truncated open reading frame. This possibility was explored in mouse using whole-brain RNA. RT-PCR was carried out using primers within exons 2 and 4, 2 and 5, 3 and 4 and 3 and 5. In each case, only a single band of the size predicted from the known cDNA sequence was seen, and no additional larger band was detected using hybridization. From these data, we conclude that, at least in mouse, it is unlikely that an additional exon is present in this location.

4. Discussion 4.1. Genomic features of editase genes from Fugu Two homologues of the mammalian RED1 gene have been identified in Fugu. Neither is likely to be a pseudogene based on the presence of the correct number of exons with appropriate splice sites and open reading frames. While only a single RED1 gene has been reported for human and rodents and there is no evidence to suggest that a second exists ( Villard et al., 1997), a more thorough analysis may yet identify one. The two Fugu RED1 genes differ in base compositional features. The GC level of RED1a is low, only 39%, and the CpG frequency is suppressed, at 0.32; in contrast, RED1b reflects typical values for Fugu genes with a 46% GC level and a CpG frequency of 0.61. The differences in percentage GC3 for the proteins, however, are less significant, at 65 and 73% for Red1a and b, respectively. RED1 a and b do not differ significantly in size, both spanning approximately 10 kb. This represents a fivefold compaction relative to the human gene, again typical of previous comparisons between the corresponding genomes. For both Fugu RED1 genes, exon prediction using popular software is very good, and the high homologies to mammalian proteins for most regions make identification of exons straightforward. Only in RED1a and only for the first coding exon is a similarity to a mammalian protein not apparent. The RED2 gene, at >50 kb, is unusually large for a Fugu gene. It is composed of 10 coding exons; the locations for seven of these (exons 2–9 following the numbering of the RED1 gene) were demonstrated in the homologous mouse gene. It is of interest to note that the locations of introns in both the Fugu and the mouse RED2 genes are similar to their locations in the RED1 genes, with the exception of the RED2-specific intron found within the first coding exon. Thus, it is neither the number of introns nor some obligate feature of the protein sequence that increases the size of this gene. Introns of the Fugu RED2 gene average >5 kb, and the largest is >18 kb. The functional relevance of this large intron is suggested by the presence of consistent exon

49

predictions within it. Determination of the partial structure of the mouse RED2 gene shows that at least four of its introns are not enormous, ranging in size from 2.5 to ~12 kb, and averaging only twice the size of the corresponding Fugu introns. Unfortunately, gaps in the mouse cosmid library prevented contig construction across the intron corresponding to the 18 kb in Fugu, and thus, it only can be estimated at >23 kb. However, from the isochore location (Bernardi 1995) of the mouse gene, estimated as H2 from the percentage GC3 of 64%, RED2 would not be expected to be enormous, because within H2 isochores, the gene density averages approximately 1 per 50 kb (Zoubak et al., 1996). The software exon prediction was also excellent for this Fugu gene, with only the first coding exon being unrecognizable. There are multiple possible first exons, fulfilling appropriate criteria, located within a few kb of the homologies to exon 2. One additional exon is predicted, within intron 3. This could be spliced in frame to exon 4, and thus, its use would not produce a truncated protein. RT-PCR from mouse brain using a variety of exon primers failed to detect any evidence of its use. However, these experiments are not conclusive because additional splicing possibilities, although more complex, can be envisioned, such as splicing from the novel exon to other exons (e.g. 8 or 9) further 3∞ or to yet another novel exon. No such possibilities were tested in this work. 4.2. Protein sequence comparisons Mammalian RED1 and DRADA editases have both been shown to effect the deamination of specific adenosine residues in some serotonin and glutamate receptor pre-mRNAs. Their substrate specificities, however, while overlapping, are not identical. RED1 can edit both the R/G (arginine to glycine) and the Q/R (glutamine to arginine) sites of the GluRB glutamate receptor, but DRADA can edit only the R/G site (Melcher et al., 1996a). Five sites ( labeled A-E) are edited within the serotonin receptor 5HT-2C. Of these it has been shown that RED1 can edit sites A, C and D, while DRADA can edit only sites A and C (Burns et al., 1997). Glutamate receptors GluRC and D also each contain an R/G site, GluR5 contains a Q/R site, and GluR6 contains a Q/R, and I/V and a Y/C site. While enzyme specificities for these have not been ascertained, it is likely that RED1 and/or DRADA (or possibly RED2) play a role. RED1 (and DRADA) likely edits numerous other substrates in addition, based on two pieces of information: (1) RED1 is expressed at relatively high levels in all tissues and times that have been examined ( Villard et al., 1997), and (2) inosine, the immediate product of adenosine deamination, is found in mRNA in all tissues examined at levels of one per 150 000 nucleotides (in muscle) to one per 17 000 nucleotides (in brain) (Paul and Bass, 1998). These last numbers imply

50

D. Slavov et al. / Gene 250 (2000) 41–51

that, if only a single site on average is edited in any substrate, as many as one in 30 to one in three or four mRNA molecules may be edited (assuming an average mRNA size of ~5000 nucleotides). Thus, RNA editing has a major role to play in creating protein diversity from genomically encoded information. Determining the features of editing enzymes that influence substrate selection and efficiency of editing will help to elucidate both the overall extent of editing and the roles played by each enzyme. A cross-species comparison of protein sequences can be informative in these efforts. In addition to the Fugu sequences, partial protein sequences were determined for the zebrafish homologue of RED1. In all species, both RED1 and RED2 editase proteins contain two RNA-binding domains and a catalytic domain with putative zinc ion coordinating cysteine residues. For both proteins, these are the most highly conserved regions among organisms. There are, however, subtle variations in the levels of conservation among these regions, and there are some data to suggest that these differences may reflect variations in substrate selection or efficiency of editing. These possibilities are most easily seen in considering the DRADA proteins and data on the conservation among the three RNAbinding domains that are discussed in the accompanying paper (Slavov et al., 2000a). While functional data are lacking for the RNA-binding domains of RED1 and RED2, there are several noteworthy implications from their conservation. For RED1, RBD1 is very highly similar (97%) between Fugu and mammals and RBD2 is only slightly less so (94% similarity). Perhaps, this suggests that glutamate and serotonin receptors and other substrates of RED1 remain highly conserved at least within regions surrounding editing sites. Interestingly, for RED2, while RBD1 in Fugu remains very similar to that in mammals (94% similarity), RBD2 is considerably less conserved at only 78% similarity. RED2 remains a most intriguing enzyme because of its apparent brain specificity of expression. The lower level of conservation overall and in the RBD2 in particular may reflect the greater divergence between fish and mammals of RED2 brain-specific substrate pre-mRNAs. Outside the RNA-binding and catalytic domains, i.e. in the N terminal and interdomain segments, RED1 and RED2 proteins are both of limited extents (75 and 124 amino acids in RED1 and RED2, respectively, in both mammals and fish) and highly conserved among organisms.

5. Conclusions The isolation of A-to-I RNA editase genes has revealed that Fugu carries an extra copy of the RED1 gene and that the Fugu homologue of RED2 has unusual structural features. For all genes, a variation in the level

of conservation of the different RNA-binding domains suggests a possible variation in the sequence conservation of substrates that may be specific to each enzyme and to each RNA-binding domain within each enzyme. Information on the DNA and protein sequences of these editases allows experiments to be designed to test domain and sequence specificities.

Acknowledgements This is contribution #1745 from the Eleanor Roosevelt Institute. This work was supported by a grant from the National Cancer Institute to K.G. (CA78213).

References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Bass, B.L., 1997. RNA editing and hypermutation by adenosine deamination. Trends Biochem Sci. 22, 157–162. Bernardi, G., 1995. The human genome: organization and evolutionary history. Annu. Rev. Genet. 29, 445–476. Burns, C.M., Chu, H., Rueter, S.M., Hutchinson, L.K., Canton, H., Sanders-Bush, E., Emeson, R.B., 1997. Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 387, 303–308. Claverie, J.-M., 1997. Computational methods for the identification of genes in vertebrate genomic sequences. Hum. Mol. Genet. 6, 1735–1744. Elgar, G., Rattray, F., Greystrong, J., Brenner, S., 1995. Genomic structure and nucleotide sequence of the p55 gene of the puffer fish, Fugu rubripes. Genomics 27, 442–446. Elgar, G., Sandford, R., Aparicio, S., Macrae, A., Venkatesh, B., Brenner, S., 1996. Small is beautiful: comparative genomics with the pufferfish (Fugu rubripes). Trends Genet. 12, 145–150. Feinberg, A.P., Vogelstein, B., 1989. Addendum to a technique for radiolabeling DNA to high specific activity. Anal. Biochem. 137, 266–267. Harris, N.L., 1997. Genotator: a workbench for sequence annotation. Genome Res. 7, 754–761. Kim, U., Wang, Y., Sanford, T., Zeng, Y., Nishikura, K., 1994. Molecular cloning of a cDNA for double stranded RNA adenosine deaminase. Proc. Natl. Acad. Sci. USA 91, 11457–11461. Lehrach, H., Drmanac, R., Hoheisel, J., Larin, Z., Lennon, G., Monaco, A.P., Nizetic, V., Zehtner, G., Poustka, A., 1990. Hybridization fingerprinting in genome mapping and sequencing. In: Davies, K.E., Tilghman, S.M. (Eds.), Genome Analysis 1: Genetic and Physical Mapping. Cold Spring Harbor Laboratory Press, Boca Raton, FL. Lomeli, H., Mosbacher, J., Melcher, V., Hoger, V., Geiger, J.R.P., Kuner, T., Monyer, H., Higuchi, M., Bach, A., Seeburg, P.H., 1994. Control of kinetic properties of AMPA receptor channels by nuclear RNA editing. Science 266, 1709–1713. Maas, S., Melcher, T., Seeburg, P.H., 1997. Curr. Opin. Cell Biol. 9, 343–349. Melcher, T., Mass, S., Herb, A., Sprengel, R., Seeburg, P.H., Higuchi, M., 1996a. A mammalian RNA editing enzyme. Nature 379, 460–464. Melcher, T., Maas, S., Herb, A., Sprengel, R., Higuchi, M., Seeburg,

D. Slavov et al. / Gene 250 (2000) 41–51 P.H., 1996b. RED2, a brain-specific member of the RNA-specific adenosine deaminase family. J. Biol. Chem. 271, 31795–31798. Paschen, W., Djuricic, B., 1995. Regional differences in the extent of RNA editing of the glutamate receptor subunits GluR2 and GluR6 in rat brain. J. Neurosci. Meth. 56, 21–29. Paschen, W., Schmitt, J., Gissel, C., Dux, E., 1997. Developmental changes of RNA editing of glutamate receptor subunits GluR5 and GluR6: in vivo versus in vitro. Dev. Brain Res. 98, 271–280. Paul, M.S., Bass, B.L., 1998. Inosine exists in mRNA at tissue-specific levels and is most abundant in brain mRNA. EMBO J. 17, 1120–1127. Philibert, R., Hawkins, G., Damschroder-Williams, G., Stubblefield, B., Martin, B., Ginna, E., 1995. Direct sequencing of trinucleotide repeats from cosmid genomic DNA template. Anal. Biochem. 225, 372–374. Sambrook, J., Fritsch, E.F., Maniatis, E., 1989. Molecular Cloning: A Laboratory Manual. second ed., Cold Spring Harbor Press, Cold Spring Harbor, NY. Seeburg, P.H., 1996. The role of RNA editing in controlling glutamate receptor channel properties. J. Neurochem. 66, 1–6. Slavov, D., Crnogorac-Jurc˘evic´, T., Clark, M., Gardiner, K., 2000a. Comparative analysis of the DRADA A-to-I RNA editing gene from mammals pufferfish and zebrafish. Gene 250, 53–60. Slavov, D., Hattori, M., Sakaki, Y., Rosenthal, A., Shimizu, N.,

51

Minoshima, S., Kudoh, J., Yaspo, M.L., Ramser, J., Reinhardt, R., Reimer, C., Clancy, K., Rynditch, A., Gardiner, K., 2000b. Criteria for gene identification and features of genome organization: analysis of 6.5 Mb of DNA sequence from human chromosome 21. Gene 247, 215–232. Smith, H.C., Gott, J.M., Hanson, M.R., 1997. A guide to RNA editing. RNA 3, 1105–1123. Tassone, F., Villard, L., Clancy, K., Gardiner, K., 1999. Structures, sequence characteristics, and synteny relationships of the transcription factor E4TF1, the splicing factor U2AF35 and the cystathionine beta synthetase genes from Fugu rubripes. Gene 326, 211–223. Tatusova, T., Madden, T., 1999. BLAST 2 sequences — a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247–250. Villard, L., Tassone, F., Haymowicz, M., Welborn, R., Gardiner, K., 1997. Map location, genomic organization and expression patterns of the human RED1 RNA editase. Somat. Cell Mol. Genet. 23, 135–145. Wang, Y., Zeng, Y., Murray, J.M., Nishikura, K., 1995. Genomic organization and chromosomal location of the human dsRNA adenosine deaminase gene: The enzyme for glutamate-activated ion channel RNA editing. J. Mol. Biol. 254, 184–195. Zoubak, S., Clay, O., Bernardi, G., 1996. The gene distribution of the human genome. Gene 174, 95–102.