J. Mol. Biol. (1989) 296, 261-280
DNA Sequences of Two Expressed Nuclear Genes for Human Mitochondrial ADP/ATP Translocase Alison L. Cozens-f-, Michael J. Runswick and John E. Walkert Medical
Research Council Laboratory of Molecular Hills Road, Cambridge CBZ 2&H, U.K. (Received 18 July
Biology
1988)
Mitochondrial ADP/ATP translocase is an abundant component of the inner membrane. It carries ATP from the matrix into the intermembrane space and transports ADP back. Clones coding for two different but related forms of the protein have been characterized from bovine cDNA libraries. The corresponding genes are referred to as Tl and T2 and they are expressed at different levels in bovine tissues. The bovine cDNAs have been used to isolate clones from a human genomic library that contain the human Tl and T2 genes. Two nucleotide sequences of 9756 and 8625 base-pairs have been determined and they contain the transcribed regions of the human Tl and T2 genes which cover 4.2 and 5.9 kb of the human genome, respectively (1 kb = lo3 base-pairs). Both genes are split into four exons. The introns in each gene are at exactly equivalent locations and interrupt sequences coding for segments of the protein that are thought to be extramembranous loops linking t,ransmembrane segments. The proteins encoded in the genes differ in 32 amino acids out of 297. and in common with other ADP/ATP translocases, neither has a processed mit’ochondrial import sequence. The human Tl and T2 genes are members of a larger gene family t#hat includes a third expressed gene T3 and also at least two spliced pseudogenes. Other studies have shown that T3 is expressed in liver and HeLa cells, and different levels of transcripts of Tl have been found in various tissues. A notable feature of t,he Tl and T2 genes, that may influence their expression, is that “CpG-rich islands” are associated with their 5’ ends. That of the T2 gene cont’ains numerous potential sites for binding the mammalian transcription factor SPl, but no TATA or CCAAT sequences are evident near to its .5’ end, although these latter features are associated with the human Tl gene. The two DNA sequences also contain many short interspersed repetitive sequences including 11 Alu repeats. and a novel element about 236 base-pairs in length, which is repeated in a six-fold tandem array in intron B of the T2 gene.
1. Introduction
sequence has been used to design an oligonucleotide probe to isolate cognate cDNA clones from a library derived from bovine heart and liver. Analvsis of these clones has demonstrated the presence ‘in t,his library of related but different sequences for t,he translocase, derived from two homologous expressed genes that have been named Tl and T2 (Walker et al., 1987; Powell et al.. 1989). Sy hybridization experiments it has been demonstrated that, one gene is expressed predominantly in heart tissue, whereas the expression of the second gene predominates in intestine (Powell et al.. 1989). Other examples have emerged of differences in expression of homologous genes for mitochondrial proteins in different tissues within the same species. For example, it has been found that another intrinsic mitochondrial membrane protein, the dicyclohexylcarbodiimide reactive proteolipid subunit of ATP synthase, is the product, of two genes in bot,h the bovine (Gay & Walker. 1985) and
The inner membranes of mitochondria contain a number of proteins that are responsible for the t,ransport of metabolites (LaNoue & Schoolwerth, 1979. 1984). The most abundant of these, indeed t’he most plentiful membrane protein in heart mitochondria (Schultheiss & Klingenberg, 1985), is the ADP/ATP translocase. It carries ATP from the mitochondrial matrix across the inner membrane and transports ADP back (Klingenberg, 1985a,b). The translocase isolated from bovine heart mitochondria is a protein of 297 amino acids and its primary structure has been determined by direct sequence analysis (Aquila et al., 1982). It is a nuclear gene product,, and a segment of the protein t Present address: Immunology. 1 Author
Department of Microbiology and Ir.C.S.F., San Francisco, CA 94143, U.S.A. to whom correspondence should be sent.
261
0 1989 Academic Press Limited
262
A. L. Cozens et al.
human genomes (Farrell & Nagley, 1987; M. R. Dyer & J. E. Walker, unpublished results), and that the bovine genes are expressed in different ratios in various tissues (Gay & Walker, 1985). Also, the a-subunit of the ATP synthase complex has different isoforms expressed in bovine heart and liver (Walker et al., 1989; Breen, 1988). Another example is provided by the electron transfer complex, cytochrome c oxidase. Immunological studies have shown the presence of tissue-specific isoforms of some of its subunits in rats (KuhnNentwig & Kadenbach, 1985). Interest in these differences of expression between members of these gene families in various tissues has been increased by biochemical studies of tissue-specific mitochondrial myopathies in humans (Morgan-Hughes, 1986; Wallace, 1986; Clark et al., 1987; Capaldi, 1988). They have shown that often the malady is caused by a defect in an electron transport complex, or rarely, in ATP synthase, and that the defect is confined to the mitochondria of the diseased tissue; mitochondria in other tissues of the same individual function normally. In order to be able to study the regulation of the expression of the Tl and T2 genes, we employed the two bovine cDNAs to isolate genomic clones of the human homologues, and the sequences of the genes and of their flanking sequences have been determined. These experiments show that the transcribed regions of the human Tl and T2 genes are distributed over about 4.2 and 5.9 kbt of DNA, respectively. They have a common structure; each is divided into four exons, the introns being located at precisely the same positions in the two genes. Both in humans and cows it is now apparent that the Tl and T2 genes belong to a larger gene family 1989) that probably includes (Powell et al., pseudogenes. In addition, a third human gene for ADP/ATP translocase, which we refer to as T3, has been shown to be expressed in HL60 cells (Battini et al., 1987). Expression of human Tl, T2 and T3 has been found in liver (Houldsworth & Attardi, 1988), and different levels of transcripts of human Tl have been demonstrated in skeletal muscle, heart, kidney and HeLa cells (Neckelmann et al., 1987). Expression of Tl has also been demonstrated in Daudi cells (A. L. Cozens, unpublished results).
2. Materials and Methods (a) Screening
genomic libraries
The human genomic library AT5 was investigated. It consists of fragments from a partial Sau3A I digest of DNA from a T-cell primary tumour cloned into the BamHI site of 12001 (Taylor et al., 1981). It was grown on Escheriehia coli Q358, and 7.5 x 10’ recombinants were screened by the plaque hybridization method (Benton & Davis, 1977). Duplicate filters were pre-hybridized for 2 h at 65°C in a solution containing 5 x Denhardt’s solution (1 mg/ml each of polyvinyl pyrrolidone, bovine serum albumin (fraction V) and Ficoll), 6 x SSC (SSC is 0.15 Mt Abbreviation
used: kb, lo3 bases or base-pairs.
NaCl, 0.015 M-sodium citrate). 0.50/ (v/v) Sarkosyl and yeast RNA (0.1 mg/ml). Hybridization was carried out, for 18 h at 65°C in the same solution containing 10% (w/v) dextran sulphate and “prime-cut” probe (Farrell et al., 1983). Filters were washed at 65°C with 6 x SSC, and at -70°C for 24 to then 2 x SSC, and autoradiographed 72 h with preflashed film and an intensifying screen. Positive plaques which were present on both duplicate filters were rescreened under the same conditions. In the initial screen a mixture of 2 probes was used; one was derived from the bovine Tl cDNA (Z’aqI site to the polylinker, bases 922 to 1222), and the other from the bovine T2 cDNA (Sac1 site to the polylinker, bases 945 to 1370; see Powell et al. (1989) for sequences of bovine TI and T2 cDNAs). Thirteen positively hybridizing recombinants were identified. They were plaque-purified and then rescreened separately with each of the same two probes. Four of them ITl/l, 1Tl/4, 1T1/8 and nTl/lZ rescreened with the Tl probe and four, AT2/2, 1T2/3. 1T2/11 and nT2/13, with the T2 probe. In a second experiment, a further 5 x 10’ recombinants were examined with a probe containing the coding region of the bovine Tl cDNA (bases 1 to 1042), which would not be expected to distinguish between clones derived from the Tl and T2 genes. An additional 5 positively hybridizing recombinants were obtained. One of them. 1T1/16, rescreened with the Tl-specific probe and 2 of them, 1T2/14 and 1T2/15, with a T2-specific probe (Sac1 site to polylinker, bases 945 to 1370). DNA was prepared from all 11 of the clones that hybridized to the Tl or T2 probes. Restriction analysis indicated that all 5 recombinants, which hybridized to the Tl-specific probe, were overlapping. They all contained BamHI fragments of 3.8 and 5.9 kb, respectively; the former hybridized to a probe derived from the 3’ non-coding region of Tl (bases 922 to 1222) and the latter to a probe from the 5’ end of the Tl cDNA (bases 1 to 238; see Fig. l(a) and (b)). In recombinant 1T2/14, a single XhoI fragment of 9 kb hybridized with the T2specific probe (see Fig. l(c)). Recombinants 1T2/2, 1T2/3 and 1T2/11 all hybridized to a probe derived from the 3 end of the bovine T2 cDNA (bases 945 to 1370), but not to probes derived from the coding regions of either Tl or T2. However, recombinants 1T2/13 and LT2/15 contained sequences which hybridized both to the probe derived from the 3’ end of the bovine T2 cDNA and to probes derived from the coding region of the Tl cDNA, and so it seemed likely that these clones contained the T2 gene or related pseudogenes. (b) Sub-cloning of DNA fragments The 2 BamHI fragments of 5.9 and 3.8 kb derived from 1Tl/l were cloned separately into the BumHI site of the plasmid vector pUC 8 (Messing, 1983). The amplified fragments were excised from the vector and were purified by electrophoresis on high gelling temperature agarose. Similarly, the XhoI fragment (approx. 9.6 kb), which was later found to contain the T2 gene, was excised from 1T2/14 and cloned into the SaEI site of pUC18. Then the 3 sub-cloned pieces of DNA were excised from plasmids with BumHI (Tl fragments), and BamHI and Hind111 (T2 fragment). This latter digestion generated a fragment of about 8.6 kb, smaller than expected. Later, after the sequence had been completed. it was evident that the human DNA in this 8.6 kb fragment was flanked by a RamHI site and by a sequence derived from the original Xh,oI site (see Fig. 3). Therefore, the original fragment of DNA present in recombinant LT2/14 contained an
Hummn Mitochondrial ATVI, TI 5’ probe (a)
ib)
Cc) (d)
(e)(f)
ADPIATP
Translomse Genes
263
ATV14, T2 3’ probe
XTI/I, TI 3’ probe (a)
(b)(c)
(d)(e)
(a)
(f)
Cb) (c)
Cd) (e)
23.1-9.6
9Q”51 4.4-
5.9
Figure 1. Hybridization to probes for the Tl and T2 genes of human genomic DNA in recombinant I phages. Digests with various restriction endonucleases were fractionated on 0.7% (w/v) high-gelling temperature agarose mini-gels (10 cm across x 6.5 cm long). .Left-hand panel, ITl/l hybridized with probe derived from 5’ end of bovine Tl cDNA (bases 1 to 238 polylinker, BaZI site); middle panel, ATI/ hybridized with probe from 3’ end of bovine Tl cDNA (bases 922 to 1222 Ta(11 site, polylinker; right-hand panel, 1T2/14 hybridized with probe from 3’ end of bovine T2 cDNA (bases 945 to 1370 SacI, polylinker). (For sequences of bovine Tl and T2 cDNAs see Powell et al. (1989).) In the left,-hand and middle panels the DNA was digested with: lane (a) BumHI; (b) EcoRI; (c) HindIII; (d) KpnI; (e) NcoI; (f) SacI. In the right-hand panel the digests are as in the other two panels except that lane (e) is XhoI. “Prime cut” probes were employed. For hybridization conditions see Materials and Methods.
internal BumHI site relatively close to its 5’ end and a small region present in XhoI fragment was lost during excision from pUC18. In preparation for sequence analysis the purified fragments were broken up by sonication, and each of the resulting mixtures was cloned separately into the SmaI site of M13mp8 or M13mp18 (Deininger, 1983). In the case of the T2 gene, end repair of fragments produced by sonication was carried out using mung-bean nuclease (12 units/pg of DNA; Stratagene, San Diego, CA, U.S.A.). This was found to be superior to the conventional method which employs a mixture of the Klenow fragment of E. coli DNA polymerase and bacteriophage T4 DPU’A polymerase. Usually, the Ml3 clones were grown on E. coli TGl, but it was observed that some repetitive DNA sequences had a pronounced tendency both to delete and to rearrange, and in these cases the recAstrain, E. coli TG2, was used instead. This was particularly necessary in Ml3 “clone turn around” experiments. In order to overlap the sequences of the 2 BarnHI fragments obtained from nTl/l a NcoI-Hind111 fragment of 788 bases was excised from the phage DNA, and after end-repair it was cloned into the SmaI site of M13mp8. (c) DNA sequence analysis DNA sequences were determined by the modified chain termination method (Sanger et al., 1977; Biggin et al., 1983) using a random strategy (Bankier & Barrell, 1983). Many problematic regions in sequences were resolved by substituting for dGTP in the sequencing reaction mixtures with either deoxyinosine triphosphate (Mills & Kramer, 1979) or deoxy-7-deazaguanosine triphosphate (Mizusawa et al., 1986). Frequently this was accompanied by the use of synthetic oligonucleotide primers in order that the problematical sequence could be brought close to the priming site. Other synthetic primers were used to extend existing sequences. In toto, 21 different synthetic oligonucleotides, each 17 bases long, were employed as
primers in the sequencing of the Tl gene; none was required in the analysis of T2. The methods used for the compilation of DNA sequences into data bases and for analysis of the final sequences will be described in a forthcoming paper (M. R. Dyer & tJ. E. Walker).
3. Results and Discussion (a) Cloning
the human
Tl and T2 genes
The cloning of expressed genes for the dicyclohexylcarbodiimide reactive subunit, of ATP synthase was hampered considerably by the presence in the human and bovine genomes of numerous pseudogenes related to the expressed genes (M. R. Dyer & J. E. Walker, unpublished results; Dyer et al, 1989). It was apparent that human DNA also contains many sequences related to the coding sequences of the bovine Tl and T2 cDNAs for ADP/ATP translocase (Powell et al., 1989), and at the outset it seemed likely that some of these sequences could be pseudogenes. Southern hybridization experiments on digests of human DNA (data not shown) indicated the presence of only one Tl-related sequence, and so there appear not to be any Tl-related pseudogenes. However, other experiments showed the presence of several human sequences related to T2. As described in Materials and Methods, section (a), five related recombinants, nTl/l, IT1/4, iT1/8, 1T1/12 and iT1/16 each contained two RamHI fragments, one of which hybridized to a probe derived from the 5’ end of the bovine cDNA and the other to a probe from the 3’ end, and fragments of the same sizes were detected by Southern hybridization to digests of human DNA. These
264
A. L. Cozens et al.
turned out to contain the entire Tl gene (see section (b), below). Tn the course of the same library screening experiments six independent recombinant d phages were isolated that hybridized with the 3’ end of the bovine T2 cDNA. Restriction digest’s of three of them, JT2/2, 1T2/3 and ilT2/11, although containing sequences that hybridize to the 3’ end of the bovine T2 cDNA, did not hybridize to probes derived from the coding regions of either the bovine Tl or T2 cDNAs, and so none of them could contain the entire T2 gene. Recombinants LT2/2 and ;1T2/3 appear to be related, but are not identical in their restriction patterns (M. R. Dyer, unpublished results). In the case of AT2/11 preliminary DNA sequencing experiments have detected the presence in this recombinant of sequences closely related to the 3’ non-coding region of the bovine T2 cDNA (unpublished results). It is possible that they represent 3’ exons of T2 that could be involved in alternative splicing pathways to generate different forms of T2, but other explanations are also possible and further experimentation is required to clarify this matter. Two further recombinants, were shown by DNA 1T2/13 and LT2/15, sequencing to contain spliced pseudogenes related to human T2 (unpublished work). The sixth recombinant with sequences related to the 3’ end of the bovine T2 cDNA, iT2/14, contained the expressed gene. This was first indicated by restriction analysis experiments; four fragments in a Sac1 digest of its DNA hybridized with a probe covering bases 1 to 1042 of the Tl cDNA, and the restriction patterns were different, from those of iTl/l. Moreover, hybridization with a probe derived from the 3’ end of the T2 cDNA did not hybridize with the smallest of the Sac1 fragments (0.9 kb) detected in the former experiment. So the 0.9 kb Sac1 fragment was cloned into M13mp8, and DNA sequence analysis revealed interrupted coding sequences for the translocase (corresponding to the exon III : intron C and intron C : exon IV boundaries), and ultimately the 0.9 kb fragment turned out to be bases 6140 to 7040 (Fig. 3). In order to have the entire gene present in a single piece of DNA, the 9 kb fragment present in the XhoI digest of ;1T2/14 was selected for sequence analysis (see Fig. 1).
(b) I>XA sequencing of the human Tl and T2 genes The sequences of the Tl and T2 genes and flanking regions (see Figs 2 and 3) were determined by sequencing the two BamHI fragments of 5.9 and 3.8 kb from LTl/l, and the 8.6 kb fragment excised from JT2/14. The overlap between the 5.8 and 3.9 kb BumHI fragments was established by sequencing an overlapping NcoI-Hind111 fragment (bases 5739 to 6526 in Fig. 2). Both DNA sequences were established fully in both senses of the DNA. On average each base in the sequences presented in Figures 2 and 3 was determined 7.3 and 7.6 times. respectively. In order that the sequences could be established fully in both senses of the DNA a number of problems of some difficulty had to be overcome. In the region containing the Tl gene these were mostly concentrated in the G +C-rich region extending from about nucleotide 3575 to 4409 (see Fig. 2). “Pileups” and “compressions” were commonplace in sequences covering this segment, but these difficulties were resolved unambiguously by the use in sequencing reactions of the triphosphates of deoxyinosine or deoxy-7. deazaguanosine as described in Materials and Methods. No such problems were met in the analysis of the G+C-rich region of the T2 gene, but another problem presented itself elsewhere. It was observed that clones derived from nucleotides 4483 to 5774 (see Fig. 3), which contains six tandem repeated sequences each being about 236 nucleotides in length, had a pronounced tendency to rearrange and to delete. This was particularly evident when Ml3 clones in the positive sense of the DNA were turned around in order that a double-stranded sequence could be generated. Indeed, no particular difficulty was met in the positive strand analysis and an unambiguous sequence could be deduced. This difficulty with the negative sense clones was reduced, but not entirely avoided, by the use of a E. coli host, and the residual difficulties recAprobably reflect the genetic instability of the particular strain employed which reverts rapidly to Nonetheless, a number of clones recA + characterized in the negative sense gave sequences in exact agreement with the positive sense sequence. These difficulties serve to emphasize the
Figure 2. DNA sequence of a segment of human DPU’Acontaining the Tl gene for mitochondrial ADPjaTP translocase. The nucleotide sequence is numbered, and the locations of exons I to IV and the protein sequences they encode are shown. Exon-intron boundaries are denoted by small arrows. Large boxes contain Alu repeats. The tra,nscriptional start site at base 3900 has been determined experimentally (A.L. Cozens, unpublished results) and associated TATA and CCAAT boxes are indicated. The under- and overlined sequence is a potent,ial signal for is indicated (Necklemann et polyadenylation and the actual site in the transcript of addition of poly(A) in the transcript al., 1987). Restriction sites that were important in cloning and sequencing experiments are shown, The sequences of exons differ at a number of positions from published cDNA sequences of human Tl. Houldsworth & Attardi (1988) in their partial sequence of pHAT14, which covers bases 3929 to 4118. report A at nucleotide 3944 rather than C, and T instead of G at base 3954. Eight residues are not present in the sequence of another clone of the T1 cDPu’A (Kecklemann et al., 1987). This is an almost complete clone and lacks only the first 2 bases after the transcriptional start site. These delet’ions in their sequence are at positions 3937 (G), 3979 (C), 4054 (G), 7627 (G), 7816 (A), 5719 (G), 5723 (C) and 5727 (C). At position 6463 they find C rather than the G in the sequence shown in this Figure. They also have one extra G residue inserted between nucleotides 4056 and 4057. Some of these differences are in coding sequences and lead to changes in the amino acid sequence of the Tl ADP/ATP translocase (see the text and the legend to Fig. 7).
Human
Mitochondrial
ADPIATP
Translocase
265
Genes
niu repeat 1 + ~ATCCCTTGTGCCTAGGAGTGCATGCTTTCAGTGATCTGTGATTGTGCCACCGT~TCCAGCCAGG~GAC~~~GAT~TGTCTCT~T~T~T~T~T~GCACA 00 90 100 110 50 60 70 20 30 40 -1 RamHI
120
‘TATAAATATCTTGATCCTCTTCCCTTAATGCTGAAACTTACCATGTG~C~CACTAGTAGTGTGTTCT~TTCCT~TAC~rT~T~rACACTTGCA~ACAGGCATTATGTTATCAC
210
220
230
240
CACCATTAGCACATCCTTTATTGTTTTTC.rTTATTCCTTGTCACTCTTACTCTTTAGC~CTTCCATTCTCTA~CCCAGTTTGGT~GAGACT~GTCTAT~GA~TTCCACT~T 320 290 300 310 250 260 210 280
330
340
350
360
130
140
150
170
160
180
190
200
TCCAAAAGGCTAACTTTCCACCCTGCTCTCATTTCTACATGGCCCTGTTCCCACCA~ATCT~GATCCCATCCACGATTG~GTAGAGTTTCTTCTCATCTCCTAGAT~CTTGTCCA 480 440 450 460 410 410 420 430 310 380 390 400 TTCTCCTGCTGGAAGAAAGCCACTAAGTACTW;GTTGG~GGGTGGACACATGTTCT~CGAGCGCCATTCTCTCCCCTGCCTGTGT~GTGATACCA~ATCTTGTGACCACAGGCAGC
490
500
510
530
520
540
550
560
570
560
600
590
rrATGCTCCCGACU;CATGTARGGCCTCTCmTCTGA~GTCCAT~~AGAGGACAGT~ATGAGTTTCCAT~-CCTC~rGTC~GGATTCCAGCTATGACTCCTTCTAT 680 690 700 710 650 660 670 610 620 630 640
120
AGAACAGGATAA~GGAGAGTAGCCAAGGAGAAATATATCTCCCATTT~~TATCC~CTCTGATTGGT~~~GTGT~C~GATGATACTAC?TCCTG~TTCATATAT 800 810 820 830 770 700 790 730 740 750 760
840
TAATTIUTA~TAATTTAGTTTAAATGTCTATTTAAAATCTTTATTTTTT 890 850 860 870 880 AAGTGATTGTAGAATCATATATCAGCCTATTTTT
970
980
900
920
910
1100
1010
1000
1110
AACGTAGTGTTAGACATTTTAAAAATAGCTACTCAAAG
1210
1220
950
960
1050
1060
1070
1080
1170
1180
1190
1200
1300
1310
1320
AAAAAATGAAGATCAAATAATGTCATAGTGTGGATATTTCAT~TT~GCTATTATGAG*CTT~TATATGAC~CTAG
990
1020
1030
1040
RCTTAGTAATATAAGACA~T~A~~G~TATCTACTT~G~TTAGTATTGGGATACCC~GCATGATATGGAG~GTG~TTAGACCCACACCTCCATAC~TAT~
1090
940
930
1230
1120
1130
1240
1250
1140
1150
1160
CTGATATATGTACTAGAGTATAGACTGCTATATCTATR 1280 1290 1260 1270
ATAGATCATCATCATACCATAGATCTGTTCATAACACTAA 1330 1340 1350 1360
1370
1380
1390
1400
1410
1420
1430
1440
AGAATTGTTGTTARTATAACTCTGCTTTCACTCCAGC 1450 1460 1470 1480
1490
1500
1510
1520
1530
1540
1550
1560
ATACAGCTTCATAGTAATTAAGTAAATACAAACTAAAACTCAT~T~CTCAC~T~CC~CT~CTATTTTTTT~GCCAGTGTTGGAG~GAGTGTGAGAG~GCCCATCTAl 1680 1640 1650 1660 1670 1610 1620 1630 1570 1580 1590 1600 ATAACTGCTGGGTTGGTATT~G~~~T~C~TA?GT~~~TAC~CTTT~TCTT~TCTATCCATTCCATTTGGG~TATATGAG~G~CTTA 1760 1770 1730 1140 1750 1690 1700 1710 1720
1780
1790 ALU repeat 2
1800 c
W\GGAGARAATGAAGTTTGTGTARGmCAATGTAGCAGTATCTTCCTTCCTTCCTTCCTTTCTTTCTTTCTTTCTTTCT 1880 1890 1850 1860 1870 la10 1820 1830 1840 ~GGAGTGATCTCAGATCACA~ACCTCTGCCTCTCGG
GTCTTTCTCTGTTGCCCA
1930
1940
1950
1960
CACGCCAGGCTAA~GTATTTTl'AGTAGAGA2050 2060 2070
2080
1970
1980
1990
2000
ARTTARAARCGTTT~GTGTGGAT~ATCCATGGGT 2290 2300 2310 2320
2020
2030
2040
TGCCATGTTCGCCAGGCTCGTCTCG~CTCCTGACCTAGGT~TCT~CAGCCTCAGCCTCCC~~TATGAGCCAC 2160 2120 2130 2140 2150 2090 2100 2110
AAGACACAA~CTAATGCACAATTAAAAGGP999iTGGAGTAT 2240 2210 2220 2230 2200
CACACCCGGCCTAmTTCATCAT
2010
2330
2340
2350
2360
2250
2260
2270
2280
2370
2380
2390
2400
CIACTTATATTCAATCCAC~~ACCTAG~CTTGGTACACAGTACATGCTC~~GA~CTGTTG~TG~CACATACATGGT~ATCTGTTTGTCTCTTCCGAGTTCTTGACT 2520 2480 2490 2500 2510 2450 2460 2470 2410 2420 2430 2440 rrTGTCTGCTCTGACCTCTGA~TTCCACT~TT~T~TTTCATT~GC~~CTGGATTTC~~CTCTAGCCTGCCCCACTCTTAGAT~C~ATGCCCTCTGTGGCCCTGG 2600 2610 2620 2630 2570 2580 2590 2530 2540 2550 2560
2640
AACCTTAGn;A~CTGCTATA~~GTCTCCACGCCC~~TGACACGCAGCT~A~CCGT~CCTCT~CATGATGTCAGC~TATT 2720 2730 2690 2700 2710 2650 2660 2670 2680
2760
-GTTTATAAAT 2740 2750
W\ATAAACTTT~TAAACAC~TG-~~~CAT~~GAT~TTGAGT~GAGTTT~GTT~CG~TT~AGTCATTCTAG~G~GG~CAGTTGTATTTG~ 2840 2850 2860 2810 2820 2830 2770 2780 2790 2800 AACCTGTATGGTTACATGAACTGCCT~ 2890 2900 2910
2870
2880
CAAGCTAAGGAAARTTAAAGCTCAGATTTATTTATATATTTT~G~TT~TTGC~TT~TTTCCTG~ATT~TAGCATTTCCTC~C 2960 2970 2930 2940 2950 3000 2920 2980 2990
M;CC~TACAGCCAAGGACTGGATCTTCTTCTCC~G~TGACffiCA~rGACCCTC~G~GGCACCG~TGACAGACAG~CATTCTGCCCT~TATGTG CCCAGCTGTCATTAAAAAG 3080 3090 3050 3060 3070 3120 3010 3020 3030 3040 3100 3110 ~GAAATTCC~GAGAffiffiT~TACATTGAACCCCT~~TT~-G~GTGTCCTGTGT~TAGAGTCACAGAGTTTT~AG~C~GTATG~TTCACCTAGTGGCCC 3200 3210 3220 3170 3180 3190 3230 3130 3140 3150 3160
3240
CCTGCACCAGGTCTTTCCTGTGGGCACTCAGTGCCAGAG~CAGACACATC~TATGT~TA~~~TG~TGACTG~CG~CGATTG~TG~G~TGAGAG~AGCA~TTGTCAGATTC
3250
3260
3270
3280
3290
3300
Fig. 2.
3310
3320
3330
3340
3350
3360
266
A. L. Cozens et al ‘“‘GAG~~~ACA”~~~~~~~~“~~~P”““““‘””””’”””’””””””““““““‘GAGGCCCATATCCAGGCAGTGAGCCCTGGT~GGGGCG 3410 3470 3430
3440
3450
3460
CCTTTAGATGCAAGAAGG~~CA~TCG~TCCCTGGGCCTGA~GCGGCCCGTGCAGGCCGG~GGTC~G~CTCTCCACCGGCGGCAGC~CCCGGTGTCT~CCCGGCTTCG 3490 3500 3510 3520 3530 3540 3550 3560 3570 3580
3470
3480
3590
3600
3110
3120
3830
3840
CCCCGGCCTAACGCTCCCTGn;CTATAT~TAC~G~CCACATGCC~~TGACAC~TGTTCCCT~GCTCGGCG~ACAGAT~CATG~TGTGCCCTTT~CGTCCC~GTTGCAG
3610
3620
3630
3640
3650
3660
3670
GGACAGCCCCCGGCCCACCCC~TC~~~~CCCC~T~CCTCTGCA~TGGGAGGA~G~~CCCGCACCTGCCCA~~~GCG~~GA~GCC~C~G 3760 3710 3790 3790 3740 3730
AGCTCCGGGCCAGGC
ACCCGCCTCCTCTCGCCC~~
3950
3960
3890
GGt
3680
3690
3900
3810
3700
3820
Transcript start GGCCCCCTAGCGTCGCGCAGGGTCGGGGACTGCGCGGCGGTGCC~CGG~GTG~G
3900
3910
3920
3930
MGDHAWSFLKDFLAGGVAAAVSKTA AGACCACCAACGU;C~;CCT~~T~G~~~~TCACCAT~TGATCAC~TT~G~TCCT~ACTTCCT~~~~~TGCCGTCTCC~CCG 4010 4030 4040 3990 4000 4020 3980 3970
3940
3950
3960
4060
4010
4080
CGGTCGKCCCAXGAGAGGCTCAMCTGCTGCT
GGGGGAAGAAGGTGCCCTCTGCGTAGAGACAGGTCCAGCGTCAGTCGCAGATl'CCTGGTGTCGGG 4260 4270 4280 4290 4300 4310 4320
M;U;CGCCGCGGMAATCGCC~ACA~CCCCU 4240 4230 4220 4210
~~~~GGccficcGGTGTCTATATA~~CCACCC~~C~TTT~GTGT~CAGATCCT~~CCGTG~~~~~GTGCACTCAG~CC~~~~TGATTG 4330 4340 4350 4360 4370 4380 4390 4400 4410
4420
4430
4440
rrAGTATmTGGW\CC~TTI\n;CGCACGC~CT~~~T~A~~T~TCATCACCC~~TTCCCTTATCGTATCTCAT~A~~T~TGTATGT~T~C~CT 4150 4460 4410 4490 4490 4500 4510 4520 4530 4540
4550
4560
'ICATCTTl'ATGTIACCTCTGT
4570
4590
~A~TCTCCA~T~~TGAGGTT 4590 4600
4610
4620
4630
4640
4650
4660
4670
4680
TTTCCTCCCCTACCTGGXC~~T~A~ATCCTC~GTG~TCC~CAC~~CC~TC~CCCCTC~TGAffi~G~CCMTTT~~TC~M~ACTMCA 4690 4700 4710 4720 4730 4740 4750 4760
4110
4780
4190
4800
4890
4900
4910
4920
TTGTCATTTTTTCGCCATCATGTCTATT~TCCAAAGCTT 4980 4990 5000 5010 5020
5030
5040
5150
5160
ACAAlUCCCCCACAMTTG~~~~~CCCTTTA
4910
4920
4930
4040
4850
W\GCATCTWT~TCC~~~~~T~~T~ 4930 4940 4950
4960
4970
CTGI’CCCCCAGTMGCCCCTCATACAGTTCTCAAACCT-
5050
5060
TCMATACCCTMTMTl’GMGCMC
5170
5180
5010
4960
4970
4880
TGAMTAAATAAATGGCTATAGCTTTATAT~~C~TCACC?TTTCAGTTTATTT~CMTACCTTTTCCC 5080
5090
5100
5110
5120
5130
ATTCGATTATTTTU;CTTGTTATCCAGTAACTMCAT~AT~CAGTATCCATTTACACGTCCTCAGTATCCATTTGA~TCCTCATCCTTTTT
5190
5200
5210
5220
Exaln KQISAEKQYKGIIDCVVRIPKEQGFLSFWRGNLANVIRYF a~~~A~A~ATT~TT~GT~~M~C~M~A~T~CTCTCCTTCTffi~T~~r~CMC~GATCCG~ACT 5410 3420 5430 5440 5450 5460 PTQALNFAFKDKYKQLFLGGVDRHKQFWRYFAGNLASGGA TCCCC-~GC~~ACMGCAGCTCTTCTT 5530 5540 5550 5560 5570
5230
5240
5250
5260
5270
5290
5410
5480
5490
5500
5510
5520
AGGGGGTGTGGATCGGCATAAGCAGl"l.CTGGCGCTACTTTGCTGGTMCCTGGCGTCCGGTGGGG 5590 5590 5600 5610 5620 5t.30 5640
AGATSLCFVYPLDFARTRLAADVGKGAAQREFHGLGDCII ax3cTGGGGCCAccTCC ~ACCCGCn;GACTTTGCTAGGACCAGGTTGG 5650 5660 5670 5680 5690 5700
5710
5120
GCCGCCCAGCGTGAGTTCCATGGTCTCTGGGCGACTGTATCA 5130 NcaI 5750 5160
KIFKSDGLRGLYQGFNVSVQGIIIYRAAYFGVYDTAK TCMGATCTTCMGTCTTGAGOXGC TCTACCKiCGTTTCMCGTCTCn;TCCMffiCATCATTATCTAT~~TGC~~T~~AG~TA~ATACT~C 5170 51110 5190 5800 5810 5920 5930 5840 5850 GAGGGCCATCCGGW\CMCGAU;CTGOK~TGGIUAGAGGAT~AT~AT~CTCAC~~~CT~TATATATTGATCTT~TTTTTCT~CTCT~GATM~GA~TTC
5990
5900
5910
BMlHI
5930
5940
5950
~MT~TG~AT~~MT~TG~G~CCTTGTGTCCTCT~TG~T~CTCT~CTTT~TTATTCA~~A~~~A~G~CTGTCTCCCTCTAGA
6010
6020
6030
6040
6050
6060
6070
~~CATAGC~ACT~~~G~C~~CCMT~CCTGTATAC~TGAGCACT~CCCTCC~TCC~A~GCA~A~CACC~T~TGTCT~CTffiTC
6130
6140
5140
6150
5960
5970AAG%zAG
5960
5970
5980
5990
6000
6090
6090
6100
6110
6120
6160
6170
6190
6190
6200
6210
6220
6230
6240
6280
6290
6300
6310
6320
6330
6340
6350
6360
Esm III GblLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFD GGCTTCTGGGCTCTGTCCAC TCCTGCCTGI\CCCCMCMCG~ACA~~~TGAGCTG 6370 * 6390 5400 6410 6420
6430
6440
5450
6460
6410
6480
ATATGTGAAGCAC~CCI~~CCCCCCMGTC
6250
6260
6210
Fig. 2.
Human Mitochondrial TVRRRblI4MOSGRKG
UI3GllTGTCGTAGAATGATGAT~~CGGCC 6490
6500
6510
%+i&,
CCTTACTGGAAATTAATTTTCAATTTGATAICCACTTAGGW\ 6610 6620 6630 6640
ADPIATP
267
Translocase Genes
A&C,‘TGTGCTCT~AT~AAACTTGTl.TGGTTTTGCCCGAGGAGAACATl-TTACAGGGCTCC~~AGTCTT 6590 6560 6510 6580 6540 6550
6650
TC~ATTMTTCCCCCTMCG~CTCAACTATCCTA~A~~TA~TTCCAT~ATT~A 6660
6600
6670
6680
6690
6100
6710
6720
~TGATIUWUCA~TK;TAAGACAml\GATCTWIATCCffiCA~AT~ffi~CCTA~CCTC~CCC~AGA~T~T~~T~~~GTA 6730 6740 6750 6760 6770 6780 6190
6800
6810
6820
6830
6840
6950
6960
7070
7080
GrTAGcTAcTTcTT~ 6850
6860
TTCCCTCCTAGTTACAGGTGTTAGTGGGATG=GGTGTTTAGCTGGGTAGAGATGGCCTG~AATCTGTTGTGCC~~~TG~ 6870 6880 6890 6900 6910 6920 6930 6940
iTCTATAC~6970
6980
6990
1000
AGGATCAT-CAGC-GGTCCTCCAT7090 7100
7110
7120
GGTmATTGccC~C~ 7210
7230
7220
AAAAACffiTCCMTGTT~A~TCGTATGTTTCAAC 7010 7020 7030
7040
7050
7060
AACCGCGTAGCATAATACTCCTGCTCCACTGCGCCCTTCTTGTTTCGCAGTTGGGCAGTCCA~~ACTT 7130 7140 7150 7160 7170 7180 7190
MTU;M;TAGGMTG~~~ACCC~~~CA~G~~~A~TCC~T~ATG~GCAl~GGW\CT 7240 7250 7260 7270 7280
7290
7300
7310
7200
7320
D CCATGCCCAG.ATGACCCMAAClKX 7330 7340
7350
TAACAGTGTGTACAGATATGTTTCAGGGGMAAGTC 1360 7370 7380 7390
l3OtlN IPlYTGTVDCWRKIAKDEGAKAFFKGAWSNVLRGMGGAFVL ATATTATGTNXCG7450
7460
7470
TTGCAAAAGACGAAGGAGCCAAGGCCTTCTTCM,GGTGCCTGGTCC~TGTG~GAGAGGCATffiGCffiT~TGTAT 1480 7490 7500 1510 7520 7530 7540 7550
7560
VLYDEIKKYV' XZTGTTGTATGATGAGATC~ 7570 7580
TAGGXMGT
7690
TA~T~AAT4AC~~AC~A~~~T~~GATCTAC~~CAC~ATCCATTGTGT~TTT~TA~CTATTCC 7590 7610 7620 7630 7640 7650
MAAAGAT’ZEGGATMCAGACTGAAAGGM 7700 7710 7720
7660
7670
TACCXAGAAGAGATGCTTCA~~TGTTCATTAAACCAC~ATGTATTTTGTA~TATTTT~ATTT~TTCC 7730
1140
7750
7760
7770
7780
7790
7680
7800
Poly A site ~~~T~TM~A~AT~~~M~~G~M~GATMTMCTCA 1810 1820 7830
8060
~CTTTAm 8170
8180
KAATllTATGTTM 8290
8300
7850
7860
7870
‘IBBb
t CTTTCTATTTTATTGMCTCTTA 7900 7910 7920
7890
~~TGCATATmCT~ATGA~CA~TATCAGTC~A~C~TTCT~T~C~ATATTATATTG~~TGTATTATATGAGA 8010 7950 7960 7970 7980 7990 8000
TTAACTGTAAAATGCATTTlTAMAGA 7930 7940 uxTAcAATGcITI 8050
7840
lTrTcMAcTT 8070
8190
-TATTCTATCTATCTTATCCAGCGTTACTGT~~GTGATAATGG~TCAT~TCCTGCCTTGTCTTAGG 8080 8090 8100 El10 8120 8130 MAAGTTTAAATCACAAT~ 8200 8210
8020
8030
8040
8140
BlSO
8160
lTTTCTATGATTAGGAAGTGCTCTGTTTTCATCCCTTTAGATAACTGTGACACCT 8220
8230
8240
8250
8260
8270
8280
.AAAGAAAATCAGTTGTCTTAG
lTTAAGCCWTAT-ACTGTAGUZTGA-AT= 8650 8660 8670 -ATAMCT 8770
8780
-ATC 8790
CATGKCCAATTACCTCCCMA-CAAACW 8890 8900
8680
8800
-ATTTCTCAGTTCTGTAGTCKrTAGTCK;GACGTCCAAG 8690 8700 8710 8720
en10
-AU3ATfTCPACAGATGAATTTT
8750
8760
-ACAAACATTCAGTCCATAGCATCACTTATCAAGC
8920
8930
8940
8950
9020
9030
9040
9050
9060
9070
9130
9140
9150
9160
9170
9180
9250
9260
9270
9280
9290
KTTGCTATAGT~C 9380
9390
9400
9410
9500
9510
9520
9530
T~A~MT)~T~T~~C~T~C~T
8960
8970
8980
8990
9000
TGGGGAGA.AAAGGTCATTGTTAGACAATTTGTTTAAGACSAT
9090
9100
9200
9210
9220
C~~~AGACG~T~~TA~~~~TTC~~~~C~G 9300 9310 9320
9330
9340
KA-T-~~GGT~TGG
9080
-T~C~~TAGAAACATGGGTAT
IGGA~
hwM3Acc~ 9370
8740
GACAGAGCTCTCTGTAGTCCCTTTTATAAGCTCACTAATCCCATTCATGAGGGCCCTACCCT 8820 8830 8840 8850 8860 8870 8880
8910
9010
8730
9190
AAGTCCCTGTATCCTTGGCAMGCTTGAMGCCACCC
-mm-TTT-
9420
9430
9440
9450
9110
9120
AGAAAGCATCCTCCAGT 9230 9240
9350
9360
AAATGTCTTCTCTGTGTTTTCTGACAGGT
9460
9470
9480
9580
9590
9600
ACCARGCC~CACCGLICTCACTCCPGCCTCCCCAGCT 9490
LTCAKGT-ATAMATCCAGGA’KXCATCTGACC
9610
9620
9630
.4X.CAGCACATGTGCAGTTAAGAGGCCT~~ 9730 9140 9750
9540
9550
9560
lCACCR%TAGATATCT~CMTAAAGTCT~
9640
BLmw I
9650
9660
Fig. 2.
9670
9680
9570
AAAAAGAAGGTACCTCTCCATCCTGTCTCCTCGAGGCC 9690 9700 9710 9720
268
A. L. Cozens et al. Alu repeat 1 + ~~CC~~CC~~G~~ATGGGA BsmHI lo 20 GCCTCAGCCAC~M;AATCGC~~~~~CCIG
130
140
30
40
50
60
70
80
90
100
110
rm
150
160
170
180
190
200
210
220
230
240
270
280
290
300
310
320
330
340
350
360
520
530
540
550
560
570
ACGAGAGGAAGGGEKATAAAGAGG~~GAG
250
260
AGGGAAGGAGGGA,,AGGCAGAAAAGGGAGGAAGGCAAGG
~AGAGARATGAAAGAAARAGGCACGGAGCGAAACGAAG 500 510 490
AAAGGCAAGCAGGGAAGGAAGACGGG
500
590
600
AAGGGAGAGGAGGCGGGAGGAGGGAGGAGGGAGGCGGGAGGGAGGGAGGAGGGAGGCGGGAGGGAGGGAGGGAGGGAGGGAGG
-?%==%-,O
AGTGGGAGGTCGCGAAffiACCCCW;CCGCAGCCCCCCCC 730 740 150
640
650
660
670
680
690
700
710
760
770
780
790
coo ~GGGCGCGGCp,GG,;CCAGCCC~~~
q
a;CGGXSXACCGCGGGGTCAAAGAACGGAGGCCCAGAGAGATCCCCCCCCACCAACCTAGGCCGGGCGCGG GCGGAGTCAC~~~ATCU;CA~GCGGGACTGAGCCGCGCCCA 850 860 870 880 890 900 910 930 940 950 960 CGTGGAAGCCCCCGCACCTTCGCGCGGGGCCCCCCGGGT(G 970 980 990 1000 1010
1020
1030
1040
1050
1060
1070
1000
XAAGffiTCU;GGCTCGGGGCACCACATCTCGCACATCCCGTCC 1110 1130 1090 1100 1120
1140
1150
1160
1170
1180
1190
1200
-T,,+&
TCGGGTACTGGCAGCGGCCTGACCTTCACCl’TCACAAGGTCAAGGCTGCCGCGGAACCCCCAGC~CCCTACGCCKA 1230 1240 1250 1260 1270 1280 1290 %@GcCTG%C-A~~~
TAGGGGTGC~ACGTCTCCGGGGAGCGCAGCCAATGGGCGCGGTTCGCTGGAGTGC 1330 1340 1350 1360 1370 1380 =!%i
ACGCAGCGGGATTCCCGGCAGCCCTCAGAGGCGAGGCACGCAGGCTGGTG 1400 1410 1420 1430 1440
AGTGGGCGTGGCGTCGGCGTCTTAGCGGCTGCT~GC~TffiCTGCTCCGTCCTTTCGGTCC~ 1510 1520 1530 1540 1550 1560
~""~~"c~~~~c~G~~~~A~~~~O
GCGGCGGCAGGGCTWU;CCAGCW\CGCCCTCCATTCACTCTGTCCTCCCGTTCCGCTG 1570 1580 1590 1600 1610 1620 1630
16
MTEQAISFAK CGCC GCCACCATGACGGAACAGGCCATCTCCTTCGCCAA 1650 1660 1670 1680
EXC4lI
axGAAccGAGTGGccGGGT 1810 lS2Ow
GTGGGCM;CAGAGCCTTGCA~G~~C~~T~~CGCGATC~TTTG~CAC~GCC~C~TTCC~C~GTCA 1640 1850 1860 1870 1880 1890 1900
CGTGACCGCTGCTGCAGGGCGTGGCGACGTCCACGCGTGCGCACTGGGCCC 1930 1940 1950 1960 1970 CI~GGCTGGCGGCCCTn;ACCTTW\GCTCAATCCTGCCTC 2050 2060 2070 2080 2090
2100
1910
1920
AAATGCGGCACGGATTGGGCATGCGCGCGCTGAGC~GCCCATGGACGGACC 1990 2000 2010 2020 2030 2040
2110
2120
2130
2140
2150
2160
TAAGGCCAATAGGGC MCGAGCCTGCCCGCAGGTGTCCTGW\ATAACCAGGATATGT 2170 2180 2190 2200 2210 2220
2230
2240
2250
2260
2270
2280
lTTTGTCTTTTGGGAGAGGCT TATAW\RACTAGCATGAn;CCTACGAACATTGCCAW;A 2290 2300 2310 2320 2330 2340
2350
2360
2370
2380
2390
Fig. 3.
YR
Human
Mitochondrial
ADPIATP
Tranalocaae
T~TMI;CAIUAAM~~AOCC~~~~~ATAGTCCC~ACTC~~CT~G~~~~G~~~MTCA~TGATC~CCCA~~A~C~TT~~C~G 3730 3770 3740 3750 3760 3780 3790
269
Genes
3800
3810
3820
3830
3.946
VQHASKQIA TCCAGCACGCCAGCAAGCAGATCGCC 3940 3950 3960
TGACTTGTGTC Excm II ADKQYKGIVDCIVRIPKEQGVLSFttRGNLANVIRYFPTQA GCCGI\CMGCAGTACMATCGTffi~~ATTGTCCGCATCCCC~A~A~GT~TG~~~~~GGAGG~~CCTT~C~CGTCATTCGCTACTTCCCCACTC~~C 4040 4050 4010 4020 3980 3990 4000 3970
4060
4070
4080
LNFAFKDKYKQIFLGGVDKHTQFWRYFAGNLASGGAAGAT CrCAACT7CGCCT7CARGGATAAGT~~A~TCTTCCT~~GCGTGGAC~CACAC~AGTTCT~AG~ACTTTGCG~~CT~CCTCCG~~T~GGCC~GCGACC 4160 4170 4130 4140 4150 4090 4100 4110 4120
4190
4190
4200
SLCFVYPLDFARTRLAADVGKSGTEREFRGLGDCLVKITK rrCCTCT~CGTGTACCCCCTGGAPTTCCCCAGARCCC~~~~ffiACGTGG~GTCA~ACAGA~GCG~TTCCGAG~CTGGGAGACT~CT~TG~GATCACC~G 4280 4290 4250 4260 4270 4210 4?20 4230 4240
4300
4310
4320
SDGIRGLYQGFSVSVQGIIIYRAAYFGVYDTRK rrCC~GCCATCCGtGGCCTGTACCAGGGCT7CAGTGT~CCGTGC~GCATCATCATCTACCGG~GGCCTACTTCffiCGTGTACGATACG~C~G 4400 4370 4380 4390 4330 4340 4350 4360 T2 repeat GAAGTCCCAW\CACGGCCTCIUCACffi~GTTCCCCCA 4450 4460 4470
4490
4410
1 4520
4530
4540
4550
4568
4640
4650
4660
4670
46a-6
4760
4170
4780
4790
4m
5120
5130
5140
5150
516U
5 T2 repeat TMTCCICCACTCK;tMC7CT~~~T~r~Ar~~TC~C~ffi~~CTCTC~~T~TAG~~CGT~CTCCCT~GTC~C~~TCATCCCTCT 5410 5450 5420 5430 5440 5460 5470
5480
5490
5500
5510
5526
GTG7CK;TCrtK;IC~AT~r~~A~~C~AG~CAT~A~TC~~~TACTATAGACTGffiTG~TTAT~~ffiACA~GATTCTCCC~~TCC 5590 5530 5540 5570 5580 5550 5560
5600
5610
5620
5630
564-u
5730
5740
5750
5-m
rTmffiAACCAC~GCCGTCGTAU;ATCAGCGCCCACCCTACT~CTCC~TAGTCTTACCTC~TC~G~TTATATCC~TACAG~~ATCCTGA~T 5180 5790 5800 5810 5820 5830 5840 5850 5860
5870
5880
4480
4500
4510
C~CGTc7~~GcC~ACAccATcATCC~CTGTGTGTGT~GT~CTCAT~CT~T~AT~ATGT~~TCCA~C~~TATC~~T~CAT~ACTG 4570 4610 4580 4590 4600 4620 4630 T2 repeat
2
CX;TGACIAThlUICA~A~~~AC~TC~~Ar~C~~AGGTC~ATCCAGGTG~~~T~~CCTC~~~~~CCT~~GT~A~CG~ 4690 4730 4700 4710 4720 4740 4750
T2 repeat
3
AGGCITTIKCICn;TCrmC7CT~~CA~C~~TATGAGATGTCTTAG~CA~TCA~CCATCCC~~ACCAT~~T~T~AG~~AGACA~rAT 5050 5090 5060 5070 soao 5100 5110 T2 repeat
4
C7C7(;T(;mGTCK;TG7~~~~~r~TGAGAT~C~~CCAT~~~ATC~AG~TACCAT~AGTGffiT~A~T~~~~A~GA~~CA 5290 5300 5310 5330 5340 55 5320
T2 r-t
6
~~cK;kcATC~~~~~~C~~G~~C~~~T~~~GTffiACACCGT~CTCC~T~C~C~A~CA~CCTCCGT~GTC 5650 5660 5670 5690 5700 5680 AKT -
C~CCU;T~GGWICTTCAGCAX;TCCATTTTGAGGCCCCG se90 5900 5910 5920
5930
5940
5710
5950
-nn
5960
CCCTCTtGCACTGTCTX%TGTCC~T~T~~~CC~TTCCCCTCTG~CCTGCCCCGACCCCTCGTGTT~.~CGTCAG~G~ACTGA~~CCACGT~A~ 6080 6050 6060 6070 6010 6020 6030 6040
5970
5980
5990
6000
6090
6100
6110
6120
6230
6240
6350
6360
ACTGGTGGTCTCGWUGAGCTCGGCACCACCTCAGGGGGCCGTGAGCACACCCTGGGGGC CGACCCTGGTCTCGGGTGGCCGTGCAGGCGCTGGACACGGA 6200 6180 6190 6210 6140 6150 6160 61’10 6220 6130 Iiml III G,4LPDPKNTHlVVSWHlAQTVTAVAGVVSYPFDTVRRRMM ATGCTCCCCGACCCCAACMCACCCACACATCGT~TGffiCT~ATGATCGC~AGACCGTGAC~CGT~CC~GT~GTCCTACCCC~CGACACGGT~GGC~C~ATGATG
2
6750
6760
6270
6280
6290
6300
Fig. 3.
6310
6320
6330
6340
270
A. L. Cozens et al.
““~?%?k%?==6~~~
GGGGGACCCTTGCTGCCGGGAAGGGGAACCMGCTCCTTGCCCT~CCG 6410 6420 6430 6440
CCCCC~-CTCTCGTfCCACCCACACCCTC( 6630 6620 6610
6640
TGAGACAGTGGCTTAGGAGGAGGGC6730 6740 6750
AGCCTCAGGTATCCCTCCTGTAATCAGAACTGTGGCTC 6760 6770 6790 6790
6690
TGGGAGGKAGGAACC 6450 6460
6470
GCW;W\CACCCAGGCCCCTGMGTCCTGTGTCCAGCCCTCCC 6680 6690 6700 6710 ACCCGTC~TGTTATTTCTOIY;IIMCGT~PT~;TGC 6900 6810 6820 6830
6490
6720
6940
Emnlv ADIHYTGTVLICWRKIFRDEG TGACATCATGTACACGGGCACCGACTGTT 6910 6920 6930
GTGGCGTCCGTGTCT~ 6850 6860 GKAFFKGAWSNVLRGMGGAFVLVLYDELKKVI’ CGCPJU;GCCPTCTTCAKX;CTGCGTGGTCCTG 6970 6980 6990
7000
7010
JO20
1040
7050
6960
7060
7070
7000
ACACACACACACCffiffiGMCCAACAGAACCACGTffiMTCCTC~CCGT~GGACCATC~CCTTCGAG~TTCCAGT~TCTTTTTCCC~C~A~CTGC~GT~T~C~ 7090 7100 7110 7120 7130 7140 7150 7160 7170
7180
7190
7200
GAAGGCTCTAWV\AACU;CCGCATTGCW\TCCAACCATCC 7210 7220 7230 7240
7250
7260
7270
7290
7290
7300
7310
7320
GTACTGAW\CCTAGAGTCCAGATGCTTCTAGGAGCCAAGT 7330 7340 7350 7360
7370
7380
7390
7400
7410
7420
7430
7440
7550
1560
CGAGTACTGGCGAGTATGTTCTATGTTGGGCCTCCTGCTGCAAAAC~CAGAGGACGCAGA 1450 74 60 7470 7460 7m7500
7030
WAGGAAGATCTTCAGAGATGA 6940 6950
Poly A site CCTCCTGC~GGCCACtCACCKKCACAGGGCGGCCTU;GG t 7510 1520 7530 7540
GGCGCTCGGCCCACGGW\CGCACATCGCGCCACCACGCTCTGCCCGT~CT~CCCACGTTCC~GTCTGCAGTGCTGCCTCC~CCC~~CC~~~~~~TC~~ 7510 7580 7590 7600 7610 7620 7630 7640 7650
7660
7670
7680
GGGAAGAcGc7690
-ATTTGTAmT 7780 7790
7800
AGGTCCCAGAMACCAAGGTATGGTAGATTCAGTCTCTGGTGAGTACCCAGTTC ‘LXGCTTCTAGA~GCC~CCTGT~CTCAGAT 7830 7840 7850 7860 7870 7980 7890 7900 7910
7920
AGAMCCAAGGCTGGCAGA~GTGTAGCCGGGCTCCCTGATAAATGCTGGAGGACCCC~~TGCACTTACTGTACCCT 7700 7710 7720 7730 7740 7750 7760 7770
CTCATGTCTCAGCTCTGGM 7810 7820
W\TCGATCAGGCCAGCGTGCT~CT~TCCTTTCTGT~~~TGATCCCATCCATG~~CA~ACTCCC~CCCC~TCCCMC~ATC~~~ 7930 7940 7950 7960 7970 7980 7990 TTTCAGCGTGAAlTTTG 8050 8060
BOO0
9010
8020
8030
8040
UVV\CAGGCGTGACACCCPGCGTTCCTCTCCACCTCCTGTffi~CCCGA~C~CC~CCCCA~C~TG~~~~C~ 8070 8080 8090 9100 El10 8120 8130
8140
8150
8160
8260
8270
8280
GGTCCCC~AGCTGTGTGTCCGCGGGTTCACCCGTGffi~TCCACCAC~T~CCACC~~AC~~~C~TC~~T~C~A~~~~ 8170 9180 8190 8200 8210 8220 8230
8240
9250
GGC-CGTGTTGCTCTACCGGGCCGAACC 8290 8300 8310
0360
8370
Alu
TTGGGCCTTGGT~CAGCGC~~~C~~ffi~A~CC~ 8320 8330 0340 8350 repeat
e
CCAGAGCCGGCTCTCCCGTGCTCTCCA 8380 8390 9400
c
GAAATCCCAGCCATAGTAAMAG 8410 8420
Figure 3. DNA sequence of a segment of human DNA containing the T2 gene for mitochondrial ADP/ATP translocase. For meaning of symbols see the legend to Fig. 2, except that the boxed sequences contain not only Alu repeats, but also a second family of repetitive sequences called T2 repeats 1 to 6. Also shown by boxes are the positions in the 5’ region of the sequence of the hexanucleotide GGGCGG and its complement (overlined by arrows). The sequences of exons differ at a number of positions from the sequence of a partial human T2 cDNA clone, pHAT8. reported by Houldsworth & Attardi (1988). This covers the sequences in exons from nucleotides 1757 to 7389; at positions 4136,4137,4141 and 4230 they report G, A, G and T, respectively, whereas we find A, G, A and C. In addition, bases 4144 to 4146 are not present in their sequence. These differences in DNA sequence lead to changes in amino acid sequence of the T2 ADP/ATP translocase (see the text and the legend to Fig. 7). kinds of problems that it may be anticipated will be encountered in any future endeavours to sequence the human genome. (c) Gene structures (i) Identi&ztion
of exons
The four exons of the identified by comparison
Tl and T2 genes were of the human genomic
sequences with the corresponding bovine cDNA sequences (Powell et al., 1989). This was confirmed by translation of the genomic DNA sequences in all phases and comparison of deduced amino acid sequences with those obtained by translation of the bovine cDNAs. Consensus rules for splice sites which predict conservation of the dinucleotides GT and AG, respectively, next to the 5’ and 3’ boundaries of introns, also were taken into
Human
Mitochondrial
ADPIATP
Translocase
Genes
271
Table 1 Introns
in human
ADPIATP
translocase
genes
Sequence GHE
Intron
Size (basepairs)
Class 5’ Boundary
Tl
A
1269
0
T2
A
2171
0
Tl
u
508
1
T2
l3
1820
1
‘l-1
c
914
1
T2
C
519
1
Consensus sequence
ctg.cag.GTGAGGACCG L Q ctg .cag.GTGGGGACGC L Q gee aag gGTGAGAGAGG A K gee .aag.gGTACGTGTGG A K aaa. ggg gGTAAGCTTGT K G aaa.gga.gGTACTCGGGG K G cagGTAAGT
consideration (Breathnach & Chambon, 1981). The classes and sizes of introns and their 3’ and 5’ boundaries are summarized in Table 1. The exact extent of the transcribed region of the Tl gene, but not of the T2 gene, is known (see Fig. 2). Primer extension analyses have shown that transcription of Tl starts at nucleotide 3900 (A. L. Cozens, unpublished work). This site is preceded by a canonical TATA sequence at positions 3872 to 3877, and the sequence CCAAT at nucleotides 3813 to 3817. Both of these sequences are often, but by no means invariably, associated- with eukaryotic promoters. The former appears to fix the site of transcriptional initiation and is usually located 25 to 30 base-pairs upstream from this site. The latter is often found 40 to 100 nucleotides upstream from the transcriptional start site and appears to play a critical role in directing effcient transcription in a select class of mammalian promoters (Kadonaga et al., 1986). However, neither element is found in the sequences determined in the 5’ region of the human T2 gene, and initiation of its transcription has not yet been studied. It may be significant that the hexanucleotide GGGCGG, or its complement, occurs 15 times in this region as this sequence has the potential to bind the transcription factor SPl, and so can enhance transcription by RNA polymerase II by 10 to 50.fold. Amongst these 15 examples of the hexanucleotide sequence only one of them, namely the one found at bases 1295 to 1300, conforms to the extended decanucleotide motif, 5’ G/TGGGCGGG/AG/AC/T 3’. The decanucleotide sequence, TGGGCGGGGC, found at bases 1294 to 1303 in the T2 gene sequence has been demonstrated in other genes to be a high-affinity site for binding SPl (Kadonaga et al., 1986; see also section (f), below, for further discussion of this matter). Absence of a TATA box and presence of multiple SPl binding sites is usually accompanied by multiple 5’ ends in transcripts, emphasizing the important role played by the TATA sequence in
3’ Boundary CCTCCACCAG.gtc.cag v Q CGTCCCCCAG.gtc.rag v Q CTGTCCACAG.gg.atg GM TCCCGCGCAG.gg.atg G M GTTTCCACAG.cc.gat A I1 GTCGTTGCAG. ct gac A 1) -CAGg
determining the site of transcriptional initiation (Martini et al., 1986). Tl cDNA The sequence of the human (Neckelman et al., 1987) shows that polyadenylation of the Tl transcript occurs after base 7896, which is 15 bases to the 3’ side of a canonical polyadenylation signal. The sequence of the human T2 cDNA (clone pHAT8; Houldsworth & Attardi, 1988) terminates at a position equivalent to base 7380 and is followed by the sequence A,, although no polyadenylation signal is associated with it. However, in the gene sequence A, is found at this position, and so it is unclear from the published data whether this is an authentic polyadenylation site or not. On the basis of the human gene sequence it appears to be likely that polyadenylation will occur 14 or 15 bases after the canonical polyadenylation signal (nucleotides 7487 to 7492) as shown in Figure 3, particularly as this has been shown to be the site of addition of poly(A) in the homologous bovine T2 gene (Powell et al., 1989). Comparisons of the cDNA (or mRNA) sequences deduced from the genomic sequences of human Tl and T2 with their bovine cDNA counterparts, illustrate different degrees of conservation in different regions of the sequences (see Fig. 4). In both pairs of sequences conservation is greatest in the coding regions, where, for example, the human and bovine Tl cDNAs differ in only 65 out of 894 nucleotides. These differences give rise to 12 changes in protein sequences between the human and bovine Tl translocases. A similarly high degree of conservation is found in the coding regions of the T2 cDNAs and also in the protein sequences that they encode (see Fig. 5). In contrast, the 5’, and particularly the 3’ non-coding regions are less well conserved, although a strong relationship between the Tl homologues on the one hand and, the T2 homologues on the other, still is clearly evident. However, the 3’ non-coding regions of human Tl and T2 cDNAs are not related, as also
A. L. Cozens et al.
272
HummTl Bovine
Tl
H"nan T1 Bovine
Tl
40 60 80 20 MGDHAWSFLKDFLAGGVAAAVSKTAVAPIERVKLLL GAGCTGTCACCATGGGTGATCACGCTTCGAGCTTCCTAAA .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: .. .. .. .. .. .. .. .. .. .. .. : ::::: ::: CCGCTGTCTCCATGAGCGATCAGGCTCTGA~CCTCAR MSDOALSFLKDFLAGGVAAAISKTAVAPIERVKLLL 20 40 60 80 140 160 QVQHASKQISAEKQYKGIIDCVVRIPKEQGFLSFWRGNLA RGGTCCAGCATGCCAGCAAACAGATCAGTGAGAAGCA ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: .4GGTCCAGCATGCCAGCAMCAGATCAGTGCTGAGMGCAGTACAhAGGGA
QVQHASKQISAEKQYKGIIDCVVRIPKEQGFLSFWRGNLA 140 160
HmnTl Bovine
Tl
HununTl Bovine
Tl
HummTl Bovine
Tl
HutranT Bovine
Tl
Munmn Tl Bovine
Tl
Tl
HmnTl Bovine
Tl
HumanTi Bovine
Tl
220
200
180 300 Y K Q L
F
:::::
L
220
320 340 G G V D R H K QF W R Y F
: :::::
300
320
340
420
440
460
. . . . . . . . . . . . . . . . . . . :: 420
N
AG
::::: ::::::::::::::::::::::::::::: .. .. .. .. .. .. .. GTGGACCGGCATAAGCAGTTCTGGXCTACT'TTGCCGGTAACC
:::::::::::
:::::::::::
440
::::::
.....
460
700 680 620 640 660 TAKGMLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFDTVRR ~GCCMGGGGATGCTGCCGACCCCARGAACGTGCACATTT~GT~T~ATGATT~CC~~TG~C~AGTC~~TGGTGTCCT~CCC~TG~~~TTCGTCGTA .......... . . . . . . . . : .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: : ::::::::::::::::::::::::::::: CGGCCMGGGGATGCTGCCTGACCCCMGMTGTCCACATTATCGTGAGCT~ATGATT~CCAGA~GT~~GGTC~~TCGTGTCCT~CCC~G~~~~C~CGTA TAKGMLPDPKNVHIIVSWMIAQTVTAVAGLVSYPFDTVRR 700 680 620 640 660 800 700 740 760 IAKDEGAKAFFKGAWS RMMMQSGRKGADIMYTGTVDCWRK WlATGATGATGCAGTCCGGCCGGAAAGGGGCCGATATTATGTACAC~~TTG~~T~G-~~~~C~~C~CTTCTTC~~TCCA : :::::::::::::: :::::::::::::: ::::: :::::::: :::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. GGATGATGATGCAGTCTGGCCGGAAAGGCTGATATCATGTACACT~~AGT~~~~~~~~~ATG~ACCC~T~C~~CA
820
:::::::
::::::::::::::::::::::
KAFFKGAWS
820
800
780
::::
940 920 900 860 880 NVLRGMGGAFVLVLYDEIKKYV' ATGTGCTGAGAtiGCATGGGGTGCTTTTGTATTGGTGTTGTATGATGAGATC~TATGTCTMTGTMTT~AC~~CACAGATTT~AGTGMC~TCTAC~TTC . . . . . . . . . . . . : : :::: : :: :::::::::::::::::::::::::::::::: :::::::::::::::::::: : .. .. .. .. .. ....*. ACGTACTGAGAGGCATTCGGT~TTTTGTATTTTGTATGAT~GA~~TTTGTCTMTGT~C~~CA~~C-----------------------NVLRGMGGAFVLVLYDEIKKFV' Tz 1 900 860 880 1000 980 RCAGATCCATTGTGTGCTTTMTAGACTATTCCTAGGGGATA :::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. : -CAGATCCATTGTGTGGTTTMTAGACTCTTCCTMGGG 980 940 960
1020 ::::::::::
:::::::: 1000
1060
1040 ::::::::::::::::::
: :
1020
1160 1140 1120 1080 1100 AACCACACATGTATTTTGTATTTATTTTACATTTAAATTCCCACA~~TAG~TMTTTATCATACTTGT~MTT~~~GAT----MT~T~T~~ATC .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ::::::: MCCACARATGTATTTTGTATTTATTTTACATTTAAATTT 1140 1120 1090 1100 1060
::: ::::::: AAAAALTTAAGTATTCATTA 1040 1180 :::::::
:::::::::
1160
1200 CCACTTMTGCAC
HmnTl Bovine
100
500 560 580 520 540 LGDCIIKIFKSDGLRGLYQGFNVSVQGIIIYRAAYFGVYD TGGGCGACTGTATCATCMGATCTTCAAGTCTGATGGCCTGAGc%GGC TCTACCAGGGTTTCRACGTCTCTGTCCAAGGCATCATTI\A .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..*............................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .-.... . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. TC.GGCMCT~ATCRCCAATCTTCMGTCTGA?~CTGA~CTCT~C~TTCMCGT~C~C~ATCATTATCT~~C~~~~~~~A~TA LGNCITKIFKSDGLRGLYQGFNVSVQGIIIYRAAYFGVYD 560 580 500 520 540
760
Bovine
::::::::::::::::::::::::::::
AGGGCTTCCTCTCCTTCTGGAGGGGTMCCTGGCCA :::::::::::::: :::::::::::::: :::::::::::::::::::::::::::: TCATTGATTGCGTGGTGAGAATCCCCAAGGAGC -CTCCTTCTGGMAACCTGCiCCA
RMMMQS7~oRKGADIMYTGTVDCWRKIAKDEGP
HumanT
:::::
200
100
260 280 NV I R Y F P T QA L N F A F K D K ACGTGATCCGTTACTTCCCCACCCAAGCTCTCAACTTCGCC :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ACGTGATCCGTTACTTCCCCACCCAAGCTCTCMCTTCGC NVIRYFPTQALNFAFKDKYKQIFLGGVDRHKQFWRYFAGN 260 280 380 400 LASGGAAGATSLCFVYPLDFARTRLAADVGKGAAQREFHG TGGCGTCCffiTGGGCCCGCTGGGGCCACCTCCCTTTGCTnTC :::: ::::::::::: ::::::::::::::::: ::::::::::::::::::::::: TGGCCTCCCGTGGGGCAGCTGGGCCCACCTCCCTCTCCTTTC LASGGAAGATSLCFVYPLDFARTRLAADVGKGAAQREFTG 380 400
100
Tl
Figure 4. Comparison of cDNA sequences of bovine and human Tl and T2 ADP/ATP translocases. The alignments were made with the computer program NUCALN (Wilbur & Lipman, 1983). In (a) human Tl is compared with bovine Tl. Two alternative polyadenylation signals in the bovine mRNA 1 and 2 (Rasmussen & Wohlrab, 1986; Powell et al., 1989) and the ‘I&I site used in preparation of a Tl-specific probe are shown. In human Tl the 3’ nucleotide in the mRNA is polyadenylated (Neckelmann et al., 1987) and the boxed region is the polyadenylation signal. In (b), human T2 is compared with bovine T2. The meaning of various symbols is as follows: the arrow indicates the polyadenylation site proposed by Houldsworth & Attardi (1988); the boxed region is the polyadenylation signal proposed in this paper
::
Human
Mitochondrial
ADPIATP
Translocase
Genes
273
80 60 20 40 M ‘I E Q A I s F A K ” F L A G G I A A A CCGCI’C’,‘CC7CCCG’,“I’CCCCIGCCCC~C~~C~CA~~~~~Ati~C~rCrCC,”rC~C~C~“rCrr~CGti~~~rC~CGCC~CATCTCC~~CG~CGrG~~CC~~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . :: ::: ::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ::: :::::: .. .. .. .. .. .. .. .. .. .. .. .._.......................... l’l”r’l’C’I+;C(;C1’CCCCPCCGC~C~CC~CC-~C~C~~GAC~~CA~C~l~~CCl~~GCC~~~~~~C~G~CCGGG~ATCGCC~C~C~~C~CC~~T~CG~~~CCCG M T E Q A 1 S F A K D F L A G G I A A A 60 80 40 20
100 I S
120
K
T
A
V
A
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: I
S 100
K
‘1’ A
V
P 2::
A
P
220 240 180 200 160 140 1E R Y K L L LQ "Q II A S K" I A A" K Q Y KG 1" DC I" H 1 P K E Q G" L I\‘I’CtiAU‘G(i(;L’CICrGCT~~CA~rCCA~AC~CAGC~~~~,~GCCGCCGAC~~AGr~~G~ATCGrGGAC~~C~,“rG,‘CC~ATCCCC~~AGCA~CGrGCrG ........................ .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1: .............._......... ATCGAGCGCGTCAAGCI~GC~~~l~A~AC~~~~~AGAl~~~CGAC~GC~l’~~~~~CGl~G~CAl~GTGCG~~~CCCC~~A~CCCGC’~G IPKEQGVL IERVKLLLQVQHASKQIAADKQYKGIVDCIVR 200 220 180 140 160
300 260 280 S F W R G N LAN V I R Y F P T Q A L N F A F K D ‘I’CCPTCTGW\GGU;C~C~r~C~G~A~~rA~rCCCCA~~CCC~~Cl~G~~C~GAT~T~~GC~T~“l~~-GGCGr~AC~CACACGCAG .._._.._.. . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .._... . . . . . . ‘,~CTrCTGGC(iG(~C~CCAACGTCMCC
S
F
”
R
G
N
L
A
N
V
I
260
.. .. .. .. .. .. .. ..
.. .. .. .. ..
R
Y
F
P
T
.. .. .. .. ..
Q
A
..f......... . . . . . . . . . .
L
N
6’
A
.. .. .. .. ..
F
K
.. ..
340
320
K
Y
K
Q
I
F
L
G
V
K”
T
Q
D
K3z0K
Q
IF
L
G,,;
V
D
K
H
T
V
G
K
Q
300
280
400
460
AD
660
640
S
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 460
440
420
480
LA
540 560 580 500 520 GTE R E 6'R.G LGDC L VK I T K S DG I R G L YQG F S V S VQG U;CACACAGCGCGAGl"~~CGA(;GCC~~AC~GCC~GGl~~GAl~ACC~G'l~CGACG~A~CCGffiGCC~GTACCAGG~~CAG~G'~CTCCG~GCAffitiCA~CA~C~~CTACCGG ._........... . . . . . . . . . . . : .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: :: ::::::::::::::::::::::: ..._. ... G(;CAGT(;AGCGCGnC~CAffiG~T~~ATl~T~~G~GATCACC~G~CGAC~CATCC~~~Gl’ACCffiGtiCl”rC~GTG’rCGGTGCA-ATCA~C~rC’rACC~ GSEREFRGLGDCLVKITKSDGIRGLYQGFNVSVQGIIIYR 560 580 540 500 520
620
D
. . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .._.... ‘TGGACAAGCGCACGCAG
420 440 380 400 F W R Y FAG N LAS G G A A G A T S L C F” Y P L D F A R T R l-~C’IGGAGGPACI’1’PCCGGGCAACCl~C~~CG~GGT~tiGCC~C~GACC~CCC~C~G~CG~G~ACCCGC~GGA’~~~CtiCC~~CCC~C~~AGCGGACG~GGG~G~CA .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :::::::: :: :: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. TPCTffiAGGTAC~G~~Cl’CCTCCGGCGGC~CG FWRYFAGNLASGGAAGATSLCFVYPLDFARTRLAADVGKS 380
360
G
600
I
I
I
Y
700
680
K
120
G V Y D T A K GM LP D P K N T H I V V S WM I AQT VT AVA G V Y S GL‘GGCCrACTTCGGCGTG1’RCGATAC~C-GCA~~TCCCCGACCCC~G~CACGCAC~,~G,~~G~rGGA~~ATCGCGCAGACCGrG~G~CGTGGCCGGCGrGGrGrCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. :: .. .. .. .. .. .. .. .. .. .. .. .. : .. .. .. .. .. .. .. .._.___._._......_............................................................... (jCCCCCTACTPCGGCA’PCACGACACCGCC~-A~~CCCG~CCC~G~A~CACAl~Gl~rG~~A~~rC~GCffiACC~~ACG~CGTG~~GCG~T~CC A A Y F G I Y D T AK GM LP DP K N TH I V V S WM I AQT VTAVAG VV S A
A
Y
F
620
680
660
640
700
BOO 820 760 780 740 Y P F D ‘1’ V R R R H M M Q S G ” K G A 11 I M Y 'r G 1 V D C W K K I F H " 'PACCCC~rCGACACGGI'UIGGCU;C(;CATG~rG~r~~rC~G~GC-C~~rG~~rC~,~~rACACG~ACCG,~GACrGrT~AGG~G~rCrTC~AG~rGAGGGGCGC~G :: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. TACCCLTTCW\CACCOl’(jCCGC(;GCGCAT~~T~AGTC~~~~A~GACATCA~‘rAC~~~C~GA~CT~GG~~rCCTC~~ACGAG~GGC~G YPFDTVRRR”MHQSGRKGADIMYKGTVDCWRKILKDEGGK 780 800 140 760 820 900 860 080 920 A F F K G AW S N Y LR GMG G AF V LV LY D E LK K” 1 * GCCI”I’C1TCAAGG(;nCGr~rCCAACG’I’CC~~~G~~rG~GtiC~C~“rCG’,~TGGTCCrGrACGA~A~,~~G~GG~ATCT~GGGCC~GGCC~CCTCCACACACACAC .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. GcCTTCTlTAAw;GcGc Cn;GTCCAACGT~GCGCGGCATW;GCGCCGCCTTCGT~~~~~ACGACC~G~~~GrC~~rA~GCCCGTC~~~~G~~~GAC~G~~ AFFKGAWSNVLHGMGGAFVLVLYDnKKVI’ 860 900 880 sac I
1140
1160 : :::::
lOB0
:
1100
G
G
K
.. .. .. .. .. ..
:::::
940
960
: :::
..
940
102" 980 1000 1040 1060 ACACACCACCi(;GAI\CCMGAG~CC~GTAG~T~C~CCGT~GGACCATC~C~~GA~T,~CA~rL~~rTrrCCCA~CGCA~~GCCT~AGATGGCC~GG~GC :: : : : :::: :: ::: : : : .. .. .. .. .. .. .. .. .. .. ..._... ..... .. .. .. .. .. .. .. .. --------------------------AGCGCCATCC~CA~Gr~~ACCACCGAC~CG~~~"~CA~G-------'rCCCC~CGGG~CGGGCCACCCCGC~C~GG~C~ 960 980 1000 1020 1100 1120 rrTAGN\AAGGGGCGCAT’rGCGATCCATCGCCAGCTA ............. ........................... :: rrGCCGGC~-----------------ACCATCGGCCCTGAC 1060
840
E
1080 .. .. .. .. .. .. .. 1040
1180
:: 1120
1140
1220 1240 1280 1300 + ~AGACCTAGAGTCCAGATG~GTA~C~~T~~T~TA~A~~~~~TCACL~~CCCAT~TAC~CAGC~A~CCCTG~GCACAGCCGA .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. : : ::: : : : :::::: ::::::::::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ACCffiACC~~~~~CAG~TGT~~-CG~~-T~GTA~~A~~-------G~TCA~TCCAT~'rA~-A~ATC~~~~CACAGCCGA 1200 1220 1240 1340 Gl’ACTGGCGAGTA’ffiT-TCTATGlTGGGCCTCC’,~C~ ::: .. .. .. .. .. .. .. .. .. .. .. .. :: :::: : ATACGCGCGATTA’KTTl’CTCCGTCGGGCAl’~CCG 1280
1360 AGAGGACGCAG .. .. .. .. .. .. .. .. TCACGACACAW\AAAAAAAAAAAAAAAAAAAR 1320
and the sequence is extended to the nucleotide presumed to correspond to the polyadenylation site in the mRNA. The bovine T2 polyadenylation site is also boxed. The sequences of bovine Tl and T2 are taken from Powell et al. (1989), and the human sequences are deduced from the gene sequences. These human sequences differ from those described by Neckelmann et al. (1987) and Houldsworth & Attardi (1988) as described in the legends to Figs 2 and 3, and the bovine Tl cDNA sequence differs from the partial sequence of Rasmussen & Wohlrab( 1986) as described by Powell et al. (1989).
274
A. L. Cozens et al.
(0) rl
Hum T2
was noted earlier with the bovine counterparts (Powell et al., 1989) and also with the Pl and P2 cDNAs encoding the dicyclohexylcarbodiimide reactive proteolipid subunit of ATP synthase (Gay & Walker, 1985).
198
Hum T3
8ov TI
Bov T2
Hum TI
Hum T2
Hum T3
Bov TI
(ii) Exon-intron boundaries The exons of human Tl and T2 genes are spread over 4.2 and 5.9 kb of DNA, respectively (see Fig. 6), and the three introns A, B and C interrupt the coding sequences at exactly the same positions in the two genes (see Table 1 for a summary of boundaries). The translocase protein has been proposed to be folded into six transmembrane a-helices linked by loops outside the lipid bilayer (Saraste & Walker, 1982) and all three introns in both genes break the coding sequences at points corresponding to loops 1, 4 and 5 (see Fig. 7 and section (g) for further discussion of this point).
Hum T2
(d) Amino Hum T3
Bov TI
$=26’S
%=74%
8ov T2
%=72%
%=38%
-$+=72
Hum T2
Hum T3
Hum TI
Hum T2
Hum T3
I
$=60%
(c)
%=76%
8ov TI
32
30
23
Bov TI
12
32
30
Bov T2
31
7
22
33
Hum TI
Hum T2
Hum T3
Bov TI
Figure 5. Binary comparisons of sequences of cDNAs and protein sequences of mammalian ADP/ATP translocases. (a) Coding sequences; (b) 3’ non-coding sequences of cDNAs; (c) protein sequences. In (a) and (c) in each case all the sequences are the same length (891 nucleotides and 297 amino acids, respectively) and the numbers refer to non-identical residues in the pairs of sequences. In (b) the sequences, which differ in length, were aligned with the computer program NUCALN (Wilbur & Lipman, 1983), and identities counted. The number of differences is expressed as a fraction of the length of the region over which they are aligned.
acid sequences of the human translocases ADPIATP
Tl and 2’2
The alignment of the deduced protein sequences of the human Tl and T2 ADP/ATP translocases with their bovine counterparts shows that they are closely related in sequence, and that they are all 297 amino acids long (see Fig. 4). (For an account of differences in sequences of Tl and T2 proteins between those predicted from gene sequences and those derived from the cDNA sequences of Neckelmann et aZ. (1987) and Houldsworth & Attardi (1988) see legend to this Figure.) None appears to have a processed import sequence and none has been found associated with ADP/ATP translocases in Neurospora crassa (Arends & Sebald, 1984), Saccharomyces cerevisiae (Adrian et al., 1986)) Zea mays (Baker & Leaver, 1985) and the Tl form of the bovine protein (Powell et al., 1989). In common with these homologues only the initiator methionine is removed by post-translational proteolysis. Thus, the translocases belong to a relatively rare class of nuclear-coded mitochondrial protein in which the import sequence is present in into the the mature protein assembled mitochondrial inner membrane. In the 19.cerevisiae protein this sequence is found within amino acids 1 to 115 (Adrian et al., 1986). In the bovine Tl protein the amino-terminal serine residue of the mature protein is N-acetylated, and a second posttranslational modification was detected at lysine 51 which is trimethylated (Klingenberg, 198%~). This latter residue is conserved in all mammalian translocases. (e) Repetitive Human
DNA
DNA
contains
sequences
two
types
of
middle
repetitive DNA sequence, the long and short interspersed sequences (LINES and SINES; Kao, 1985; Rogers, 1985; Weiner et al., 1986; Hutchison et al., 1989). No examples of the former have been detected in the sequences described in this paper,
Human Mitochondrial
Translocase Genes
141
487
218
Human Tl
ADPIATP
459
1269
-_-----
-----_-,
141
487
Human T2
_______.
2171
1820
A
B
666 (approx)
III IV II I Figure 6. Structures of the human Tl and T2 genes encoding mitochondrial ADP/ATP translocase. In each gene, exons I to IV and introns A to C are shown as filled boxes and continuous lines, respectively. The sizes of exons and introns are given in base-pairs, those of exons I and IV in the T2 gene being approximate as the transcriptional start and polyadenylation site are not known.
illustrates dramatically the presence of repetitive elements in introns A and B of the T2 gene. The former are six Ah repeats and the latter arise from six tandem repeats of about 236 nucleotides and are referred to as T2 repeats. A seventh and eighth A&u repeat were detected in the 5’ and 3’-flanking regions of the T2 gene, and three others were found in the Tl sequence (see Figs 2, 3 and 9). The T2
but the Tl and T2 genes and flanking regions contain two different examples of SINES. These repetitive sequences were detected in two ways, firstly by comparison of DNA sequences with the EMBL data base using the computer programme FASTN (Lipman & Pearson, 1985), and, secondly, in the case of the T2 sequence by comparison of the DNA sequence with itself (Fig. 8). This Figure
1
50
3
M GDHAKSFLKDFLAGGVAA4VSKTAVAPIER VKLLIQVQ'H ASXQISAEKQXKGIIDCVVR IPKE(x;FLSF KRGNLANVIRXFPTQUNFA M SDQALSFLFJFLA0ZVMAI SKTAVAPIER VRLLIMQ'H ASKQISAEKQXKGIICCVVR IPKEWFLSF NRGNU&,'IR XFPTtXLNFA M TF,(wISFAKD FLtGGIAMI SKTAVAPIER VKLLUIV(1'H ASKQIAADKQXKGIVDCIVR IPKEO3JLSF WB3LAWIR XFPXHLNFI\ M TEQliISFAKD FUGGIAAAI SKTAVAPIER VKLLIRM'H ASKQIAADKQXKGIVIKZfR IPKEOGVLSFWRGNUNVIR XFPVXtVA " TDAALSFAKD FL4GGVAAAI SKTAVAFIER VKLLIQVQ'H ASKQITADKQXKGIIECVVR IPKEPEVLSF W&NIMVIR XFPTQUWA HAE~VLGIIPP~FIWGGVSMVS~~IERIKLLVPNIDEMIIUGRLDRRYNGIIDCFKATT~~~~RYF~AFRM(F~G M SHTETQ?CQSHFGVDFIK;GvsMI AKTCAAPIER VKLIMNQ!ZE MLKCGSLDTRXKGILDCFKR TATKEGIVSF WlCNl'ANVLR XF-A H QTPLCANAPAEKGGFMMID -EAAV SKTAAAPIER VKLLICNQDE MIKSGRLSEPXKGIVDCFKRTIWEGFSSL "RGYTANVIR XFPTQ,MFA t
t
t
t.
.*
..*
tt**t
ttt
.
l
et..
tt
l **
l
---_-----------------------
.t*tt
l tt*tt.tt.
Eovlm T2 Human T3 N. crassa
FKDKIKSLLS FKDXFKRLFN .
.
t
II
150 200 i VXPLDFARTRLAAD”VGKG AAQEFHGLG DCIIKIFKSD GLRGLXQGFN VSV(X;IIIXR AAXFGVXDTAK'GMLPDPKN GGVDRBKQFN RXFAGNLASG-TSLCF u;vDIUKQF!d RXFAGNIASGUAGATSICF VYPLDFARTRLAAD"VGKG AAQNFTGLG NCITKIFKSD GLRGLXQGFN VSVQZIIIXR AAXFGVXDTAK'GMLPDPKN GGvDl(flToFI( RXFAGNLASGGAAC%TSLCFVYPLDFARTRUAD"VGXS GTERKFRGIGBZLVKITKSD GIRGLYSFS VSVQGXIIYR AAYFGVYDTA K*CHLPDPKN GAAGATSLCF VYPLDFARTR LAAD*'VGKS GSEREFiUXA IKLVKITKSD GIRGLYGGFNVSVQGIIIYR AAXFGIYDTA K%MWDPKN GG"DI[RzpFv RYFAGNLASG GGVDKRTQFMRXFAGNUSG GAAGATSLCF VYPLDFARTR LMD"VGKAGAEREFRGLGDCLVKIXKSD GIKGLY~FNV~IIIYRAAXFGIYBTAK'~DPKN YKKDV*DGXW~LASGGAAGATSLLFVXSLDXAi7TRUNDAI(UI(I GGERQF%LVDVXRKTIASD GXAGLYRGFG PSVAGIWYRGLXFGLXDSI KPVLLVGDLK YDRER=DGXARRFAGNLFSG GAAGGLSLLFVXSLDYARTRMADARGSKSTXlRgmLL DVYIIKTLKTDGLLGLYRGFV PSV‘GIIVYR GLYFGLXDSFKPVLLXGALE FKKDR*DGYNI(wFAGIv,ASGGNkGASSLFFVYSLDXARTRIANTlMAMG GGERQFKLV DVYRKTLKSDGIAGLYRGFNISCVGIIVYR GLYFGLYDSI KPVVLTGNM tttt
l *
tttt
l *
t
l .
tt
+*.t
*.
t
.
l
t
*t
t
_-____--_-_-----------------
t
t
t*t
tt
t
l t
.t
ttt
t*
t
*
_-
-------------__--------------
III
Huma” Tl Bovine Tl HumanT Bovine T2 Human T3
FKDFCXKQIFL FKDKXKQIFL
--------_______--------------
I
Human Tl Rovlne Tl “mm T2
100 FKDKXKQLFL FKDKYKQIFL FKDKYKQIFL
IV
250 300 E VKIWs*nIA QSVTAVAGLVSYPFDTJRRRM,,QSGRKGADIMXYTGTVEC NRKIAKDEGAKAFFKGAKSNVLRQCIULFV LVLXDEI"' l *em VRIIVsuIIAQTVTAkGLV SYPFDTVRRRpI+RsGRRGA DIMYTGTVDCNRXIAHDeGp KAFFKGA"SN~ LVLXDEI"' *'*KKFV THIWSIMIA QlVTAVM SYPFDlvRRRrnpsGRRGA DIWRKIFFtDEGGMFFKAWSNLVLYDEL"' ***KKVI THIVVSMIA -"AGW SYPFDTVRM B DINFQULKDEGG FAFFXGANSNLVLYDEL*** *wxVI TKIVISmIA QIVTAVAGLT SYPFDTVRRRB DIBYTGTLDCHFUCIARDEGG~FKGAKSN~ L"LYDEI'** *"XXYT NNFLASFALG UZVTTAAGIA SYPLOTIRRR~SG **'EAVKXK.SSFDAAS41VAKK~KSLFILRGVkaGV LSIYDQLQVL LFGKAFKG~ G GSFVASFLLGW#Ilt.tZASTA SYPUMVRRRM?tXSG***Q TIKYDGAIBz LRKIVQKEGAYSLFKGCGANIFRGVAAAGV ISLYDQlpLI MY;IWK DNFFASFALG WLITNUGU SYPIDTVMR mSG l *=E AVKYK.%SLDAFM)IMKEGPKSLFKGAGANILRAIAGAGV LSGXDQLQILFFGKKXGSGG A t f l .* l * l ** a.. l * . f t . . . et* t t . . ** t --_------~_____-_--------~~~~ V
VI
Figure 7. Protein sequences of mitochondrial ADP/ATP translocases. The sequences are numbered from the aminoterminal residue of the mature bovine Tl protein. The initiator methionine that precedes it is residue - 1. Stars indicate residues conserved in all species and segments I to VI are hydrophobic regions that are proposed to be folded into transmembrane a-helices (Saraste & Walker, 1982). A, B and C indicate the positions of introns in the corresponding regions of the human genes. The protein sequences are taken from the following sources: human Tl and T2, this work; bovine Tl, Aquila et al. (1982); Powell et al. (1989); bovine T2, Powell et al. (1989); N. crassa, Arends & Sebald (1984); S. cerevisiae, Adrian et al. (1986); 2. mays, Baker k Leaver (1985). The sequences for human Tl and T2, which are deduced from the gene sequences, differ from those predicted by cDNA sequences as follows. The cDNA sequence of human Tl (Neckelmann et al., 1987) codes for Ala instead of Gly at position 15, for the sequence Arg-Arg at positions 149-150 rather than Lys-Gly, Ala151 being deleted, and for Leu instead of Va1230. The partial cDNA sequence of human T2 in the clone pHAT8 (Houldsworth & Attardi, 1988) codes for the sequence Arg-His-Ala rather than LysHis-Thr at positions 105 to 107, and Gln108 is deleted (see also the legends to Figs 2 and 3 for an account of differences at, the nucleotide level).
276
A. L. Cozens
et al.
repeats are highly conserved, 154 nucleotides being identical in the five longest sequences (see Fig. 10). They have no associated poly(A) sequences such as are found in Ah sequences, and there is no indication that, t’hey are transcribed. They cannot be folded into a tRNA-like structure (Daniels & Deininger, 1985). Apparently this t,ype of repeat has not been detected and it is not, known if it is found elsewhere in the human genome.
6-
(f) li:zpression
T2 gene (kb)
Figure 8. Comparison of the DKA sequence of the human T2 gene and flanking regions with itself. The computer program DIAGON (Staden, 1982) was used. A window length of 231 and score of 100 were employed in the calculation. I to IV indicate the positions of exons. The related sequences in intron A (between exons I and II) are Ah repeats (Fig. 9) and those in intron B (bet’ween exons II and III are referred to as T2 repeats (Fig. IO).
2189-1900,-1 8317-8618 l-215 2398-2639 2640-2842 2844-3066 3352-30691-I 3353-3647 X48-3858 8625-8426(F)
Al"
ccJ"sen*"*
GCTCGGCGTGGIU;CTCACACCTGTAA------TCCCAtC
at
*
CO”*e”*“*
Alu
I/**
l .
,
CATGGTGTGAAA-CCCCGTCTCTACTAAARRTAC--ARRAGGA f
*
f
*.
.
(I
. . .
f
\TMATAAA
I-114 2189-19001-1 8317-8618 I-215 2398-2639 2640-2842 2844-3066 3352-30691-l 3353-3647 X49-3858 8625-84261-I co"se"s"5
(I
GAT-CCCTTGTCCCTAGGA CA---AAAR-CCCCATCrrTACTAAARATAC--AAAARA CCGAGCG--AAA-CCCCATCrrTAATARRAATAC-RAAARRA CATGGCGCGAAA-CCCCGTC~T~T~TAT----TTA~T~~~~T~~-A~C~TCA~CCA~TACTT-~T~~A~AG~TC~T-CCCGGGA TTATTTTAACTTAACGAAtCAGGACG---------------TCCCAGTGCATT~~G-----------------------~CCGGAG ~CAAGA-------CCTCACCTC-~ tCAAGA-------CCCCATCTCT-AAGARAATmTAAAACTTA~A~-C-TG---------------TCCCC~ACTTTGAGAG-----------------------~CCGA~ GCAAG--------CCCCGTCTCT-ARGAAAATTmTAAACTTAGCAGGGC-TG---------------TCCCA~ACTT~AGAG-----------------------~CCA~G -TGA=A---G-CCCAGGA CGACGTGrrAAA-CCCTGTCTcTACTAAAAATAC-AGAACA~~~~~~T~AG-GTGCC------------T~TA GCARG---------TCCArrrrT--GT~T~~TT~~A~T~AT~~~ATC~TA~CCC~TAC~-~C~--~-TT~TCC-A~A TTAtCCttGCGTGGTtGrrX;-GCGCCTATAG~CCCAGCT AGGAGAATCACTTUTC-----CAn;Gn;TGGAACGCTCGTC~TACT~T-~ cATCGn;TCAAR-CCCCATCTCCACTAAPARTAC--T-~A~~~~-~-AT~C~TA~CCCA~TACT~~~A~A~-~~TT~CCTG~A l
Tl Tl Tl T2 T2 T2 T2 T2 T2 T2 T2
fl
200
150
Al”
l
B block
A block l-114 2189-1900 t-1 8317-8618 l-215 2398-2639 2640-2842 2844-3066 3352-3069,-l 3353-3647 3649-3858 8625-8426(-l
and T2 genes
GGGTTCGAANCC
GTGGYNNRGTGG
Tl Tl Tl TZ T2 T2 T2 T2 72 *2 T2
Tl
A major reason for carrying out this detailed analysis of human Tl and T2 genes is the wide difference in relative levels of expression of the bovine homologues in various tissues (Powell et al., 1989: see Introduction). Equivalent studies of the human genes have not yet been carried out, although it has been shown that Tl, T2 and T3 are expressed in liver (Houldsworth & Attardi, 1988), and different levels of transcripts of Tl have been found in various human tissues (Neckelmann et al.. 1987). Earlier, immunological studies of the protein isolated from heart, liver and kidney provided for organ-specific determinants, with evidence partial identity of proteins from the various sources (Schultheiss & Klingenberg, 1984, 1985). Therefore, it is not unreasonable t!o assume that the human
012345678
Tl Tl T2 T2 T2 T2 T2 TZ T2 T2
of the human
G~TTGCAGTGAGCCGA~TCGCGCCAC~C-KCAGCC~AACA~G-AGACTCCA--TCTC----~ l
*
Ir.1
.
Figure 9. Summary of human Alu sequences in and around the Tl and T2 genes. The sequences have been aligned with the consensus human Ah sequence (Deininger et al., 1981). Insertions, denoted by dashes, have been introduced to improve alignment. Completely conserved residues are indicated with a star. The A and B blocks (underlined) refer to conserved elements in the split promoter (Fowlkes t Shenk, 1980; Fuhrman et al., 1981; Paolella et al., 1983; PerezStable et al., 1984: Rogers, 1985). Three of the Ah sequences shown in the Figure are incomplete as they flank the experimentally determined sequences; these are T2, 1 to 215: T2, 8625 to 8426; and Tl, 1 to 114.
Human
T2 T2 T2 T2 T2 T2
4483-4715 4716-4950 4951-5169 5170-5402 5403-5636 5637-5774
4483-4715 4716-4950 4951-5169 5170-5402 5403-5636 5637-5774
A DP/ AT P Tranalocase
Genes
277
a0 60 20 40 GTCCTGTGGGCTGMGGTCTGAGA-TMGGTGTGGG-CAGGGCTGGTTCCTCCTGCGGCCTCTCTCCTGGACTTGGAGACGCCGTCTTCTCCCTGTGCCC GTCCTGGAGGCTGGAGGTCTGAGATCCAGGTGTGGG-CAGGGCTGGTTCCTCCTGCGGCCTCTCTCCTAGGC~GTAGATGCCGTC~CTCCCTGTGCCC GTCCTGCAGGCTGMGGTCTGAGACCMGGCATGGG-CAGGGCTGGTTCCTCCTGAGGCCTCTCTCCTGGGCTTGGAGATGCTGTCTTCPC--------GTCCTGGAGGCTGGMGTCTGAGATCCAGGTGTGGG-CAGGGCTGGTTCCTCCTGAGGCCTCTCTCCTGGGCTTGGAGACTCCGTC~--CCCTGTGTCC ATCCTGGAGTCTGGMGTCTGAGATCMGGTGTGGGGCAGGGCTGGTTCCTCCTGAGGCCTCTCTCCTGGGC~GTAGACGCCGT~TCTCC~GTGTCC GTCCTGGAGGCTGGMGTCTGACATCMGGAGTGGG-CAGGGCTGGTTCCTTCTGAGGCCTCTCTT~TGGC~GTAGACACCGTCTTCTCCCTGTGTCC l
T2 T2 T2 T2 T2 T2
Mitochondrial
.*,.
f
ttt
t
t**t**
.
l
‘t*
llttt
t,**t***.t.t.t
t.,
.**tt**+,
t
+
/)o*.
f,.
l
100
II+***
l
160 180 120 140 TCACAGCATCATCCCTCTGTGTGTGTCTGTGTCCTCATCCTCTCTTCTTATGGGATGTCTTMTCCATTTCAGGCTGCTATCACAGMTACCATAGACTG TCACGGGGTCGTCCCTCTGTGTGTGTCTGTOTCCTCATCTC~CTTGTTATGAGATGTCTTMTCCATTTCAGGffGCTATCACAGMTACCATAGACTG ----AGGCTTTTTCCTGTGTGTGTGT~GTGTCCTCATC---TCTTGTTATGAGATGTCTTAGTCCATTTCAGGffGCCATCCCAGMCACCATAGACTG TCACAGGGTCGTCCCTCTGTGTGTGTCPGTOTCCTCGTCCTCGTGTCCTC~CTGATGAGATGTCTTAGTCCATTTGAGGCTGCTATCACAGMTAC~TAGAGTG TCACAGGGTCATCCCTCTGTGTGTGTCTGTGTGTCTACTG TCACAGGGTCATCCCTCCGTGTGT--CTGTGTCCTCATCTC ,
l
t
**I)
l
.**..
tttt****.
.
a**.
(I
.*.
0
,*****t
I**.+*.
..**t
l
t
t
200
.f
tt
t.**)+
I.
220 T2 T2 T2
4453-4715 4716-4950 4951-5169
T2 T2
5170-5502 5403-5636
GGTGACT-ATMACAACAGACATTGATTTTCCTACA
GGTGGATTGTMACAGCAGACATTGATTCTCCCACA GGTGACTTAGAAACAACAGACATTGATTCTCCCACA GGTGGACTGTAAACAACAGACATTGATTCTTCCATA GGTGGCTTATAAACAGCAGACATTGATTCTCCCGGA ."lt ttttt t***..**t..* li .
t
Figure 10. Summary of 6 novel repeats found in intron R of the human improve alignments. The stars indicate conserved bases.
genes will be expressed differently in various tissues. A similar phenomenon has been demonstrated also with the bovine PI and P2 genes encoding different precursors of the dicyclohexylcarbodiimide reactive subunit of mitochondrial ATP synthase (Gay & Walker, 19&i), and with heart (Walker et al., 1989) and liver (Breen, 1988) isoforms of the cc-subunit of the same enzyme. The 5’ regions of Tl and T2 genes have a high preponderance of the dinucleotide CpG (see Fig. 1 I), and clearly CpG-rich islands seem to be associated with 5’ ends of the Tl and T2 genes. That in the Tl gene covers about 1 kb and extends into the region to the 5’ side of the site of transcriptional initiation. The CpG-rich sequence in the T2 gene is more extensive. It covers about 1.5 kb and also probably includes regions involved in initiation of transcription. In CpG islands it is thought that cytosine in CpG is not methylated, whereas it mostly is in CpG elsewhere, but this remains to be demonstrated in the present cases. There are approximately 30,000 such islands in mammalian genomes, and whilst some of them appear to be associated with housekeeping genes that are expressed in all tissues, others are associated with genes that are expressed only in specific cell types (Bird, 1986, 1987). It has been suggested that tissue-specific genes without CpG
T2 gene. Dashes
have been introduced
to
islands, would be unavailable to ubiquitous transcription factors in tissues where they are and so this could contribute to methylated, transcriptional repression and lack of expression. In contrast, tissue-specific genes with CpG islands would be available continuously to transcription factors, and their expression could be prevented in non-expressing tissues by trans-acting repressors (Bird, 1987). The human T2 gene has a number of features near to its 5’ end that suggest that, it. might be a “housekeeping” gene expressed in all tissues. Firstly, its promoter is G+C-rich and contains multiple binding sites with the potential to bind transcription factor SPl. In addition, it evidently lacks both TATA and CCAAT boxes (Melton, 1987). In contrast, the human Tl gene has features that have been associated with both housekeeping and tissue-specific genes. For example, it has TATA and CCAAT promoter elements, and both are associated with tissue-specific genes and some housekeeping genes also. It has been proposed that the TATA boxes associated with housekeeping genes are part of a more extensive conserved element which extends in a 3’ direction up to the cap site (Martini et al., 1986). In the human Tl gene 17 out of 30 nucleotides (or 57%) are conserved over this region, which is rather low in comparison with the levels of
I CG
II
I
I Il~ll
III I
Figure 11. Distribution of the dinucleotide CpG in the 5’ regions of the human Tl and T2 genes. The horizontal lines above the distributions of dinucleotides indicate non-coding regions, and the filled boxes the positions of exons. The calculation was made with the computer program ANALYSEQ (Staden, 1985).
278
A. L. Cozens et al.
conservation observed in most other tissue-specific genes in this category that have been investigated (Martini et al., 1986), except that the human muscle carbonic anhydrase gene is conserved in only 48% of nucleotides between its TATA box and cap site (Lloyd et al., 1987). A second feature that has been proposed to be characteristic of a housekeeping gene is the presence of a CpG island near to its 5’ end, but they have also been found in tissue-specific genes. So it is unclear from DNA sequence alone to which category of gene human Tl belongs, and detailed studies are required of the methylation states of the CpG islands and of the transcription and expression of the human Tl and T2 genes in various tissues. (g) Number of genes for mammalian ADPIATP translocase Hybridization experiments conducted on restrietion digests of human and bovine genomic DNA have shown the presence of numerous different sequences related to the cDNAs of bovine Tl and T2 ADP/ATP translocases. For example, between 12 to 15 sequences could be detected by hybridization of restriction digests of human DNA with the coding region of the bovine Tl cDNA (Powell et al., 1989). So far, three expressed human genes have been detected. Two of them are the Tl and T2 genes described in this paper, for which corresponding cDNA clones also have been described (Neckelmann et al., 1987; Houldsworth & Attardi, 1988). .Expression of a third related, but different, gene has been detected in HL60 cells that have been growth-stimulated (Battini et al., 1987), and also in liver and HeLa cells (Houldsworth & Attardi, 1988). In addition, in the course of our studies we have partially characterized two spliced pseudogenes related to T2 (unpublished observations) and there can be little doubt that there are others. It would be of particular interest to know whether there are additional expressed genes for ADP/ATP translocase that lie undetected in the human genome.
(h) Evolution of genes for mitochondrial translocase ADPIATP From studies of the sequences of other metabolite transport proteins in the inner membrane of mitoahondria it is known that the ADP/ATP translocase -belongs to a wider multigene family. This embraces at least the uncoupling protein from mitochondria of brown fat, which is a proton transporter (Aquila et al., 1985, 1987), the phosphate carrier (Runswick et al., 1987), and preliminary results strongly suggest that the a-ketoglutaratelmalate carrier also belongs to this family (M. J. Runswick, F. Bisaccia, F. Palmieri & J. E. Walker, unpublished results). These studies have shown that these proteins are homologous
throughout much of their sequences, and that the uncoupling protein and the phosphate carrier have a threefold internal repeat of about 100 amino acids, as was detected first in the sequence of bovine ADP/ATP translocase (Saraste & Walker, 1982). This is also present in the human homologues discussed in this paper. Therefore, four major steps can be detected in the evolution of the ADP/ATP translocase: Step 1. Formation of a primordial carrier with three tandemly repeated domains from a single domain of about 100 amino acids by two gene duplication events. As in the present-day proteins, this elemental domain probably consisted of two transmembrane a-helical segments linked by an extramembranous loop (Saraste & Walker, 1982). Step 2. Divergence of the primordial carrier, and evolution of carrier specificity. Step 3. Expansion of the ADP/ATP translocase expressed gene family before the emergence of man and cow. Step 4. After the divergence of man and cow formation of pseudogenes, by reverse transcription of mRNA and retroposition in the human and bovine genomes . A number of comments can be made concerning this proposal. Firstly, carrier proteins belonging to the family to which the ADP/ATP translocase belongs, have so far been detected only in mitochondria. Chloroplasts have carriers with the same biochemical functions as some mitochondrial carriers, examples being the adenine nucleotide and phosphate carriers, but as yet there are no sequence data. Other carriers in bacteria and plasma membranes of eukaryotes form a separate family (Maiden et al., 1987) and are not evidently related to the mitochondrial carriers. Therefore, the origin of the primordial carrier in Step 1 is particularly the possibility cannot be obscure, although excluded that it was introduced into the protoeukaryotic cell by endosymbiosis (assuming that this explains the origin of mitochondria; see Gray & Doolittle, 1982). The positions of intron B in the contemporary translocase genes can be taken as relics of earlier gene duplication events that presumably gave rise to the three tandem repeats (Gilbert, 1978; Blake, 1978; Traut, 1988). Equally, the positions of introns A and C could be taken as evidence that the 100 amino acid repeat itself evolved by an earlier duplication of a small domain containing one transmembrane segment, but the present-day sequences do not support t’his suggestion. Secondly, at present it is not clear how many of the 13 or so mitochondrial carriers that have been detected (LaNoue & Schoolwerth, 1979) belong to the family discussed here. The ADP/ATP, phosphate and a-ketoglutaratelmalate (Bisaccia & Palmieri, 1984) carriers and the uncoupling protein from brown fat mitochondria (Aquila et al., 1985; Runswick et al., 1987) are all related in sequence. They have similar sizes and each comprises about 306 amino acids. The dicarboxylate carrier has also
Human Mitochondrial
ADPIATP
been purified and reconstituted and appears to belong to this family in so far as its molecular weight is about 28,000 (Bisaccia et al., 1988). However, two other carriers do not conform to these general properties; the aspartate/glutamate carrier has an estimated molecular weight of about 68,000 (Kramer et al., 1986) and there is evidence that the pyruvate carrier contains a component of 1975). No other about 15,000 (Halestrap, mitochondrial metabolite transport proteins have been purified to date, and assessment of the extent of this gene family awaits sequence data on other carriers. Aquila et al. (I985), in a discussion of the evolution of the carriers, have proposed a hierarchy of metabolite carriers in which the uncoupling protein belongs to a subgroup of H+-anion cotransporters. They propose that the uncoupling protein is not an elementary H+ carrier from which others derive, but rather that it is a degenerate form of an H+-anion co-transporter. If this proposal is correct, then it adds detail to Step 2 of the proposed evolutionary sequence. Thirdly, Neckelmann et al. (1987) have calculated that the human Tl and T2 genes diverged in the Pennsylvanian period (310 to 270 million years before the present) or in the Permian period (270 to 225 million years before the present). This calculation is based upon consideration of replacement and synonymous substitutions, but it should be borne in mind that incorrect sequence data were employed (see the legend to Figs 2 and 7). Divergence of man and cows is believed to have occurred at a later date, 80 million years ago (Li et al., 1985). We are grateful to Dr R. Baserga for communicating to us unpublished information concerning the human T3 gene, and to Dr T. H. Rabbitts for providing us with a sample of the human genomic library AT5. We thank Mr T. Hercus for his assistance in sequencing the T2 gene. A.L.C. was supported by an M.R.C. Research Training Fellowship. References Adrian, G. S., McCammon, M. T., Montgomery, D. L. $ Douglas, M. G. (1986). Mol. Cd Biol. 6, 626634. Aquila, H., Misra, D., Eulitz, M. & Klingenberg, M. (1982). Hoppe-Seylerk 2. Physiol. Chem. 363, 345349. Aquila, H., Link, T. A. & Klingenberg, M. (1985). EMBO J. 4, 2369-2376. Aquila, H.. Link, T. A. t Klingenberg, M. (1987). FEBS Letters, 212, l-9. Arends, H. & Sebald, W. (1984). EMBO J. 3, 377-382. Baker, A. & Leaver, C. J. (1985). NucZ. Acids Res. 13, 5857-5867. Bankier, A. T. & Barrell, B. G. (1983). In Techniques in NucZeic Acid Biochemistry (Flavell, R. A., ed.), vol. B508, pp. l-34, Elsevier, County Clare, Ireland and Kew York. Battini, R., Ferrari, S., Kaczmarek, L., Calabretta, B., Chen, S. T. & Baserga, R. (1987). J. BioZ. Chem. 262, 4355-4359. Benton, W. D. 6 Davis, R. W. (1977). Science, l%, 180-182.
Transbcase Genes
279
Biggin, M. D., Gibson, T. J. & Hong, G. F. (1983). Proc. Nat. Acad. Sci., U.S.A. 80, 3963-3965. Bird, A. P. (1986). Nature (London), 321, 209-213. Bird, A. P. (1987). Trends Genet. 3, 342-346. Bisaccia, F. & Palmieri, F. (1984). Biochim. Biophys. Acta, 766, 386-394. Bisaccia, F., Individeri, C. & Palmieri, F. (1988). Biochim. Biophys. Acta, 933, 229-240. Blake, C. C. F. (1978). Nature (London), 273, 267. Breathnach, R. & Chambon, P. (1981). Annu. Rev. Biochem. 50, 349-383. Breen, G. A. M. (1988). B&hem. Biophys. Res. Commun. 152, 264-269. Capaldi, R. A. (1988). Trends Biochem. Sci. 13, 144-148. Clark, J. B., Hayes, D. J., Byrne, E. & Morgan-Hughes, J. A. (1987). Biochem. Sot. Trans. 15, 626-627. Daniels, G. R. & Deininger, P. L. (1985). Nature (London), 317, 819-822. Deininger, P. L. (1983). Anal. B&hem. 129, 216-223. Deininger, P. L., Jolly, D. J., Rubin, C. M., Freidmann, T. & Schmid, C. W. (1981). J. Mol. BioZ. 151, 17-33. Dyer, M. R., Gay, pu’.J. & Walker, J. E. (1989). Biochem. J. in the press. Farrell, L. B. & Nagley, P. (1987). Biochem. Biophys. Res. Commun. 144, 1257-1264. Farrell, P. J., Deininger, P. L., Bankier, A. & Barrell, B. G. (1983). Proc. Nat. Ad. Sci., U.S.A. 80, 15651569. Fowlkes, D. M. & Schenk, T. (1980). CeZZ,22, 405-413. Fuhrman, S., Deininger, P. L., LaPorte. P., Freidmann, T. & Geiduschek, E. P. (1981). NucZ. Acids Res. 9, 6439-6456. Gay, N. J. & Walker, J. E. (1985). EMBO J. 4, 35193524. Gilbert, W. (1978). Nature (London), 271, 501. Gray, M. W. & Doolittle, W. F. (1982). Microbial. Rev. 46, l-42. Halestrap, A. (1975). B&hem. J. 172, 377-387. Houldsworth, J. & Attardi, G. (1988). Proc. Nat. Acad. Sci., U.S.A. 85, 377-381. Hutchison, C. A., III, Hardies, S. C., Loeb, D. D., Sheshe, W. R. & Edgell, M. H. (1989). In MobiZe DNA (Berg, D. E. & Howe, M. H., eds), A. S. M. Press, Washington DC, in the press. Kadonaga, J. T., Jones, K. A. & Tijian, R. (1986). Trends Biochem. Sci. 11, 20-23. Kao, F. T. (1985). Inter. Rev. Cytol. 96, 51-88. Klingenberg, M. (1985a). In The Enzymes of BioZogicuZ Membranes (Martonosi, A. fu’., ed.), vol. 4, pp. 51 l-553, Plenum Publishing Corporation, New Yark. Klingenberg, M. (1985b). AnnaZ N.Y. Acad. Sci. 456, 279-288. Kriimer, R., Kiirzinger, G. & Heberger, C. (1986). ,irch. Biochem. Biophys. 251, 166-174. Kuhn-Kentwig, L. t Kadenbach, B. (1985). Eur. J. B&hem. 149, 147-158. LaKoue, K. F. & Schoolwerth, A. C. (1979). Annu. Rev. Biochem. 48, 871-922.
LaNoue, K. F. & Schoolwerth,
A. C. (1984). In
Bioenergetics (Ernster, L., ed.), pp. 221-268, Elsevier Science Publishers B.V., Amsterdam. Li, W.-H., Luo, C.-C. & Wu, C.-I. (1985). In Molecular Evolutionary Genetics (MacIntyre, R. J., ed.), pp. l94, Plenum Press, New York. Lipman, D. J. & Pearson, W. R. (1985). Science, 227, 1435-1441. Lloyd, J., Brownson, C., Tweedie, S., Charlton, J. & Edwards, Y. H. (1987). Genes DeveZop. 1, 594-602. Maiden, M. C. J., Davis, E. O., Baldwin, S. A., Moore,
280
A. L. Cozens et al.
D. C. M. & Henderson, P. J. F. (1987). Nature (London), 325, 641-643. Martini, G., Toniolo, D., Vulliamy, T., Luzzatto, L., Dono, R., Viglietto, G., Paonessa, G., D’Urso, M. D. & Persico, M. G. (1986). EMBO J. 5, 1849-1855. Melton, D. W. (1987). In Oxford Surveys on Eukaryotic Genes (Maclean, N., ed.), vol. 4, pp. 34-76, Oxford University Press, Oxford, U.K. Messing, J. (1983). Methods Enzymol. 101, 20-78. Mills, D. R. & Kramer, F. R. (1979). Proc. Nat. Acad. SC;., U.S.A. 76, 2232-2235. Mizusawa, S., Nishimura, S. & Seela, F. (1986). NucE. Acids Res. 14, 1319-1324. Morgan-Hughes, J. A. (1986). Trends Neurosci. 9, 15-19. Neckelmann, N., Li, K., Wade, R. P., Shuster, R. & Wallace, D. C. (1987). Proc. Nat. Acud. Sci., U.S.A. 84, 7580-7584. Paolella, G., Lucero, M. A., Murphy, M. H. & Baralle, F. E. (1983). EMBO J. 2, 691-696. Perez-Stable, C., Ayres, T. & Shen, C. K. J. (1984). Proc. Nat. Acad. Sci., U.S.A. 81, 5291-5295. Powell, S. J., Medd, S. M., Runswick, M. J. & Walker, J. E. (1989). Biochew&try, in the press. Rasmussen, U. B. & Wohlrab, H. (1986). Biochem. Biophys. Res. Commun. 138, 8504357. Rogers, J. H. (1985). Int. Rev. CytoE. 93, 187-279. Runswick, M. J., Powell, S. J., Nyren, P. & Walker, J. E. (1987). EMBO J. 6, 1367-1373. Edited
Sanger, F., Nicklen, S. & Coulson, A. R. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 5463-5467. Sara&e, M. & Walker, J. E. (1982). FEBS Letters, 144. 250-254. Schultheiss, H. P. & Klingenberg, M. (1984). Eur. J. Biochem. 143, 599-605. Schultheiss. H. P. 8r Klingenberg, M. (1985). Arch. Biochem. Biophys. 239, 273-279. Staden, R. (1982). Nucl. Acids Res. 10, 2951-2961. Staden, R. (1985). In Genetic Engineering: Principles and Methods (Setlow, J. K. & Hollaender, A., eds), vol. 7 pp. 67-114, Plenum Publishing Corporation, New York and London. Taylor. A. M. R., Oxford, J. M. & Metcalfe, J. A. (1981). Int. J. Cancer, 27, 311-319. Traut, T. W. (1988). Proc. Nat. Acad. Sci., U.S.A. 85, 2944-2948. Walker, J. E., Cozens, A. L., Dyer, M. R., Fearnley, I. M., Powell, S. J. & Runswick. M. J. (1987). Chemica Scripta, 27B, 97-105. Walker, J. E., Powell, 6. J., Viiias, 0. & Runswick. M. ,J. (1989). Biochemistry, in t,he press. Wallace, Il. C. (1986). Hospital Practice, 77-92. Weiner, A. M., Deininger, P. L. & Efstratiadis, A. (1986). Annu. Rev. Biochem. 55, 631-661. Wilbur, W. J. & Lipman, D. J. (1983). Proc. Xat. Acad. Sci., U.S.A. 80, 726-730.
by P. Chambon