ATP translocase

ATP translocase

J. Mol. Biol. (1989) 296, 261-280 DNA Sequences of Two Expressed Nuclear Genes for Human Mitochondrial ADP/ATP Translocase Alison L. Cozens-f-, Micha...

2MB Sizes 0 Downloads 5 Views

J. Mol. Biol. (1989) 296, 261-280

DNA Sequences of Two Expressed Nuclear Genes for Human Mitochondrial ADP/ATP Translocase Alison L. Cozens-f-, Michael J. Runswick and John E. Walkert Medical

Research Council Laboratory of Molecular Hills Road, Cambridge CBZ 2&H, U.K. (Received 18 July

Biology

1988)

Mitochondrial ADP/ATP translocase is an abundant component of the inner membrane. It carries ATP from the matrix into the intermembrane space and transports ADP back. Clones coding for two different but related forms of the protein have been characterized from bovine cDNA libraries. The corresponding genes are referred to as Tl and T2 and they are expressed at different levels in bovine tissues. The bovine cDNAs have been used to isolate clones from a human genomic library that contain the human Tl and T2 genes. Two nucleotide sequences of 9756 and 8625 base-pairs have been determined and they contain the transcribed regions of the human Tl and T2 genes which cover 4.2 and 5.9 kb of the human genome, respectively (1 kb = lo3 base-pairs). Both genes are split into four exons. The introns in each gene are at exactly equivalent locations and interrupt sequences coding for segments of the protein that are thought to be extramembranous loops linking t,ransmembrane segments. The proteins encoded in the genes differ in 32 amino acids out of 297. and in common with other ADP/ATP translocases, neither has a processed mit’ochondrial import sequence. The human Tl and T2 genes are members of a larger gene family t#hat includes a third expressed gene T3 and also at least two spliced pseudogenes. Other studies have shown that T3 is expressed in liver and HeLa cells, and different levels of transcripts of Tl have been found in various tissues. A notable feature of t,he Tl and T2 genes, that may influence their expression, is that “CpG-rich islands” are associated with their 5’ ends. That of the T2 gene cont’ains numerous potential sites for binding the mammalian transcription factor SPl, but no TATA or CCAAT sequences are evident near to its .5’ end, although these latter features are associated with the human Tl gene. The two DNA sequences also contain many short interspersed repetitive sequences including 11 Alu repeats. and a novel element about 236 base-pairs in length, which is repeated in a six-fold tandem array in intron B of the T2 gene.

1. Introduction

sequence has been used to design an oligonucleotide probe to isolate cognate cDNA clones from a library derived from bovine heart and liver. Analvsis of these clones has demonstrated the presence ‘in t,his library of related but different sequences for t,he translocase, derived from two homologous expressed genes that have been named Tl and T2 (Walker et al., 1987; Powell et al.. 1989). Sy hybridization experiments it has been demonstrated that, one gene is expressed predominantly in heart tissue, whereas the expression of the second gene predominates in intestine (Powell et al.. 1989). Other examples have emerged of differences in expression of homologous genes for mitochondrial proteins in different tissues within the same species. For example, it has been found that another intrinsic mitochondrial membrane protein, the dicyclohexylcarbodiimide reactive proteolipid subunit of ATP synthase, is the product, of two genes in bot,h the bovine (Gay & Walker. 1985) and

The inner membranes of mitochondria contain a number of proteins that are responsible for the t,ransport of metabolites (LaNoue & Schoolwerth, 1979. 1984). The most abundant of these, indeed t’he most plentiful membrane protein in heart mitochondria (Schultheiss & Klingenberg, 1985), is the ADP/ATP translocase. It carries ATP from the mitochondrial matrix across the inner membrane and transports ADP back (Klingenberg, 1985a,b). The translocase isolated from bovine heart mitochondria is a protein of 297 amino acids and its primary structure has been determined by direct sequence analysis (Aquila et al., 1982). It is a nuclear gene product,, and a segment of the protein t Present address: Immunology. 1 Author

Department of Microbiology and Ir.C.S.F., San Francisco, CA 94143, U.S.A. to whom correspondence should be sent.

261

0 1989 Academic Press Limited

262

A. L. Cozens et al.

human genomes (Farrell & Nagley, 1987; M. R. Dyer & J. E. Walker, unpublished results), and that the bovine genes are expressed in different ratios in various tissues (Gay & Walker, 1985). Also, the a-subunit of the ATP synthase complex has different isoforms expressed in bovine heart and liver (Walker et al., 1989; Breen, 1988). Another example is provided by the electron transfer complex, cytochrome c oxidase. Immunological studies have shown the presence of tissue-specific isoforms of some of its subunits in rats (KuhnNentwig & Kadenbach, 1985). Interest in these differences of expression between members of these gene families in various tissues has been increased by biochemical studies of tissue-specific mitochondrial myopathies in humans (Morgan-Hughes, 1986; Wallace, 1986; Clark et al., 1987; Capaldi, 1988). They have shown that often the malady is caused by a defect in an electron transport complex, or rarely, in ATP synthase, and that the defect is confined to the mitochondria of the diseased tissue; mitochondria in other tissues of the same individual function normally. In order to be able to study the regulation of the expression of the Tl and T2 genes, we employed the two bovine cDNAs to isolate genomic clones of the human homologues, and the sequences of the genes and of their flanking sequences have been determined. These experiments show that the transcribed regions of the human Tl and T2 genes are distributed over about 4.2 and 5.9 kbt of DNA, respectively. They have a common structure; each is divided into four exons, the introns being located at precisely the same positions in the two genes. Both in humans and cows it is now apparent that the Tl and T2 genes belong to a larger gene family 1989) that probably includes (Powell et al., pseudogenes. In addition, a third human gene for ADP/ATP translocase, which we refer to as T3, has been shown to be expressed in HL60 cells (Battini et al., 1987). Expression of human Tl, T2 and T3 has been found in liver (Houldsworth & Attardi, 1988), and different levels of transcripts of human Tl have been demonstrated in skeletal muscle, heart, kidney and HeLa cells (Neckelmann et al., 1987). Expression of Tl has also been demonstrated in Daudi cells (A. L. Cozens, unpublished results).

2. Materials and Methods (a) Screening

genomic libraries

The human genomic library AT5 was investigated. It consists of fragments from a partial Sau3A I digest of DNA from a T-cell primary tumour cloned into the BamHI site of 12001 (Taylor et al., 1981). It was grown on Escheriehia coli Q358, and 7.5 x 10’ recombinants were screened by the plaque hybridization method (Benton & Davis, 1977). Duplicate filters were pre-hybridized for 2 h at 65°C in a solution containing 5 x Denhardt’s solution (1 mg/ml each of polyvinyl pyrrolidone, bovine serum albumin (fraction V) and Ficoll), 6 x SSC (SSC is 0.15 Mt Abbreviation

used: kb, lo3 bases or base-pairs.

NaCl, 0.015 M-sodium citrate). 0.50/ (v/v) Sarkosyl and yeast RNA (0.1 mg/ml). Hybridization was carried out, for 18 h at 65°C in the same solution containing 10% (w/v) dextran sulphate and “prime-cut” probe (Farrell et al., 1983). Filters were washed at 65°C with 6 x SSC, and at -70°C for 24 to then 2 x SSC, and autoradiographed 72 h with preflashed film and an intensifying screen. Positive plaques which were present on both duplicate filters were rescreened under the same conditions. In the initial screen a mixture of 2 probes was used; one was derived from the bovine Tl cDNA (Z’aqI site to the polylinker, bases 922 to 1222), and the other from the bovine T2 cDNA (Sac1 site to the polylinker, bases 945 to 1370; see Powell et al. (1989) for sequences of bovine TI and T2 cDNAs). Thirteen positively hybridizing recombinants were identified. They were plaque-purified and then rescreened separately with each of the same two probes. Four of them ITl/l, 1Tl/4, 1T1/8 and nTl/lZ rescreened with the Tl probe and four, AT2/2, 1T2/3. 1T2/11 and nT2/13, with the T2 probe. In a second experiment, a further 5 x 10’ recombinants were examined with a probe containing the coding region of the bovine Tl cDNA (bases 1 to 1042), which would not be expected to distinguish between clones derived from the Tl and T2 genes. An additional 5 positively hybridizing recombinants were obtained. One of them. 1T1/16, rescreened with the Tl-specific probe and 2 of them, 1T2/14 and 1T2/15, with a T2-specific probe (Sac1 site to polylinker, bases 945 to 1370). DNA was prepared from all 11 of the clones that hybridized to the Tl or T2 probes. Restriction analysis indicated that all 5 recombinants, which hybridized to the Tl-specific probe, were overlapping. They all contained BamHI fragments of 3.8 and 5.9 kb, respectively; the former hybridized to a probe derived from the 3’ non-coding region of Tl (bases 922 to 1222) and the latter to a probe from the 5’ end of the Tl cDNA (bases 1 to 238; see Fig. l(a) and (b)). In recombinant 1T2/14, a single XhoI fragment of 9 kb hybridized with the T2specific probe (see Fig. l(c)). Recombinants 1T2/2, 1T2/3 and 1T2/11 all hybridized to a probe derived from the 3 end of the bovine T2 cDNA (bases 945 to 1370), but not to probes derived from the coding regions of either Tl or T2. However, recombinants 1T2/13 and LT2/15 contained sequences which hybridized both to the probe derived from the 3’ end of the bovine T2 cDNA and to probes derived from the coding region of the Tl cDNA, and so it seemed likely that these clones contained the T2 gene or related pseudogenes. (b) Sub-cloning of DNA fragments The 2 BamHI fragments of 5.9 and 3.8 kb derived from 1Tl/l were cloned separately into the BumHI site of the plasmid vector pUC 8 (Messing, 1983). The amplified fragments were excised from the vector and were purified by electrophoresis on high gelling temperature agarose. Similarly, the XhoI fragment (approx. 9.6 kb), which was later found to contain the T2 gene, was excised from 1T2/14 and cloned into the SaEI site of pUC18. Then the 3 sub-cloned pieces of DNA were excised from plasmids with BumHI (Tl fragments), and BamHI and Hind111 (T2 fragment). This latter digestion generated a fragment of about 8.6 kb, smaller than expected. Later, after the sequence had been completed. it was evident that the human DNA in this 8.6 kb fragment was flanked by a RamHI site and by a sequence derived from the original Xh,oI site (see Fig. 3). Therefore, the original fragment of DNA present in recombinant LT2/14 contained an

Hummn Mitochondrial ATVI, TI 5’ probe (a)

ib)

Cc) (d)

(e)(f)

ADPIATP

Translomse Genes

263

ATV14, T2 3’ probe

XTI/I, TI 3’ probe (a)

(b)(c)

(d)(e)

(a)

(f)

Cb) (c)

Cd) (e)

23.1-9.6

9Q”51 4.4-

5.9

Figure 1. Hybridization to probes for the Tl and T2 genes of human genomic DNA in recombinant I phages. Digests with various restriction endonucleases were fractionated on 0.7% (w/v) high-gelling temperature agarose mini-gels (10 cm across x 6.5 cm long). .Left-hand panel, ITl/l hybridized with probe derived from 5’ end of bovine Tl cDNA (bases 1 to 238 polylinker, BaZI site); middle panel, ATI/ hybridized with probe from 3’ end of bovine Tl cDNA (bases 922 to 1222 Ta(11 site, polylinker; right-hand panel, 1T2/14 hybridized with probe from 3’ end of bovine T2 cDNA (bases 945 to 1370 SacI, polylinker). (For sequences of bovine Tl and T2 cDNAs see Powell et al. (1989).) In the left,-hand and middle panels the DNA was digested with: lane (a) BumHI; (b) EcoRI; (c) HindIII; (d) KpnI; (e) NcoI; (f) SacI. In the right-hand panel the digests are as in the other two panels except that lane (e) is XhoI. “Prime cut” probes were employed. For hybridization conditions see Materials and Methods.

internal BumHI site relatively close to its 5’ end and a small region present in XhoI fragment was lost during excision from pUC18. In preparation for sequence analysis the purified fragments were broken up by sonication, and each of the resulting mixtures was cloned separately into the SmaI site of M13mp8 or M13mp18 (Deininger, 1983). In the case of the T2 gene, end repair of fragments produced by sonication was carried out using mung-bean nuclease (12 units/pg of DNA; Stratagene, San Diego, CA, U.S.A.). This was found to be superior to the conventional method which employs a mixture of the Klenow fragment of E. coli DNA polymerase and bacteriophage T4 DPU’A polymerase. Usually, the Ml3 clones were grown on E. coli TGl, but it was observed that some repetitive DNA sequences had a pronounced tendency both to delete and to rearrange, and in these cases the recAstrain, E. coli TG2, was used instead. This was particularly necessary in Ml3 “clone turn around” experiments. In order to overlap the sequences of the 2 BarnHI fragments obtained from nTl/l a NcoI-Hind111 fragment of 788 bases was excised from the phage DNA, and after end-repair it was cloned into the SmaI site of M13mp8. (c) DNA sequence analysis DNA sequences were determined by the modified chain termination method (Sanger et al., 1977; Biggin et al., 1983) using a random strategy (Bankier & Barrell, 1983). Many problematic regions in sequences were resolved by substituting for dGTP in the sequencing reaction mixtures with either deoxyinosine triphosphate (Mills & Kramer, 1979) or deoxy-7-deazaguanosine triphosphate (Mizusawa et al., 1986). Frequently this was accompanied by the use of synthetic oligonucleotide primers in order that the problematical sequence could be brought close to the priming site. Other synthetic primers were used to extend existing sequences. In toto, 21 different synthetic oligonucleotides, each 17 bases long, were employed as

primers in the sequencing of the Tl gene; none was required in the analysis of T2. The methods used for the compilation of DNA sequences into data bases and for analysis of the final sequences will be described in a forthcoming paper (M. R. Dyer & tJ. E. Walker).

3. Results and Discussion (a) Cloning

the human

Tl and T2 genes

The cloning of expressed genes for the dicyclohexylcarbodiimide reactive subunit, of ATP synthase was hampered considerably by the presence in the human and bovine genomes of numerous pseudogenes related to the expressed genes (M. R. Dyer & J. E. Walker, unpublished results; Dyer et al, 1989). It was apparent that human DNA also contains many sequences related to the coding sequences of the bovine Tl and T2 cDNAs for ADP/ATP translocase (Powell et al., 1989), and at the outset it seemed likely that some of these sequences could be pseudogenes. Southern hybridization experiments on digests of human DNA (data not shown) indicated the presence of only one Tl-related sequence, and so there appear not to be any Tl-related pseudogenes. However, other experiments showed the presence of several human sequences related to T2. As described in Materials and Methods, section (a), five related recombinants, nTl/l, IT1/4, iT1/8, 1T1/12 and iT1/16 each contained two RamHI fragments, one of which hybridized to a probe derived from the 5’ end of the bovine cDNA and the other to a probe from the 3’ end, and fragments of the same sizes were detected by Southern hybridization to digests of human DNA. These

264

A. L. Cozens et al.

turned out to contain the entire Tl gene (see section (b), below). Tn the course of the same library screening experiments six independent recombinant d phages were isolated that hybridized with the 3’ end of the bovine T2 cDNA. Restriction digest’s of three of them, JT2/2, 1T2/3 and ilT2/11, although containing sequences that hybridize to the 3’ end of the bovine T2 cDNA, did not hybridize to probes derived from the coding regions of either the bovine Tl or T2 cDNAs, and so none of them could contain the entire T2 gene. Recombinants LT2/2 and ;1T2/3 appear to be related, but are not identical in their restriction patterns (M. R. Dyer, unpublished results). In the case of AT2/11 preliminary DNA sequencing experiments have detected the presence in this recombinant of sequences closely related to the 3’ non-coding region of the bovine T2 cDNA (unpublished results). It is possible that they represent 3’ exons of T2 that could be involved in alternative splicing pathways to generate different forms of T2, but other explanations are also possible and further experimentation is required to clarify this matter. Two further recombinants, were shown by DNA 1T2/13 and LT2/15, sequencing to contain spliced pseudogenes related to human T2 (unpublished work). The sixth recombinant with sequences related to the 3’ end of the bovine T2 cDNA, iT2/14, contained the expressed gene. This was first indicated by restriction analysis experiments; four fragments in a Sac1 digest of its DNA hybridized with a probe covering bases 1 to 1042 of the Tl cDNA, and the restriction patterns were different, from those of iTl/l. Moreover, hybridization with a probe derived from the 3’ end of the T2 cDNA did not hybridize with the smallest of the Sac1 fragments (0.9 kb) detected in the former experiment. So the 0.9 kb Sac1 fragment was cloned into M13mp8, and DNA sequence analysis revealed interrupted coding sequences for the translocase (corresponding to the exon III : intron C and intron C : exon IV boundaries), and ultimately the 0.9 kb fragment turned out to be bases 6140 to 7040 (Fig. 3). In order to have the entire gene present in a single piece of DNA, the 9 kb fragment present in the XhoI digest of ;1T2/14 was selected for sequence analysis (see Fig. 1).

(b) I>XA sequencing of the human Tl and T2 genes The sequences of the Tl and T2 genes and flanking regions (see Figs 2 and 3) were determined by sequencing the two BamHI fragments of 5.9 and 3.8 kb from LTl/l, and the 8.6 kb fragment excised from JT2/14. The overlap between the 5.8 and 3.9 kb BumHI fragments was established by sequencing an overlapping NcoI-Hind111 fragment (bases 5739 to 6526 in Fig. 2). Both DNA sequences were established fully in both senses of the DNA. On average each base in the sequences presented in Figures 2 and 3 was determined 7.3 and 7.6 times. respectively. In order that the sequences could be established fully in both senses of the DNA a number of problems of some difficulty had to be overcome. In the region containing the Tl gene these were mostly concentrated in the G +C-rich region extending from about nucleotide 3575 to 4409 (see Fig. 2). “Pileups” and “compressions” were commonplace in sequences covering this segment, but these difficulties were resolved unambiguously by the use in sequencing reactions of the triphosphates of deoxyinosine or deoxy-7. deazaguanosine as described in Materials and Methods. No such problems were met in the analysis of the G+C-rich region of the T2 gene, but another problem presented itself elsewhere. It was observed that clones derived from nucleotides 4483 to 5774 (see Fig. 3), which contains six tandem repeated sequences each being about 236 nucleotides in length, had a pronounced tendency to rearrange and to delete. This was particularly evident when Ml3 clones in the positive sense of the DNA were turned around in order that a double-stranded sequence could be generated. Indeed, no particular difficulty was met in the positive strand analysis and an unambiguous sequence could be deduced. This difficulty with the negative sense clones was reduced, but not entirely avoided, by the use of a E. coli host, and the residual difficulties recAprobably reflect the genetic instability of the particular strain employed which reverts rapidly to Nonetheless, a number of clones recA + characterized in the negative sense gave sequences in exact agreement with the positive sense sequence. These difficulties serve to emphasize the

Figure 2. DNA sequence of a segment of human DPU’Acontaining the Tl gene for mitochondrial ADPjaTP translocase. The nucleotide sequence is numbered, and the locations of exons I to IV and the protein sequences they encode are shown. Exon-intron boundaries are denoted by small arrows. Large boxes contain Alu repeats. The tra,nscriptional start site at base 3900 has been determined experimentally (A.L. Cozens, unpublished results) and associated TATA and CCAAT boxes are indicated. The under- and overlined sequence is a potent,ial signal for is indicated (Necklemann et polyadenylation and the actual site in the transcript of addition of poly(A) in the transcript al., 1987). Restriction sites that were important in cloning and sequencing experiments are shown, The sequences of exons differ at a number of positions from published cDNA sequences of human Tl. Houldsworth & Attardi (1988) in their partial sequence of pHAT14, which covers bases 3929 to 4118. report A at nucleotide 3944 rather than C, and T instead of G at base 3954. Eight residues are not present in the sequence of another clone of the T1 cDPu’A (Kecklemann et al., 1987). This is an almost complete clone and lacks only the first 2 bases after the transcriptional start site. These delet’ions in their sequence are at positions 3937 (G), 3979 (C), 4054 (G), 7627 (G), 7816 (A), 5719 (G), 5723 (C) and 5727 (C). At position 6463 they find C rather than the G in the sequence shown in this Figure. They also have one extra G residue inserted between nucleotides 4056 and 4057. Some of these differences are in coding sequences and lead to changes in the amino acid sequence of the Tl ADP/ATP translocase (see the text and the legend to Fig. 7).

Human

Mitochondrial

ADPIATP

Translocase

265

Genes

niu repeat 1 + ~ATCCCTTGTGCCTAGGAGTGCATGCTTTCAGTGATCTGTGATTGTGCCACCGT~TCCAGCCAGG~GAC~~~GAT~TGTCTCT~T~T~T~T~T~GCACA 00 90 100 110 50 60 70 20 30 40 -1 RamHI

120

‘TATAAATATCTTGATCCTCTTCCCTTAATGCTGAAACTTACCATGTG~C~CACTAGTAGTGTGTTCT~TTCCT~TAC~rT~T~rACACTTGCA~ACAGGCATTATGTTATCAC

210

220

230

240

CACCATTAGCACATCCTTTATTGTTTTTC.rTTATTCCTTGTCACTCTTACTCTTTAGC~CTTCCATTCTCTA~CCCAGTTTGGT~GAGACT~GTCTAT~GA~TTCCACT~T 320 290 300 310 250 260 210 280

330

340

350

360

130

140

150

170

160

180

190

200

TCCAAAAGGCTAACTTTCCACCCTGCTCTCATTTCTACATGGCCCTGTTCCCACCA~ATCT~GATCCCATCCACGATTG~GTAGAGTTTCTTCTCATCTCCTAGAT~CTTGTCCA 480 440 450 460 410 410 420 430 310 380 390 400 TTCTCCTGCTGGAAGAAAGCCACTAAGTACTW;GTTGG~GGGTGGACACATGTTCT~CGAGCGCCATTCTCTCCCCTGCCTGTGT~GTGATACCA~ATCTTGTGACCACAGGCAGC

490

500

510

530

520

540

550

560

570

560

600

590

rrATGCTCCCGACU;CATGTARGGCCTCTCmTCTGA~GTCCAT~~AGAGGACAGT~ATGAGTTTCCAT~-CCTC~rGTC~GGATTCCAGCTATGACTCCTTCTAT 680 690 700 710 650 660 670 610 620 630 640

120

AGAACAGGATAA~GGAGAGTAGCCAAGGAGAAATATATCTCCCATTT~~TATCC~CTCTGATTGGT~~~GTGT~C~GATGATACTAC?TCCTG~TTCATATAT 800 810 820 830 770 700 790 730 740 750 760

840

TAATTIUTA~TAATTTAGTTTAAATGTCTATTTAAAATCTTTATTTTTT 890 850 860 870 880 AAGTGATTGTAGAATCATATATCAGCCTATTTTT

970

980

900

920

910

1100

1010

1000

1110

AACGTAGTGTTAGACATTTTAAAAATAGCTACTCAAAG

1210

1220

950

960

1050

1060

1070

1080

1170

1180

1190

1200

1300

1310

1320

AAAAAATGAAGATCAAATAATGTCATAGTGTGGATATTTCAT~TT~GCTATTATGAG*CTT~TATATGAC~CTAG

990

1020

1030

1040

RCTTAGTAATATAAGACA~T~A~~G~TATCTACTT~G~TTAGTATTGGGATACCC~GCATGATATGGAG~GTG~TTAGACCCACACCTCCATAC~TAT~

1090

940

930

1230

1120

1130

1240

1250

1140

1150

1160

CTGATATATGTACTAGAGTATAGACTGCTATATCTATR 1280 1290 1260 1270

ATAGATCATCATCATACCATAGATCTGTTCATAACACTAA 1330 1340 1350 1360

1370

1380

1390

1400

1410

1420

1430

1440

AGAATTGTTGTTARTATAACTCTGCTTTCACTCCAGC 1450 1460 1470 1480

1490

1500

1510

1520

1530

1540

1550

1560

ATACAGCTTCATAGTAATTAAGTAAATACAAACTAAAACTCAT~T~CTCAC~T~CC~CT~CTATTTTTTT~GCCAGTGTTGGAG~GAGTGTGAGAG~GCCCATCTAl 1680 1640 1650 1660 1670 1610 1620 1630 1570 1580 1590 1600 ATAACTGCTGGGTTGGTATT~G~~~T~C~TA?GT~~~TAC~CTTT~TCTT~TCTATCCATTCCATTTGGG~TATATGAG~G~CTTA 1760 1770 1730 1140 1750 1690 1700 1710 1720

1780

1790 ALU repeat 2

1800 c

W\GGAGARAATGAAGTTTGTGTARGmCAATGTAGCAGTATCTTCCTTCCTTCCTTCCTTTCTTTCTTTCTTTCTTTCT 1880 1890 1850 1860 1870 la10 1820 1830 1840 ~GGAGTGATCTCAGATCACA~ACCTCTGCCTCTCGG

GTCTTTCTCTGTTGCCCA

1930

1940

1950

1960

CACGCCAGGCTAA~GTATTTTl'AGTAGAGA2050 2060 2070

2080

1970

1980

1990

2000

ARTTARAARCGTTT~GTGTGGAT~ATCCATGGGT 2290 2300 2310 2320

2020

2030

2040

TGCCATGTTCGCCAGGCTCGTCTCG~CTCCTGACCTAGGT~TCT~CAGCCTCAGCCTCCC~~TATGAGCCAC 2160 2120 2130 2140 2150 2090 2100 2110

AAGACACAA~CTAATGCACAATTAAAAGGP999iTGGAGTAT 2240 2210 2220 2230 2200

CACACCCGGCCTAmTTCATCAT

2010

2330

2340

2350

2360

2250

2260

2270

2280

2370

2380

2390

2400

CIACTTATATTCAATCCAC~~ACCTAG~CTTGGTACACAGTACATGCTC~~GA~CTGTTG~TG~CACATACATGGT~ATCTGTTTGTCTCTTCCGAGTTCTTGACT 2520 2480 2490 2500 2510 2450 2460 2470 2410 2420 2430 2440 rrTGTCTGCTCTGACCTCTGA~TTCCACT~TT~T~TTTCATT~GC~~CTGGATTTC~~CTCTAGCCTGCCCCACTCTTAGAT~C~ATGCCCTCTGTGGCCCTGG 2600 2610 2620 2630 2570 2580 2590 2530 2540 2550 2560

2640

AACCTTAGn;A~CTGCTATA~~GTCTCCACGCCC~~TGACACGCAGCT~A~CCGT~CCTCT~CATGATGTCAGC~TATT 2720 2730 2690 2700 2710 2650 2660 2670 2680

2760

-GTTTATAAAT 2740 2750

W\ATAAACTTT~TAAACAC~TG-~~~CAT~~GAT~TTGAGT~GAGTTT~GTT~CG~TT~AGTCATTCTAG~G~GG~CAGTTGTATTTG~ 2840 2850 2860 2810 2820 2830 2770 2780 2790 2800 AACCTGTATGGTTACATGAACTGCCT~ 2890 2900 2910

2870

2880

CAAGCTAAGGAAARTTAAAGCTCAGATTTATTTATATATTTT~G~TT~TTGC~TT~TTTCCTG~ATT~TAGCATTTCCTC~C 2960 2970 2930 2940 2950 3000 2920 2980 2990

M;CC~TACAGCCAAGGACTGGATCTTCTTCTCC~G~TGACffiCA~rGACCCTC~G~GGCACCG~TGACAGACAG~CATTCTGCCCT~TATGTG CCCAGCTGTCATTAAAAAG 3080 3090 3050 3060 3070 3120 3010 3020 3030 3040 3100 3110 ~GAAATTCC~GAGAffiffiT~TACATTGAACCCCT~~TT~-G~GTGTCCTGTGT~TAGAGTCACAGAGTTTT~AG~C~GTATG~TTCACCTAGTGGCCC 3200 3210 3220 3170 3180 3190 3230 3130 3140 3150 3160

3240

CCTGCACCAGGTCTTTCCTGTGGGCACTCAGTGCCAGAG~CAGACACATC~TATGT~TA~~~TG~TGACTG~CG~CGATTG~TG~G~TGAGAG~AGCA~TTGTCAGATTC

3250

3260

3270

3280

3290

3300

Fig. 2.

3310

3320

3330

3340

3350

3360

266

A. L. Cozens et al ‘“‘GAG~~~ACA”~~~~~~~~“~~~P”““““‘””””’”””’””””””““““““‘GAGGCCCATATCCAGGCAGTGAGCCCTGGT~GGGGCG 3410 3470 3430

3440

3450

3460

CCTTTAGATGCAAGAAGG~~CA~TCG~TCCCTGGGCCTGA~GCGGCCCGTGCAGGCCGG~GGTC~G~CTCTCCACCGGCGGCAGC~CCCGGTGTCT~CCCGGCTTCG 3490 3500 3510 3520 3530 3540 3550 3560 3570 3580

3470

3480

3590

3600

3110

3120

3830

3840

CCCCGGCCTAACGCTCCCTGn;CTATAT~TAC~G~CCACATGCC~~TGACAC~TGTTCCCT~GCTCGGCG~ACAGAT~CATG~TGTGCCCTTT~CGTCCC~GTTGCAG

3610

3620

3630

3640

3650

3660

3670

GGACAGCCCCCGGCCCACCCC~TC~~~~CCCC~T~CCTCTGCA~TGGGAGGA~G~~CCCGCACCTGCCCA~~~GCG~~GA~GCC~C~G 3760 3710 3790 3790 3740 3730

AGCTCCGGGCCAGGC

ACCCGCCTCCTCTCGCCC~~

3950

3960

3890

GGt

3680

3690

3900

3810

3700

3820

Transcript start GGCCCCCTAGCGTCGCGCAGGGTCGGGGACTGCGCGGCGGTGCC~CGG~GTG~G

3900

3910

3920

3930

MGDHAWSFLKDFLAGGVAAAVSKTA AGACCACCAACGU;C~;CCT~~T~G~~~~TCACCAT~TGATCAC~TT~G~TCCT~ACTTCCT~~~~~TGCCGTCTCC~CCG 4010 4030 4040 3990 4000 4020 3980 3970

3940

3950

3960

4060

4010

4080

CGGTCGKCCCAXGAGAGGCTCAMCTGCTGCT

GGGGGAAGAAGGTGCCCTCTGCGTAGAGACAGGTCCAGCGTCAGTCGCAGATl'CCTGGTGTCGGG 4260 4270 4280 4290 4300 4310 4320

M;U;CGCCGCGGMAATCGCC~ACA~CCCCU 4240 4230 4220 4210

~~~~GGccficcGGTGTCTATATA~~CCACCC~~C~TTT~GTGT~CAGATCCT~~CCGTG~~~~~GTGCACTCAG~CC~~~~TGATTG 4330 4340 4350 4360 4370 4380 4390 4400 4410

4420

4430

4440

rrAGTATmTGGW\CC~TTI\n;CGCACGC~CT~~~T~A~~T~TCATCACCC~~TTCCCTTATCGTATCTCAT~A~~T~TGTATGT~T~C~CT 4150 4460 4410 4490 4490 4500 4510 4520 4530 4540

4550

4560

'ICATCTTl'ATGTIACCTCTGT

4570

4590

~A~TCTCCA~T~~TGAGGTT 4590 4600

4610

4620

4630

4640

4650

4660

4670

4680

TTTCCTCCCCTACCTGGXC~~T~A~ATCCTC~GTG~TCC~CAC~~CC~TC~CCCCTC~TGAffi~G~CCMTTT~~TC~M~ACTMCA 4690 4700 4710 4720 4730 4740 4750 4760

4110

4780

4190

4800

4890

4900

4910

4920

TTGTCATTTTTTCGCCATCATGTCTATT~TCCAAAGCTT 4980 4990 5000 5010 5020

5030

5040

5150

5160

ACAAlUCCCCCACAMTTG~~~~~CCCTTTA

4910

4920

4930

4040

4850

W\GCATCTWT~TCC~~~~~T~~T~ 4930 4940 4950

4960

4970

CTGI’CCCCCAGTMGCCCCTCATACAGTTCTCAAACCT-

5050

5060

TCMATACCCTMTMTl’GMGCMC

5170

5180

5010

4960

4970

4880

TGAMTAAATAAATGGCTATAGCTTTATAT~~C~TCACC?TTTCAGTTTATTT~CMTACCTTTTCCC 5080

5090

5100

5110

5120

5130

ATTCGATTATTTTU;CTTGTTATCCAGTAACTMCAT~AT~CAGTATCCATTTACACGTCCTCAGTATCCATTTGA~TCCTCATCCTTTTT

5190

5200

5210

5220

Exaln KQISAEKQYKGIIDCVVRIPKEQGFLSFWRGNLANVIRYF a~~~A~A~ATT~TT~GT~~M~C~M~A~T~CTCTCCTTCTffi~T~~r~CMC~GATCCG~ACT 5410 3420 5430 5440 5450 5460 PTQALNFAFKDKYKQLFLGGVDRHKQFWRYFAGNLASGGA TCCCC-~GC~~ACMGCAGCTCTTCTT 5530 5540 5550 5560 5570

5230

5240

5250

5260

5270

5290

5410

5480

5490

5500

5510

5520

AGGGGGTGTGGATCGGCATAAGCAGl"l.CTGGCGCTACTTTGCTGGTMCCTGGCGTCCGGTGGGG 5590 5590 5600 5610 5620 5t.30 5640

AGATSLCFVYPLDFARTRLAADVGKGAAQREFHGLGDCII ax3cTGGGGCCAccTCC ~ACCCGCn;GACTTTGCTAGGACCAGGTTGG 5650 5660 5670 5680 5690 5700

5710

5120

GCCGCCCAGCGTGAGTTCCATGGTCTCTGGGCGACTGTATCA 5130 NcaI 5750 5160

KIFKSDGLRGLYQGFNVSVQGIIIYRAAYFGVYDTAK TCMGATCTTCMGTCTTGAGOXGC TCTACCKiCGTTTCMCGTCTCn;TCCMffiCATCATTATCTAT~~TGC~~T~~AG~TA~ATACT~C 5170 51110 5190 5800 5810 5920 5930 5840 5850 GAGGGCCATCCGGW\CMCGAU;CTGOK~TGGIUAGAGGAT~AT~AT~CTCAC~~~CT~TATATATTGATCTT~TTTTTCT~CTCT~GATM~GA~TTC

5990

5900

5910

BMlHI

5930

5940

5950

~MT~TG~AT~~MT~TG~G~CCTTGTGTCCTCT~TG~T~CTCT~CTTT~TTATTCA~~A~~~A~G~CTGTCTCCCTCTAGA

6010

6020

6030

6040

6050

6060

6070

~~CATAGC~ACT~~~G~C~~CCMT~CCTGTATAC~TGAGCACT~CCCTCC~TCC~A~GCA~A~CACC~T~TGTCT~CTffiTC

6130

6140

5140

6150

5960

5970AAG%zAG

5960

5970

5980

5990

6000

6090

6090

6100

6110

6120

6160

6170

6190

6190

6200

6210

6220

6230

6240

6280

6290

6300

6310

6320

6330

6340

6350

6360

Esm III GblLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFD GGCTTCTGGGCTCTGTCCAC TCCTGCCTGI\CCCCMCMCG~ACA~~~TGAGCTG 6370 * 6390 5400 6410 6420

6430

6440

5450

6460

6410

6480

ATATGTGAAGCAC~CCI~~CCCCCCMGTC

6250

6260

6210

Fig. 2.

Human Mitochondrial TVRRRblI4MOSGRKG

UI3GllTGTCGTAGAATGATGAT~~CGGCC 6490

6500

6510

%+i&,

CCTTACTGGAAATTAATTTTCAATTTGATAICCACTTAGGW\ 6610 6620 6630 6640

ADPIATP

267

Translocase Genes

A&C,‘TGTGCTCT~AT~AAACTTGTl.TGGTTTTGCCCGAGGAGAACATl-TTACAGGGCTCC~~AGTCTT 6590 6560 6510 6580 6540 6550

6650

TC~ATTMTTCCCCCTMCG~CTCAACTATCCTA~A~~TA~TTCCAT~ATT~A 6660

6600

6670

6680

6690

6100

6710

6720

~TGATIUWUCA~TK;TAAGACAml\GATCTWIATCCffiCA~AT~ffi~CCTA~CCTC~CCC~AGA~T~T~~T~~~GTA 6730 6740 6750 6760 6770 6780 6190

6800

6810

6820

6830

6840

6950

6960

7070

7080

GrTAGcTAcTTcTT~ 6850

6860

TTCCCTCCTAGTTACAGGTGTTAGTGGGATG=GGTGTTTAGCTGGGTAGAGATGGCCTG~AATCTGTTGTGCC~~~TG~ 6870 6880 6890 6900 6910 6920 6930 6940

iTCTATAC~6970

6980

6990

1000

AGGATCAT-CAGC-GGTCCTCCAT7090 7100

7110

7120

GGTmATTGccC~C~ 7210

7230

7220

AAAAACffiTCCMTGTT~A~TCGTATGTTTCAAC 7010 7020 7030

7040

7050

7060

AACCGCGTAGCATAATACTCCTGCTCCACTGCGCCCTTCTTGTTTCGCAGTTGGGCAGTCCA~~ACTT 7130 7140 7150 7160 7170 7180 7190

MTU;M;TAGGMTG~~~ACCC~~~CA~G~~~A~TCC~T~ATG~GCAl~GGW\CT 7240 7250 7260 7270 7280

7290

7300

7310

7200

7320

D CCATGCCCAG.ATGACCCMAAClKX 7330 7340

7350

TAACAGTGTGTACAGATATGTTTCAGGGGMAAGTC 1360 7370 7380 7390

l3OtlN IPlYTGTVDCWRKIAKDEGAKAFFKGAWSNVLRGMGGAFVL ATATTATGTNXCG7450

7460

7470

TTGCAAAAGACGAAGGAGCCAAGGCCTTCTTCM,GGTGCCTGGTCC~TGTG~GAGAGGCATffiGCffiT~TGTAT 1480 7490 7500 1510 7520 7530 7540 7550

7560

VLYDEIKKYV' XZTGTTGTATGATGAGATC~ 7570 7580

TAGGXMGT

7690

TA~T~AAT4AC~~AC~A~~~T~~GATCTAC~~CAC~ATCCATTGTGT~TTT~TA~CTATTCC 7590 7610 7620 7630 7640 7650

MAAAGAT’ZEGGATMCAGACTGAAAGGM 7700 7710 7720

7660

7670

TACCXAGAAGAGATGCTTCA~~TGTTCATTAAACCAC~ATGTATTTTGTA~TATTTT~ATTT~TTCC 7730

1140

7750

7760

7770

7780

7790

7680

7800

Poly A site ~~~T~TM~A~AT~~~M~~G~M~GATMTMCTCA 1810 1820 7830

8060

~CTTTAm 8170

8180

KAATllTATGTTM 8290

8300

7850

7860

7870

‘IBBb

t CTTTCTATTTTATTGMCTCTTA 7900 7910 7920

7890

~~TGCATATmCT~ATGA~CA~TATCAGTC~A~C~TTCT~T~C~ATATTATATTG~~TGTATTATATGAGA 8010 7950 7960 7970 7980 7990 8000

TTAACTGTAAAATGCATTTlTAMAGA 7930 7940 uxTAcAATGcITI 8050

7840

lTrTcMAcTT 8070

8190

-TATTCTATCTATCTTATCCAGCGTTACTGT~~GTGATAATGG~TCAT~TCCTGCCTTGTCTTAGG 8080 8090 8100 El10 8120 8130 MAAGTTTAAATCACAAT~ 8200 8210

8020

8030

8040

8140

BlSO

8160

lTTTCTATGATTAGGAAGTGCTCTGTTTTCATCCCTTTAGATAACTGTGACACCT 8220

8230

8240

8250

8260

8270

8280

.AAAGAAAATCAGTTGTCTTAG

lTTAAGCCWTAT-ACTGTAGUZTGA-AT= 8650 8660 8670 -ATAMCT 8770

8780

-ATC 8790

CATGKCCAATTACCTCCCMA-CAAACW 8890 8900

8680

8800

-ATTTCTCAGTTCTGTAGTCKrTAGTCK;GACGTCCAAG 8690 8700 8710 8720

en10

-AU3ATfTCPACAGATGAATTTT

8750

8760

-ACAAACATTCAGTCCATAGCATCACTTATCAAGC

8920

8930

8940

8950

9020

9030

9040

9050

9060

9070

9130

9140

9150

9160

9170

9180

9250

9260

9270

9280

9290

KTTGCTATAGT~C 9380

9390

9400

9410

9500

9510

9520

9530

T~A~MT)~T~T~~C~T~C~T

8960

8970

8980

8990

9000

TGGGGAGA.AAAGGTCATTGTTAGACAATTTGTTTAAGACSAT

9090

9100

9200

9210

9220

C~~~AGACG~T~~TA~~~~TTC~~~~C~G 9300 9310 9320

9330

9340

KA-T-~~GGT~TGG

9080

-T~C~~TAGAAACATGGGTAT

IGGA~

hwM3Acc~ 9370

8740

GACAGAGCTCTCTGTAGTCCCTTTTATAAGCTCACTAATCCCATTCATGAGGGCCCTACCCT 8820 8830 8840 8850 8860 8870 8880

8910

9010

8730

9190

AAGTCCCTGTATCCTTGGCAMGCTTGAMGCCACCC

-mm-TTT-

9420

9430

9440

9450

9110

9120

AGAAAGCATCCTCCAGT 9230 9240

9350

9360

AAATGTCTTCTCTGTGTTTTCTGACAGGT

9460

9470

9480

9580

9590

9600

ACCARGCC~CACCGLICTCACTCCPGCCTCCCCAGCT 9490

LTCAKGT-ATAMATCCAGGA’KXCATCTGACC

9610

9620

9630

.4X.CAGCACATGTGCAGTTAAGAGGCCT~~ 9730 9140 9750

9540

9550

9560

lCACCR%TAGATATCT~CMTAAAGTCT~

9640

BLmw I

9650

9660

Fig. 2.

9670

9680

9570

AAAAAGAAGGTACCTCTCCATCCTGTCTCCTCGAGGCC 9690 9700 9710 9720

268

A. L. Cozens et al. Alu repeat 1 + ~~CC~~CC~~G~~ATGGGA BsmHI lo 20 GCCTCAGCCAC~M;AATCGC~~~~~CCIG

130

140

30

40

50

60

70

80

90

100

110

rm

150

160

170

180

190

200

210

220

230

240

270

280

290

300

310

320

330

340

350

360

520

530

540

550

560

570

ACGAGAGGAAGGGEKATAAAGAGG~~GAG

250

260

AGGGAAGGAGGGA,,AGGCAGAAAAGGGAGGAAGGCAAGG

~AGAGARATGAAAGAAARAGGCACGGAGCGAAACGAAG 500 510 490

AAAGGCAAGCAGGGAAGGAAGACGGG

500

590

600

AAGGGAGAGGAGGCGGGAGGAGGGAGGAGGGAGGCGGGAGGGAGGGAGGAGGGAGGCGGGAGGGAGGGAGGGAGGGAGGGAGG

-?%==%-,O

AGTGGGAGGTCGCGAAffiACCCCW;CCGCAGCCCCCCCC 730 740 150

640

650

660

670

680

690

700

710

760

770

780

790

coo ~GGGCGCGGCp,GG,;CCAGCCC~~~

q

a;CGGXSXACCGCGGGGTCAAAGAACGGAGGCCCAGAGAGATCCCCCCCCACCAACCTAGGCCGGGCGCGG GCGGAGTCAC~~~ATCU;CA~GCGGGACTGAGCCGCGCCCA 850 860 870 880 890 900 910 930 940 950 960 CGTGGAAGCCCCCGCACCTTCGCGCGGGGCCCCCCGGGT(G 970 980 990 1000 1010

1020

1030

1040

1050

1060

1070

1000

XAAGffiTCU;GGCTCGGGGCACCACATCTCGCACATCCCGTCC 1110 1130 1090 1100 1120

1140

1150

1160

1170

1180

1190

1200

-T,,+&

TCGGGTACTGGCAGCGGCCTGACCTTCACCl’TCACAAGGTCAAGGCTGCCGCGGAACCCCCAGC~CCCTACGCCKA 1230 1240 1250 1260 1270 1280 1290 %@GcCTG%C-A~~~

TAGGGGTGC~ACGTCTCCGGGGAGCGCAGCCAATGGGCGCGGTTCGCTGGAGTGC 1330 1340 1350 1360 1370 1380 =!%i

ACGCAGCGGGATTCCCGGCAGCCCTCAGAGGCGAGGCACGCAGGCTGGTG 1400 1410 1420 1430 1440

AGTGGGCGTGGCGTCGGCGTCTTAGCGGCTGCT~GC~TffiCTGCTCCGTCCTTTCGGTCC~ 1510 1520 1530 1540 1550 1560

~""~~"c~~~~c~G~~~~A~~~~O

GCGGCGGCAGGGCTWU;CCAGCW\CGCCCTCCATTCACTCTGTCCTCCCGTTCCGCTG 1570 1580 1590 1600 1610 1620 1630

16

MTEQAISFAK CGCC GCCACCATGACGGAACAGGCCATCTCCTTCGCCAA 1650 1660 1670 1680

EXC4lI

axGAAccGAGTGGccGGGT 1810 lS2Ow

GTGGGCM;CAGAGCCTTGCA~G~~C~~T~~CGCGATC~TTTG~CAC~GCC~C~TTCC~C~GTCA 1640 1850 1860 1870 1880 1890 1900

CGTGACCGCTGCTGCAGGGCGTGGCGACGTCCACGCGTGCGCACTGGGCCC 1930 1940 1950 1960 1970 CI~GGCTGGCGGCCCTn;ACCTTW\GCTCAATCCTGCCTC 2050 2060 2070 2080 2090

2100

1910

1920

AAATGCGGCACGGATTGGGCATGCGCGCGCTGAGC~GCCCATGGACGGACC 1990 2000 2010 2020 2030 2040

2110

2120

2130

2140

2150

2160

TAAGGCCAATAGGGC MCGAGCCTGCCCGCAGGTGTCCTGW\ATAACCAGGATATGT 2170 2180 2190 2200 2210 2220

2230

2240

2250

2260

2270

2280

lTTTGTCTTTTGGGAGAGGCT TATAW\RACTAGCATGAn;CCTACGAACATTGCCAW;A 2290 2300 2310 2320 2330 2340

2350

2360

2370

2380

2390

Fig. 3.

YR

Human

Mitochondrial

ADPIATP

Tranalocaae

T~TMI;CAIUAAM~~AOCC~~~~~ATAGTCCC~ACTC~~CT~G~~~~G~~~MTCA~TGATC~CCCA~~A~C~TT~~C~G 3730 3770 3740 3750 3760 3780 3790

269

Genes

3800

3810

3820

3830

3.946

VQHASKQIA TCCAGCACGCCAGCAAGCAGATCGCC 3940 3950 3960

TGACTTGTGTC Excm II ADKQYKGIVDCIVRIPKEQGVLSFttRGNLANVIRYFPTQA GCCGI\CMGCAGTACMATCGTffi~~ATTGTCCGCATCCCC~A~A~GT~TG~~~~~GGAGG~~CCTT~C~CGTCATTCGCTACTTCCCCACTC~~C 4040 4050 4010 4020 3980 3990 4000 3970

4060

4070

4080

LNFAFKDKYKQIFLGGVDKHTQFWRYFAGNLASGGAAGAT CrCAACT7CGCCT7CARGGATAAGT~~A~TCTTCCT~~GCGTGGAC~CACAC~AGTTCT~AG~ACTTTGCG~~CT~CCTCCG~~T~GGCC~GCGACC 4160 4170 4130 4140 4150 4090 4100 4110 4120

4190

4190

4200

SLCFVYPLDFARTRLAADVGKSGTEREFRGLGDCLVKITK rrCCTCT~CGTGTACCCCCTGGAPTTCCCCAGARCCC~~~~ffiACGTGG~GTCA~ACAGA~GCG~TTCCGAG~CTGGGAGACT~CT~TG~GATCACC~G 4280 4290 4250 4260 4270 4210 4?20 4230 4240

4300

4310

4320

SDGIRGLYQGFSVSVQGIIIYRAAYFGVYDTRK rrCC~GCCATCCGtGGCCTGTACCAGGGCT7CAGTGT~CCGTGC~GCATCATCATCTACCGG~GGCCTACTTCffiCGTGTACGATACG~C~G 4400 4370 4380 4390 4330 4340 4350 4360 T2 repeat GAAGTCCCAW\CACGGCCTCIUCACffi~GTTCCCCCA 4450 4460 4470

4490

4410

1 4520

4530

4540

4550

4568

4640

4650

4660

4670

46a-6

4760

4170

4780

4790

4m

5120

5130

5140

5150

516U

5 T2 repeat TMTCCICCACTCK;tMC7CT~~~T~r~Ar~~TC~C~ffi~~CTCTC~~T~TAG~~CGT~CTCCCT~GTC~C~~TCATCCCTCT 5410 5450 5420 5430 5440 5460 5470

5480

5490

5500

5510

5526

GTG7CK;TCrtK;IC~AT~r~~A~~C~AG~CAT~A~TC~~~TACTATAGACTGffiTG~TTAT~~ffiACA~GATTCTCCC~~TCC 5590 5530 5540 5570 5580 5550 5560

5600

5610

5620

5630

564-u

5730

5740

5750

5-m

rTmffiAACCAC~GCCGTCGTAU;ATCAGCGCCCACCCTACT~CTCC~TAGTCTTACCTC~TC~G~TTATATCC~TACAG~~ATCCTGA~T 5180 5790 5800 5810 5820 5830 5840 5850 5860

5870

5880

4480

4500

4510

C~CGTc7~~GcC~ACAccATcATCC~CTGTGTGTGT~GT~CTCAT~CT~T~AT~ATGT~~TCCA~C~~TATC~~T~CAT~ACTG 4570 4610 4580 4590 4600 4620 4630 T2 repeat

2

CX;TGACIAThlUICA~A~~~AC~TC~~Ar~C~~AGGTC~ATCCAGGTG~~~T~~CCTC~~~~~CCT~~GT~A~CG~ 4690 4730 4700 4710 4720 4740 4750

T2 repeat

3

AGGCITTIKCICn;TCrmC7CT~~CA~C~~TATGAGATGTCTTAG~CA~TCA~CCATCCC~~ACCAT~~T~T~AG~~AGACA~rAT 5050 5090 5060 5070 soao 5100 5110 T2 repeat

4

C7C7(;T(;mGTCK;TG7~~~~~r~TGAGAT~C~~CCAT~~~ATC~AG~TACCAT~AGTGffiT~A~T~~~~A~GA~~CA 5290 5300 5310 5330 5340 55 5320

T2 r-t

6

~~cK;kcATC~~~~~~C~~G~~C~~~T~~~GTffiACACCGT~CTCC~T~C~C~A~CA~CCTCCGT~GTC 5650 5660 5670 5690 5700 5680 AKT -

C~CCU;T~GGWICTTCAGCAX;TCCATTTTGAGGCCCCG se90 5900 5910 5920

5930

5940

5710

5950

-nn

5960

CCCTCTtGCACTGTCTX%TGTCC~T~T~~~CC~TTCCCCTCTG~CCTGCCCCGACCCCTCGTGTT~.~CGTCAG~G~ACTGA~~CCACGT~A~ 6080 6050 6060 6070 6010 6020 6030 6040

5970

5980

5990

6000

6090

6100

6110

6120

6230

6240

6350

6360

ACTGGTGGTCTCGWUGAGCTCGGCACCACCTCAGGGGGCCGTGAGCACACCCTGGGGGC CGACCCTGGTCTCGGGTGGCCGTGCAGGCGCTGGACACGGA 6200 6180 6190 6210 6140 6150 6160 61’10 6220 6130 Iiml III G,4LPDPKNTHlVVSWHlAQTVTAVAGVVSYPFDTVRRRMM ATGCTCCCCGACCCCAACMCACCCACACATCGT~TGffiCT~ATGATCGC~AGACCGTGAC~CGT~CC~GT~GTCCTACCCC~CGACACGGT~GGC~C~ATGATG

2

6750

6760

6270

6280

6290

6300

Fig. 3.

6310

6320

6330

6340

270

A. L. Cozens et al.

““~?%?k%?==6~~~

GGGGGACCCTTGCTGCCGGGAAGGGGAACCMGCTCCTTGCCCT~CCG 6410 6420 6430 6440

CCCCC~-CTCTCGTfCCACCCACACCCTC( 6630 6620 6610

6640

TGAGACAGTGGCTTAGGAGGAGGGC6730 6740 6750

AGCCTCAGGTATCCCTCCTGTAATCAGAACTGTGGCTC 6760 6770 6790 6790

6690

TGGGAGGKAGGAACC 6450 6460

6470

GCW;W\CACCCAGGCCCCTGMGTCCTGTGTCCAGCCCTCCC 6680 6690 6700 6710 ACCCGTC~TGTTATTTCTOIY;IIMCGT~PT~;TGC 6900 6810 6820 6830

6490

6720

6940

Emnlv ADIHYTGTVLICWRKIFRDEG TGACATCATGTACACGGGCACCGACTGTT 6910 6920 6930

GTGGCGTCCGTGTCT~ 6850 6860 GKAFFKGAWSNVLRGMGGAFVLVLYDELKKVI’ CGCPJU;GCCPTCTTCAKX;CTGCGTGGTCCTG 6970 6980 6990

7000

7010

JO20

1040

7050

6960

7060

7070

7000

ACACACACACACCffiffiGMCCAACAGAACCACGTffiMTCCTC~CCGT~GGACCATC~CCTTCGAG~TTCCAGT~TCTTTTTCCC~C~A~CTGC~GT~T~C~ 7090 7100 7110 7120 7130 7140 7150 7160 7170

7180

7190

7200

GAAGGCTCTAWV\AACU;CCGCATTGCW\TCCAACCATCC 7210 7220 7230 7240

7250

7260

7270

7290

7290

7300

7310

7320

GTACTGAW\CCTAGAGTCCAGATGCTTCTAGGAGCCAAGT 7330 7340 7350 7360

7370

7380

7390

7400

7410

7420

7430

7440

7550

1560

CGAGTACTGGCGAGTATGTTCTATGTTGGGCCTCCTGCTGCAAAAC~CAGAGGACGCAGA 1450 74 60 7470 7460 7m7500

7030

WAGGAAGATCTTCAGAGATGA 6940 6950

Poly A site CCTCCTGC~GGCCACtCACCKKCACAGGGCGGCCTU;GG t 7510 1520 7530 7540

GGCGCTCGGCCCACGGW\CGCACATCGCGCCACCACGCTCTGCCCGT~CT~CCCACGTTCC~GTCTGCAGTGCTGCCTCC~CCC~~CC~~~~~~TC~~ 7510 7580 7590 7600 7610 7620 7630 7640 7650

7660

7670

7680

GGGAAGAcGc7690

-ATTTGTAmT 7780 7790

7800

AGGTCCCAGAMACCAAGGTATGGTAGATTCAGTCTCTGGTGAGTACCCAGTTC ‘LXGCTTCTAGA~GCC~CCTGT~CTCAGAT 7830 7840 7850 7860 7870 7980 7890 7900 7910

7920

AGAMCCAAGGCTGGCAGA~GTGTAGCCGGGCTCCCTGATAAATGCTGGAGGACCCC~~TGCACTTACTGTACCCT 7700 7710 7720 7730 7740 7750 7760 7770

CTCATGTCTCAGCTCTGGM 7810 7820

W\TCGATCAGGCCAGCGTGCT~CT~TCCTTTCTGT~~~TGATCCCATCCATG~~CA~ACTCCC~CCCC~TCCCMC~ATC~~~ 7930 7940 7950 7960 7970 7980 7990 TTTCAGCGTGAAlTTTG 8050 8060

BOO0

9010

8020

8030

8040

UVV\CAGGCGTGACACCCPGCGTTCCTCTCCACCTCCTGTffi~CCCGA~C~CC~CCCCA~C~TG~~~~C~ 8070 8080 8090 9100 El10 8120 8130

8140

8150

8160

8260

8270

8280

GGTCCCC~AGCTGTGTGTCCGCGGGTTCACCCGTGffi~TCCACCAC~T~CCACC~~AC~~~C~TC~~T~C~A~~~~ 8170 9180 8190 8200 8210 8220 8230

8240

9250

GGC-CGTGTTGCTCTACCGGGCCGAACC 8290 8300 8310

0360

8370

Alu

TTGGGCCTTGGT~CAGCGC~~~C~~ffi~A~CC~ 8320 8330 0340 8350 repeat

e

CCAGAGCCGGCTCTCCCGTGCTCTCCA 8380 8390 9400

c

GAAATCCCAGCCATAGTAAMAG 8410 8420

Figure 3. DNA sequence of a segment of human DNA containing the T2 gene for mitochondrial ADP/ATP translocase. For meaning of symbols see the legend to Fig. 2, except that the boxed sequences contain not only Alu repeats, but also a second family of repetitive sequences called T2 repeats 1 to 6. Also shown by boxes are the positions in the 5’ region of the sequence of the hexanucleotide GGGCGG and its complement (overlined by arrows). The sequences of exons differ at a number of positions from the sequence of a partial human T2 cDNA clone, pHAT8. reported by Houldsworth & Attardi (1988). This covers the sequences in exons from nucleotides 1757 to 7389; at positions 4136,4137,4141 and 4230 they report G, A, G and T, respectively, whereas we find A, G, A and C. In addition, bases 4144 to 4146 are not present in their sequence. These differences in DNA sequence lead to changes in amino acid sequence of the T2 ADP/ATP translocase (see the text and the legend to Fig. 7). kinds of problems that it may be anticipated will be encountered in any future endeavours to sequence the human genome. (c) Gene structures (i) Identi&ztion

of exons

The four exons of the identified by comparison

Tl and T2 genes were of the human genomic

sequences with the corresponding bovine cDNA sequences (Powell et al., 1989). This was confirmed by translation of the genomic DNA sequences in all phases and comparison of deduced amino acid sequences with those obtained by translation of the bovine cDNAs. Consensus rules for splice sites which predict conservation of the dinucleotides GT and AG, respectively, next to the 5’ and 3’ boundaries of introns, also were taken into

Human

Mitochondrial

ADPIATP

Translocase

Genes

271

Table 1 Introns

in human

ADPIATP

translocase

genes

Sequence GHE

Intron

Size (basepairs)

Class 5’ Boundary

Tl

A

1269

0

T2

A

2171

0

Tl

u

508

1

T2

l3

1820

1

‘l-1

c

914

1

T2

C

519

1

Consensus sequence

ctg.cag.GTGAGGACCG L Q ctg .cag.GTGGGGACGC L Q gee aag gGTGAGAGAGG A K gee .aag.gGTACGTGTGG A K aaa. ggg gGTAAGCTTGT K G aaa.gga.gGTACTCGGGG K G cagGTAAGT

consideration (Breathnach & Chambon, 1981). The classes and sizes of introns and their 3’ and 5’ boundaries are summarized in Table 1. The exact extent of the transcribed region of the Tl gene, but not of the T2 gene, is known (see Fig. 2). Primer extension analyses have shown that transcription of Tl starts at nucleotide 3900 (A. L. Cozens, unpublished work). This site is preceded by a canonical TATA sequence at positions 3872 to 3877, and the sequence CCAAT at nucleotides 3813 to 3817. Both of these sequences are often, but by no means invariably, associated- with eukaryotic promoters. The former appears to fix the site of transcriptional initiation and is usually located 25 to 30 base-pairs upstream from this site. The latter is often found 40 to 100 nucleotides upstream from the transcriptional start site and appears to play a critical role in directing effcient transcription in a select class of mammalian promoters (Kadonaga et al., 1986). However, neither element is found in the sequences determined in the 5’ region of the human T2 gene, and initiation of its transcription has not yet been studied. It may be significant that the hexanucleotide GGGCGG, or its complement, occurs 15 times in this region as this sequence has the potential to bind the transcription factor SPl, and so can enhance transcription by RNA polymerase II by 10 to 50.fold. Amongst these 15 examples of the hexanucleotide sequence only one of them, namely the one found at bases 1295 to 1300, conforms to the extended decanucleotide motif, 5’ G/TGGGCGGG/AG/AC/T 3’. The decanucleotide sequence, TGGGCGGGGC, found at bases 1294 to 1303 in the T2 gene sequence has been demonstrated in other genes to be a high-affinity site for binding SPl (Kadonaga et al., 1986; see also section (f), below, for further discussion of this matter). Absence of a TATA box and presence of multiple SPl binding sites is usually accompanied by multiple 5’ ends in transcripts, emphasizing the important role played by the TATA sequence in

3’ Boundary CCTCCACCAG.gtc.cag v Q CGTCCCCCAG.gtc.rag v Q CTGTCCACAG.gg.atg GM TCCCGCGCAG.gg.atg G M GTTTCCACAG.cc.gat A I1 GTCGTTGCAG. ct gac A 1) -CAGg

determining the site of transcriptional initiation (Martini et al., 1986). Tl cDNA The sequence of the human (Neckelman et al., 1987) shows that polyadenylation of the Tl transcript occurs after base 7896, which is 15 bases to the 3’ side of a canonical polyadenylation signal. The sequence of the human T2 cDNA (clone pHAT8; Houldsworth & Attardi, 1988) terminates at a position equivalent to base 7380 and is followed by the sequence A,, although no polyadenylation signal is associated with it. However, in the gene sequence A, is found at this position, and so it is unclear from the published data whether this is an authentic polyadenylation site or not. On the basis of the human gene sequence it appears to be likely that polyadenylation will occur 14 or 15 bases after the canonical polyadenylation signal (nucleotides 7487 to 7492) as shown in Figure 3, particularly as this has been shown to be the site of addition of poly(A) in the homologous bovine T2 gene (Powell et al., 1989). Comparisons of the cDNA (or mRNA) sequences deduced from the genomic sequences of human Tl and T2 with their bovine cDNA counterparts, illustrate different degrees of conservation in different regions of the sequences (see Fig. 4). In both pairs of sequences conservation is greatest in the coding regions, where, for example, the human and bovine Tl cDNAs differ in only 65 out of 894 nucleotides. These differences give rise to 12 changes in protein sequences between the human and bovine Tl translocases. A similarly high degree of conservation is found in the coding regions of the T2 cDNAs and also in the protein sequences that they encode (see Fig. 5). In contrast, the 5’, and particularly the 3’ non-coding regions are less well conserved, although a strong relationship between the Tl homologues on the one hand and, the T2 homologues on the other, still is clearly evident. However, the 3’ non-coding regions of human Tl and T2 cDNAs are not related, as also

A. L. Cozens et al.

272

HummTl Bovine

Tl

H"nan T1 Bovine

Tl

40 60 80 20 MGDHAWSFLKDFLAGGVAAAVSKTAVAPIERVKLLL GAGCTGTCACCATGGGTGATCACGCTTCGAGCTTCCTAAA .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: .. .. .. .. .. .. .. .. .. .. .. : ::::: ::: CCGCTGTCTCCATGAGCGATCAGGCTCTGA~CCTCAR MSDOALSFLKDFLAGGVAAAISKTAVAPIERVKLLL 20 40 60 80 140 160 QVQHASKQISAEKQYKGIIDCVVRIPKEQGFLSFWRGNLA RGGTCCAGCATGCCAGCAAACAGATCAGTGAGAAGCA ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: .4GGTCCAGCATGCCAGCAMCAGATCAGTGCTGAGMGCAGTACAhAGGGA

QVQHASKQISAEKQYKGIIDCVVRIPKEQGFLSFWRGNLA 140 160

HmnTl Bovine

Tl

HununTl Bovine

Tl

HummTl Bovine

Tl

HutranT Bovine

Tl

Munmn Tl Bovine

Tl

Tl

HmnTl Bovine

Tl

HumanTi Bovine

Tl

220

200

180 300 Y K Q L

F

:::::

L

220

320 340 G G V D R H K QF W R Y F

: :::::

300

320

340

420

440

460

. . . . . . . . . . . . . . . . . . . :: 420

N

AG

::::: ::::::::::::::::::::::::::::: .. .. .. .. .. .. .. GTGGACCGGCATAAGCAGTTCTGGXCTACT'TTGCCGGTAACC

:::::::::::

:::::::::::

440

::::::

.....

460

700 680 620 640 660 TAKGMLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFDTVRR ~GCCMGGGGATGCTGCCGACCCCARGAACGTGCACATTT~GT~T~ATGATT~CC~~TG~C~AGTC~~TGGTGTCCT~CCC~TG~~~TTCGTCGTA .......... . . . . . . . . : .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: : ::::::::::::::::::::::::::::: CGGCCMGGGGATGCTGCCTGACCCCMGMTGTCCACATTATCGTGAGCT~ATGATT~CCAGA~GT~~GGTC~~TCGTGTCCT~CCC~G~~~~C~CGTA TAKGMLPDPKNVHIIVSWMIAQTVTAVAGLVSYPFDTVRR 700 680 620 640 660 800 700 740 760 IAKDEGAKAFFKGAWS RMMMQSGRKGADIMYTGTVDCWRK WlATGATGATGCAGTCCGGCCGGAAAGGGGCCGATATTATGTACAC~~TTG~~T~G-~~~~C~~C~CTTCTTC~~TCCA : :::::::::::::: :::::::::::::: ::::: :::::::: :::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. GGATGATGATGCAGTCTGGCCGGAAAGGCTGATATCATGTACACT~~AGT~~~~~~~~~ATG~ACCC~T~C~~CA

820

:::::::

::::::::::::::::::::::

KAFFKGAWS

820

800

780

::::

940 920 900 860 880 NVLRGMGGAFVLVLYDEIKKYV' ATGTGCTGAGAtiGCATGGGGTGCTTTTGTATTGGTGTTGTATGATGAGATC~TATGTCTMTGTMTT~AC~~CACAGATTT~AGTGMC~TCTAC~TTC . . . . . . . . . . . . : : :::: : :: :::::::::::::::::::::::::::::::: :::::::::::::::::::: : .. .. .. .. .. ....*. ACGTACTGAGAGGCATTCGGT~TTTTGTATTTTGTATGAT~GA~~TTTGTCTMTGT~C~~CA~~C-----------------------NVLRGMGGAFVLVLYDEIKKFV' Tz 1 900 860 880 1000 980 RCAGATCCATTGTGTGCTTTMTAGACTATTCCTAGGGGATA :::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. : -CAGATCCATTGTGTGGTTTMTAGACTCTTCCTMGGG 980 940 960

1020 ::::::::::

:::::::: 1000

1060

1040 ::::::::::::::::::

: :

1020

1160 1140 1120 1080 1100 AACCACACATGTATTTTGTATTTATTTTACATTTAAATTCCCACA~~TAG~TMTTTATCATACTTGT~MTT~~~GAT----MT~T~T~~ATC .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ::::::: MCCACARATGTATTTTGTATTTATTTTACATTTAAATTT 1140 1120 1090 1100 1060

::: ::::::: AAAAALTTAAGTATTCATTA 1040 1180 :::::::

:::::::::

1160

1200 CCACTTMTGCAC

HmnTl Bovine

100

500 560 580 520 540 LGDCIIKIFKSDGLRGLYQGFNVSVQGIIIYRAAYFGVYD TGGGCGACTGTATCATCMGATCTTCAAGTCTGATGGCCTGAGc%GGC TCTACCAGGGTTTCRACGTCTCTGTCCAAGGCATCATTI\A .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..*............................ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .-.... . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. TC.GGCMCT~ATCRCCAATCTTCMGTCTGA?~CTGA~CTCT~C~TTCMCGT~C~C~ATCATTATCT~~C~~~~~~~A~TA LGNCITKIFKSDGLRGLYQGFNVSVQGIIIYRAAYFGVYD 560 580 500 520 540

760

Bovine

::::::::::::::::::::::::::::

AGGGCTTCCTCTCCTTCTGGAGGGGTMCCTGGCCA :::::::::::::: :::::::::::::: :::::::::::::::::::::::::::: TCATTGATTGCGTGGTGAGAATCCCCAAGGAGC -CTCCTTCTGGMAACCTGCiCCA

RMMMQS7~oRKGADIMYTGTVDCWRKIAKDEGP

HumanT

:::::

200

100

260 280 NV I R Y F P T QA L N F A F K D K ACGTGATCCGTTACTTCCCCACCCAAGCTCTCAACTTCGCC :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ACGTGATCCGTTACTTCCCCACCCAAGCTCTCMCTTCGC NVIRYFPTQALNFAFKDKYKQIFLGGVDRHKQFWRYFAGN 260 280 380 400 LASGGAAGATSLCFVYPLDFARTRLAADVGKGAAQREFHG TGGCGTCCffiTGGGCCCGCTGGGGCCACCTCCCTTTGCTnTC :::: ::::::::::: ::::::::::::::::: ::::::::::::::::::::::: TGGCCTCCCGTGGGGCAGCTGGGCCCACCTCCCTCTCCTTTC LASGGAAGATSLCFVYPLDFARTRLAADVGKGAAQREFTG 380 400

100

Tl

Figure 4. Comparison of cDNA sequences of bovine and human Tl and T2 ADP/ATP translocases. The alignments were made with the computer program NUCALN (Wilbur & Lipman, 1983). In (a) human Tl is compared with bovine Tl. Two alternative polyadenylation signals in the bovine mRNA 1 and 2 (Rasmussen & Wohlrab, 1986; Powell et al., 1989) and the ‘I&I site used in preparation of a Tl-specific probe are shown. In human Tl the 3’ nucleotide in the mRNA is polyadenylated (Neckelmann et al., 1987) and the boxed region is the polyadenylation signal. In (b), human T2 is compared with bovine T2. The meaning of various symbols is as follows: the arrow indicates the polyadenylation site proposed by Houldsworth & Attardi (1988); the boxed region is the polyadenylation signal proposed in this paper

::

Human

Mitochondrial

ADPIATP

Translocase

Genes

273

80 60 20 40 M ‘I E Q A I s F A K ” F L A G G I A A A CCGCI’C’,‘CC7CCCG’,“I’CCCCIGCCCC~C~~C~CA~~~~~Ati~C~rCrCC,”rC~C~C~“rCrr~CGti~~~rC~CGCC~CATCTCC~~CG~CGrG~~CC~~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . :: ::: ::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ::: :::::: .. .. .. .. .. .. .. .. .. .. .. .._.......................... l’l”r’l’C’I+;C(;C1’CCCCPCCGC~C~CC~CC-~C~C~~GAC~~CA~C~l~~CCl~~GCC~~~~~~C~G~CCGGG~ATCGCC~C~C~~C~CC~~T~CG~~~CCCG M T E Q A 1 S F A K D F L A G G I A A A 60 80 40 20

100 I S

120

K

T

A

V

A

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: I

S 100

K

‘1’ A

V

P 2::

A

P

220 240 180 200 160 140 1E R Y K L L LQ "Q II A S K" I A A" K Q Y KG 1" DC I" H 1 P K E Q G" L I\‘I’CtiAU‘G(i(;L’CICrGCT~~CA~rCCA~AC~CAGC~~~~,~GCCGCCGAC~~AGr~~G~ATCGrGGAC~~C~,“rG,‘CC~ATCCCC~~AGCA~CGrGCrG ........................ .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1: .............._......... ATCGAGCGCGTCAAGCI~GC~~~l~A~AC~~~~~AGAl~~~CGAC~GC~l’~~~~~CGl~G~CAl~GTGCG~~~CCCC~~A~CCCGC’~G IPKEQGVL IERVKLLLQVQHASKQIAADKQYKGIVDCIVR 200 220 180 140 160

300 260 280 S F W R G N LAN V I R Y F P T Q A L N F A F K D ‘I’CCPTCTGW\GGU;C~C~r~C~G~A~~rA~rCCCCA~~CCC~~Cl~G~~C~GAT~T~~GC~T~“l~~-GGCGr~AC~CACACGCAG .._._.._.. . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .._... . . . . . . ‘,~CTrCTGGC(iG(~C~CCAACGTCMCC

S

F



R

G

N

L

A

N

V

I

260

.. .. .. .. .. .. .. ..

.. .. .. .. ..

R

Y

F

P

T

.. .. .. .. ..

Q

A

..f......... . . . . . . . . . .

L

N

6’

A

.. .. .. .. ..

F

K

.. ..

340

320

K

Y

K

Q

I

F

L

G

V

K”

T

Q

D

K3z0K

Q

IF

L

G,,;

V

D

K

H

T

V

G

K

Q

300

280

400

460

AD

660

640

S

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 460

440

420

480

LA

540 560 580 500 520 GTE R E 6'R.G LGDC L VK I T K S DG I R G L YQG F S V S VQG U;CACACAGCGCGAGl"~~CGA(;GCC~~AC~GCC~GGl~~GAl~ACC~G'l~CGACG~A~CCGffiGCC~GTACCAGG~~CAG~G'~CTCCG~GCAffitiCA~CA~C~~CTACCGG ._........... . . . . . . . . . . . : .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :: :: ::::::::::::::::::::::: ..._. ... G(;CAGT(;AGCGCGnC~CAffiG~T~~ATl~T~~G~GATCACC~G~CGAC~CATCC~~~Gl’ACCffiGtiCl”rC~GTG’rCGGTGCA-ATCA~C~rC’rACC~ GSEREFRGLGDCLVKITKSDGIRGLYQGFNVSVQGIIIYR 560 580 540 500 520

620

D

. . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .._.... ‘TGGACAAGCGCACGCAG

420 440 380 400 F W R Y FAG N LAS G G A A G A T S L C F” Y P L D F A R T R l-~C’IGGAGGPACI’1’PCCGGGCAACCl~C~~CG~GGT~tiGCC~C~GACC~CCC~C~G~CG~G~ACCCGC~GGA’~~~CtiCC~~CCC~C~~AGCGGACG~GGG~G~CA .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. :::::::: :: :: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. TPCTffiAGGTAC~G~~Cl’CCTCCGGCGGC~CG FWRYFAGNLASGGAAGATSLCFVYPLDFARTRLAADVGKS 380

360

G

600

I

I

I

Y

700

680

K

120

G V Y D T A K GM LP D P K N T H I V V S WM I AQT VT AVA G V Y S GL‘GGCCrACTTCGGCGTG1’RCGATAC~C-GCA~~TCCCCGACCCC~G~CACGCAC~,~G,~~G~rGGA~~ATCGCGCAGACCGrG~G~CGTGGCCGGCGrGGrGrCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. :: .. .. .. .. .. .. .. .. .. .. .. .. : .. .. .. .. .. .. .. .._.___._._......_............................................................... (jCCCCCTACTPCGGCA’PCACGACACCGCC~-A~~CCCG~CCC~G~A~CACAl~Gl~rG~~A~~rC~GCffiACC~~ACG~CGTG~~GCG~T~CC A A Y F G I Y D T AK GM LP DP K N TH I V V S WM I AQT VTAVAG VV S A

A

Y

F

620

680

660

640

700

BOO 820 760 780 740 Y P F D ‘1’ V R R R H M M Q S G ” K G A 11 I M Y 'r G 1 V D C W K K I F H " 'PACCCC~rCGACACGGI'UIGGCU;C(;CATG~rG~r~~rC~G~GC-C~~rG~~rC~,~~rACACG~ACCG,~GACrGrT~AGG~G~rCrTC~AG~rGAGGGGCGC~G :: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. TACCCLTTCW\CACCOl’(jCCGC(;GCGCAT~~T~AGTC~~~~A~GACATCA~‘rAC~~~C~GA~CT~GG~~rCCTC~~ACGAG~GGC~G YPFDTVRRR”MHQSGRKGADIMYKGTVDCWRKILKDEGGK 780 800 140 760 820 900 860 080 920 A F F K G AW S N Y LR GMG G AF V LV LY D E LK K” 1 * GCCI”I’C1TCAAGG(;nCGr~rCCAACG’I’CC~~~G~~rG~GtiC~C~“rCG’,~TGGTCCrGrACGA~A~,~~G~GG~ATCT~GGGCC~GGCC~CCTCCACACACACAC .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. GcCTTCTlTAAw;GcGc Cn;GTCCAACGT~GCGCGGCATW;GCGCCGCCTTCGT~~~~~ACGACC~G~~~GrC~~rA~GCCCGTC~~~~G~~~GAC~G~~ AFFKGAWSNVLHGMGGAFVLVLYDnKKVI’ 860 900 880 sac I

1140

1160 : :::::

lOB0

:

1100

G

G

K

.. .. .. .. .. ..

:::::

940

960

: :::

..

940

102" 980 1000 1040 1060 ACACACCACCi(;GAI\CCMGAG~CC~GTAG~T~C~CCGT~GGACCATC~C~~GA~T,~CA~rL~~rTrrCCCA~CGCA~~GCCT~AGATGGCC~GG~GC :: : : : :::: :: ::: : : : .. .. .. .. .. .. .. .. .. .. ..._... ..... .. .. .. .. .. .. .. .. --------------------------AGCGCCATCC~CA~Gr~~ACCACCGAC~CG~~~"~CA~G-------'rCCCC~CGGG~CGGGCCACCCCGC~C~GG~C~ 960 980 1000 1020 1100 1120 rrTAGN\AAGGGGCGCAT’rGCGATCCATCGCCAGCTA ............. ........................... :: rrGCCGGC~-----------------ACCATCGGCCCTGAC 1060

840

E

1080 .. .. .. .. .. .. .. 1040

1180

:: 1120

1140

1220 1240 1280 1300 + ~AGACCTAGAGTCCAGATG~GTA~C~~T~~T~TA~A~~~~~TCACL~~CCCAT~TAC~CAGC~A~CCCTG~GCACAGCCGA .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. : : ::: : : : :::::: ::::::::::::::: .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ACCffiACC~~~~~CAG~TGT~~-CG~~-T~GTA~~A~~-------G~TCA~TCCAT~'rA~-A~ATC~~~~CACAGCCGA 1200 1220 1240 1340 Gl’ACTGGCGAGTA’ffiT-TCTATGlTGGGCCTCC’,~C~ ::: .. .. .. .. .. .. .. .. .. .. .. .. :: :::: : ATACGCGCGATTA’KTTl’CTCCGTCGGGCAl’~CCG 1280

1360 AGAGGACGCAG .. .. .. .. .. .. .. .. TCACGACACAW\AAAAAAAAAAAAAAAAAAAR 1320

and the sequence is extended to the nucleotide presumed to correspond to the polyadenylation site in the mRNA. The bovine T2 polyadenylation site is also boxed. The sequences of bovine Tl and T2 are taken from Powell et al. (1989), and the human sequences are deduced from the gene sequences. These human sequences differ from those described by Neckelmann et al. (1987) and Houldsworth & Attardi (1988) as described in the legends to Figs 2 and 3, and the bovine Tl cDNA sequence differs from the partial sequence of Rasmussen & Wohlrab( 1986) as described by Powell et al. (1989).

274

A. L. Cozens et al.

(0) rl

Hum T2

was noted earlier with the bovine counterparts (Powell et al., 1989) and also with the Pl and P2 cDNAs encoding the dicyclohexylcarbodiimide reactive proteolipid subunit of ATP synthase (Gay & Walker, 1985).

198

Hum T3

8ov TI

Bov T2

Hum TI

Hum T2

Hum T3

Bov TI

(ii) Exon-intron boundaries The exons of human Tl and T2 genes are spread over 4.2 and 5.9 kb of DNA, respectively (see Fig. 6), and the three introns A, B and C interrupt the coding sequences at exactly the same positions in the two genes (see Table 1 for a summary of boundaries). The translocase protein has been proposed to be folded into six transmembrane a-helices linked by loops outside the lipid bilayer (Saraste & Walker, 1982) and all three introns in both genes break the coding sequences at points corresponding to loops 1, 4 and 5 (see Fig. 7 and section (g) for further discussion of this point).

Hum T2

(d) Amino Hum T3

Bov TI

$=26’S

%=74%

8ov T2

%=72%

%=38%

-$+=72

Hum T2

Hum T3

Hum TI

Hum T2

Hum T3

I

$=60%

(c)

%=76%

8ov TI

32

30

23

Bov TI

12

32

30

Bov T2

31

7

22

33

Hum TI

Hum T2

Hum T3

Bov TI

Figure 5. Binary comparisons of sequences of cDNAs and protein sequences of mammalian ADP/ATP translocases. (a) Coding sequences; (b) 3’ non-coding sequences of cDNAs; (c) protein sequences. In (a) and (c) in each case all the sequences are the same length (891 nucleotides and 297 amino acids, respectively) and the numbers refer to non-identical residues in the pairs of sequences. In (b) the sequences, which differ in length, were aligned with the computer program NUCALN (Wilbur & Lipman, 1983), and identities counted. The number of differences is expressed as a fraction of the length of the region over which they are aligned.

acid sequences of the human translocases ADPIATP

Tl and 2’2

The alignment of the deduced protein sequences of the human Tl and T2 ADP/ATP translocases with their bovine counterparts shows that they are closely related in sequence, and that they are all 297 amino acids long (see Fig. 4). (For an account of differences in sequences of Tl and T2 proteins between those predicted from gene sequences and those derived from the cDNA sequences of Neckelmann et aZ. (1987) and Houldsworth & Attardi (1988) see legend to this Figure.) None appears to have a processed import sequence and none has been found associated with ADP/ATP translocases in Neurospora crassa (Arends & Sebald, 1984), Saccharomyces cerevisiae (Adrian et al., 1986)) Zea mays (Baker & Leaver, 1985) and the Tl form of the bovine protein (Powell et al., 1989). In common with these homologues only the initiator methionine is removed by post-translational proteolysis. Thus, the translocases belong to a relatively rare class of nuclear-coded mitochondrial protein in which the import sequence is present in into the the mature protein assembled mitochondrial inner membrane. In the 19.cerevisiae protein this sequence is found within amino acids 1 to 115 (Adrian et al., 1986). In the bovine Tl protein the amino-terminal serine residue of the mature protein is N-acetylated, and a second posttranslational modification was detected at lysine 51 which is trimethylated (Klingenberg, 198%~). This latter residue is conserved in all mammalian translocases. (e) Repetitive Human

DNA

DNA

contains

sequences

two

types

of

middle

repetitive DNA sequence, the long and short interspersed sequences (LINES and SINES; Kao, 1985; Rogers, 1985; Weiner et al., 1986; Hutchison et al., 1989). No examples of the former have been detected in the sequences described in this paper,

Human Mitochondrial

Translocase Genes

141

487

218

Human Tl

ADPIATP

459

1269

-_-----

-----_-,

141

487

Human T2

_______.

2171

1820

A

B

666 (approx)

III IV II I Figure 6. Structures of the human Tl and T2 genes encoding mitochondrial ADP/ATP translocase. In each gene, exons I to IV and introns A to C are shown as filled boxes and continuous lines, respectively. The sizes of exons and introns are given in base-pairs, those of exons I and IV in the T2 gene being approximate as the transcriptional start and polyadenylation site are not known.

illustrates dramatically the presence of repetitive elements in introns A and B of the T2 gene. The former are six Ah repeats and the latter arise from six tandem repeats of about 236 nucleotides and are referred to as T2 repeats. A seventh and eighth A&u repeat were detected in the 5’ and 3’-flanking regions of the T2 gene, and three others were found in the Tl sequence (see Figs 2, 3 and 9). The T2

but the Tl and T2 genes and flanking regions contain two different examples of SINES. These repetitive sequences were detected in two ways, firstly by comparison of DNA sequences with the EMBL data base using the computer programme FASTN (Lipman & Pearson, 1985), and, secondly, in the case of the T2 sequence by comparison of the DNA sequence with itself (Fig. 8). This Figure

1

50

3

M GDHAKSFLKDFLAGGVAA4VSKTAVAPIER VKLLIQVQ'H ASXQISAEKQXKGIIDCVVR IPKE(x;FLSF KRGNLANVIRXFPTQUNFA M SDQALSFLFJFLA0ZVMAI SKTAVAPIER VRLLIMQ'H ASKQISAEKQXKGIICCVVR IPKEWFLSF NRGNU&,'IR XFPTtXLNFA M TF,(wISFAKD FLtGGIAMI SKTAVAPIER VKLLUIV(1'H ASKQIAADKQXKGIVDCIVR IPKEO3JLSF WB3LAWIR XFPXHLNFI\ M TEQliISFAKD FUGGIAAAI SKTAVAPIER VKLLIRM'H ASKQIAADKQXKGIVIKZfR IPKEOGVLSFWRGNUNVIR XFPVXtVA " TDAALSFAKD FL4GGVAAAI SKTAVAFIER VKLLIQVQ'H ASKQITADKQXKGIIECVVR IPKEPEVLSF W&NIMVIR XFPTQUWA HAE~VLGIIPP~FIWGGVSMVS~~IERIKLLVPNIDEMIIUGRLDRRYNGIIDCFKATT~~~~RYF~AFRM(F~G M SHTETQ?CQSHFGVDFIK;GvsMI AKTCAAPIER VKLIMNQ!ZE MLKCGSLDTRXKGILDCFKR TATKEGIVSF WlCNl'ANVLR XF-A H QTPLCANAPAEKGGFMMID -EAAV SKTAAAPIER VKLLICNQDE MIKSGRLSEPXKGIVDCFKRTIWEGFSSL "RGYTANVIR XFPTQ,MFA t

t

t

t.

.*

..*

tt**t

ttt

.

l

et..

tt

l **

l

---_-----------------------

.t*tt

l tt*tt.tt.

Eovlm T2 Human T3 N. crassa

FKDKIKSLLS FKDXFKRLFN .

.

t

II

150 200 i VXPLDFARTRLAAD”VGKG AAQEFHGLG DCIIKIFKSD GLRGLXQGFN VSV(X;IIIXR AAXFGVXDTAK'GMLPDPKN GGVDRBKQFN RXFAGNLASG-TSLCF u;vDIUKQF!d RXFAGNIASGUAGATSICF VYPLDFARTRLAAD"VGKG AAQNFTGLG NCITKIFKSD GLRGLXQGFN VSVQZIIIXR AAXFGVXDTAK'GMLPDPKN GGvDl(flToFI( RXFAGNLASGGAAC%TSLCFVYPLDFARTRUAD"VGXS GTERKFRGIGBZLVKITKSD GIRGLYSFS VSVQGXIIYR AAYFGVYDTA K*CHLPDPKN GAAGATSLCF VYPLDFARTR LAAD*'VGKS GSEREFiUXA IKLVKITKSD GIRGLYGGFNVSVQGIIIYR AAXFGIYDTA K%MWDPKN GG"DI[RzpFv RYFAGNLASG GGVDKRTQFMRXFAGNUSG GAAGATSLCF VYPLDFARTR LMD"VGKAGAEREFRGLGDCLVKIXKSD GIKGLY~FNV~IIIYRAAXFGIYBTAK'~DPKN YKKDV*DGXW~LASGGAAGATSLLFVXSLDXAi7TRUNDAI(UI(I GGERQF%LVDVXRKTIASD GXAGLYRGFG PSVAGIWYRGLXFGLXDSI KPVLLVGDLK YDRER=DGXARRFAGNLFSG GAAGGLSLLFVXSLDYARTRMADARGSKSTXlRgmLL DVYIIKTLKTDGLLGLYRGFV PSV‘GIIVYR GLYFGLXDSFKPVLLXGALE FKKDR*DGYNI(wFAGIv,ASGGNkGASSLFFVYSLDXARTRIANTlMAMG GGERQFKLV DVYRKTLKSDGIAGLYRGFNISCVGIIVYR GLYFGLYDSI KPVVLTGNM tttt

l *

tttt

l *

t

l .

tt

+*.t

*.

t

.

l

t

*t

t

_-____--_-_-----------------

t

t

t*t

tt

t

l t

.t

ttt

t*

t

*

_-

-------------__--------------

III

Huma” Tl Bovine Tl HumanT Bovine T2 Human T3

FKDFCXKQIFL FKDKXKQIFL

--------_______--------------

I

Human Tl Rovlne Tl “mm T2

100 FKDKXKQLFL FKDKYKQIFL FKDKYKQIFL

IV

250 300 E VKIWs*nIA QSVTAVAGLVSYPFDTJRRRM,,QSGRKGADIMXYTGTVEC NRKIAKDEGAKAFFKGAKSNVLRQCIULFV LVLXDEI"' l *em VRIIVsuIIAQTVTAkGLV SYPFDTVRRRpI+RsGRRGA DIMYTGTVDCNRXIAHDeGp KAFFKGA"SN~ LVLXDEI"' *'*KKFV THIWSIMIA QlVTAVM SYPFDlvRRRrnpsGRRGA DIWRKIFFtDEGGMFFKAWSNLVLYDEL"' ***KKVI THIVVSMIA -"AGW SYPFDTVRM B DINFQULKDEGG FAFFXGANSNLVLYDEL*** *wxVI TKIVISmIA QIVTAVAGLT SYPFDTVRRRB DIBYTGTLDCHFUCIARDEGG~FKGAKSN~ L"LYDEI'** *"XXYT NNFLASFALG UZVTTAAGIA SYPLOTIRRR~SG **'EAVKXK.SSFDAAS41VAKK~KSLFILRGVkaGV LSIYDQLQVL LFGKAFKG~ G GSFVASFLLGW#Ilt.tZASTA SYPUMVRRRM?tXSG***Q TIKYDGAIBz LRKIVQKEGAYSLFKGCGANIFRGVAAAGV ISLYDQlpLI MY;IWK DNFFASFALG WLITNUGU SYPIDTVMR mSG l *=E AVKYK.%SLDAFM)IMKEGPKSLFKGAGANILRAIAGAGV LSGXDQLQILFFGKKXGSGG A t f l .* l * l ** a.. l * . f t . . . et* t t . . ** t --_------~_____-_--------~~~~ V

VI

Figure 7. Protein sequences of mitochondrial ADP/ATP translocases. The sequences are numbered from the aminoterminal residue of the mature bovine Tl protein. The initiator methionine that precedes it is residue - 1. Stars indicate residues conserved in all species and segments I to VI are hydrophobic regions that are proposed to be folded into transmembrane a-helices (Saraste & Walker, 1982). A, B and C indicate the positions of introns in the corresponding regions of the human genes. The protein sequences are taken from the following sources: human Tl and T2, this work; bovine Tl, Aquila et al. (1982); Powell et al. (1989); bovine T2, Powell et al. (1989); N. crassa, Arends & Sebald (1984); S. cerevisiae, Adrian et al. (1986); 2. mays, Baker k Leaver (1985). The sequences for human Tl and T2, which are deduced from the gene sequences, differ from those predicted by cDNA sequences as follows. The cDNA sequence of human Tl (Neckelmann et al., 1987) codes for Ala instead of Gly at position 15, for the sequence Arg-Arg at positions 149-150 rather than Lys-Gly, Ala151 being deleted, and for Leu instead of Va1230. The partial cDNA sequence of human T2 in the clone pHAT8 (Houldsworth & Attardi, 1988) codes for the sequence Arg-His-Ala rather than LysHis-Thr at positions 105 to 107, and Gln108 is deleted (see also the legends to Figs 2 and 3 for an account of differences at, the nucleotide level).

276

A. L. Cozens

et al.

repeats are highly conserved, 154 nucleotides being identical in the five longest sequences (see Fig. 10). They have no associated poly(A) sequences such as are found in Ah sequences, and there is no indication that, t’hey are transcribed. They cannot be folded into a tRNA-like structure (Daniels & Deininger, 1985). Apparently this t,ype of repeat has not been detected and it is not, known if it is found elsewhere in the human genome.

6-

(f) li:zpression

T2 gene (kb)

Figure 8. Comparison of the DKA sequence of the human T2 gene and flanking regions with itself. The computer program DIAGON (Staden, 1982) was used. A window length of 231 and score of 100 were employed in the calculation. I to IV indicate the positions of exons. The related sequences in intron A (between exons I and II) are Ah repeats (Fig. 9) and those in intron B (bet’ween exons II and III are referred to as T2 repeats (Fig. IO).

2189-1900,-1 8317-8618 l-215 2398-2639 2640-2842 2844-3066 3352-30691-I 3353-3647 X48-3858 8625-8426(F)

Al"

ccJ"sen*"*

GCTCGGCGTGGIU;CTCACACCTGTAA------TCCCAtC

at

*

CO”*e”*“*

Alu

I/**

l .

,

CATGGTGTGAAA-CCCCGTCTCTACTAAARRTAC--ARRAGGA f

*

f

*.

.

(I

. . .

f

\TMATAAA

I-114 2189-19001-1 8317-8618 I-215 2398-2639 2640-2842 2844-3066 3352-30691-l 3353-3647 X49-3858 8625-84261-I co"se"s"5

(I

GAT-CCCTTGTCCCTAGGA CA---AAAR-CCCCATCrrTACTAAARATAC--AAAARA CCGAGCG--AAA-CCCCATCrrTAATARRAATAC-RAAARRA CATGGCGCGAAA-CCCCGTC~T~T~TAT----TTA~T~~~~T~~-A~C~TCA~CCA~TACTT-~T~~A~AG~TC~T-CCCGGGA TTATTTTAACTTAACGAAtCAGGACG---------------TCCCAGTGCATT~~G-----------------------~CCGGAG ~CAAGA-------CCTCACCTC-~ tCAAGA-------CCCCATCTCT-AAGARAATmTAAAACTTA~A~-C-TG---------------TCCCC~ACTTTGAGAG-----------------------~CCGA~ GCAAG--------CCCCGTCTCT-ARGAAAATTmTAAACTTAGCAGGGC-TG---------------TCCCA~ACTT~AGAG-----------------------~CCA~G -TGA=A---G-CCCAGGA CGACGTGrrAAA-CCCTGTCTcTACTAAAAATAC-AGAACA~~~~~~T~AG-GTGCC------------T~TA GCARG---------TCCArrrrT--GT~T~~TT~~A~T~AT~~~ATC~TA~CCC~TAC~-~C~--~-TT~TCC-A~A TTAtCCttGCGTGGTtGrrX;-GCGCCTATAG~CCCAGCT AGGAGAATCACTTUTC-----CAn;Gn;TGGAACGCTCGTC~TACT~T-~ cATCGn;TCAAR-CCCCATCTCCACTAAPARTAC--T-~A~~~~-~-AT~C~TA~CCCA~TACT~~~A~A~-~~TT~CCTG~A l

Tl Tl Tl T2 T2 T2 T2 T2 T2 T2 T2

fl

200

150

Al”

l

B block

A block l-114 2189-1900 t-1 8317-8618 l-215 2398-2639 2640-2842 2844-3066 3352-3069,-l 3353-3647 3649-3858 8625-8426(-l

and T2 genes

GGGTTCGAANCC

GTGGYNNRGTGG

Tl Tl Tl TZ T2 T2 T2 T2 72 *2 T2

Tl

A major reason for carrying out this detailed analysis of human Tl and T2 genes is the wide difference in relative levels of expression of the bovine homologues in various tissues (Powell et al., 1989: see Introduction). Equivalent studies of the human genes have not yet been carried out, although it has been shown that Tl, T2 and T3 are expressed in liver (Houldsworth & Attardi, 1988), and different levels of transcripts of Tl have been found in various human tissues (Neckelmann et al.. 1987). Earlier, immunological studies of the protein isolated from heart, liver and kidney provided for organ-specific determinants, with evidence partial identity of proteins from the various sources (Schultheiss & Klingenberg, 1984, 1985). Therefore, it is not unreasonable t!o assume that the human

012345678

Tl Tl T2 T2 T2 T2 T2 TZ T2 T2

of the human

G~TTGCAGTGAGCCGA~TCGCGCCAC~C-KCAGCC~AACA~G-AGACTCCA--TCTC----~ l

*

Ir.1

.

Figure 9. Summary of human Alu sequences in and around the Tl and T2 genes. The sequences have been aligned with the consensus human Ah sequence (Deininger et al., 1981). Insertions, denoted by dashes, have been introduced to improve alignment. Completely conserved residues are indicated with a star. The A and B blocks (underlined) refer to conserved elements in the split promoter (Fowlkes t Shenk, 1980; Fuhrman et al., 1981; Paolella et al., 1983; PerezStable et al., 1984: Rogers, 1985). Three of the Ah sequences shown in the Figure are incomplete as they flank the experimentally determined sequences; these are T2, 1 to 215: T2, 8625 to 8426; and Tl, 1 to 114.

Human

T2 T2 T2 T2 T2 T2

4483-4715 4716-4950 4951-5169 5170-5402 5403-5636 5637-5774

4483-4715 4716-4950 4951-5169 5170-5402 5403-5636 5637-5774

A DP/ AT P Tranalocase

Genes

277

a0 60 20 40 GTCCTGTGGGCTGMGGTCTGAGA-TMGGTGTGGG-CAGGGCTGGTTCCTCCTGCGGCCTCTCTCCTGGACTTGGAGACGCCGTCTTCTCCCTGTGCCC GTCCTGGAGGCTGGAGGTCTGAGATCCAGGTGTGGG-CAGGGCTGGTTCCTCCTGCGGCCTCTCTCCTAGGC~GTAGATGCCGTC~CTCCCTGTGCCC GTCCTGCAGGCTGMGGTCTGAGACCMGGCATGGG-CAGGGCTGGTTCCTCCTGAGGCCTCTCTCCTGGGCTTGGAGATGCTGTCTTCPC--------GTCCTGGAGGCTGGMGTCTGAGATCCAGGTGTGGG-CAGGGCTGGTTCCTCCTGAGGCCTCTCTCCTGGGCTTGGAGACTCCGTC~--CCCTGTGTCC ATCCTGGAGTCTGGMGTCTGAGATCMGGTGTGGGGCAGGGCTGGTTCCTCCTGAGGCCTCTCTCCTGGGC~GTAGACGCCGT~TCTCC~GTGTCC GTCCTGGAGGCTGGMGTCTGACATCMGGAGTGGG-CAGGGCTGGTTCCTTCTGAGGCCTCTCTT~TGGC~GTAGACACCGTCTTCTCCCTGTGTCC l

T2 T2 T2 T2 T2 T2

Mitochondrial

.*,.

f

ttt

t

t**t**

.

l

‘t*

llttt

t,**t***.t.t.t

t.,

.**tt**+,

t

+

/)o*.

f,.

l

100

II+***

l

160 180 120 140 TCACAGCATCATCCCTCTGTGTGTGTCTGTGTCCTCATCCTCTCTTCTTATGGGATGTCTTMTCCATTTCAGGCTGCTATCACAGMTACCATAGACTG TCACGGGGTCGTCCCTCTGTGTGTGTCTGTOTCCTCATCTC~CTTGTTATGAGATGTCTTMTCCATTTCAGGffGCTATCACAGMTACCATAGACTG ----AGGCTTTTTCCTGTGTGTGTGT~GTGTCCTCATC---TCTTGTTATGAGATGTCTTAGTCCATTTCAGGffGCCATCCCAGMCACCATAGACTG TCACAGGGTCGTCCCTCTGTGTGTGTCPGTOTCCTCGTCCTCGTGTCCTC~CTGATGAGATGTCTTAGTCCATTTGAGGCTGCTATCACAGMTAC~TAGAGTG TCACAGGGTCATCCCTCTGTGTGTGTCTGTGTGTCTACTG TCACAGGGTCATCCCTCCGTGTGT--CTGTGTCCTCATCTC ,

l

t

**I)

l

.**..

tttt****.

.

a**.

(I

.*.

0

,*****t

I**.+*.

..**t

l

t

t

200

.f

tt

t.**)+

I.

220 T2 T2 T2

4453-4715 4716-4950 4951-5169

T2 T2

5170-5502 5403-5636

GGTGACT-ATMACAACAGACATTGATTTTCCTACA

GGTGGATTGTMACAGCAGACATTGATTCTCCCACA GGTGACTTAGAAACAACAGACATTGATTCTCCCACA GGTGGACTGTAAACAACAGACATTGATTCTTCCATA GGTGGCTTATAAACAGCAGACATTGATTCTCCCGGA ."lt ttttt t***..**t..* li .

t

Figure 10. Summary of 6 novel repeats found in intron R of the human improve alignments. The stars indicate conserved bases.

genes will be expressed differently in various tissues. A similar phenomenon has been demonstrated also with the bovine PI and P2 genes encoding different precursors of the dicyclohexylcarbodiimide reactive subunit of mitochondrial ATP synthase (Gay & Walker, 19&i), and with heart (Walker et al., 1989) and liver (Breen, 1988) isoforms of the cc-subunit of the same enzyme. The 5’ regions of Tl and T2 genes have a high preponderance of the dinucleotide CpG (see Fig. 1 I), and clearly CpG-rich islands seem to be associated with 5’ ends of the Tl and T2 genes. That in the Tl gene covers about 1 kb and extends into the region to the 5’ side of the site of transcriptional initiation. The CpG-rich sequence in the T2 gene is more extensive. It covers about 1.5 kb and also probably includes regions involved in initiation of transcription. In CpG islands it is thought that cytosine in CpG is not methylated, whereas it mostly is in CpG elsewhere, but this remains to be demonstrated in the present cases. There are approximately 30,000 such islands in mammalian genomes, and whilst some of them appear to be associated with housekeeping genes that are expressed in all tissues, others are associated with genes that are expressed only in specific cell types (Bird, 1986, 1987). It has been suggested that tissue-specific genes without CpG

T2 gene. Dashes

have been introduced

to

islands, would be unavailable to ubiquitous transcription factors in tissues where they are and so this could contribute to methylated, transcriptional repression and lack of expression. In contrast, tissue-specific genes with CpG islands would be available continuously to transcription factors, and their expression could be prevented in non-expressing tissues by trans-acting repressors (Bird, 1987). The human T2 gene has a number of features near to its 5’ end that suggest that, it. might be a “housekeeping” gene expressed in all tissues. Firstly, its promoter is G+C-rich and contains multiple binding sites with the potential to bind transcription factor SPl. In addition, it evidently lacks both TATA and CCAAT boxes (Melton, 1987). In contrast, the human Tl gene has features that have been associated with both housekeeping and tissue-specific genes. For example, it has TATA and CCAAT promoter elements, and both are associated with tissue-specific genes and some housekeeping genes also. It has been proposed that the TATA boxes associated with housekeeping genes are part of a more extensive conserved element which extends in a 3’ direction up to the cap site (Martini et al., 1986). In the human Tl gene 17 out of 30 nucleotides (or 57%) are conserved over this region, which is rather low in comparison with the levels of

I CG

II

I

I Il~ll

III I

Figure 11. Distribution of the dinucleotide CpG in the 5’ regions of the human Tl and T2 genes. The horizontal lines above the distributions of dinucleotides indicate non-coding regions, and the filled boxes the positions of exons. The calculation was made with the computer program ANALYSEQ (Staden, 1985).

278

A. L. Cozens et al.

conservation observed in most other tissue-specific genes in this category that have been investigated (Martini et al., 1986), except that the human muscle carbonic anhydrase gene is conserved in only 48% of nucleotides between its TATA box and cap site (Lloyd et al., 1987). A second feature that has been proposed to be characteristic of a housekeeping gene is the presence of a CpG island near to its 5’ end, but they have also been found in tissue-specific genes. So it is unclear from DNA sequence alone to which category of gene human Tl belongs, and detailed studies are required of the methylation states of the CpG islands and of the transcription and expression of the human Tl and T2 genes in various tissues. (g) Number of genes for mammalian ADPIATP translocase Hybridization experiments conducted on restrietion digests of human and bovine genomic DNA have shown the presence of numerous different sequences related to the cDNAs of bovine Tl and T2 ADP/ATP translocases. For example, between 12 to 15 sequences could be detected by hybridization of restriction digests of human DNA with the coding region of the bovine Tl cDNA (Powell et al., 1989). So far, three expressed human genes have been detected. Two of them are the Tl and T2 genes described in this paper, for which corresponding cDNA clones also have been described (Neckelmann et al., 1987; Houldsworth & Attardi, 1988). .Expression of a third related, but different, gene has been detected in HL60 cells that have been growth-stimulated (Battini et al., 1987), and also in liver and HeLa cells (Houldsworth & Attardi, 1988). In addition, in the course of our studies we have partially characterized two spliced pseudogenes related to T2 (unpublished observations) and there can be little doubt that there are others. It would be of particular interest to know whether there are additional expressed genes for ADP/ATP translocase that lie undetected in the human genome.

(h) Evolution of genes for mitochondrial translocase ADPIATP From studies of the sequences of other metabolite transport proteins in the inner membrane of mitoahondria it is known that the ADP/ATP translocase -belongs to a wider multigene family. This embraces at least the uncoupling protein from mitochondria of brown fat, which is a proton transporter (Aquila et al., 1985, 1987), the phosphate carrier (Runswick et al., 1987), and preliminary results strongly suggest that the a-ketoglutaratelmalate carrier also belongs to this family (M. J. Runswick, F. Bisaccia, F. Palmieri & J. E. Walker, unpublished results). These studies have shown that these proteins are homologous

throughout much of their sequences, and that the uncoupling protein and the phosphate carrier have a threefold internal repeat of about 100 amino acids, as was detected first in the sequence of bovine ADP/ATP translocase (Saraste & Walker, 1982). This is also present in the human homologues discussed in this paper. Therefore, four major steps can be detected in the evolution of the ADP/ATP translocase: Step 1. Formation of a primordial carrier with three tandemly repeated domains from a single domain of about 100 amino acids by two gene duplication events. As in the present-day proteins, this elemental domain probably consisted of two transmembrane a-helical segments linked by an extramembranous loop (Saraste & Walker, 1982). Step 2. Divergence of the primordial carrier, and evolution of carrier specificity. Step 3. Expansion of the ADP/ATP translocase expressed gene family before the emergence of man and cow. Step 4. After the divergence of man and cow formation of pseudogenes, by reverse transcription of mRNA and retroposition in the human and bovine genomes . A number of comments can be made concerning this proposal. Firstly, carrier proteins belonging to the family to which the ADP/ATP translocase belongs, have so far been detected only in mitochondria. Chloroplasts have carriers with the same biochemical functions as some mitochondrial carriers, examples being the adenine nucleotide and phosphate carriers, but as yet there are no sequence data. Other carriers in bacteria and plasma membranes of eukaryotes form a separate family (Maiden et al., 1987) and are not evidently related to the mitochondrial carriers. Therefore, the origin of the primordial carrier in Step 1 is particularly the possibility cannot be obscure, although excluded that it was introduced into the protoeukaryotic cell by endosymbiosis (assuming that this explains the origin of mitochondria; see Gray & Doolittle, 1982). The positions of intron B in the contemporary translocase genes can be taken as relics of earlier gene duplication events that presumably gave rise to the three tandem repeats (Gilbert, 1978; Blake, 1978; Traut, 1988). Equally, the positions of introns A and C could be taken as evidence that the 100 amino acid repeat itself evolved by an earlier duplication of a small domain containing one transmembrane segment, but the present-day sequences do not support t’his suggestion. Secondly, at present it is not clear how many of the 13 or so mitochondrial carriers that have been detected (LaNoue & Schoolwerth, 1979) belong to the family discussed here. The ADP/ATP, phosphate and a-ketoglutaratelmalate (Bisaccia & Palmieri, 1984) carriers and the uncoupling protein from brown fat mitochondria (Aquila et al., 1985; Runswick et al., 1987) are all related in sequence. They have similar sizes and each comprises about 306 amino acids. The dicarboxylate carrier has also

Human Mitochondrial

ADPIATP

been purified and reconstituted and appears to belong to this family in so far as its molecular weight is about 28,000 (Bisaccia et al., 1988). However, two other carriers do not conform to these general properties; the aspartate/glutamate carrier has an estimated molecular weight of about 68,000 (Kramer et al., 1986) and there is evidence that the pyruvate carrier contains a component of 1975). No other about 15,000 (Halestrap, mitochondrial metabolite transport proteins have been purified to date, and assessment of the extent of this gene family awaits sequence data on other carriers. Aquila et al. (I985), in a discussion of the evolution of the carriers, have proposed a hierarchy of metabolite carriers in which the uncoupling protein belongs to a subgroup of H+-anion cotransporters. They propose that the uncoupling protein is not an elementary H+ carrier from which others derive, but rather that it is a degenerate form of an H+-anion co-transporter. If this proposal is correct, then it adds detail to Step 2 of the proposed evolutionary sequence. Thirdly, Neckelmann et al. (1987) have calculated that the human Tl and T2 genes diverged in the Pennsylvanian period (310 to 270 million years before the present) or in the Permian period (270 to 225 million years before the present). This calculation is based upon consideration of replacement and synonymous substitutions, but it should be borne in mind that incorrect sequence data were employed (see the legend to Figs 2 and 7). Divergence of man and cows is believed to have occurred at a later date, 80 million years ago (Li et al., 1985). We are grateful to Dr R. Baserga for communicating to us unpublished information concerning the human T3 gene, and to Dr T. H. Rabbitts for providing us with a sample of the human genomic library AT5. We thank Mr T. Hercus for his assistance in sequencing the T2 gene. A.L.C. was supported by an M.R.C. Research Training Fellowship. References Adrian, G. S., McCammon, M. T., Montgomery, D. L. $ Douglas, M. G. (1986). Mol. Cd Biol. 6, 626634. Aquila, H., Misra, D., Eulitz, M. & Klingenberg, M. (1982). Hoppe-Seylerk 2. Physiol. Chem. 363, 345349. Aquila, H., Link, T. A. & Klingenberg, M. (1985). EMBO J. 4, 2369-2376. Aquila, H.. Link, T. A. t Klingenberg, M. (1987). FEBS Letters, 212, l-9. Arends, H. & Sebald, W. (1984). EMBO J. 3, 377-382. Baker, A. & Leaver, C. J. (1985). NucZ. Acids Res. 13, 5857-5867. Bankier, A. T. & Barrell, B. G. (1983). In Techniques in NucZeic Acid Biochemistry (Flavell, R. A., ed.), vol. B508, pp. l-34, Elsevier, County Clare, Ireland and Kew York. Battini, R., Ferrari, S., Kaczmarek, L., Calabretta, B., Chen, S. T. & Baserga, R. (1987). J. BioZ. Chem. 262, 4355-4359. Benton, W. D. 6 Davis, R. W. (1977). Science, l%, 180-182.

Transbcase Genes

279

Biggin, M. D., Gibson, T. J. & Hong, G. F. (1983). Proc. Nat. Acad. Sci., U.S.A. 80, 3963-3965. Bird, A. P. (1986). Nature (London), 321, 209-213. Bird, A. P. (1987). Trends Genet. 3, 342-346. Bisaccia, F. & Palmieri, F. (1984). Biochim. Biophys. Acta, 766, 386-394. Bisaccia, F., Individeri, C. & Palmieri, F. (1988). Biochim. Biophys. Acta, 933, 229-240. Blake, C. C. F. (1978). Nature (London), 273, 267. Breathnach, R. & Chambon, P. (1981). Annu. Rev. Biochem. 50, 349-383. Breen, G. A. M. (1988). B&hem. Biophys. Res. Commun. 152, 264-269. Capaldi, R. A. (1988). Trends Biochem. Sci. 13, 144-148. Clark, J. B., Hayes, D. J., Byrne, E. & Morgan-Hughes, J. A. (1987). Biochem. Sot. Trans. 15, 626-627. Daniels, G. R. & Deininger, P. L. (1985). Nature (London), 317, 819-822. Deininger, P. L. (1983). Anal. B&hem. 129, 216-223. Deininger, P. L., Jolly, D. J., Rubin, C. M., Freidmann, T. & Schmid, C. W. (1981). J. Mol. BioZ. 151, 17-33. Dyer, M. R., Gay, pu’.J. & Walker, J. E. (1989). Biochem. J. in the press. Farrell, L. B. & Nagley, P. (1987). Biochem. Biophys. Res. Commun. 144, 1257-1264. Farrell, P. J., Deininger, P. L., Bankier, A. & Barrell, B. G. (1983). Proc. Nat. Ad. Sci., U.S.A. 80, 15651569. Fowlkes, D. M. & Schenk, T. (1980). CeZZ,22, 405-413. Fuhrman, S., Deininger, P. L., LaPorte. P., Freidmann, T. & Geiduschek, E. P. (1981). NucZ. Acids Res. 9, 6439-6456. Gay, N. J. & Walker, J. E. (1985). EMBO J. 4, 35193524. Gilbert, W. (1978). Nature (London), 271, 501. Gray, M. W. & Doolittle, W. F. (1982). Microbial. Rev. 46, l-42. Halestrap, A. (1975). B&hem. J. 172, 377-387. Houldsworth, J. & Attardi, G. (1988). Proc. Nat. Acad. Sci., U.S.A. 85, 377-381. Hutchison, C. A., III, Hardies, S. C., Loeb, D. D., Sheshe, W. R. & Edgell, M. H. (1989). In MobiZe DNA (Berg, D. E. & Howe, M. H., eds), A. S. M. Press, Washington DC, in the press. Kadonaga, J. T., Jones, K. A. & Tijian, R. (1986). Trends Biochem. Sci. 11, 20-23. Kao, F. T. (1985). Inter. Rev. Cytol. 96, 51-88. Klingenberg, M. (1985a). In The Enzymes of BioZogicuZ Membranes (Martonosi, A. fu’., ed.), vol. 4, pp. 51 l-553, Plenum Publishing Corporation, New Yark. Klingenberg, M. (1985b). AnnaZ N.Y. Acad. Sci. 456, 279-288. Kriimer, R., Kiirzinger, G. & Heberger, C. (1986). ,irch. Biochem. Biophys. 251, 166-174. Kuhn-Kentwig, L. t Kadenbach, B. (1985). Eur. J. B&hem. 149, 147-158. LaKoue, K. F. & Schoolwerth, A. C. (1979). Annu. Rev. Biochem. 48, 871-922.

LaNoue, K. F. & Schoolwerth,

A. C. (1984). In

Bioenergetics (Ernster, L., ed.), pp. 221-268, Elsevier Science Publishers B.V., Amsterdam. Li, W.-H., Luo, C.-C. & Wu, C.-I. (1985). In Molecular Evolutionary Genetics (MacIntyre, R. J., ed.), pp. l94, Plenum Press, New York. Lipman, D. J. & Pearson, W. R. (1985). Science, 227, 1435-1441. Lloyd, J., Brownson, C., Tweedie, S., Charlton, J. & Edwards, Y. H. (1987). Genes DeveZop. 1, 594-602. Maiden, M. C. J., Davis, E. O., Baldwin, S. A., Moore,

280

A. L. Cozens et al.

D. C. M. & Henderson, P. J. F. (1987). Nature (London), 325, 641-643. Martini, G., Toniolo, D., Vulliamy, T., Luzzatto, L., Dono, R., Viglietto, G., Paonessa, G., D’Urso, M. D. & Persico, M. G. (1986). EMBO J. 5, 1849-1855. Melton, D. W. (1987). In Oxford Surveys on Eukaryotic Genes (Maclean, N., ed.), vol. 4, pp. 34-76, Oxford University Press, Oxford, U.K. Messing, J. (1983). Methods Enzymol. 101, 20-78. Mills, D. R. & Kramer, F. R. (1979). Proc. Nat. Acad. SC;., U.S.A. 76, 2232-2235. Mizusawa, S., Nishimura, S. & Seela, F. (1986). NucE. Acids Res. 14, 1319-1324. Morgan-Hughes, J. A. (1986). Trends Neurosci. 9, 15-19. Neckelmann, N., Li, K., Wade, R. P., Shuster, R. & Wallace, D. C. (1987). Proc. Nat. Acud. Sci., U.S.A. 84, 7580-7584. Paolella, G., Lucero, M. A., Murphy, M. H. & Baralle, F. E. (1983). EMBO J. 2, 691-696. Perez-Stable, C., Ayres, T. & Shen, C. K. J. (1984). Proc. Nat. Acad. Sci., U.S.A. 81, 5291-5295. Powell, S. J., Medd, S. M., Runswick, M. J. & Walker, J. E. (1989). Biochew&try, in the press. Rasmussen, U. B. & Wohlrab, H. (1986). Biochem. Biophys. Res. Commun. 138, 8504357. Rogers, J. H. (1985). Int. Rev. CytoE. 93, 187-279. Runswick, M. J., Powell, S. J., Nyren, P. & Walker, J. E. (1987). EMBO J. 6, 1367-1373. Edited

Sanger, F., Nicklen, S. & Coulson, A. R. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 5463-5467. Sara&e, M. & Walker, J. E. (1982). FEBS Letters, 144. 250-254. Schultheiss, H. P. & Klingenberg, M. (1984). Eur. J. Biochem. 143, 599-605. Schultheiss. H. P. 8r Klingenberg, M. (1985). Arch. Biochem. Biophys. 239, 273-279. Staden, R. (1982). Nucl. Acids Res. 10, 2951-2961. Staden, R. (1985). In Genetic Engineering: Principles and Methods (Setlow, J. K. & Hollaender, A., eds), vol. 7 pp. 67-114, Plenum Publishing Corporation, New York and London. Taylor. A. M. R., Oxford, J. M. & Metcalfe, J. A. (1981). Int. J. Cancer, 27, 311-319. Traut, T. W. (1988). Proc. Nat. Acad. Sci., U.S.A. 85, 2944-2948. Walker, J. E., Cozens, A. L., Dyer, M. R., Fearnley, I. M., Powell, S. J. & Runswick. M. J. (1987). Chemica Scripta, 27B, 97-105. Walker, J. E., Powell, 6. J., Viiias, 0. & Runswick. M. ,J. (1989). Biochemistry, in t,he press. Wallace, Il. C. (1986). Hospital Practice, 77-92. Weiner, A. M., Deininger, P. L. & Efstratiadis, A. (1986). Annu. Rev. Biochem. 55, 631-661. Wilbur, W. J. & Lipman, D. J. (1983). Proc. Xat. Acad. Sci., U.S.A. 80, 726-730.

by P. Chambon