The complete cDNA sequence for dihydrolipoyl transacylase (E2) of human branched-chain α-keto acid dehydrogenase complex

The complete cDNA sequence for dihydrolipoyl transacylase (E2) of human branched-chain α-keto acid dehydrogenase complex

Biochimica et Biophysica Acta, 1132 (1992) 319-321 © 1992 Elsevier Science Publishers B.V. All rights reserved 0167-4781/92/$05.00 BBAEXP 90406 319 ...

279KB Sizes 0 Downloads 11 Views

Biochimica et Biophysica Acta, 1132 (1992) 319-321 © 1992 Elsevier Science Publishers B.V. All rights reserved 0167-4781/92/$05.00

BBAEXP 90406

319

Short Sequence-Paper

The complete cDNA sequence for dihydrolipoyl transacylase (E2) of human branched-chain a-keto acid dehydrogenase complex Kim S. Lau a, Jacinta L. Chuang a, W. Joseph Herring c, Dean J. Danner c, Rody P. Cox b and David T. Chuang a Departments of ~ Biochemistry and h Internal Medicine, Unit:ersity of Texas Southwestern Medical Center, Dallas, TX (USA) and c Dicision of Medical Genetics, Department of Pediatrics, Emory Unicersity School of Medicine, Atlanta, GA (USA) (Received 14 July 1992)

Key words: Branched-chain a-keto acid dehydrogenase complex; Dihydrolipoyl transacylase; Alu repetitive sequence; (Human)

We have determined the complete nucleotide sequence for the cDNA encoding human dihydrolipoyl transacylase (E2) using the rapid amplification of cDNA ends (RACE) procedure. The full-length E2 cDNA is 3535 nucleotides in length. The coding region spans 1446 bp and the 3'-noncoding region spans 2074 bp. The latter contains three Alu repetitive sequences and two transcription termination sites.

Mammalian branched-chain a-keto acid dehydrogenase (BCKAD) complex catalyzes the oxidative decarboxylation of the branched-chain a-keto acids derived from leucine, isoleucine and valine. The BCKAD complex is organized around a cubic core of 24 dihydrolipoyl transacylase (E2) subunits, to which a decarboxylase (El), a dehydrogenase (E3), a specific kinase and a specific phosphatase are attached through noncovalent interactions [1]. We and others have previously isolated and characterized human E2 cDNA clones encoding the E2 pre-polypeptide [2-4]. However, the complete cDNA sequence for the human E2 mRNA has not been determined. This information is essential for studying mutations in the E2 subunit that produce maple syrup urine disease [5]. The availability of the complete human E2 cDNA sequence may also contribute to understanding the pathogenesis of primary biliary cirrhosis, where the human E2 protein is a major target for autoantibodies [6]. In the present study, we have extended the published human E2 sequences [3,4] by additional cDNA cloning. The 5'and 3'-terminal sequences of the human E2 cDNA

Correspondence to: D.T. Chuang, Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75235-9038, USA. The sequence data reported in this paper have been submitted to the EMBL Data Bank (Heidelberg) under the accession number X66785. Abbreviations: E2, dihydrolipoyl transacylase; BCKAD, branchedchain a-keto acid dehydrogenase; RACE, rapid amplification of cDNA ends; TS, termination site.

were determined using the rapid amplification of cDNA ends (RACE) method [7] with human kidney poly(A) + RNA as template. For the RACE protocol, an internal antisense primer: 5 '-CCCACATAGGCAATATCGTCTAGAT3' (bases 385 to 409) was used for reverse transcription of human kidney poly(A) + RNA. Addition of dA nucleotides to the newly synthesized DNA strand was accomplished using terminal transferase. The first round of PCR amplification was performed with oligo(dT)14 as a sense primer and the above internal oligonucleotide (bases 385 to 409) as an antisense primer. For the second round of amplification, the adaptor-(dT)17 primer: 5'-GACTCGAGTCGACATCGA(T)I7-3' (sense) and the internal primer: 5'CGAACACAAATCAGCTTC-3' for bases 59 to 76 (antisense) were utilized. For the third round of amplification, the primers used were: the adaptor primer without oligo(dT)17 (sense) and the primer for bases 59 to 76: 5'-CGAACACAAATCAGCTTC-3' (antisense). The amplified products were subcloned into the pCR vector using the TA Cloning SystemTM (lnvitrogen, San Diego, CA) and screened with an internal oligonucleotide probe: 5'-CGCTGCAGTCCGTATGCTGA-3' (bases 17 to 36). Positive clones were sequenced by the dideoxy chain termination method. To determine the 3'-terminal sequence, the adaptor (dT) primer (5'-GACTCGAGTCGACATCGA(T)17-3') was used for reverse transcription of human kidney poly(A) + RNA to generate the first cDNA strand. The

ATTTCCGGGGTAAGATG GCT GCA GTC CGT ATG CTG AGA ACC TGG AGC AGG AAT GCG GGG AAG CTG ATT TGT GTT CGC TAT TTT CAA ACA TGT GGT AAT GTT CAT GTT TTG AAG CCA AAT TAT GTG TGT TTC TT7 -61 Met AIO Alo VOl Arg Met Leu Arg Thr Trp Ser Arg Asn Alo G1y Lys Leu I1e Cys Vol Arg Tyr Phe Gln Thr Cys GIy Asn Vol H1S Vol Leu Lys Pro Asn Tyr Vol Cys Phe Phe

G~TATTGCACTC~AGCCT~GG~GA~A~AG~AAGACGC~ATCT~AAAAAcAm~AAAAAAACAAAATT~AT~TTA~TAAAAGACAGGTAG~CATATACA~ACAGTATATGCC~TATTTTTTTTAACTGA~T~TTAATGAAACTTTAATTTTA~TTAATTAAGAAATG~AATTTATATACA

~

TS

II

Fig. 1. The complete human E2 cDNA sequence. The entire nucleotide sequence is 3Y35 bp in length. The coding region (bases 15 to base 1461) specifies 482 amino acids. The region between residues ( - 6 1 ) and ( - 1 ) is the mitochondrial targeting sequence. The amino-terminal residue is Gly-1 (underlined), as determined by peptide sequencing of natural human E2 [2]. The Y-noncoding region (2074 bp) contains three Alu repeats (Alu-1, Alu-2 and A/u-3). Two transcription termination sites (TSI and TSII) are present. TSI is preceded by two consensus polyadenylation signals (underlined).

3400 TGCACCTATAGTCCCAGCTACTCAGGAAGCTGAGAC AAGAGGATCAATTGAGC c~AGGAGTTcAAAG~TGTAGTGAG~TGTCATTGTG~AcTATC~T~AGTATGGGTGA~AGAGTGAGA~TGGT~TcTAAAAAT

AIu-3 . . . . . . . . . . . . . . 3223 CAGGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATTGCTTGAGC CCAAGAGGTGGAGGTTACAGTGAGCAGAGATCACACCACTGCACTCCAGCCTGGGTGGCAGAGCAACACTTCGTCTCAGAAAAAAAAAAAAAAACCAAAAACCAAAAAGCCAAGTGTGGTG GTG

3046 TCAAAAAACATTATTTTTAAAATTAGCCTGGTGGCTCACGC~TGTAATCCCAGCA~TTTGGGAGGCCGAGGTGGCCAGAT~A~AGGTCAGGAGTTcGAGACCACC~TGGCCAACATGGTGAAACCCCATCTCTACAGTTTT~TAAAAATA~AAAAATTAC~TGGGC~TGGTGcA

P~yAsi~n~ PolyAs%~I TS I Alu-2 . . . . . . . . . . . . . . Z869 TTTTAATATCTACTTGGTT~TTCTCAAAATGGAAATAATTTTAAAATCAGGAAA~AATAAATCAGCCAGGTGTGATGA~TTGTAACTGTAATCCCAGTTATAGGGGAGGCTGAA~CAGGAGGATCACTTGAGG~CAGGAGTTTGA~AC~AGC~T~GGCAACATAGTGAGATCC~ATC

2692 GGTTT~ATAATGA~CATTTTGTTAAAGG~CTA~TTTAATCAGAAATAG~AA~GAGAT~AATGTAT~AA~ATTT~AATTTGCATTCGGAAATCCATGTTGTTTCTAATAT~G~C~AGTTGAAAACTGTATGCCAAAATTAGTTGT~TAAGTGAAGTTTTGTGA~AGAAAAAAGGTTG

2515 AAAAA~TTGAGAAGT~TA~TCCTTTTAACTTTTTTGGTCTT~AG~TAAAAAATAGGATAAGAAATTAAGGT~ATTCCATTCT~CATATCCTGGGTAAGAATGTAAATAAGAGGAGAAGGAAGAGTCTAATAGTAATTATGGATATAAAAAATAAGAAATTTTGTATAGAAATGAA

2338 TTGTCCAATTATAGTTTTATAACCAGTCTAT~AAAGGTGTTTGTTTAAAATGGATATAGTTTTAGATTTGTGGTAA~GCTTTGGTATTTT~TT~G~GAAGA~CTTCAC~TTTG~AAACTTCCCTCATGTAAGGAAGGTACTTTAAATGTAG~AG~ACTGACATTT~TTTTTTTAAA

2161 AAATATTTTC~ATTTC~GTTATTATG~TAATTGTTGTATGAAATAAGTGCAATTATACTTCTCTTTTGAGATAT~CAAGAGTATATT~TTG~T~TGTATAGAGAATAT~ATCTGATAGTGTCTTATTTATATTAATTAATGTCTTTGAAAAGGGAAAAGTATAAA~TGGC~TTAAAA

1984

Alu.l . . . . . . . . . . . . . . 1807 GGGG~GATCA~CTGAGGTTAGAAGTTTGAGAC~AG~TGGCCAACATGGTGAAA~CTGGCCTCTA~TAAAAAATACAAAATTGA~GGGTGTGGT6GfGGGTACCG~TA~TTGGGAGG~TGAGGCAGGAGAATCGCTTGAACCTGGGAGGTGGAGGTTG~AGTGAGCTGAGATCGT

1630 AAAATGAATAATGGTGTAATGGTT~T~TGGGGCTGTCA~ATTTTATAGGT~AGAGTGTGACTTCTTAATATGGTGCTGATGTTTTTGTGT~AATGG~TTGAAACTGGCAAGATTAACAAAATTAGGCCGGGCATGGTGGCTCACGCCTGTAATCCAGCACTT~GGGAGG~C~AGGT

1456 CTG AAA TGA AGAcTGATAAGACATTCTTGAACTTTTTGAGCTTCCAAAGAGTATGTAAACccTAGCTGTGCCAGCACATGTTCATCTTTACAATTTATATTGTAAACGATTTGTATCGTATGATTAAGGAT~TAAGGCACAATATTTGTCACTGTTCTATTAGACTTTTTACTG 420 Leu Lys eee

1324 GAA GTA TAT AAG GCA CAG ATA ATG AAT GTG AGC TGG TCA GCT GAT CAC AGA GTT ATT GAT GGT GCT ACA ATG TCA CGC TTC TCC AAT TTG TGG AAA TCC TAT TTA GAA AAC CCA GCT TTT ATG CTA CTA GAT 376 GIU VOl Tyr Lys Alo Gin Ire Met Asn Vol Ser Trp Ser Alo Asp His Arg Vol lle Asp Gty Alo Thr Met Ser Arg Phe Ser Asn Leu Trp Lys Ser Tyr Leu Glu Asn Pro Alo Phe Met Leu Leu Asp

1192 GGA GGA ACA TTT ACT CTT TCC AAC ATT GGA TCA ATT GGT GGT ACC TTT GCC AAA CCA GTG ATA ATG CCA CCT GAA GTA GCC ATT GGG GCC CTT GGA TCA ATT AAG GCC ATT CCC CGA TTT AAC CAG AAA GGA 332 G1y Gly Thr Phe Thr Leu Ser ASh I l e Gly Set I l e Gly G~y Thr Phe Alo Lys Pro Vot I r e Met Pro Pro Glu Vol Alo I1e Gly AIO Leu Gly Ser I l e Lys AlO I r e Pro A r g Phe Ash Gln Lys Gly

1060 GAT ACT GAG CAG GGT TTG ATT GTC CCT AAT GTG AAA AAT GTT CAG ATC TGC TCT ATA TTT GAC ATC GCC ACT GAA CTG AAC CGC CTC CAG AAA TTG GGC TCT GTG GGT CAG CTC AGC ACC ACT GAT CTT ACA 288 Asp Thr Glu Gln Gly Leu I r e Vol Pro ASh Vot Lys Asn Vol Gin I l e Cys Ser I1e Phe Asp I l e Alo Thr Glu Leu Asn Arg Leu Gln Lys Leu Gly Ser Vol Gly Gin Leu Ser Thr Thr Asp Leu Thr

928 CTC TCC TTT ATG CCT TTC TTC TTA AAG GCT GCT TCC TTG GGA TTA CTA CAG TTT CCT ATC CTT AAC GCT TCT GTG GAT GAA AAC TGC CAG AAT ATA ACA TAT AAG GCT TCT CAT AAC ATT GGG ATA GCA ATG 244 Leu Set Phe Met Pro Phe Phe Leu Lys Alo Alo Ser Leu Gly Leu Leu Gln Phe Pro I l e Leu Asn Alo Ser vol Asp Glu Asn Cys Gln ASh I l e Thr Tyr Lys AIo Ser His ASh I l e Gly I l e Alo Met

796 AAA GCA ATG GTC AAG ACT ATG TCT GCA GCC CTG AAG ATA CCT CAT TTT GGT TAT TGT GAT GAG ATT GA( CTT ACT GAA CTG GTT AAG CTC CGA GAA GAA TTA AAA CCC ATT GCA TTT GCT CGT GGA ATT AAA 200 Ly$ AI~ Met Vo] Lys Thr Met Set A1o Alo Leu Lys I l e Pro His Phe Gty Tyr Cys Asp G1u I l e Asp Leu Thr Glu Leu Vol Lys Leu Ar9 Gtu Glu Leu Lys Pro l i e AIO Phe AIo Arg Gly I l e t y s

664 TTG CCT CCT TCA CCC AAA GTT GAA ATT ATG CCA CCT CCA CCA AAG CCA AAA GAC ATG ACT GTT CCT ATA CTA GTA TCA AAA CCT CCG GTA TTC ACA GGC AAA GAC AAA ACA GAA CCC ATA AAA GGC TTT CAA 156 Leu Pro Pro Ser Pro Lys Vol Glu I l e Met Pro Pro Pro Pro Lys Pro Lys Asp Met Thr Vol Pro I l e Leu Vol Ser Lys Pro Pro Vol Phe Thr Gly Lys Asp Lys Thr GIu Pro Ile Lys Gly Phe Gln

532 GCA ACT CCT GCA GTT CGC CGT CTG GCA ATG GAA AAC AAT ATT AAG CTG AGT GAA GTT GTT GGC TCA GGA AAA GAT GGC AGA ATA CTT AAA GAA GAT ATC CTC AAC TAT TTG GAA AAG CAG ACA GGA GCT ATA 112 AIo Thr Pro AIo Vol Arg Arg Leu Alo Met Glu Asn ASh I l e Lys Leu Set G1u Vol Vol Gty Ser Gly Lys Asp Gly Arg I r e Leu Lys Glu Asp I l e Leu Asn Tyr Leu Glu Lys Gin Thr Gly Alo Ile

400 GCC TAT GTG GGG AAG CCA TTA GTA GA( ATA GAA ACG GAA GCT TTA AAA GAT TCA GAA GAA GAT GTT GTT GAA ACT CCT GCA GTG TCT CAT GAT GAA CAT ACA CAC CAA GAG ATA AAG GGC CGA AAA ACA CTG 68 AIo Tyr Vol Gly Lys Pro Leu Val Asp I l e Glu Thr Glu AIa Leu Lys Asp Set Glu Glu Asp Vol Vol Glu Thr Pro AIo Val Ser His Asp Glu His Thr HlS Gln Glu Ile Lys Gly Arg Lys Thr Leu

268 TAT GTA AAA GAA GGA GAT ACA GTG TCT CAG TTT GAT AGC ATC TGT GAA GTT CAA AGT GAT AAA GCT TCT GTT ACC ATC ACT AGT CGT TAT GAT GGA GTC ATT A A A A A A C T C T A T T A T A A T C T A GAC G A T A T T 24 Tyr Vol Lys Glu Gly Asp Thr Vol Set Gln Phe Asp Set I l e Cys Glu Vol Gin Set Asp Lys Ato Ser Vot Thr I l e Thr Ser Arg Tyr Asp Gty Vol I l e tys Lys Leu Tyr Tyr ASh Leu Asp Asp lie

136 GGT TAT CC~ TCA TT( AAG TAT AGT CAT CCA CAT CAC TTC CTG AAA ACA ACT GCT GCT CTC CGT GGA CAG GTT GTT CAG TTC AAG CTC TCA GAC ATT GGA GAA GGG ATT AGA GAA GTA ACT GTT AAA GAA TGG -21Gly Tyr Pro Ser Phe Lys Tyr Ser His Pro His His Phe Leu Lys Thr Thr AIo Ato Leu Arg G1y Gin Vol Vol Gln Phe Lys Leu Ser Asp I l e Gly Glu Gly Ile Arg Glu Vol Thr Vol Lys Glu Trp

1

t~ tO

321 first round of PCR amplification was carried out with the adaptor (dT) primer as an anti-sense primer and primer for bases 1711-1734 5'-TTTITGTGTCAATGGCTI'GA-Y as a sense primer. The second round of PCR amplification was performed using the adaptor primer 5'-GACTCGAGTCGACATCGA-3' (antisense) and the internal primer: 5'-GAAGACCTTCACCTI'TGCAA-3' corresponding to bases 2433-2452 (sense). Products were subeloned into the pCR vector, and sequenced. The complete nucleotide sequence for the human E2 cDNA contains 3535 bases (Fig. 1). RACE products from the 5' end of the E2 cDNA have enabled us to identify the ATG initiation codon at position 15-17. At positions - 3 and +4 consensus A and G bases, respectively, are found which are in accordance with the optimal sequence for initiation of translation [8]. Human E2 cDNA thus encodes a pre-polypeptide of 482 amino acids including a 61-residue leader peptide. This is the same length of leader peptide as previously reported for the bovine E2 pre-polypeptide [2]. The Y-untranslated region (2074 bp) is unusually long compared to the coding region (1446 bp). Sequence analysis of the Y-RACE products indicates the presence of multiple transcription termination sites. This produces E2 mRNA of different sizes ranging from 2828 to 3535 bp. From a total of eight RACE products cloned and sequenced, one terminated at position 2828, one at position 2879, two were found to terminate at position 2941, two at position 3176 and two at 3535. Two polyadenylation signals of the type AATAAA are present upstream of the termination site at position 2941. The latter is designated transcription termination site I (TSI). The terminal nucleotide (position 3535) of the longest mRNA is assigned transcription termination site II (TS II). The previously reported sizes of human liver E2 mRNA of 3.5 kb and 2.5 kb [3] agree well with the size of the longest 3535 bp mRNA and the 2941 bp mRNA species, respectively, deduced from the present study. The lack of polyadenylation signal(s) immediately upstream of position 3535 may account for the apparent array of termination observed with the various length transcripts. Of interest is the presence within the Y-noncoding region of the human E2 mRNA of three Alu repetitive sequences (Fig. 1 underlined) ranging in size from 202 to 245 bases. Alu repetitive repeats are found in Y-untranslated regions of other mammalian transcripts, for example, the human low density lipoprotein receptor mRNA with three ,41u repetitive sequences [9]. Two distinct families of Alu sequences are reported: the Alu-J and the Alu-S families [10-12]. Analysis of the Alu repeats in human E2 cDNA indicate that Alu-1

A. Families of Alu-J and Alu-S Sequences 57

63

65

70

71

AIu J AIu-S

A C

G A

T C

C G

C T

Alu-1 AIu-2 AIu 3

G C

A . A

C .

G .

C

G C

T .

G

94

C .

T

101

106

163

194

204

208

220

233

275

A G

G A

G A

G A

G A

A 6

C T

T A

( T

G

A A

C A G

A G A

A G A

6 A A

T C T

A

G

A

G C T

. C

B. Subfamilies of Alu-S (Sa,Sb and Sc) Sequences 65

66

78

88

95

Alu-Sa Alu-Sb Alu-Sc

C

T

T A A

C, T T

C T C

Alu-1 Alu-3

C C

T T

T T

6 C

C C

100

153

163

197

200

219

T C C G T T/G

A G G

C G C

T G T

G C G

T T

C G

C T

T T

6 G

G C

Fig. 2. Classification of the three Alu sequences in human E2 cDNA. Panel A shows Alu-J and Alu-S families of Alu sequences according to Jurka and Smith [8] and Raisonnier [10] and the three repetitive sequences Alu-I to -3 present in the human E2 cDNA. The numbers refer to diagnostic positions in Alu-J and Alu-S. The Alu-2 sequence of human E2 cDNA belongs to the Alu-J family. Panel B shows the subfamilies of AIu-S (Sa, Sb and Sc) and Alu-1 and Alu-3 of human E2 cDNA. The latter two sequences are assigned to the Alu-Sa subfamily.

and Alu-3 belong to the Alu-S family, and the Alu-2 sequence conforms to the sequence for Alu-J (Fig. 2 and Refs. 10 and 11). The significance of these repeats in the Y-untranslated region of this cDNA remains to be elucidated. This work was supported by Grants DK26758, DK38320 and DK37373 from the National Institutes of Health, Grants 90G-093 and 92R-093 from American Heart Association, Texas Affiliate, and Grant 1-1149 from the March of Dimes Birth Defects Foundation. References 1 Pettil, F.H., Yeaman, S.J. and Reed, L.J. (1978) Proc. Natl. Acad. Sci. USA 75, 4881-4885. 2 Lau, K.S., Griffin, T.A., Hu, C.-W.C. and Chuang, D.T. (1988) Biochemistry 27, 1972-1981. 3 Danner, D.J., Litwer, S., Herring, W.J. and Pruckler, J. (1989) J. Biol. Chem. 264, 7742-7746. 4 Nobukuni, Y., Mitsubuchi, H., Endo, F. and Matsuda, 1. (1989) Biochem. Biophys. Res. Commun. 161, 1035-1041. 5 Danner, D.J., Armstrong, N., Heffelfinger, S.C., Sewell, E.T., Priest, J.H. and Elsas, L.J. (1985) J. Clin. Invest. 75, 858-860. 6 Gershwin, M.E. and Mackay, I.R. (1991) Gastroenterology 100, 822-833. 7 Frohman, M.A., Dush, M.K. and Martin, G.R. (1988) Proc. Natl. Acad. Sci. USA 85, 8993-9002. 8 Kozak, M. (1987) J. Mol. Biol. 197, 947-950. 9 Lehrman, M.A., Schneider, W.J., Sudhof, T.C., Brown, M.S., Goldstein, J.L. and Russell, D.W. (1985) Science 227, 140-146. 10 Jurka, J. and Smith, T. (1988) Proc. Natl. Acad. Sci. USA 85, 4775-4778. 11 Raisonnier, A. (1991)J. Mol. Evol. 32, 211-219. 12 Britten, R.J., Baron, W.F., Stout, D.B. and Davidson, E.H. (1988) Proc. Natl. Acad. Sci. USA 85, 4770-4774.