Biochimica et Biophysica Acta, 1131 (1992) 83-90 © 1992 Elsevier Science Publishers B.V. All rights reserved 0167-4781/92/$05.00
83
BBAEXP 92376
Isolation and characterisation of the mouse pyruvate dehydrogenase E l a genes James Fitzgerald, Wendy M. Hutchison and Hans-Henrik M. Dahl The Murdoch Institute for Research into Birth Defects, Royal Children's Hospital, Melbourne (Australia) (Received 16 September 1991) (Revised manuscript 15 January 1992)
Key words: Pyruvate dehydrogenase gene; Nucleic acid sequence; (Mouse)
We have characterized two mouse genes that code for the E l a subunit of pyruvate dehydrogenase (PDH), Pdha-1 and Pdha-2. The coding regions show a high degree of homology with each other and with the human PDH genes, PDHA1 and PDHA2. Conserved regions include mitochondrial import sequences, phosphorylation sites and a putative TPP binding site. The PDH genes have an analogous chromosomal arrangement to PGK genes in that two isoforms code for a functionally and structurally similar product. Pdha-1 codes for a somatic isoform and maps to the X-chromosome. Pdha-2 is located on an autosome, is intronless and only expressed in spermatogenic cells. Comparison of human and mouse PDH and PGK gene sequences shows that the somatic sequences are more conserved relative to the testis-specific isoforms, and that the mouse PDH E l a genes have experienced a faster rate of DNA change compared to their human counterparts.
Introduction The pyruvate dehydrogenase complex ( P D H ) is central in aerobic energy metabolism [1]. The seven subunits of this nuclear encoded enzyme complex are transported into the mitochondrion where they catalyse the conversion of pyruvate to acetyl CoA via an oxidative decarboxylation step for entry into the citric acid cycle. The E1 enzyme is a heterotetramer of two a and two /3 subunits designated P D H E l a and P D H El/3 [2]. The main mechanism of regulation of P D H activity is the phosphorylation/dephosphorylation of three serine residues in the E l a subunit [3]. Most patients with P D H deficiency appear to have defects in the E l a subunit [4,5]. However, the clinical presentation of patients with P D H E l a deficiency is highly variable,
The Pdha-2 locus has been designated Pdhal by the Mouse Nomenclature Committee. DNA sequence data from this article have been deposited with the GenBank Data Libraries (Accession No. M76727 (Pdha-1) and M76728 (Pdha-2)). The DNA sequence from the promoter region of the human testis-specific PDH E l a gene has also been deposited with the GenBank Data Libraries under the accession number M86808. Correspondence: H.-H.M. Dahl, The Murdoch Institute, Royal Children's Hospital, Flemington Rd., Parkville, Melbourne, Victoria, Australia 3052.
ranging from lethal lactic acidosis in the neonatal period to mild lactic acidemia with major brain abnormalities [6]. In situ hybridization of a human P D H E l a c D N A clone to human metaphase chromosomes revealed the presence of two loci [7]. P D H A 1 was m a p p e d to the X-chromosome (Xp22.1-22.2) and was found to be expressed in somatic tissues [7]. This gene contains 10 introns and spans approx. 17 kb [8]. The second locus, PDHA2, was localised to chromosome 4 (4q22-4q23) with expression limited to the testis [9]. This autosomal, testis-specific isoform completely lacks introns and possesses characteristics of a functional processed gene. Chromosome mapping in mouse has also revealed the presence of two E l a loci, one maps to the X-chromosome and the other to chromosome 19 [10]. The X-chromosome location of the somatic P D H E l a gene in humans, and the dependency of the brain on aerobic glucose oxidation are important factors when explaining the highly variable clinical presentation of patients with P D H deficiency. Further understanding of the clinical and biological effects of P D H E l a deficiency is h a m p e r e d by the lack of a suitable animal model and problems with obtaining early human fetal material. In addition, studies on P D H E l a gene regulation and expression in spermatogenic cells require the use of animal tissues. We have therefore initiated a study of P D H E l a expression in the mouse
84 TABLE I
Sequences of oligonucleotide primers" Primer
Sequence ( 5 ' - 3 ' )
Agtl0-1 Agtl0-2 Agtll-A Agtl 1-B mouse testis mouse testis mouse testis mouse testis mouse testis mouse testis
AGCAAGTTCAGCCTGGTTAA CTTATGAGTATTTCTTCCAG TCCTGGAGCCCGTCAGTATG ACTGGTAATGGTAGCGACCG TGCACATAAGTGGCTCAAGT GTTCTTGCCGTACATGTGCA TATAAAGATAGATTCTGAG GAGAGCTGGCTTI'TGGACCA GACATAGCCTACTGTGCAGA GCCGGCCCTTCCTCCTTCT
D E F H J L
and report here the isolation and characterisation of clones coding for the two mouse PDH E l a isoforms. Materials and Methods
All DNA manipulations were carried out according to Sambrook et al. [11] unless otherwise stated. Oligonucleotide primer sequences are listed in Table I.
DNA probes Probe P D H A I : lc is a 1464 bp cDNA probe from the human X-chromosome located PDHA1 locus. It contains 78 bp of the 5' untranslated region, the complete coding and 3' untranslated regions and part of the poly(A) tail [9]. ml PDH 10:1 is a 77 bp liver-specific EcoRV-SacI restriction fragment from the mouse liver eDNA clone ml E l a 10 (nucleotides 886-962 in Fig. 1). mt PDH 1.2a is a 214 bp testis-specific PCR fragment (using mouse testis primers J and L) that codes for a sequence in the 5' untranslated region (nucleotides - 4 3 4 to - 2 2 1 in Fig. 2). Probes were labelled with a Random Primed DNA Labelling Kit (Boehringer-Mannheim). All manipulations of DNA sequence data were performed using the MELBDBSYS, Walter and Eliza Hall Institute, Melbourne, Australia. Isolation of mouse somatic PDH Ela cDNA (Pdha-1) Mouse liver E l a ZAP clone 2.1 was isolated from a mouse liver cDNA library (Stratagene, catalogue No. 935302) in 3.ZAP. Approx. 130000 plaques (on Escherichia coli LE 392) were screened using the human PDHA1 : lc cDNA probe [9] and the filters washed to a stringency of 0.5 × SSC (at 65°C). A 3 kb cDNA clone was isolated and was subsequently found to contain the entire Pdha-1 coding region plus 100 bp of the 5' untranslated and 1.6 kb of 3' untranslated regions. The fragment was subcloned into the EcoRI site of M13mpl9 and sequenced using a Promega T7 sequenc-
ing kit. A partial clone (ml E l a 10) was also isolated from a Agtl0 library (Clonetech, ML 1017a), amplified in a PCR reaction using 3.gtl0 primers, subcloned into M13mpl0 and sequenced.
Isolation of mouse testis-specific PDH Ela cDNA (Pdha-2) To detect the autosomal E l a isoform, an adult mouse testis Agtl0 cDNA library (Clonetech, ML1003a) was plated on E. coli LE 392 cells. Approx. 240000 plaques were probed with PDHA1 : lc and washed to a stringency of 0.5 × SSC (at 65°C). Two positive clones were detected and the inserts were isolated by using hgtl0 primers to PCR amplify the clones directly from the 3. supernatants. The PCR fragments were then gel purified, digested with EcoRI, ligated into M13mpl9 and sequenced. Both clones were found to be identical, containing 650 bp of coding region and 350 bp of 3' untranslated region. They were named mt E l a 2.4. A 5' cDNA clone (rot E l a 3) was isolated by PCR amplification from a 3.gtll adult mouse testis cDNA library (kindly donated by Dr. E.M. Eddy) by amplifying with Agtll primer A and mouse testis primer E. Upon sequencing with mouse testis primer H the clone was found to contain 478 bp of coding region and 65 bp of the 5' noncoding region. A 3. EMBL genomic library plated on E. coli NM538 cells was screened with a testis cDNA clone, mt E l a 2.4. Of 500000 plaques screened two positive clones were detected. The autosomal E l a gene was contained on a 3.2 kb XbaI fragment. This XbaI fragment was subcloned into M13mpl9 and named mt E l a 2.3. It was found to contain the complete coding region and flanking 3' and 5' untranslated regions.
Isolation and characterisation of Pdha-2 cDNA clones with poly(A) tails Mouse testis cDNA clone 2.4 and 3 did not contain poly(A) tails. In order to define the 5' ends of PDH E l a mRNAs we characterized the 5' ends of additional Pdha-2 cDNA clones. Screening of the adult mouse testis h g t l l cDNA library (Dr. E.M. Eddy) with PDHA1 : lc led to the isolation of an additional clone with a poly(A) tail. Asymmetric PCR was performed directly on this A cDNA clone using primers h g t l l - A and h g t l l - B (in excess) according to Saiki et al. [12] The PCR product was then sequenced with mouse testis primer D following the Sequenase protocol for single strand templates (USB). Another cDNA containing a longer 5' untranslated region was identified using a Rapid Amplification of cDNA ends (RACE) protocol [13]. The PCR product was sequenced according to Bachmann et al. [14] with mouse testis primer F.
85 ,4qG $;GA ATT
TCA GGC C A A
AAG GGG C A G
2CA G G A G G A
GTG CCT
GGG
TGC CGC
ATG CTT GCC met leu ala
GCT G T A ala val
TCC CGC set arg
CTG GTT OCT leu val ala
TCC CGT set arg
AAT TTT G C A ash phe ala
CAT CGG C T A G A A his arg leu glu
A G A TGT AAC
TAT GTG T A G A A T
TGG GAT
GGC CGC CGC G T G
AGT C T G C T G C G C
TCC
GTG TTG G C A G G C val leu ala g l y
C A A CTG
60
ATG AGG AAG ~t arg lys
120/3
TCT GCG C A G A A G CCG C-CA AGC C G A G T G set ala gln lys pro alaa ser arg val
188/23
AAT GAT GCT A C A TTT G A G ATT A A G A A A TGT G A C CTT asn asp ala thr phe glu ile lys lys cys asp leu
240/43
300/63
'IAC TAC AGG ATG A T G C A G ACT G T G C G C CGG ATG G A G C T A A A G C.CG GAT C A G C T G TAT AAG ryr tyr arg met met gln thr val arg arg ~ t glu leu lys alaa a s p gln leu tyr lys
368/83
CAG QID
A A A ATC ATT CGT GGT ~TC lys ile ile arg gly phe
C T G GAG GCT !eu qlu ala TTC phe
,5GT TGT ql¥ cys AAC asn
GGC qly
ACT C G G thr arg GCT ala
ATA AAC CCT ile ash pro
420/103
ACG GAC CAC CTC ATC ACT thr asp his leu ile thr
C~C TAT C G A G C A ala tyr arq ala
CAT GGC TTC his gly phe
480/123
ACA G G A C G A A G A G G A thr gly arg arg g l y
540/143
TCA A T G CAC ATG TAC GCC A A G AAC TTC TAT G G A G G C ser ~ t his ~ t tyr &la lys asn phe tyr g l y gly
800/163
CCC C T G G G A G C A G G A pro leu g l y ala g l y
TC-C AAG TAC Cys lya tyr
660/183
GGG gly
CGA GCA arg ala
ATT CTT G C A G A G C T A ile leu ala glu leu
ATT GCC C T G GCC ile ala leu ala
TTG A C A leu thr
TTA TAC GGC GAT GGT GCT GCT A A T CAG C~T C A G leu tyr gly asp gly ala ala ash gln g l y gln
720/203
GCA GCA ala ala
CTG TGG A A A T T A CCT TGC A T T TTT ATC TGT leu trp lys leu pro cys ile phe ile Cys
GAG glu
780/223
TCT GTT G A G A G A G C A G C A G C C AGC ACG GAC TAC TAC ser val glu arg ala ala alaa set thr asp tyr tyr
840/243
CTC A G G G T A GAT G G A A T G leu arg val asp g l y met
900/263
GAA OCT glu ala
TAC AAT A T G tyr asn met
AAC
CGC arg
GGC ATG qly met
TAT tyr
TGC GTG GGC cys val gly
TGT cys
ATC TTT lie pne AAC ash
CAG G A A GCC TGC gln glu ala cys
G G A OCT C A G G T G gly ala gln val
AAT GGA A A A G A T G A G G T C a:~n gly lys asp glu val
~sn
TGT CAC TTG TGT GAT GGT cya his leu cys asp g l y
GGC CTG CCT G T G gly leu pro val
AAA GGG A A A G G C lys gly lys g l y
GGC ATC GTT gly ile val
AGA GAG GAT G G G arg qlu a s p g l y
Results
CTC AAG leu lys
ACC thz
GAG GGC C C C C C A GTC ACC A C A GTG CTC ACC glu g l y pro pro val thr thr val leu thr
GGG A C G gly thr
AAA AGA G G A GAT TTT l>'s alg gly asp phe
ATT CCT G G A ile pro g l y
JAG GCA A C A ~l,J ela thr
AAG TTT lys phe
GCG OCT ala ala
GCC TAT ala tyr
[TT( CA[~ ACF ~t~: gin thL
TAC CGC ~yr arg
TAC CAT tyr his
GGA CAC AGC ATG AGT GAC CCT G G A gly his set met set asp pro g l y
?SA GAA G A A ~tq glu gLu
ATC CAG G A A ile gin glu
AT[; GTG tr,,t val
AA:7 AGC ann sex
AAT ash
GTA A G A AGT A A G AGT GAC CCT ATT ATG CTT CTC AAG GAT AGA val arg set lys set asp pro ile met leu leu lys asp arg
!080/323
CTT GCA AGT GTT G A A leo ala Set val glu
~TA GGC i eu q l y
ATC TAC ile tyr
GAA glu
TTA AAG GAG leu lys glu
GTA AGC val mar
ATT GAT G T G ile asp val
CAG TTT GCC ACG GCT GAT C C T GAG CCC gln phe ala thr ala asp pro glu pro
AGC AGT GAT CCT CCC ser set asp pro pro
AAG lys
FCC
AGG
AAC
TCT G T G CTC
TCA ACT
TGG
ACA GOA AAT ACC CAG
ACA TGT
A2~
7TA AGG
AGT
ATT
GCA TGC
AGC ~ T A
TCA GTC AGT TAA TGG GTG ACT GTT AGG set val set OCH
GAG
AAA
GAA GTG AGG glu val arg
1140/343
CCG TTG G A G G A A pro leu glu glu
1200/363
TTT G ~ GTG CGT GGT GCC pbe qlu val arg qly ala
A'r? AAG TTT =le lys phe
TGG GTG OAT G G G
AGT
TTG
TAC tyr
960/283
1020/303
GA7 OCT GCC asp ala ala
7TO AAA
ATC C T G A T G G A G ile leu met glu CGC ACT arg thr
AAA GAA ATC GAG lys glu ile qlu TAT CAC tyr his
GAT ATC TTG TGC GTC C G A asp ile leu cys val arg
TGC A G G TCT G G T AAG GGG CCC Cys arg set gly lys gly pro
AAC ash
CAG TGG gln tip
1260/383
TOT TCC TCA
1320
ACA A C A A.~A GGC
TAA AAA A A A A A A
P D H 10" 1, a mouse liver-specific sequence, or mt PDH 1.2a, a mouse testis-specific sequence.
1380
AAA
1440
Fig. 1. Nucleotide and amino acid sequence of the mouse liver PDH E l a (Pdha-1) cDNA sequence. A putative polyadenylation signal, AAATTAA, is underlined.
Chromosome assignment Genomic DNA was isolated from male and female mice according to Sambrook et al. [11] except the proteinase K step was omitted. The DNA was digested with restriction enzyme BglI, electrophoresed on a 0.7% agarose gel, blotted and probed with either ml
Mouse liver E l a isoform Sequence analysis of mouse liver E l a cDNA clones 10 and ZAP clone 2.1 revealed that the somatic isoform of E l a has a coding region of 1170 nucleotides which encodes a protein of 390 amino acids (Fig. 1). ml ZAP clone 2.1 is a full length cDNA clone with an approx. 120 bp 5' untranslated region and a 1.6 kb 3' untranslated region terminating with a poly(A) tail. ml Elc~ clone 10 contains the 3' end of the coding region and a 143 nucleotide 3' untranslated region followed by a poly(A) tail (nucleotide 776-1446, Fig. 1).
Mouse testis-specific E l a isoform Upon sequencing, the mouse testis E l a cDNA clones 2.4 and 3 were found to be similar, but not identical to the liver cDNA clones. The Pdha-2 sequence contains an open reading frame of 1173 nucleotides, thus encoding a 391 amino acid protein (Fig. 2). Clone 2.4 spans nucleotides 744-1587 and clone 3 nucleotides 1-542 (Fig. 2). Clone 2.4 ends five nucleotides downstream from the second polyadenylation signal. Neither of these clones contain a poly(A) tail. The Agtll adult mouse testis library was rescreened and a positive Pdha-2 cDNA clone isolated, which was found to be polyadenylated at nucleotide 1347. The data from cDNA clone 2.4 suggested that a second polyadenylation signal may be used. To investigate this possibility the RACE protocol was employed. A cDNA clone was identified which was found to contain a poly(A) tail 17 nucleotides 3' to the polyadenylation signal at nucleotide 1578 (Fig. 2). Therefore, two types of cDNA clones were identified that differ in the length of 3' untranslated region. The shorter transcript includes a 3' untranslated region of 108 nucleotides with 10 bases separating an AU-rich sequence ( A A U A A A U AAA) and the start of polyadenylation.
TABLE II Amino acid homology for human mouse P D H EIo~ genes, human somatic (PDHA1), haman testis-specific (PDHA2), mouse somatic (Pdha-l) and mouse testis specific (Pdha-2)
PDHA1/pdha- 1 PDHA1/PDHA2 PDHA1/pdha-2 PDHA2/pdha-! pdha- 1/pdha-2 PDHA2/pdha-2
Total number of amino acids compared
Number of amino acid changes
Amino acid homology (%)
Nucleotide homology (%)
390 387 390 387 390 388
7 53 88 52 91 97
98.2 86.3 77.4 86.5 76.6 75.0
88.6 85.5 76.5 81.4 75.5 77.8
86 1
Two polyadenylation signals are encoded in this sequence. The longer transcript contains 360 bases of 3' untranslated region and 17 bases separate the
50
PDHAI
kLRI~V
SRg"LSGASQKPASR~FLVASRNFANDATFEIKKCDLHRLEEGP
pdha-i
~L*KK~AAV
SRVLAGSAQKPASRVLVAS~FANDATFEIKKCDLHRLEEGP
PDHA2 pdha-2
MLAAFISRVLRRVAQKS~d~R~VASRNSSNDATFEIKKCDLYLLEEGP ~KMLTAVLSHVFSGMVQKPALRGLLSSLKFSNDATCDIKKCD
LYRLEEGP
51
TGT ATT G T A CAT ACT TTC AAT ~ A
TGT TAA GAT GTT GTT ATT TTT TTC TTT AAA AAC GAA
485
AAG GTA G A A G A A A G A A T G ATG TCA TTT TGC TCC CTC CCT CCT TCC CTC CTG CCT GCC CTT
-425
CCT CCT TCT CCT CTT TTC CTC CTC TCC ACC GCC CCq CAT TTT G A G GCC AAA GAG ACC ACA
365
### #~# #J# ##~ ##J # AAC TGA CTG AGA ATA G G A TTC AAG TCC GGC TCC TAC AI~A CCT C A A TGC CTA GTG TCT GAG
3C5
ATT ACA GGT ATG TGT GGC ACA GTT ACT GTC TGA GTT TAG AGG TGC TGG GAT G G A ACC C,,%A
245
@@@ @@@ @@@ @@@ @%@ TAT GTC CCT A C C CTG C C A G G A C T T TTT TGA GTT GTA CAG ~ A
-185
AAA GAC ACT CA~ TTG C T G TGT G A G TGG CAA C C A T G A TGT TAA A T A G G A G G A AAG TGT G G G
125
AAQ CTC
TGC ACA G T A GGC
PVTTVLTREDGLKYYRMMQTVRRMELKADQLYKQKIIRGFCHLCDGQEAC
pdha-I
PVTTVLTREDGLKYYRM}4QTVRRMELKADQLYKQKIIRGFCHLCDGQEAC
PDHA2
PVTTVLTRAEGLKYYRMMLTVRRMELKADQLYKQKFIRGFCHLCDGQEAC
pdha-2
PTSTVLTRAEALKYYRTMQVIRRMELKADQLYKQKFIRGFCHLCDGQEAC
PDHAI
CVGLEAGINPTDHLITAYRAHGFTFTRGLSVREILAELTGRKGGCAKGKG
pdha-i
CVGLEAGINPTDHLITAYRAHGFTFTRGLPVRAILAELTGRRGGCAKGKG
PDHA2
CVGLEAGINPSDHVITSYRAHGVCYTRGLSVRSILAELTGRRGGCAKGKG
pdha-2
CVGLEAGINPTDHVITSYRAHGFCYTRGLSVKSILAELTGRKGGCAKGKG
PDHAI
GSMHMYAKNFYGGNGIVGAQVPLGAGIALACKYNGKDEVCLTLYGDGAAN
pdha-i
GSMHMYAKNFYGGNGIVGAQVPLGAGIALACKYNGKDEVCLTLYGDGAAN
PDHA2
GSMHMYTKNFYGGNGIVGAQGPLGAGIALACKYKGNDEICLTLYGDGAAN
pdha-2
GSMHMYGKNFYGGNGIVGAQVPLGAGVAFACKYLKNGQVCLALYGDGAAN
I01
CTT GCT GAC G T A GGC AAC GCG TTT GCA TCC CGT TAT TGT TGC ATC AGA GGA GCT CGG CAG +I C C A TCT TAA AAG CCA CTG AGT GAT CGT TGG GAG CCG AGC CGC TAC CGT TGT GCC TCG CGT
65
56
116/18
AAG CCA GCT CTC A G A G G A C T G CTG T C A TCT CTG AAG TTC TCC A A C GAC GCC ACC TGT GAC lys pro ala leu arg gly leu leu set set leu lys phe set ash asp ala thr cys asp
176/38
200
**
P 3
?DHAI
QGQIFEAYNMAALWKLPCIFICENNRYGMGTSVERAAASTDYYKRGDFIP
pdha-i
QGQIFEAYNMAALWKLPCIFICENNRYGMGTSVERAAASTDYYKRGDFIP
PDHA2
QGQIAEAFNMAALWKLPCVFICENNLYGMGTSTERAAASPDYYKRGNFIP
pdha-2
QGQVFEAYNMSALWKIPCVFICENNLYGMGTSNERSAASTDYHKKGFIIP
-5
TTC TCC ATG AGG A ~ ATG CTG ACC GCT GTG CTG TCT CAC G T A TTT TCG G Q A ATG GTC CAA met arg lys ~ t leu thr ala val leu ~er his val phe set qly met val glrl
150
151
201
AGA TCC TCC CTT TCC AGT TAG AAG CAT CAG AGG AGC A G G CCC CGT G G G C G T GGC TTC CAS
I00
PDHAI
251 PDHAI
GLRVDGMDILCVREATRFAAAYCRSGKGPILMELQTYRYHGHSMSDPGVS
pdha-]
GLRVDGMDILCVREATKFAAAYCRSGKGPILMELQTYRYHGHSMSDPGVS
PDHA2
GLKVDGMDVLCVREATKFAANYCRSGKGPILMELQTYRYHGHSMSDPGVS
pdha-2
GLRVNGMDILCVREATKFAADHCRSGKGPIVMELQTYRYHGHSMSDPGIS
301
ATT AAG A ~ TGT GAC C T G TAC C G G CTG GAG GAS GGC C C A CCG ACC TCC ACC GTQ CTC ACC ile lys lys cys asp leu tyr arg leu glu glu gly pro pro thr set thr val le~ thr
236/!~
CGA GCC GAG GCC CTC AAG TAC TAC CGG ACC ATG CAG G T A ATT C G O CGC ATG GAG TTG AAG a r q ala glu ala leu lys tyr tyr arg thr met gln v a l ile arg arg met glu leu lyo
296;'78
GCC G A C C A G CTG TAT AAG CAG A A A TTC ATC CGT GGT TTC TGT CAC CTG TOT GAT GGQ CAG ala asp gln leu tyr lys qln lys phe ile arg gly phe cys his leu Cys asp gly gln
356,'98
GAA GCC TGC TGC GTG GGG C T G GAG G C A GGG A T A A.%T CCC A C G GAT CAC OTC ATC ACG TCC glu a l a Cys Cys val gly leu glu ala qly ile ash pro thr asp his val ile thr set
416;'118
TAC COG GCT CAT GGC TTC TGC TAC ACG C G A O G A CTG TCC G T G AAG TCC ATT CTC GCC GAG tyr arg ala his gly phe Cy3 tyr thr arg gly leu set val lys set ile leu ala gltl
476/138
CTG ACT GGA CGC A ~ G G A GGC TGT GCT A ~ GGC AJ~G G G A GGC TCC ATG CAC ATG TAC GGC leu thr gly arg lys g l y g l y cys ala lys gly lys gly g l y set ~ t his met tyr gly
536/15@
A~G AAC TTC TAC GGT C,GC AAT GGC ATT GTT G G G C43C C A G G T A CCC CTG G G A GCT GGT GTG lys ash phe tyr gly g l y a s h g l y ile val g l y alaa gln val pro leu g l y ala gly val
5@6/178 656/]9@
GCG GCT AAC C A A GGG CAG G T A TTC G A A C~CA TAC AAT ATG T C A GCC TTG TGG K A A TTA CCC als a l a ash gln gly gln val phe glu ala tyr a s h m e t s e t alaa leu trp ly3 leu pro
716/218
TGT GTT TTC ATC TGT GAG AAT AAC CTC T AT C ~ A A T G G G A ACC TCC AAC GAG AGA TCA GCA cys qal phe ile cys glu ash ash leu tyr gly ~ t gly thr set ash glu arg set als
776/238
GCC AGT ACT GAT TAC CAC AAG ~ GGT TTT ATT ATC CCC C ~ A CTG AGG GTG AAT GQG ATG ala set thr asp tyr his lys lys gly phe ile ile pro g l y leu arg val ash g l y ~ t
@36/258
........................... I ~ . . . . a ~ A a~ . . . . . . . . . . . . . ~ ............ asp ile leu cy3 val arg glu al& thr s phe a a a a asp his Cy3 srg set gly lys GGG CCC ATT GTG ATG GAG CTG CAG ACC TAC CGT TAT CAT G G A CAC AGT ATG AGC GAC CCA gly pro ile val met glu leu gln thr tyr arg tyr his g l y his set ~ t set asp pro
956/298
GGG ATC AGT TAT CGT TCA C G A G A A G A A G TT C AT AAC G T G A G A AGT A A G AGT GAT CCT ATA gly ile ser tyr arg set srg glu glu val his ash val srg set lys set asp pro ile
1016/3]
ATG CTG CTC C G A G A G A G A ATT ATC AGC AAC AAC CTC AGC AAT ATT G ~ GAA TTG A A A GAA met leu leu arg glu srg ile ile set ash ash leu set ash ile glu glu leu lys ql~
1076/33
ATT GAT G C A GAT G T G ~J~G A A A GAG GTG GAG GAC G C A GCT CAG TTT GCT ACG ACT GAT CCA ~]e asp ala asp val lys lys glu val glu asp ala ala gln phe ala thr thr asp pro
1!36/35
GAA C C A GCT G T G G A A GAT A T A GCC AAT TAC CTC TAC CAC C A A GAT C C A C CT TTT G A A GTC glu pzo ala val glu asp ile ala ash tyr leu [yr his gln esp pro pro phe glu val
]196/37
CGr GGT GCA CAT AAG TGG CTC AAG TAT AAG TCC CAC AGT TAG ATA SAT ~TT ACC TAT ACA erg g!y als his lys trp leu lys tyr lys ser his set AMB
1256
T2T GZT AAA TTT TTT TTC AGT GGG ACA TTT ATG GTG TAC TC~ A G G A ~
CTT CAA CTT TGT
1316
TAA GGA G G A ATA AAT A A A A C G A C A TTG C A G A C A A A A GTC TTA TAA ACC TTT A T A AAG ATA
1376
~AT rCC TGA GTT ATY ~ O
QAG ATT AGA AGA TAT ~
ASG A G A FQT TCC ACT
143t
TTC TGT TTT AAC ATT A ~
AGC ATT G T G TTG CAT ACT ACT ATG AAT ATC TTT TAG ACT ATT
1496
TCA AAT T T A TAA AAT TAT A~T A G A A A A AAC GGG TTA AAT TCC C C A ATT TGG CAT AGT AGT
1566
-
%@@ @@@ @@@ ~@~ @@@
G~
G T A GAT TTT T G A TTT
]6 ]6
GTG AGG AGQ ATQ ~TT TCA ATT CT~
1696
TT C ATT TGQ TTT TAG TTT TGA AAT A A A TAC TTT TAT TTA ~
TTT CTT CTT CTT TGA TGA GTT TGT C ~
TCC CCG C C C
A~
TTA ~
CCC CCT CAC CAA A T A CCT GAT TAT GTT CCC CTT TCC CCC TCC ATG q
YRTREEIQEVRSKSDPIMLLKDRMVNSNLASVEELKEIDVEVRKEIEDAA
pdha-i
YRTREEIQEVRSKSDPIMLLKDRMVNSNLASVEELKEIDVEVRKEIEDAA
PDHA2
YRTREEIQEVRSKRDPIIILQDRM"gNSKLATVEELKEIGAEVRKEIDDAA
pdha-2
YRSREEVHNVRSKSDPIMLLRERIISNNLSNIEELKEIDADVKKEVEDAA
P1
P2
350
351
GCT TTT GCC TGT A ~ TAC CTG AAG AAT GGT CAG GTC TGC TTG C4CT TTG TAC GC~ GAT GGT ala phe ala cys lys tyr leu lys ash qly gln val Cys leu slsa leu tyr gly asp gly
TTT GTT T ~
PDHAI
250
? AA7
] ')*
CTC TGA CCC AGG TTC CTC CCT CCC TCT GTC CCC TGT GGT TGA TTT TTC CTC CTA AGT AG]
l~l~
PDHA]
QFATAD?EPPLEELGYHIYSSDPPFEVRGANQWIKFKSVS.
pdha-]
QFATADFEPPLEELGYHIYSSDPPFEVRGANQWIKFKSVS.
?DHA2
QFATTD?EPHLEELGHHIYSSDSSFEVRGANPWIKFKSVS.
pdha-2
QFATTDPEPAVEDIANYLYHQDPPFEVRGAHKWLKYKSHS.
Fig. 3. Amino acid sequence alignment of human somatic ( P D H A I ) , human testis-specific (PDHA2), mouse somatic (Pdha-I) and mouse testis-specific (Pdha-2) Elo~ sequences. PI, P2 and P3 are phosphorylation sites. (**) marks the TPP binding site. Mitochondrial import sequence is in bold type.
polyadenylation signal ( A A U A A A ) and the start of polyadenylation. The human testis-specific PDH E l a gene is an intronless autosomal gene. To ascertain if an analogous situation exists in the mouse we cloned and sequenced a genomic D N A fragment ( m t E l a 2.3) containing the mouse testis-specific gene. The genomic sequence was identical to the c D N A sequence from clones 2.4 and 3. We therefore conclude that Pdha-2 is an intronless gene. Sequence generated in the promoter region (607 nucleotides sequenced from the genomic clone m t E l a 2.3) identified an Spl binding site at - 8 1 (Fig. 2).
Fig. 2. Nucleotide and amino acid sequence of mouse testis P D H E l a (Pdha-2) genomic sequence. The promoter region contains an Spl binding site, (GT)(GA)GGCG(GT)(GA)(GA)(CT), which is underlined ( ). In the 3' untranslated region two polyadenylation signals, A A T A A A , are underlined ( ). Polyadenylation of m R N A starts at bases marked with an asterisk (*). Sequence recognition element C A Y U G is underlined with ( ~ ~ ~ ) . A 'U-rich' sequence is underlined with a broken line ( - - - ) . Two direct repeats are marked ( # # # ) and ((ct~t;(ctO. Sequence in bold represents a promoter sequence that shows homology with a pgk-2 promoter sequence.
87
Comparison of human and mouse PDH E l a genes Table II summarises the comparison of the human and mouse P D H E l a nucleotide and protein sequences. Coding regions show between 75.5% and 88.6% similarity at the nucleotide level (Table II). However, the nucleotide sequence of noncoding regions of the four E l a sequences show no significant homology except for a 28 bp region in the 5' untranslated region of the human (PDHA2) and mouse (Pdha2) testis-specific genes (Fig. 2). These regions share a high degree of homology (78%) and are located 260 and 259 nucleotides upstream of the translational start in P D H A 2 and Pdha-2, respectively. The primary amino acid sequences of the mouse P D H E l a proteins are similar to those found in the homologous human proteins (Fig. 3). The difference in amino acid composition varies from 2% to 25%. Most of the changes are conservative but it is clear that these changes are not random as more are observed towards the C-terminus. It is also evident that the sequences near the three phosphorylation sites are highly conserved, as is the region near the proposed thiamine pyrophosphate binding site at Trp-214-Lys-215 (Fig. 3). The precursors to the human liver and testis P D H E l a proteins have typical mitochondrial import sequences [9,15]. The N-termini of the two mouse P D H El a proteins are also typical mitochondrial import sequences, characterized by the ability to form an amphiphilic a-helix, containing several basic, but no acidic amino acids [16]. Although the import sequence for the mouse testis specific P D H E l a conforms to these constraints, the primary amino acid sequence has diverged from the other E l a human and mouse import sequences. If the mitochondrial import sequence is cleaved at the same site in human and mouse, the Pdha-1 and Pdha-2 genes code for mature proteins of 361 amino acids with molecular weights of 40 135 and 40 085, respectively. PDHA1, Pdha-1 and Pdha-2 all have two possible initiator methionine residues in the mitochondrial import sequence. It is not known which methionine acts as the translational start point. Sequence analysis of rat and porcine P D H E l a c D N A clones indicates a high degree of homology with the human P D H A 1 sequence [17,18]. Rat and porcine mature protein sequences are calculated to be 99% and 98% homologous to mouse Pdha-1 mature protein sequences, respectively, while the nucleotide sequences are 95% and 88% homologous.
Chromosome assignment In situ hybridization has shown the presence of two P D H E l a loci in mice, one on the X-chromosome and one on chromosome 19 [10]. In order to assign the
12
34
Fig. 4. Chromosome assignment of Pdha-1 and Pdha-2. Male (lanes 1 and 3) and female (lanes 2 and 4) mouse DNA was digested with BglI and blotted. The blots were probed with the Pdha-I specific probe ml PDH 10:1 (lanes 1 and 2) or the Pdha-2 specific probe mt PDH 1.2a (lanes 3 and 4).
cloned genes to these loci we performed Southern blot analysis. Fig. 4 shows that a band in the lane containing genomic D N A from a female mouse is approximately twice as intense compared to the corresponding band in the lane containing male mouse D N A when probed with the mouse liver-specific probe. This indicates that the liver sequence is located on the X-chromosome. Furthermore, bands in lanes containing male and female mouse genomic D N A when probed with a mouse testis-specific probe, are of approximately equal intensity. This indicates an autosomal location for the testis-specific sequence. Discussion
We have shown that two isoforms of the mouse P D H E l a subunit exist. One of these, Pdha-1, is expressed in somatic tissues, whereas the other, Pdha-2, appears to be specifically expressed in spermatogenic cells, as it is not detected in kidney, liver, heart or brain [19,20]. We report here the isolation and sequence analysis of c D N A and genomic clones coding for thd somatic and testis specific isoforms of mouse PDH Ela. D N A sequence analysis of c D N A clones has indicated that the somatic form of mouse P D H Elc~ is coded for by at least two mRNAs. We have isolated two Pdha-1 c D N A clones that differ in length of 3' untranslated region. One clone has a 3' untranslated region of 143 bp and the other approx. 1.6 kb. The reason for the existence of the two different Pdha-1 m R N A s is not known, but was also noted in the equivalent human P D H E l a gene [15]. Two Pdha-2 transcripts were detected by c D N A sequence analysis. These also vary in size of 3' untrans-
88 lated region. One is 108 nucleotides and the other 360 nucleotides in length. The presence of two m R N A s agrees with RNA blot analysis of Pdha-2 expression in spermatogenic cells [19]. However, the expression pattern and possible function of the two Pdha-2 m R N A s must await further analysis. Each of the Pdha-1 and Pdha-2 c D N A clones characterized contain a polyadenylation signal that immediately preceeds a poly(A) tail. We assume therefore, that differences in transcript length are due to the utilization of alternative polyadenylation signals. The consensus recognition sequence element A A U A A A (polyadenylation signal) is necessary but not sufficient for polyadenylation. A large proportion of transcripts that undergo polyadenylation contain a ' G U ' or 'U'-rich element in addition to the polyadenylation signal. Mutation studies have shown that these elements play a role in efficient polyadenylation in many genes when present less than 50 nucleotides downstream from the A A U A A A (for review refer to Ref. 21). In Pdha-2, a 'U'-rich element is observed 25-48 nucleotides downstream from the second polyadenylation signal (78% of the nucleotides are uridines). This sequence therefore may be involved in polyadenylation in concert with the second, distal A A U A A A in Pdha-2. In a survey of 61 vertebrate sequences, Berget [22] observes another sequence element, C A Y U G which often appears between the polyadenylation signal and the start of polyadenylation. In Pdha-2 the sequence C A T T G is present downstream from A A U A A A and upstream from the start of polyadenylation (Fig. 2). This consensus recognition sequence is complimentary to regions within the small nuclear R N A U4 suggesting that U4 small nuclear ribonucleoproteins may mediate polyadenylation in a similar manner to the role of U1 snRNPs in splicing [22]. We have isolated a 3.2 kb genomic D N A fragment and confirmed by D N A sequence analysis that it contains the testis specific Pdha-2 gene. This revealed that the Pdha-2 gene is an intronless gene, similar to the human P D H A 2 and PGK-2 genes. Furthermore, D N A sequencing has enabled us to study the regions flanking the coding sequences in some detail. It has been suggested that these testis specific genes are functional retroposons, based on the presence of direct repeats flanking the reverse transcriptaseprocessed m R N A [23]. Two possible direct repeat structures have been detected in the mouse testisspecific gene of P D H El a (Fig. 2). One sequence starts at nucleotide 1351 ( 5 ' - G T C T T A T A A A C C T T T A T - 3 ' ) with the similar sequence at nucleotide - 3 3 5 ( 5 ' - C T C C T A C A A A C C T C A A T - 3 ' ) . The other repeats are at nucleotides 1600 ( 5 ' - G T A G A T T T T T G A T T T - 3 ' ) and - 2 0 8 ( 5 ' - G G A C T T T T T T G A G T T - 3 ' ) . The most 3' repeat in each set immediately follows the possible remnants of a poly(A) tail, the location of which is in
agreement with the c D N A data. We think it unlikely that such repeats are conserved during evolution if they have no function. Furthermore, creation of the intronless processed gene must have happened in a common ancestor. If they have been conserved due to a strong selection pressure one might also expect homology between the repeats found in the human and mouse testis-specific P D H Elc~ genes. However, no obvious similarities were detected. The relevance and possible function of these putative repeats are therefore not known. We have compared the promoter regions of the testis-specific mouse and human PDHElc~ genes with that of PGK-2 and with their somatic isoforms. The promoter sequences of the human P D H A I and PGK-1 genes are highly G + C rich, as are those of many other house-keeping genes [24]. However, the G + C contents of the testis specific PDHA2, Pdha-2 and the PGK-2 promoters are significantly lower, which might suggest that differences in G + C content affect the expression of these genes [24]. The testis specific prorooters have regions that are similar to the Spl consensus sequence [25]. In the mouse Pdha-2 gene this sequence is found at nucleotide - 8 1 and has the sequence 5 ' - G T G G G C G T G G C - 3 ' . Other similarities between the testis specific promoters include the Pdha2 sequence 5 ' - A G T T G T A C A G G G A A A A G A C A C T C A G T T - 3 ' at nucleotide - 1 9 7 . A similar sequence is found in the human P D H A 2 gene at nucleotide 260 (5'-GTTTAACAGGGAAAAGGGACTCAGAT-3') and in the human PGK-2 gene at nucleotide - 8 1 1 (5 ' - A G T T G T A C A G G T A G G A A A G C A - 3 ' ). Analysis of the pgk-2 promoter in transgenic mice [26] showed that deletion of the region - 1 4 0 0 to - 5 1 5 reduced expression by approx. 50%. However, the authors con-
TABLE Ill
Percentage corrected diz~ergence of PDH Ele~ and PGK gene sequences Calculations according to Perler et al. [27]. h-I is human somatic isoform, h-2 is human testis isoform, m-1 is mouse somatic isoform, m-2 is mouse testis isoform. PGK, phosphoglycerate kinase. Replacement changes are nucleotide substitutions that alter a codon so that an amino acid is replaced. Silent changes are nucleotide substitutions that change a codon to a synonymous codon and therefore do not change the amino acids.
h-l/m-I h-l/h-2 m-l/h-2 m-2/h-2 h-l/m-2 m-2/m-1
% Replacement PDH
Changes PGK
% Silent PDH
Changes PGK
1.08 7.53 6.98 15.43 14.25 14.06
1.25 6.44 7.54 7.75 10.25 10.54
76.4 79.2 96.1 83.4 102.8 120.8
32.2 62.1 72.5 65.2 96.4 103.8
89 clude that no cis-acting elements which affect expression level and tissue specificity are present in this region. Specific probes for the testis and liver isoforms of P D H were generated and used to probe Southern blot filters with male and female mouse DNA. These results, combined with the in situ hybridization data [10] show that the Pdha-1 gene is located on the X chromosome, region F3-F4, and the Pdha-2 gene on chromosome 19, band B. This is analogous to the situation in humans where PDHA1 is on the X-chromosome and PDHA2 is on an autosome (chr. 4) [7-9]. The molecular evolution of P D H E l a genes can be examined by calculating the relative amount of replacement and silent changes. Replacement changes are nucleotide substitutions that result in an amino acid substitution. Silent changes are nucleotide substitutions that result in a change to a synonymous codon. The resulting figures are corrected for multiple substitutions within a codon [27]. Table III shows that mouse and human somatic nucleotide sequences differ by 1%. However, mouse and human testis sequences differ by 15%. Since the same time period presumably has elapsed for the sequences to change, the somatic isoforms are more conserved than the testis-specific isoforms. This indicates that the somatic sequences can tolerate relatively few amino acid changes, presumably because of stringent selective pressures. A similar pattern was observed with the human and mouse PGK genes (sequences from Refs. 28 and 29) (Table III). Why testis sequences are much more divergent is not understood. It is well established that rodents and humans originated from a common ancestor. The relative rate of DNA change for P D H E l a for these two species can be determined by comparing sequence changes between E l a isoforms within each species. A comparison of human liver and testis sequences reveals that 79% of the sequence has experienced silent changes, but in the mouse, a similar comparison shows that 121% (corrected) have undergone silent changes (Table III). Silent changes are largely free of selection and reflect the rate of DNA change. Therefore, the rate of DNA change is higher for the mouse P D H E l a genes. This is also true for human and mouse PGK genes. Britten [30] has examined selectively neutral DNA sequence differences to determine the rate of DNA change between taxonomic groups. He observes a faster rate of change in rodent sequences relative to higher primate sequences, and suggests that variation and selection of biochemical mechanisms such as DNA repair and replication probably account for this difference in rate of DNA sequence change. The expression pattern of P D H E l a in mice appears to be similar to that in humans. The isolation and characterisation of the mouse P D H E l a genes will
enable us to study the expression and regulation of this subunit in detail, especially in spermatogenic cells and in the fetal brain. The latter is of importance for understanding the clinical effects of PDH E l a deficiency and for developing a mouse model for this disorder.
Acknowledgements We thank D. Kirby and H. Vogel for synthesising the oligonucleotides and Dr. R. Iannello for stimulating discussions and help preparing this manuscript. This project is partly supported by a block grant from the National Health and Medical Research Council of Australia.
References 1 Reed, L.J. (1974) Acc. Chem. Res. 7, 40-46. 2 Barrera, C.R., Namihira, G., Hamilton, L., Munk, P., Eley, M.H., Linn, T.C. and Reed, L.J. (1972) Arch. Biochem. Biophys. 148, 343 -358. 3 Linn, T.C., Pettit, F.H. and Reed, L.J. (1969) Proc. Natl. Acad. Sci. USA 62, 234-241. 4 Brown, G.K., Scholem, R.D., Hunt, S.M., Harrison, J.R. and Pollard, A.C. (1987) J. Inher. Metab. Dis. 10, 359-366. 5 McKay, N., Petrova-Benedict, R., Thorne, J., Bergen, B., Wilson, W. and Robinson, R. (1986) Eur. J. Pediatr. 144, 445-450. 6 Brown, G.K., Haan, E.A., Kirby, D.M., Scholem, R.D., Wraith, J.E., Rogers J.G. and Danks, D.M. (1988) Eur. J. Pediatr. 147, 10-14. 7 Brown, R.M., Dahl, H.-H.M. and Brown, G.K. (1989) Genomics 4, 174-181. 8 Maragos, C., Hutchison, W.M., Hayasaka, K., Brown, G.K. and Dahl, H.-H.M. (1989) J. Biol. Chem. 264(21), 12294-12298. 9 Dahl, H.-H.M., Brown, R.M., Hutchison, W.M. and Brown, C.K. (1990) Genomics 8, 225-232. 10 Brown, R.M., Dahl, H.-H.M. and Brown, G.K. (1990) Somatic Cell. Mol. Genet. 16, 487-492. 11 Sambrook, J., Fritsch, E.F. and Maniatis, T.(1989) Molecular cloning: A laboratory manual, 2nd Edn., Cold Spring Harbor Laboratory Press, Cold Spring Harbor. 12 Saiki, R.K., Gelfand, D.H., Stoffel, S., Scharf, S.J., Higuchi, R., Horn, G.T., Mullis, K.B. and Erlich, H.A. (1988) Science 239, 487-491. 13 Frohman, M.A., Dush, M.K. and Martin, G.R. (1988) Proc. Natl. Acad. Sci. USA 85, 8998-9002. 14 Bachmann, B., Luke, W. and Hunsmann, G. (1990) Nucleic Acids Res. 18, 1309. 15 Dahl, H.-H.M., Hunt, S.M., Hutchison, W.M., Maragos, C. and Brown, G.K. (1987) J. Biol. Chem. 262, 7398-7403. 16 Von Heijne, G. (1986) EMBO J. 5(6), 1335-1342. 17 Matuda, S., Nakano, K., Ohta, S., Saheki, T., Kawanishi, Y. and Miyata, T. (1991) Biochim. Biophys. Acta 1089, 1-7. 18 Sermon, K., De Meirleir, L., Elpers, I., Lissens, W. and Liebaers, I. (1990) Nucleic Acids. Res. 18, 4925. 19 Iannello, R.C. and Dahl, H.-H.M. (1992) Biol. Reprod., in press. 20 Takakubo, F. and Dahl, H.-H.M. (1992) Exp. Cell Res. 199, 39-49. 21 Manley, J.L. (1988) Biochim. Biophys. Acta 950, 1-12. 22 Berget, S.M. (1984) Nature 309, 179-182. 23 McCarrey, J.R. and Thomas, K. (1987) Nature 326, 501-505. 24 McCarrey, J.R. (1987) Gene 61, 291-298.
90 25 Briggs, M.R., Kadonaga, J.T., Bell, S.P. and Tijan, R. (1986) Science 234, 47-52. 26 Robinson, M.O., McCarrey, J.R. and Simon, M.I. (1989) Proc. Natl. Acad. Sci. USA 86, 8437-8441. 27 Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R. and Dodgson, J. (1980) Cell 20, 555-566.
28 Boer, P.H., Adra, C.N., Lau, Y.-F. and McBurney, M.W. (1987) Mol. Cell. Biol. 7, 3107-3112. 29 Mori, N., Singer, S.J., Lee, C. and Riggs, A.D. (1986) Gene 45, 275-280. 30 Britten, R.J. (1986) Science 231, 1393-1398.