Vol. 176, No. 2, 1991
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS Pages 711-716
April 30, 1991
THE
PRIMARY
STRUCTURE
OF
SYSTEM
Kazuko Fujiwara,
Institute
HUMAN
H-PROTEIN
DEDUCED
Kazuko
BY
OF
cDNA
THE
GLYCINE
CLEAVAGE
CLONING
Okamura-Ikeda, Yutaro Motokawa
Kiyoshi
Hayasaka*,
for Enzyme Research, the U n i v e r s i t y Tokushima 770, Japan
and
of Tokushima,
Received March 14, 1991
SUMMARY: A f u l l - l e n g t h cDNA encoding the human H-protein of the g l y c i n e c l e a v a g e s y s t e m has b e e n i s o l a t e d f r o m a lgt11 h u m a n fetal liver cDNA library. The cDNA insert was 1091 base pairs with an open reading frame of 519 base pairs which encoded a 125amino acid mature human H-protein with a 48-amino acid presequence. Human H-protein is 97%, 86%, and 46% identical to the bovine, chicken, and pea H-protein, respectively. © 1991Academic Press, Inc.
The in
glycine
cleavage
mitochondria,
glycine.
The
protein,
and
protein
(I-6).
and
system
a covalently
the
chemically
determined
H-protein
is
prosthetic
group
bovine
at
disease four
is
or
acid
of
125 59
sequence
amino (7).
genes. H-protein
Patients have
it
system with
been
small
the
acids
cleavage
the
T-
as
protein group
with
that
a
we
H-
with
which
catalysis.
revealed
The
chicken
lipoic
cloned
a
acid cDNA
(8). system
hyperglycinemia. as
P-protein,
designated
Recently,
and sequenced
of the glycine
of
prosthetic
during
located
degradation
enzymes,
protein
acid
system
the
is a h e a t - s t a b l e
lysine
heterogeneous
for
three
carrier
enzymes
amino
non-ketotic
structural
T-protein,
of
a
lipoic
H-protein
The activity with
and
three
composed
is a m u l t i e n z y m e
responsible
consists
attached
with
patients
is
H-protein
interacts
encoding
it
L-protein, The
system
The is
is d e f i c i e n t etiology
encoded
a defect reported
in (9,
by
the
of at
in
this least
P-protein,
10),
but
the
Permanent address: Department of Pediatrics, Akita University School of Medicine, Honmichi 1-chome, Akita 010, Japan. The abbreviations bp, base pairs.
used are:
711
SDS,
sodium dodecyl
sulfate;
0006-291X/91 $1.50 Copyright © 1991 by Academic Press, Inc. All rights of reproduction in any form reserved.
Vol. 176, No. 2, 1991
molecular ketotic
mechanisms
underlying
hyperglycinemia
deficiency the
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
of
the
components
present
of
study,
H-protein
and
are
system, the
mostly
it
is
human
we r e p o r t
the
enzyme unknown.
desirable
glycine
primary
MATERIALS
To to
cloning
in
non-
understand
have
cleavage
the m o l e c u l a r
its d e d u c e d
deficiency
cDNAs
system.
the
of
all
In
the
of the h u m a n
liver
structure.
AND
METHODS
Materials Restriction nucleases and DNA-modifying enzymes w e r e p u r c h a s e d f r o m T o y o b o (Tokyo, Japan), N i p p o n g e n e (Tokyo), and T a k a r a S h u z o (Kyoto, Japan). Radiolabeled nucleotides were o b t a i n e d from N e w E n g l a n d N u c l e a r (Boston, MA). Oligonucleotide primers were synthesized on an Applied Biosystems 381A DNA synthesizer. S c r e e n i n g of c D N A L i b r a r y A human fetal liver cDNA library in ~gt11 (Clontech) was screened with a 32p-labeled EcoRI f r a g m e n t e n c o d i n g the m a t u r e b o v i n e l i v e r H - p r o t e i n (BH5A (8)). The b o v i n e c D N A p r o b e w a s h y b r i d i z e d to the h u m a n c D N A l i b r a r y at 65 °C in 6 x S S C (I x S S C = 150 m M NaCI, 15 m M s o d i u m c i t r a t e , p H 7.0), 5 x D e n h a r d t ' s s o l u t i o n , 0.5% SDS , 10% d e x t r a n s u l f a t e , 160 ~ g / m l s a l m o n t e s t i s DNA, and 7 x I0 s c p m / m l of p r o b e . The f i l t e r s w e r e W a s h e d in I x SSC and 0.1% SDS at 65 °C for 30 m i n before autoradiography. Twenty five p u t a t i v e positive clones w e r e i s o l a t e d f r o m 6 x I0 s p l a q u e s . They were classified into four groups according to their cDNA sizes and restriction patterns and analyzed. DNA Sequencing EcoRI-excised DNA inserts or KpnI-SacIe x c i s e d DNAs f r o m lgt11 D N A c o n t a i n i n g the i n s e r t w e r e s u b c l o n e d into a pGEM-3Z cloning vector (Promega) and sequenced by the d i d e o x y c h a i n t e r m i n a t i o n p r o c e d u r e (11) u s i n g the G e m S e q K l e n o w s y s t e m (Promega).
RESULTS
Screening bovine HHI,
probe
that fetal
the
glycine
liver
liver
strands
shown
bp
of
site
complete was
chosen
is
higher
(12).
The to t h e
in Fig.
and
2.
sequencing
of
to the
in
the
strategy
outlined
in Fig.
the
deduced
cDNA
at the
3' end,
There
is
25-543)
the
primary
consists
an
I.
H-protein
sequence
bp,
the
internal
a
sequence (13)
and
including
of of
As a m i n o
amino
519 173
acids
determined the
5
EcoRI
frame
protein
sequence.
in
on b o t h
1091
encoding
of
than
The deoxy-
acid
reading
as
activity
determined
amino
with
in the
712
of
the
H-protein.
liver
open
NH2-terminal
human
the
fetal
was
the f i r s t A T G
identical
because
with
designated
for
sequence
(nucleotides
following
library
clone,
primary
The
tail
cDNA
region
system
786.
is
coding
library
nucleotide
residues
liver
of a c D N A
cDNA
sequence
nucleotides
direct
the
fetal
isolation
the p o l y ( A )
at
49-60
human
the
DISCUSSION
cleavage
according
nucleotide are
the
to
harbored
The
adult
of
led
AND
by acid
Vol. 176, No. 2, 1991
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
=
~
L |
I )
• A w
"© 1
i
I
%
I
!
I
200
0
I
400
I
I
600
800
l
I
I
I000
Fig. I. Restriction endonuclease map and strategy for nucleotide sequence analysis of human H-protein cDNA. The open box represents the coding region, and the solid lines represent the untranslated regions. The horizontal arrows indicate the d i r e c t i o n and extent of sequence analysis. Closed circles indicate the use of synthetic primers. The scale at the bottom is nucleotide basepairs. sequence bovine that
including
H-protein
this
the
125 a m i n o
H-protein
acids,
with
pair
of
the
H-protein and
the
561
The
processing of
isolated
a portion
bovine
coding
cDNA
region
Moreover,
alter
This in
of
3'
13
(8)
the
and
48 of
of
the
13,812
contains
the
bp A
present
for
AATAAA
is
normally
was
cDNA
latter
suggests
543
sequences.
AATAAA
signal
Da.
bovine
sequence
apparently (14).
that
we
used
The
short
have
only
region. H-protein
showed
in the
region that
leading
to
sequence homology
encoding
encodes
the
the
silent
39%
mature
untranslated
with
regions
different
that
90%
of
in the
protein.
protein,
at
mutations,
in t h e
with is
mature
predominantly
agreement
among
the
of the p r o t e i n s .
is o n l y
in g e n e r a l
cDNA
that
occurs
structures
5'
and
region
region
divergence
of
HHI
of
the
of
is a p o l y p e p t i d e mass
signal case
believe
a presequence
frame,
the
we
the
NH2-terminus
5'-untranslated
between
bp
sequences
codons
is
in
with
(8),
the
protein
polyadenylation
of this
of h o m o l o g y
finding
that
and
95%
the p r i m a r y
sequence
is
clone
and
in
the degree
as
of the h u m a n
within
divergence position
1065
identity
cDNA
molecular
of
distance
its
serine-49,
reading
bp
5.'-untranslated
Comparison the
24
90%
Since
polyadenylation
tail
stretch
open
and
and
(8).
poly(A)
for R N A
is
a calculated
consensus
at p o s i t i o n s
from
The m a t u r e
to the
3'-untranslated
has
H-protein.
is p r e d i c t e d .
In a d d i t i o n of
deduced
is h u m a n
human
acids
presequence
sequence
protein
mature
amino
the
the
the
third
which
do
not
On the o t h e r hand,
3' u n t r a n s l a t e d other have
species
region.
eucaryotic higher than
genes
rates
the
of
coding
regions. The distribution H-protein
examined
by
of h y d r o p h o b i c the
method
713
of
residues Kyte
and
in the m a t u r e Doolittle
human
(15)
and
Vol. 176, No. 2, 1991
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
1
GGG CGG GCC CGC ACC CCT GCG AAC ATG GCG CTG CGA GTG GTG CGG AGC Met Ala Leu Arg Val Val Arg Ser
48 8
49 9
GTG CGG GCC CTG CTC TGC ACC CTG CGC GCG GTC CCG TTA CCC GCC GCG Val Arg Ala Leu Leu Cys Thr Leu Arg Ala Val Pro Leu Pro Ala Ala
96 24
97 25
CCC TGC CCG CCG AGG CCC TGG CAG CTG GGG GTG GGC GCC GTC CGT ACG Pro Cys Pro Pro Arg Pro Trp Gln Leu Gly Val Gly Ala Val Arg Thr
144 40
CGT AAA TTC ACA GAG AAA Arg Lys Phe Thr Glu Lys
192 56
193 57
CTG CGC ACT GGA CCC GCT CTG CTC TCG GTG Leu Arg Thr Gly Pro Ala Leu Leu Ser Val + CAC GAA TGG GTA ACA ACA GAA AAT GGC ATT His Glu Trp Val Thr Thr Glu Asn Gly l l e
GGA ACA GTG GGA ATC AGC Gly Thr Val Gly l l e Ser
240 72
241 73
AAT TTT GCA CAG GAA GCG TTG GGA GAT GTT GTT TAT TGT AGT CTC CCT Asn Phe Ala Gln Glu Ala Leu Gly Asp Val Val Tyr Cys Ser Leu Pro
288 88
289 89
GAA GTT GGG ACA AAA TTG AAC AAA CAA GAT GAG TTT GGT GCT TTG GAA Glu Val Gly Thr Lys Leu Asn Lys Gln Asp Glu Phe Gly Ala Leu Glu
336 104
337 105
AGT GTG AAA GCT GCT AGT GAA CTC TAT TCT CCT TTA TCA GGA GAA GTA Ser Val Lys Ala Ala Ser Glu Leu Tyr Ser Pro Leu Ser Gly Glu Val
384 120
385 121
ACT GAA ATT AAT GAA GCT CTT GCA GAA AAT CCA GGA CTT GTA AAC AAA Thr Glu l l e Asn Glu Ala Leu Ala Glu Asn Pro Gly Leu Val Asn Lys
432 136
433 137
TCT TGT TAT GAA GAT GGT TGG CTG ATC AAG ATG ACA CTG AGT AAC CCT Ser Cys Tyr Glu Asp Gly Trp Leu l l e Lys Met Thr Leu Ser Asn Pro
480 152
481 153
TCA GAA CTA GAT GAA CTT ATG AGT GAA GAA GCA TAT GAG AAA TAC ATA Ser Glu Leu Asp Glu Leu Met Ser Glu Glu Ala Tyr Glu Lys Tyr l l e
528 168
529 169
AAA TCT ATT GAG GAG TGA AAA TGG AAC TCC TAA ATA AAC TAG TAT GAA Lys Ser l l e Glu Glu
576
145 41
577 625 673 721 769 817 865 913 961 1009 1057
ATA TTA TTA AGA CTA GTT TAT ATA GTT ATA TGC
ACG CAA GAA TAG ACA CTG TAA ATA GGC TCT ATA AAA ATA ATA CTT AAT GCT TGT CAC TGC AGC AAA
GCC AGC AGA GTT GTC TTA AAT AAA CTT TTA GTA I-[A CCG ATG CTA ATG AAA GAA AAT GCC CTT TAA TAT GCG TCT TTT TCA CAA AGT GTT CAG AAT TCA TGA AAT ATT ACA TAA TTC AAA GAT AAC TTG TAA CTT GCA TGT ATC CAT GAT CTT TCC ATT GGA AAT AAC ACA GTG TCA GAT GAG GAA CAC ATT TGC TGG TGC TAT TTT TAT ATA ATA AAA TAC TTC TTC GTT
TAG TGG TGG ATA GAA GAC 624 GGG AAA AAA AAA CTA CTG 672 TAA CTT TCT AAT GAT TAT 720 TAT CCT ATG ATT TTT AGA 768 TAT CCA TGG TAA AAA CTA 816 ATT GTT ATT CTT AAG CCT 864 ACC TGG ATT TGG GAT GAA 912 TGG AAG TGA AGA GGT TTT 960 CAC TAT CTT AAT TTT GCG 1008 ACA GTG AAG CAA CAG CTT 1056 AAA AA
Fig. 2. Nucleotide sequence of human H-protein cDNA. The complete sequence of clone HHI and its translation into the human H-protein are presented in the 5' to 3' direction. The asterisk indicates the residue involved in lipoic acid attachment. The arrow indicates the site where the presequence is predicted to be cleaved. The p o l y a d e n y l a t i o n signals are underlined.
the
secondary
Fasman
(16)
chicken
structure
are
H-protein
estimated
to
pWbsthetic
group
predicted
essentially (7).
be
38%
The and
is a g a i n
The
predicted
with
the
reported
(17,
18)
mature
the
found
mature sequences
H-protein.
contents
27%,
by
same
the
as of
method
those s-helix
respectively. to r e s i d e
human
bovine
The
human
714
(8),
and
for
the
and
B-sheet
were
lipoic
acid
sequence chicken
sequence
Chou
The
in a h e l i c a l
H-protein
for
of
reported
has
region.
was (7), 97%
compared and
pea
identity
V o l . 176, No. 2, 1991
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
human
-40 -30 -20 -lO -I MALRVVRSVRALLCTLRAVPLPAAPCPPRPWQLGVGAVRTLRTGPALL
bovine
~AL~A~AAVGGL~AISAbSAS~LS~5GGLRA6A~EL~6PALL
chicken
::::
:
:
:
MALRMWASSTANALKLSSSS. . . . . . . . . .
pea 1
I0
20
RLHLSPTFSISRCFSNVL +
30
40
50
human
SVRKFTEKHEWVTTENGIGTVGISNFAQEALGDVVYCSLPEVGTKLNKQD
bovine
SVRKFTEKHEWVTTENGVGTVGISNFAQEALGDVVYCSLPEVGTKLNLQE : :::: :::: ::::::::::::::::::::::::::::::::::: :
chicken
SARKFTDKHEWISVENGIGTVGISNFAQEALGDVVYCSLPEIGTKLNKDD
pea
DGLKYAPSHEWVKHEGSVATIGITDHAQDHLGEVVFVELPEPGVSVTKGK
human
6O 7O 8O 90 I00 EFGALESVKAASELYSPLSGEVTEINEALAENPGLVNKSCYEDGWLIKMT :::::::::::::::::::::::::: :::::::::::::::::::::::
bovine
EFGALESVKAASELYSPLSGEVTEINKALAENPGLVNKSCYEDGWLIKMT
chicken
EFGALESVKAASELYSPLTGEVTDINAALADNPGLVNKSCYQDGWLIKMT
pea
GFGAVESVKATSDVNSPISGEVIEVNTGLTGKPGLINSSPYEDGWMIKIK
human
llO 120 LSNPSELDELMSEEAYEKYI KSIEE
bovine
FSNPSELDELMSEEAYEKYI KSI EE
chicken
VEKPAELDELMSEDAYEKYIKSI ED
pea
PTSPDELESLLGAKEYTKFCEEEDAAH
::::::::::::::::: :::::::::::::::::::::::::::::::
:::::::::::::::::::::::: :
:::::::::::::::::::
Fig. 3. Comparison of amino acid s e q u e n c e s of human, b o v i n e (Ref. 8), c h i c k e n (Ref. 7), and pea (Refs. 17, 18) Hproteins. Amino acid residues are numbered beginning with the NH2-terminal serine of the mature human H-protein. The lysine r e s i d u e involved in lipoic acid a t t a c h m e n t is m a r k e d by an asterisk. Amino acid residues identical with human H-protein are marked with double dots. The arrow indicates the NH2-terminus of pea H-protein.
with
the
sequence, (Fig.
bovine while
3).
revealed
sequence 46%
attachment
the
site
similarity.
The
acid
to
of
the
protein
intermediate
during
the
of
human
presequence. of
the
both
acid
seems
presequence
the
the
conserved site
Sequences prepeptides
amino
acid
chicken
the
pea
sequence
and
the
is
NH 2- a n d well
715
pea
H-protein
including
significant sequence for
the
degradation.
H-protein
were
for
the
portion
to be e s s e n t i a l and/or
with
with
H-protein
middle exhibits
glycine
of
identity
observed
animal
in
lipoic
86%
was
the
region
of
lipoate-attachment lipoic
identity
Alignment that
and
68.8%
sequence
around
the
binding
transfer Identity with
the
COOH-terminal
conserved.
the
the of
of
the
of
the
bovine portions
Although
the
Vol. 176, No. 2, 1991
presequence bovine
of
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS
the
and human
pea
H-protein
H-protein,
shorter
seven out of
half
the NH2-terminal
sequence of human and bovine
result
may
indicate
that
the pea
the
is important
presequence
conserved
than
sixteen
the NH2-terminal
these prepeptides
of
is
is
those amino
of
the
acids
of
homologous
with
presequences.
The
NH2-terminal
for the transport
sequence
of
of H-protein
to
the proper site in the mitochondria.
REFERENCES
I. Fujiwara, K., Okamura, K., and Motokawa, Y. (1979) Arch. Biochem. Biophys. 197, 454-462. 2. Hiraga, K., and Kikuchi, G. (1980) J. Biol. Chem. 255, 1166411670. 3. Okamura-Ikeda, K., Fujiwara, K., and Motokawa, Y. (1982) J. Biol. Chem. 257, 135-139. 4. Fujiwara, K., and Motokawa, Y. (1983) J. Biol. Chem. 258, 8156-8162. 5. Fujiwara, K., Okamura-Ikeda, K., and Motokawa, Y. (1984) J. Biol. Chem. 259, 10664-10668. 6. Okamura-Ikeda, K., Fujiwara, K., and Motokawa, Y. (1987) J. Biol. Chem. 262, 6746-6749. 7. Fujiwara, K., Okamura-Ikeda, K., and Motokawa, Y. (1986) J. Biol. Chem. 261, 8836-8841. 8. Fujiwara, K., Okamura-Ikeda, K., and Motokawa, Y. (1990) J. Biol. Chem. 265, 17463-17467. 9. Hiraga, K., Kochi, H., Hayasaka, K., Kikuchi, G., and Nyhan, W. L. (1981) J. Clin. Invest. 68, 525-534. 10. Hayasaka, K., Tada, K., Kikuchi, G., Winter, S. and Nyhan, W. L. (1983) Pediatr. Res. 17, 967-970. 11. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U.S.A. 74, 5463-5467. 12. Hayasaka, K., Tada, K., Fueki, N., Takahashi, I., Igarashi, A., Takabayashi, T. and Baumgartner, R. (1987) J. Pediactics 110, 124-126. 13. Hiraga, K., Kure, S., Yamamoto, M., Ishiguro, Y., and Suzuki, T. (1988) Biochem. Biophys. Res. Commun. 151, 758-762. 14. Proudfoot, N. J., and Brownlee, G. G. (1976) Nature (London) 263, 211-214. 15. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132. 16. Chou, P. Y., and Fasman, G. D. (1978) Adv. Enzymol. Relat. Areas Mol. Biol. 47, 45-148. 17. Kim, Y., and Oliver, D. J. (1990) J. Biol. Chem. 265, 848853. 18. Macherel, D., Lebrun, M., Gagnon, J., Neuburger, M., and Douce, R. (1990) Biochem. J. 268, 783-789.
716