Vol. 47, No. 4,1972
BIOCHEMICAL
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
THE ORIGIN OF THE GENETIC MATERIAL
IN THE ABNORMALLY
LONG HUMAN HEMOGLOBIN CI AND B CHAINS
Lois T. Hunt and Margaret 0. Dayhoff National Biomedical Research Foundation Georgetown University Medical Center 3900 Reservoir Road, N.W. Washington, D.C. 20007
Received April 6, 1972 SUMMARY The additional carboxy-terminal segment of the abnormal human Hemoglobin Constant Spring (Hb CS) C( chain and positions 68-98 of the normal human a chain have nine identical residues; this was the maximum number revealed by computer search of 518 known protein sequences. Searches were also made with Jl-residue segments having randomized sequences of an average composition. Because the probability is less than 1% that the similarity is due to chance alone, the results favor a comnon genetic origin for the Hb CS CYchain additional segment and the region 68-98 of the normal a chain; it may also be true that the additional segment of Hemoglobin Tak B chain arose in a similar manner. Recently
two variant
human hemoglobins
described;
Clegg,
Hemoglobin
Constant
and Flatz,
Kinderlerer,
Kilmartin
Hemoglobin
Tak (Hb Tak)
has 10 additional
normal
messenger
greater
than
Weatherall Spring
that
elongated
terminates
the
needed
the abnormal
chain
scribed
mRNA but
the
genetic
material
and Lehmann2
to code for
then
reflect
codes
chains
@ 1972, by Academic
Press, Inc.
into
the extended 699
Copyright
that
the
B chain
residues.
The
Therefore, in the
additional
structure protein. region
residues, of
to have a length
a mutation the
chromosome
of
carboxy-terminal
are believed
from
have been
the c1 chain
proteins.
protein;
translated for
chains
carboxy-terminal
form may result
not usually which
that
have found
the actual
of mRNA into
would
have found
the hemoglobin
protein
translation
elongated
(Hb CS) has 31 additional
RNAs for
abnormally
into
and Milnerl
having
an codon which
residues
which
is
tran-
Alternatively, may have been
in
Globin -Gastropod Mollusc Hemoglobin CYchain - Kangaroo Hemoglobin CYchain - Llama Hemoglobin ~1 chain - Irus Macaque Hemoglobin ~1 chain - Rhesus Macaque Histone IIb2 -Bovine Immunoglobulin K chain -Human BAT Imnunoglobulin K chain -Human CUM Imnunoglobulin K chain -Human TEN Lipotropin B - Pig
495
EIGHT IDENTICAL RESIDUES
a chain - Human ~1 chain - Mouse Enzyme - Myxobacter
- Human CS
Hemoglobin Hemoglobin Trypsin-like
a chain
to the
1:':
68
142
1:':
98
173
POSITION IN SEQUENCE
Spring
(CS)
31 Residues
SEQUENCE
of the ~1 Chain
GFVASVAAPPAGADAWTKLFGLIIDALKAAG QAVEHIDDLPGTLSKLSDLHAHKLRVDPVNF K(A.A,A.H.L.B.B.L.P.S,A.L.S.B.L.S.B.L.H.A.H)K.L L.A.V.G.H.V.D.D.M.P.Q.A.L.S.A.L.S.D.L.H.A.H)K.L LAVGHVDDMPNALSALSDLHAHKLRVDPVNF YNKRSTITSREIQTAVRLLLPGELAKHAVSE QSPLSLPVTPGEPASISCRSSQSLLHSBGBB QTPLSLPVTPGEPASISCRSSQSLLDSGDGN QSPLSLPVTPGEPASISCRSSQH(G,B)SFLNWY PEPARDPEAPAEGAAARAELEYGLVAEAQAA
NAVAHVDDMPNALSALSDLHAHKLRVDPVNF NAG(A.H.L)DDLPGALSALSDLHAHKLRVDPVNF GNVQSNGNNCGIPASQRSSLFERLQPILSQY
QAGASVAVPPARWASQRALLPHSLRPFLVFE
Constant
Carboxy-terminal
of Human Hemoglobin
Similar
NINE IDENTICAL RESIDUES
Hemoglobin
PROTEIN
Segments
TABLE 1
R(V.B.P.V.B.F. R(V.D.P.V.N.F.
F
s 5
"0
$ s d F r z m F 5
s 5 0
P
d.5 ?
Vol. 47, No. 4, 1972
introduced
through
of the extra authors1$2 portion ties,
recent
segments noted
that
we have
48,292
in the
residues,
these
Let
searches
us consider
The highest
first
number
surprising
in the
structural
genes
similarity
occurring
theoretical with
segment
of this
are
hypothesizes score
which
chance
in
we might 100 that
of
with
a total
from
the C( chain
of
segment
of HB CS. was 9.
occurred
was derived
from the
of the
a computer occur
probability
in two ways.
About
in the collected
of human hemoglobin.
is one chance
have come from segments
chain,
which
re-
is
32,500
data,
and 111
Thus,
if
in 293 that
this
~1
of this
The first
simulation.
It
in the
of the HB CS c1 chain
three
of
The results
material
there
one of the
all
the genetic
CI chain
would
similari-
with
families.
segments
68-98),
involves
find
any
the Atlas of protein
the 31-residue
can be derived
no relationship,
to discover
from
segment
Some estimate
in the
resemble
1 and 2.
for
of 31 residues
found
the
unless
family.
by chance
a length
such segments
derived
extra
not
518 sequences,
high-scoring
(positions
and the second
segments
found
the origin
protein,
segments
protein
in Tables
one of the three
of human hemoglobin
presented
include
the additional
of identities
that
Data Deck"
154 different
are shown
did
In an effort
19?Z3. The data from
segment
of the additional
Sequence
case,
For each abnormal
sequence.
"Protein
In either
of the extra
the sequences
derived
computer
chain
hemoglobin
compared
aberrations.
be of interest.
the sequence
and Structure
Sequence
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
chromosomal
would
of any known
the sequences
is
BIOCHEMICAL
a single
and about
we actually
one high one
found
would
be
from it. In order
to compare
of the search composition
shown
experimentally,
in Table
was average
for
had 3 each of Ser and Ala; Lys;
and 1 each of
ated
12 sequences
collection
for
Ile, with
similar
1 with
a null
the protein
Gln,
Arg,
composition
segments.
There
case,
Glu,
in Pro,
Phe, Tyr, by lottery were
701
simulation,
we constructed
families
2 each of Asp,
Asn, this
by computer
five
the data Thr,
Cys, His,
results
segments collection:
Gly,
Val,
and Met.
and searched segments
the
whose they
Leu,
and
We gener-
the data
or an average
of
Vol. 47, No. 4, 1972
BIOCHEMICAL
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
TABLE 2 Segments
Similar
to the
Carboxy-terminal
10 Residues
of Human Hemoglobin
FIVE IDENTICAL RESIDUES
0.4
8 chain
POSITION IN SEQUENCE
-Human
TAK
Elastase - Pig Hemoglobin a chain - Bovine Hemoglobin aA chain - Goat Hemoglobin sB chain - Bovine Hemoglobin B chain - Dog Hemoglobin B chain -Brown Lemur Hemoglobin 8 chain - Rhesus Macaque Hemoglobin y chain -Human Lysozyme -Bacteriophage T4 Growth Hormone -Human Coat Protein - Bacteriophage FR
sequences
was from of 8.
per search
a hemoglobin
with
Among the 73 protein
12 searches,
a score
chain.
There
were
families
with
a hemoglobin
hemoglobin
a chain
segment
with
a score
ted by the
product
of the
chance
that
chance
that
have
of 9 and the a high
score:
The chance score
high
segments
is having
The chance 9 would
that
then
For this
0.4
family
=
a segment
somewhat
less;
from only
be 0.016 value
x & 0.002
from
T K L L A(N,S,L)F
119 98 98 102 103 103 103 103 59
128 107 107 111 112 112 112 112 68 102 129
TILANNSPCY F K L L.S H S L L.V F)K L L.S H S L L.V F K(L.L.G.N.V.L.V.V. FKLLGNVLVC F)K(L.L.G.B.S,L.B,S, FKLLGNVLVC FKLLGNVLVT TKDEAEKLFN RSVFANSLVY TAIAANSGIY
1;:
None of those
sequences
appeared
search
with
a score
of 8 or 9 represented
three
of 9 would
times.
can then
hemoglobin
the
that
a
be estima-
of any segment
the
in
The chance
appear
the sequence from
per
found
will
have
a family
a
will
0.016. the human hemoglobin
five
7, 8, and 9 identities a segment
156
scores
a segment
x k
that
7.7
SEQUENCE
147
of 9 identities.
the
score
B Chain
TAK
PROTEIN Hemoglobin
of the
a chain
itself
would
occur
among the 47 hemoglobin
a
(high
scores)
segments.
human hemoglobin
a
with chain
the would
random have
chain
a score
2. 0.002. we derived
an estimate
702
of the standard
deviation,
of
Y
Vol. 47, No. 4, 1972
which
is
If
?0.002.*
segment with
BIOCHEMICAL
the
distribution
from human hemoglobin
a confidence
level
From either ment,
point
the probability
segment
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
is
C( chain
greater
than
of view,
the
is
of HB CS a chain
low,
only
normal,
having
then
a score
the probability
of 9 is
less
of a than
l%,
99%. theoretical about
one or the computer
1X, that
to the human hemoglobin
the similarity cx chain
is
experi-
of the extra due to chance
alone. Although
a probability
conclusion,
nevertheless,
proximity that
the similar
probable
that
additional several
the
divergence If
faster
is so, in
the
segment
chains
are prominent
in
for
segments
75 million
mutations functional
similar
is
case for
its
the data
origin are
part
ago.
of the a chain.
2.
segment,
from a portion
not inconsistent
sur-
as the
years
to the additional
short
Because
considerably
of Hb Tak are shown in Table For this
and the
we would back
to
very
origin.
as far
possibly
in the
the results.
However,
make it
as similar,
occurred
of point
than
B chain
to make a strong chain.
each other,
segment
of the
almost
segment
of acceptance
of a search
terminal
hemoglobin
rate
segment
68 to 98 of the CL chain
are
a
and the
the extra
had a common evolutionary
a chains
from
for
establish
aberration
of human hemoglobin
residues
of the extra
species
the additional
The results
possible
mammalian
does not
of genetic
material
of the ~1 chain including
appearance
of these
this
genetic
of HB CS cx chain
of the other that
of the
region
the segment
segment
by itself
the known mechanisms
on the chromosome
for
mise
of such magnitude
it
carboxyHemoglobin is
not
of the
functional
such
an origin.
with
* Theoretical estimates of the standard deviations of the randm variables involved here are given by a, where N counts are observed. The standard deviation of the product of three functions was estimated as follcns: Let 6 = the standard deviation of the nth variable divided by its mean; then !he fractional standard deviation of the product of three independent distributions is given by
“ijk
=
6? + s? + 62 -I-s? 6? + 62ik 62 + s? 62 Jk J
k
1J
703
Vol. 47, No. 4, 1972
The carboxyl other.
Their
events.
This
duplication leading
BIOCHEMICAL
ends of the
origins is
may well
two abnormal
to c1 and B chains
chains
RESEARCH COMMUNICATIONS
are not
have been two different
to be expected
at the time
AND BIOPHYSICAL
if
the additional
of the mammalian
radiation,
very
series
a chain long
similar of genetic
segment after
to each
arose
from
the duplication
of hemoglobin.
ACKNOWLEDGMENTS This the National
investigation Institutes
was supported
by Grants
GM-08710
and RR-05681
from
of Health.
REFERENCES 1. 2. 3.
Clegg, Flatz, 1: 732 Atlas press,
J.B., Weatherall, D.J. and Milner, P.F., Nature, 234: 337 (1971) G., Kinderlerer, J.L., Kilmartin, J.V. and LehmancH., Lancet, (1971) of Protein Sequence and Structure 1972, ed. Dayhoff, M.O., 5: in National Biomedical Research Foundation, Washington, D.C. (i972)
704
a