Vol. 54, No. 3, 1973
BIOCHEMICAL
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
HUMAN a-CHAIN GLOBIN MESSENGER: PREDICTION OF A NUCLEOTIDE SEQUENCE Brian Department Received
Laux,
Don Dennis,
of Chemistry, August
and Harold
University
B. White,
of Delaware,
III* Newark 19711
3, 1973
Summary: All nucleotide sequences consistent with the amino acid sequence of the a-chain of human hemoglobin were tested for their complementarity with a known 26-nucleotide sequence from the a-chain messenger. The region with the highest pairing potential is immediately adjacent to the known nucleotide sequence. The existence of this potential hairpin loop requires the specification of nucleotides in at least 6 degenerate positions.
Secondary
structure
is
and rRNA2.
The degeneracy
possibility
for
we113-6.
Certainly
structure
is very
messengers
a characteristic of the genetic
extensive
secondary
for
the MS2 coat 7 evident . Whether
are favored
feature
generally
of
code allows
structure
tRNA's1
the
in mRNA's as
protein
messenger,
secondary
such base-paired
by natural
selection
is yet
to be determined. Double-standed
RNA inhibits
reticulocytes
by binding
This
that
suggests
region a-chain is
coding dictions
to initiation normally
factor
binds
The hyperchromicity
synthesis
to
a
in rabbit
3(IF-3j8. double-standed
exhibited
by rabbit
globin
messenger indicates that secondary structure 9 two possible in this molecule . We have predicted
present
hairpin
*
in mRNA.
IF-3
protein
loops for
for
the portion
the C-terminal
were made possible
To whom inquiries
should
of
region
the human a-chain 5. of the protein
by the amino acid be addressed.
messenger These pre-
sequence
of
the
BIOCHEMICAL
Vol. 54, No. 3,1973
chain-termination
variant,
discovery
of
permitted
the unambiguous
tides
in
This
this
third
structures5,
This maximal
base-pairing
coding
for
loop
for
All
the amino acid
using
a pattern
base-pairing
stable
secondary Several
This
with
agreement
which
the
with
a
provided
26-nucleotide all
sequence
regions
nucleotide
selection
recognition
of this
of
of
the mRNA
sequences
sequence of human a-chain
possibilities 12 structure .
features
both
1).
deduced
possible
were considered.
computer
(Fig.
nucleo-
communication).
with
in striking
complementarity
with
personal
the structure
when the
protein.
compatible globin
hairpin
The recent 11 Wayne , has
of 26 consecutive
incompatible is
.
variant,
assignment
but
loop represents
was checked
deletion
is
10
Spring
(W. P. Winter,
sequence
potential
Constant
frame-shift
region
nucleotide
predicted
only
the
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
was performed algorithm
which
would
proposed
by
which
retains
contribute
loop
should
to
be
noted: a.
The region with
that
the
provides
26-nucleotide
maximum complementarity sequence
the mRNA immediately
adjacent
quence.
loop which
The hairpin
intrinsically
more stable
complementary
segments remote
(The region codes for nucleotides
with
is
a segment of
to that
than
this
known secreates
a structure
is with
from each other.
the second-best
pairing
Leu 80 to Leu 86 and is
potential
located
156
away from the known 26-nucleotide
sequence.) b.
The size comparable
and stability to those
of
this
in the MS2
895
hairpin RNA'.
loop
is
We have
Vol. 54, No. 3,1973
BIOCHEMICAL
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
Fig. 1. Predicted hairpin loop for the C-terminal region of human cl-chain globin. Circles indicate known nucleotides. Boxes indicate nucleotides in degenerate positions which have been selected to maximize basepairing. Unspecified nucleotides are indicated by X. The amino acid sequence beyond the termination codon is that of the chain-termination variant, Constant Springlo. The nucleotide sequence from Tyr 140 to Val 147 was deduced by a comparison of the amino acid sequences of the Constant Spring variant and the frameThe shift deletion variant, Waynell. Wayne variant can be explained by a deletion of any one of the three nucleotides coding for Lys 139 or either of the last two nucleotides of Ser 138. In order to maximize base-pairing, we have assumed that the Wayne variant results from a deletion of G from the Lys 139 codon. (The other possible deletions for the Wayne variant require an A in this position.) The first nucleotide of the Leu 136 codon is likely to be C since the hemoglobin variant Bibba (Leu 136a+Pro)16 would then be oossibleith a single nucleotide substitution. &
calculated to be -17.6 ing
the
stabilization energy of this 13 kcal/mole or -10.8 kcal/molel*
upon the method of
calculation
896
used.
loop dependThe
BIOCHEMICAL
Vol. 54, No. 3, 1973
loop
AND BIOPHYSICAL RESEARCH COMMUNICATIONS
as shown in Fig.
there
are four
1 is
interior
25-nucleotide
loops but
pairs
no bulge
long: 14
loops
.
The codons are in 2:2 registry5. It start
is
at the third
at the the it
first
that
the hairpin
could
of Ala
130 and end
nucleotide
nucleotide
of Ala
145.
loop would be 19 nucleotides is
third
possible
that
nucleotide
nucleotide hairpin there
of is
The loop
bility
at
of Ser 133 and end at the
first
codon.
more stable factors
While
the
the
longest
theoretically,
which
would
favor
these
loops.
in Fig.
1 permits
in degenerate
of the
tides
start
slightly
nucleotides
case
Likewise,
could
the termination
hairpin
In this
long.
the hairpin
may be other
shorter C.
conceivable
the prediction positions.
random occurrence
in the positions
The proba-
of
indicated
of 15
these
is
nucleo-
less
than
lQ-? d.
The proposed of the
a-chain
sequence It
loop
is
globin,
rather
may be that
contributes
If
will
secondary
loop
constant
is
region
whose amino acid 1.
among vertebrates
segment of a messenger of a particular
sequence. of human globin
confirm
our prediction
hairpin
a region
a stable
The sequencing 15
the C-terminal
to the retention
amino acid
progress
codes for
or refute
is confirmed, present.
structure
methods used in this
is
messengers our predicted
it
is highly
Furthermore, also
paper
important
it for
combined with
897
now in sequence.
likely
would
that
suggest
nonviral additional
mRNA.
a that The
selection
Vol. 54, No. 3,1973
rules
may then
BIOCHEMICAL
make it
from the many protein We thank their
Drs.
possible
to predict
sequences
availablei.
Jean S. White
comments and criticisms.
the University
AND BIOPHYSICAL
and William
RESEARCH COMMUNICATIONS
messenger sequences
P. Winter
This work was supported
for by
of Delaware.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10 . 11. 12. 13. 14. 15. 16.
Dayhoff, M. O., Atlas of Protein Sequence and Structure, Nationalmedical Research Foundation, Silver vol. 4. Spring, Md. 543 pp (1972). Fellner, P., Ehresmann, C., Stiegler, P., and Ebel, J.-P., Nature New Biology, 239, 1 (1972). -Adams, J. M., Jeppeson, P. G. N., Sanger, F., Barrell, B. C., Cold e. Harb. e. 34, 611 (1969). -Quant. Biol., Fitch, W. M. and Margoliash, E., Evol. Biol., =$, 67 (1970). -White, III, H. B., Laux, B. E., and Dennis, D., Science, 175, 1264 (1972). Mark, A. J. and Petruska, J. A., J. Mol. Biol., 72, 609 (1972). Min Jou, W., Haegeman, G., Ysebae&,x, and FieE, W., Nature, 237, 82 (1972). Natl. Acad. Sci. U.S., Kaempfer,. and Kaufman, J., PE. ---70, 1222 (1973). Eanni, A. M., Giglioni, B., Ottolenghi, S., and Comi, P., Nature New Biology, 240, 183 (1972). -Clegg, J. B., Weatherall, D. J., and Milnor, P. F., Nature, 234, 337 (1971). ad-Akhavan, M., Winter, W. P., Abramson, R. K., and Rucknagel, D. L., Blood, 40, 927 (1972). Fitch, W. M., J. Molec. Evolution, 1, 185 (1972). Sci U.S., Delisi, C. and-Crothers, D. M., 6. Natl. e. -. 68, 2682 (1971). %oco Jun, I., Uhlenbeck, 0. C., and Levine, M. D., Nature, 230, 362 (1972). zotta, C. A., Verma, I. M., McCaffery, R. P., and Forget, B. G., -Fed. Proc. 32, 455 (1973). Kleinauer, E. F., Reynolds, C. A., Duzy, A. M., Wilson, J. B., Moores, R. R., Berenson, M. P., Wright, C. S., Hnisnan, T. H. J., Biochem. Biophys. Acta, 154, 220 (1968). -x
898