The origin of the genetic material in the abnormally long human hemoglobin α and β chains

The origin of the genetic material in the abnormally long human hemoglobin α and β chains

Vol. 47, No. 4,1972 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS THE ORIGIN OF THE GENETIC MATERIAL IN THE ABNORMALLY LONG HUMAN HEMOGLOBI...

279KB Sizes 1 Downloads 11 Views

Vol. 47, No. 4,1972

BIOCHEMICAL

AND BIOPHYSICAL RESEARCH COMMUNICATIONS

THE ORIGIN OF THE GENETIC MATERIAL

IN THE ABNORMALLY

LONG HUMAN HEMOGLOBIN CI AND B CHAINS

Lois T. Hunt and Margaret 0. Dayhoff National Biomedical Research Foundation Georgetown University Medical Center 3900 Reservoir Road, N.W. Washington, D.C. 20007

Received April 6, 1972 SUMMARY The additional carboxy-terminal segment of the abnormal human Hemoglobin Constant Spring (Hb CS) C( chain and positions 68-98 of the normal human a chain have nine identical residues; this was the maximum number revealed by computer search of 518 known protein sequences. Searches were also made with Jl-residue segments having randomized sequences of an average composition. Because the probability is less than 1% that the similarity is due to chance alone, the results favor a comnon genetic origin for the Hb CS CYchain additional segment and the region 68-98 of the normal a chain; it may also be true that the additional segment of Hemoglobin Tak B chain arose in a similar manner. Recently

two variant

human hemoglobins

described;

Clegg,

Hemoglobin

Constant

and Flatz,

Kinderlerer,

Kilmartin

Hemoglobin

Tak (Hb Tak)

has 10 additional

normal

messenger

greater

than

Weatherall Spring

that

elongated

terminates

the

needed

the abnormal

chain

scribed

mRNA but

the

genetic

material

and Lehmann2

to code for

then

reflect

codes

chains

@ 1972, by Academic

Press, Inc.

into

the extended 699

Copyright

that

the

B chain

residues.

The

Therefore, in the

additional

structure protein. region

residues, of

to have a length

a mutation the

chromosome

of

carboxy-terminal

are believed

from

have been

the c1 chain

proteins.

protein;

translated for

chains

carboxy-terminal

form may result

not usually which

that

have found

the actual

of mRNA into

would

have found

the hemoglobin

protein

translation

elongated

(Hb CS) has 31 additional

RNAs for

abnormally

into

and Milnerl

having

an codon which

residues

which

is

tran-

Alternatively, may have been

in

Globin -Gastropod Mollusc Hemoglobin CYchain - Kangaroo Hemoglobin CYchain - Llama Hemoglobin ~1 chain - Irus Macaque Hemoglobin ~1 chain - Rhesus Macaque Histone IIb2 -Bovine Immunoglobulin K chain -Human BAT Imnunoglobulin K chain -Human CUM Imnunoglobulin K chain -Human TEN Lipotropin B - Pig

495

EIGHT IDENTICAL RESIDUES

a chain - Human ~1 chain - Mouse Enzyme - Myxobacter

- Human CS

Hemoglobin Hemoglobin Trypsin-like

a chain

to the

1:':

68

142

1:':

98

173

POSITION IN SEQUENCE

Spring

(CS)

31 Residues

SEQUENCE

of the ~1 Chain

GFVASVAAPPAGADAWTKLFGLIIDALKAAG QAVEHIDDLPGTLSKLSDLHAHKLRVDPVNF K(A.A,A.H.L.B.B.L.P.S,A.L.S.B.L.S.B.L.H.A.H)K.L L.A.V.G.H.V.D.D.M.P.Q.A.L.S.A.L.S.D.L.H.A.H)K.L LAVGHVDDMPNALSALSDLHAHKLRVDPVNF YNKRSTITSREIQTAVRLLLPGELAKHAVSE QSPLSLPVTPGEPASISCRSSQSLLHSBGBB QTPLSLPVTPGEPASISCRSSQSLLDSGDGN QSPLSLPVTPGEPASISCRSSQH(G,B)SFLNWY PEPARDPEAPAEGAAARAELEYGLVAEAQAA

NAVAHVDDMPNALSALSDLHAHKLRVDPVNF NAG(A.H.L)DDLPGALSALSDLHAHKLRVDPVNF GNVQSNGNNCGIPASQRSSLFERLQPILSQY

QAGASVAVPPARWASQRALLPHSLRPFLVFE

Constant

Carboxy-terminal

of Human Hemoglobin

Similar

NINE IDENTICAL RESIDUES

Hemoglobin

PROTEIN

Segments

TABLE 1

R(V.B.P.V.B.F. R(V.D.P.V.N.F.

F

s 5

"0

$ s d F r z m F 5

s 5 0

P

d.5 ?

Vol. 47, No. 4, 1972

introduced

through

of the extra authors1$2 portion ties,

recent

segments noted

that

we have

48,292

in the

residues,

these

Let

searches

us consider

The highest

first

number

surprising

in the

structural

genes

similarity

occurring

theoretical with

segment

of this

are

hypothesizes score

which

chance

in

we might 100 that

of

with

a total

from

the C( chain

of

segment

of HB CS. was 9.

occurred

was derived

from the

of the

a computer occur

probability

in two ways.

About

in the collected

of human hemoglobin.

is one chance

have come from segments

chain,

which

re-

is

32,500

data,

and 111

Thus,

if

in 293 that

this

~1

of this

The first

simulation.

It

in the

of the HB CS c1 chain

three

of

The results

material

there

one of the

all

the genetic

CI chain

would

similari-

with

families.

segments

68-98),

involves

find

any

the Atlas of protein

the 31-residue

can be derived

no relationship,

to discover

from

segment

Some estimate

in the

resemble

1 and 2.

for

of 31 residues

found

the

unless

family.

by chance

a length

such segments

derived

extra

not

518 sequences,

high-scoring

(positions

and the second

segments

found

the origin

protein,

segments

protein

in Tables

one of the three

of human hemoglobin

presented

include

the additional

of identities

that

Data Deck"

154 different

are shown

did

In an effort

19?Z3. The data from

segment

of the additional

Sequence

case,

For each abnormal

sequence.

"Protein

In either

of the extra

the sequences

derived

computer

chain

hemoglobin

compared

aberrations.

be of interest.

the sequence

and Structure

Sequence

AND BIOPHYSICAL RESEARCH COMMUNICATIONS

chromosomal

would

of any known

the sequences

is

BIOCHEMICAL

a single

and about

we actually

one high one

found

would

be

from it. In order

to compare

of the search composition

shown

experimentally,

in Table

was average

for

had 3 each of Ser and Ala; Lys;

and 1 each of

ated

12 sequences

collection

for

Ile, with

similar

1 with

a null

the protein

Gln,

Arg,

composition

segments.

There

case,

Glu,

in Pro,

Phe, Tyr, by lottery were

701

simulation,

we constructed

families

2 each of Asp,

Asn, this

by computer

five

the data Thr,

Cys, His,

results

segments collection:

Gly,

Val,

and Met.

and searched segments

the

whose they

Leu,

and

We gener-

the data

or an average

of

Vol. 47, No. 4, 1972

BIOCHEMICAL

AND BIOPHYSICAL RESEARCH COMMUNICATIONS

TABLE 2 Segments

Similar

to the

Carboxy-terminal

10 Residues

of Human Hemoglobin

FIVE IDENTICAL RESIDUES

0.4

8 chain

POSITION IN SEQUENCE

-Human

TAK

Elastase - Pig Hemoglobin a chain - Bovine Hemoglobin aA chain - Goat Hemoglobin sB chain - Bovine Hemoglobin B chain - Dog Hemoglobin B chain -Brown Lemur Hemoglobin 8 chain - Rhesus Macaque Hemoglobin y chain -Human Lysozyme -Bacteriophage T4 Growth Hormone -Human Coat Protein - Bacteriophage FR

sequences

was from of 8.

per search

a hemoglobin

with

Among the 73 protein

12 searches,

a score

chain.

There

were

families

with

a hemoglobin

hemoglobin

a chain

segment

with

a score

ted by the

product

of the

chance

that

chance

that

have

of 9 and the a high

score:

The chance score

high

segments

is having

The chance 9 would

that

then

For this

0.4

family

=

a segment

somewhat

less;

from only

be 0.016 value

x & 0.002

from

T K L L A(N,S,L)F

119 98 98 102 103 103 103 103 59

128 107 107 111 112 112 112 112 68 102 129

TILANNSPCY F K L L.S H S L L.V F)K L L.S H S L L.V F K(L.L.G.N.V.L.V.V. FKLLGNVLVC F)K(L.L.G.B.S,L.B,S, FKLLGNVLVC FKLLGNVLVT TKDEAEKLFN RSVFANSLVY TAIAANSGIY

1;:

None of those

sequences

appeared

search

with

a score

of 8 or 9 represented

three

of 9 would

times.

can then

hemoglobin

the

that

a

be estima-

of any segment

the

in

The chance

appear

the sequence from

per

found

will

have

a family

a

will

0.016. the human hemoglobin

five

7, 8, and 9 identities a segment

156

scores

a segment

x k

that

7.7

SEQUENCE

147

of 9 identities.

the

score

B Chain

TAK

PROTEIN Hemoglobin

of the

a chain

itself

would

occur

among the 47 hemoglobin

a

(high

scores)

segments.

human hemoglobin

a

with chain

the would

random have

chain

a score

2. 0.002. we derived

an estimate

702

of the standard

deviation,

of

Y

Vol. 47, No. 4, 1972

which

is

If

?0.002.*

segment with

BIOCHEMICAL

the

distribution

from human hemoglobin

a confidence

level

From either ment,

point

the probability

segment

AND BIOPHYSICAL RESEARCH COMMUNICATIONS

is

C( chain

greater

than

of view,

the

is

of HB CS a chain

low,

only

normal,

having

then

a score

the probability

of 9 is

less

of a than

l%,

99%. theoretical about

one or the computer

1X, that

to the human hemoglobin

the similarity cx chain

is

experi-

of the extra due to chance

alone. Although

a probability

conclusion,

nevertheless,

proximity that

the similar

probable

that

additional several

the

divergence If

faster

is so, in

the

segment

chains

are prominent

in

for

segments

75 million

mutations functional

similar

is

case for

its

the data

origin are

part

ago.

of the a chain.

2.

segment,

from a portion

not inconsistent

sur-

as the

years

to the additional

short

Because

considerably

of Hb Tak are shown in Table For this

and the

we would back

to

very

origin.

as far

possibly

in the

the results.

However,

make it

as similar,

occurred

of point

than

B chain

to make a strong chain.

each other,

segment

of the

almost

segment

of acceptance

of a search

terminal

hemoglobin

rate

segment

68 to 98 of the CL chain

are

a

and the

the extra

had a common evolutionary

a chains

from

for

establish

aberration

of human hemoglobin

residues

of the extra

species

the additional

The results

possible

mammalian

does not

of genetic

material

of the ~1 chain including

appearance

of these

this

genetic

of HB CS cx chain

of the other that

of the

region

the segment

segment

by itself

the known mechanisms

on the chromosome

for

mise

of such magnitude

it

carboxyHemoglobin is

not

of the

functional

such

an origin.

with

* Theoretical estimates of the standard deviations of the randm variables involved here are given by a, where N counts are observed. The standard deviation of the product of three functions was estimated as follcns: Let 6 = the standard deviation of the nth variable divided by its mean; then !he fractional standard deviation of the product of three independent distributions is given by

“ijk

=

6? + s? + 62 -I-s? 6? + 62ik 62 + s? 62 Jk J

k

1J

703

Vol. 47, No. 4, 1972

The carboxyl other.

Their

events.

This

duplication leading

BIOCHEMICAL

ends of the

origins is

may well

two abnormal

to c1 and B chains

chains

RESEARCH COMMUNICATIONS

are not

have been two different

to be expected

at the time

AND BIOPHYSICAL

if

the additional

of the mammalian

radiation,

very

series

a chain long

similar of genetic

segment after

to each

arose

from

the duplication

of hemoglobin.

ACKNOWLEDGMENTS This the National

investigation Institutes

was supported

by Grants

GM-08710

and RR-05681

from

of Health.

REFERENCES 1. 2. 3.

Clegg, Flatz, 1: 732 Atlas press,

J.B., Weatherall, D.J. and Milner, P.F., Nature, 234: 337 (1971) G., Kinderlerer, J.L., Kilmartin, J.V. and LehmancH., Lancet, (1971) of Protein Sequence and Structure 1972, ed. Dayhoff, M.O., 5: in National Biomedical Research Foundation, Washington, D.C. (i972)

704

a