Analogies between hemoglobin and immunoglobulin evolution

Analogies between hemoglobin and immunoglobulin evolution

Imraunochemlstry, 1975 Vol 12, pp 495498 Pergamon Press Printed m Great Britain ANALOGIES BETWEEN HEMOGLOBIN AND IMMUNOGLOBULIN EVOLUTION MORRIS G ...

322KB Sizes 1 Downloads 87 Views

Imraunochemlstry, 1975 Vol 12, pp 495498

Pergamon Press

Printed m Great Britain

ANALOGIES BETWEEN HEMOGLOBIN AND IMMUNOGLOBULIN EVOLUTION MORRIS G O O D M A N Department of Anatomy, Wayne State University School of Medicine, Detroit, Michigan 48201, U S A (Recewed 13 September 1974)

Abstract--Events m hemoglobin evolution are explored on phylogenetlc trees constructed for ~t and fl hemoglobin polypepUde chain amino acid sequences by the maximum parsimony method. They show that a germhne gene duphcatlon-mutatlon process of evolution is not unique for ~mmunoglobuhns The genealogy found by the maximum pammony method for known V-region amino acid sequences from heavy and hght lmmunoglobulin chains is best interpreted from the standpoint of the multigene germline theory of antibody diversity

INTRODUCTION In recent years effective computing procedures have been developed to extract evolutionary information from the amino acid sequences of proteins. Procedures based on the maximum parsimony method (Moore et al., 1973) are proving especially useful for deducing phylogenetic history from the sequence differences among present day polypeptide chains. A necessary prerequisite for computing a genealogy by this method is that the protein family be well represented by amino acid sequence data. Such is the case for the globin family. Amino acid sequences are known for over one hundred globm polypeptide chains from a wide range of organisms, and considerable progress has been made in reconstructing their genealogy (Goodman et al., 1974; Goodman et al., 1975. In this paper, I should like to briefly review some of the findings on the globm family with the aim of providing analogies for understanding lmmunoglobulin evolution. The history of present day globins traces back about a billion years to a primordial globin gene which existed in the common ancestor of plants and animals The human genome contains a minimum of eight or nine non-allelic globin genes, one for the monomerlc protein myoglobin and others for the polypeptlde chains of tetramerlc hemoglobins, the and ( chains of very early embryonic hemoglobins, the ~, chains of fetal hemoglobin (F), the /3 chains of the major adult hemoglobin (A), the 6 chains of the minor adult hemoglobin (A2), and the :t chains of these F, A, and A2 hemoglobins. The formulas for F, A and A2 are a2"~2, ~2/32, and ~262 . The and ( chains are not as well understood as the ~t, fl, ),, and 6 chains, although the embryomc hemoglobins in which E occur do have the formulas c4 and ~2E2. There are at least two different 7 chains (~? with glycine at residue position 136 and A). with alanine at this position) coded for by non-allehc genes (Huisman, 1972), and there is evidence that in some individuals, if not throughout the whole human population, two non-allehc loci for 0~chains exist (Brimhall et al., 1974). These two • loci appear to have been produced by a recent gene duplication since they code

for identical ~t chains. The globin phylogenetic tree (Goodman et al., 1974; Goodman et al., 1975) traces the separation of myoglobin from the ~,/3,~,, and ~5evolutionary precursor to a gene duplication which occurred, it seems, in the first vertebrates perhaps about a half billion years ago. Within the next 10a yr m the descending hemoglobin gene line of this globin tree, another gene duplication in the jawed vertebrates in a common ancestor of bony fish and human beings separated the ~ branch from the /3 branch. Later in descent in the early therian mammals m the range of 150 x 106 yr ago, a fl gene duplication produced the separate y locus. About 40 or 50 x 106 yr ago in the primitive Anthropoidea yet another /3 gene duplication produced the ancestral locus from wtuch human and other hominoid 6 genes descended. Point mutations accumulated m these diverging gene lines leading to the sequence differences observed today among the various globins. Immunoglobulin chains like hemoglobin chains also evolve through gene duplications and point mutations. Indeed, it is generally agreed that in human beings there are at the very least twenty four non-allelic germline genes coding for 'constant' of Cregion and 'variable' or V-region portions of lmmunoglobulin chains, fourteen C genes for the different classes, subclasses, and lsotypes of these polypeptide chains and ten more genes for the different subgroups of the V-region portions of the chains (Putnam, 1972). Here the agreement stops. Advocates of the multigene germline theory of antibody diversity would attribute each idiotype (a V-region which differs in sequence structure from other V-regions m its subgroup) to a different germline gene. Thus according to this theory hundreds and even thousands of V genes are present in the germline. Opponents of the theory believe that there are special somatic mutational or recombinational mechanisms which produce the diversity of V gene sequences coding for antibodies during the lifetime of a vertebrate organism. Clearly, no matter how you look at it, the immunoglobulin model of protein evoluuon is far more complicated than the hemoglobin model. Nevertheless, globm evolution is complex enough to

495

496

M GOODMAN

PHYLOGENTIC TREE O~ HEMOGLOBINCHAIN SEQUENCES 150

50

I _4"V" ~ /

I

,,

A\

5

/1".

22

~, /

t i ~ ~. , .~. , .Lt ! . ~ X •' ~~ z' ~ i ~ i~z

~~

~z <~, ,~

8

7>!

~ ®

,' ~ ~

~ ~~,

"

Fig 1 Phylogenetlc tree of 31 ammote Ichicken and mammahan) ~ hemoglobin chain sequences constructed by the maximum parsimony method Augmented link lengths greater than originally observed lengths are the italicized numbers These were calculated by a computer algorithm MNAUG which minimally corrects for the missing mutations due to missing intermediate ancestors (see Goodman et al, 1974 for explanation of the algorithm) The ordinate scale is m mdhons of years and is based on traditional views concerning the branching times of the species represented by the ~(chain sequences make me feel confident that the computing procedures which are proving capable of revealing the genealogy of the globln family will be capable of doing this for the tmmunoglobuhn family PHYLOGENY OF at AND fl HEMOGLOBIN GENES

Figures 1 and 2 show respectively phylogenetic trees constructed by the maximum parslmo,~) method for 31 amlnote (chicken and mammal) ~ and 51 tetraF~OGN~V/6 61

~

3"30

I

I

/z

/ \

. o

/ 3 HEMOGLOBIN CHAIN SEQUENCES

oc

90

f

o

~'

| Fig 2 Phylogenetlc tree of 51 tetrapod (frog, chicken, and mammalian) fl-type hemoglobin chain sequences. Link lengths augmented to compensate for missing mutations are the ,taliclzed numbers The ordinate scale is m millions of years and depicts from traditional fossil-based evidence the times of branching of the species represented by the fl chain sequences

pod (frog, chicken, and mammal) fl hemoglobin chain sequences Each tree was found after an extensive search for that genealogical branching arrangement which would require fewer mutations than any other branching arrangement to account for the descent of the sequences. This parsimony of mutations criterion ensures that with respect to the structural similarities among ahgned sequenced protein chains the largest number of them are accounted for by inheritance from common ancestors and the smallest number by parallel or reverse mutations In depictmg gene phylogenies, Figs. 1 and 2 also depict species phylogemes Not only are the branching arrangements remarkably similar for the spectes common to both trees, but they agree closely with the phylogeny deduced from traditional fossil and neomorphological evidence Indeed because of this, it was possible to use a time scale for the ordinates of Figs l and 2 based on traditional views concerning the ancestral branching times of the species represented by the a and fl sequences. The fact that these computed globm genealogies are validated by the traditional evidence on phylogeny can give us more confidence when dealing with lmmunoglobuhn amino acid sequences, (which presently represent many genes, but few species), that the maximum parsimony method will be able to capture their actual genealogical relationships. A criticism raised against the multlgene germhne theory of antibody diversity is that V-region sequences of a particular subgroup type possess 'species-specLfiC' residues when the sequences compared are from such ammals as rabbit, mouse, and man Somehow this was thought to contradict the theory Figures l and 2 show that there is no contra&ctlon at all, and in fact 'species-specific' residues are exactly what would be predicted from a germhne gene duphcation-mutation process. All that is required is that gene duphcattons continue to occur after the separation of the mammalian hneages being compared. (It matters not if there are only relatively few such duplications as in the globin family or relatively many as in the lmmunoglobuhn family; the principle itself holds) Take the 30~ hemoglobin chain locus which is expressed in some gorillas and chimpanzees (Boyer et al., 1973). It IS non-allelic to the predominant active c( locus of hominoids. Yet the sequences coded for by the two loci are genealogically closer to each other than to any non-hommold 7s. The gene duplication producing the separate 3c( locus apparently occurred as depicted in Fig l in the early hominoids. Similarly an ~ gene duphcatlon in a caprlne common ancestor of goats and sheep is responsible for the two nonallehc c< chains found in present day goats. Understandably these two ct chains share caprme 'speciesspecific' residues Non-allelic closely linked loci on the same chromosome code for the ?, fl and 6 sequences of human F, A, and A 2 hemoglobins Figure 2 depicts the fl-), dupllcauon as occurring in the early therlan mammals after the separation of the prototherian mammals (echidna) but before the divergence of metatherlans (marsupials) and eutherlans (placentals) The linkage between fl and 7 loci apparently has been maintained for the past 150 or so mdhon years, but no derived pnmate-specLfiC residues are shared between human fl and 1' chains On the other hand.

Analogies between Hemoglobin and Immunoglobulm Evolution fl and 6 chains in the Anthropoidea do share 'speciesspecific' residues (better called, phylogenetically related residues). Moreover it appears that the ~ loci of hominoids and ceboids arose from independent fl gene duplications, one in the early catarrhine primates and the other in the early platyrrhme or cebold primates, because the ~ chains of the hominoid branch of the catarrhlnes appear to be genealogically closer to catarrhme fls than to ceboid 6s and, similarly, cebold rSs appear to be closer to cebold fls than to hominoid fis (Fig. 2) Independent fl gene duphcations also occurred in the artiodactyls. A duphcation in the early bovlds produced the separate locus coding for the fl-type chain of bovld fetal hemoglobins, and another, later duplication in the early caprlnes produced the closely linked loci coding for the flA and tiC chains found in both sheep and goats. Not surprlslngly, bovid fetal and adult fl sequences share bovid-speclfic residues and the caprme flA and tiC sequences further share caprme-specffic residues. The caprlne region of the fl tree in Fig.2 is of further interest in showing that multiple mutational differences can accumulate between allellc genes Sheep flA and fib sequences are coded for by allehc genes which differ by seven point mutations Moreover this accumulation of differences began in a caprlne lineage ancestral to both sheep and goats, because sheep flA is genealogically closer to goat flA than to sheep fiB This prolonged accumulation of mutational differences between allellc genes is analogous to differences among certain rabbit V-region allotypes coded for apparently by allelic genes in the regular Mendehan manner I shall not attempt to go into the details of the V-region allotype phenomena, which are thought to argue against the multigene germhne theory of antibody diversity (see Smith et al, 1971, for a rebuttal), but I would like to point out that if antibody diversity is to arise from a germline gene duplication- mutation process, then extensive allehc as well as non-allehc differences between Vregion genes are to be expected GENEALOGICAL TREE OF V-REGION SEQUENCES Amino acid sequences are known for complete Vregion portions of about forty five different immunoglobulin chains Mr Ken Garst, a graduate student in my laboratory, has been analyzing* these sequences by the maximum parsimony procedures which are proving so successful in reconstructing globin phylogeny. He has been searching for that genealogical branching arrangement for the V-region sequences which would require fewer mutations to account for their descent than any other branching arrangement The tree topology of lowest mutational length found so far is shown in Fig 3 The genealogy shown in this tree agrees in main features with those reported b) Dayhoff(1972) and by Smith et al (1971) However by combining in the same data set VH, V~, and VK sequences aligned against one another, it has been possible to obtain a clearer picture of the genealogical position of the mouse V, sequences in relation to human V~ and rabbit, mouse, and human VK sequences. As may be noted in Fig 3 the mouse Va * A more detaded account of this work will be prepared at a later date

49"/

U6HT

~tart

GENEALOGY

Of ~APm

V-REGION SEOUENCES

---HUMAN

~ RABBIT

HUMAN

~ MO~SE

HUMAN

Fig. 3 The genealogical branching arrangement found by the maximum parsimony method for complete V-region sequences of 45 light and heavy lmmunoglobulln chains branch descends from the most ancient splitting among light chain V-region sequences. Human V~ sequences are genealogical closer to VK sequences, whether from rabbit, mouse, or man, than to the branch of mouse V sequences. Yet the mouse V~ genes are in the same genetic linkage system as mouse C~ genes, and mouse and human C~ sequences are considered (Dayhoff, 1972) to be genealogically closer to each other than to human and mouse CK sequences (this is confirmed in our prehminary tree results with C-region sequences). What this could mean is that the lambda linkage system is phylogenetlcally older than the kappa linkage system We may suppose that the kappa system arose from the lambda by chromosome duplication and later in its evolution had its more ancient V-region loci replaced by new loci from fresh gene duplications, whereas in the lambda system a lone lineage of V-region genes predating the lambda-kappa system divergence left a few survivors in present day mice The divergence of kappa and lambda systems clearly occurred long before the ancestral stock of eutherian mammals radiated into different orders, because the origins of the human VK subgroups VKI, VK2, and VK3 predate, as shown m Fig. 3, this eutherian radiation No wonder so called 'species-specLhc' residues are found m some sets of rabbit, mouse, and human VK sequences. Even within a V-region subgroup a branching arrangement is observed whereby the sequences are sorted into clusters containing more closely related sequences. This phenomenon of an ordered genealogy has been discussed by Smith et al. (1971). Since each sequence comes from a different individual, it seems extremely Improbable that a somatic mutation or recombination mechanism could so act in each individual on the somatic descendants of that supposed germllne gene common to all indwiduals as to produce by parallel changes the appearance of an ordered genealogy It is the multigene germhne theory of V-region diversity which offers the rational explanation for the phenomenon. REFERENCES Boyer S H., Noyes A N, Boyer M L. and Man K (1973) J blol Chem 248, 992 Brlmhall B, Duerst M, Holtan S R, Stenzel P, Szelenyl J and Jones R T (1974) Biochtm btophys Acta 336, 344

498

M GOODMAN

Dayhoff M O. (1972) Atlas of Protein Sequence and Structure, Vol 5. National Biomedical Research Foundation, Sdver Spring, Md Goodman M., Moore G W., Barnabas J. and Matsuda G 11974) J. molec Evolution 3, I. Goodman M., Moore G. W. and Matsuda G (1975) Nature, Lond 253, 603

Hmsman T. H J. (1972) Adv chn. Chem 15, 149 Moore G. W, Barnabas J and Goodman M (1973) J. theoret. Biol. 38, 457. Putnam F W. (1972) d Human Evolutlon 1, 591 Smith G P, Hood L and Fitch W. M (1971) Ann. Rev Btochem 40, 969