receptor complexes

receptor complexes

THE EVOLUTION OF HAEMATOPOIETIC CYTOKINE/RECEPTOR COMPLEXES Denis C. Shields,1,2 Dawn L. Harmon,1,2,3 Fatima Nunez,1,2,3 Alexander S. Whitehead1,2 The...

214KB Sizes 0 Downloads 30 Views

THE EVOLUTION OF HAEMATOPOIETIC CYTOKINE/RECEPTOR COMPLEXES Denis C. Shields,1,2 Dawn L. Harmon,1,2,3 Fatima Nunez,1,2,3 Alexander S. Whitehead1,2 The evolutionary expansion of the haematopoietic cytokines and their receptors is characterized by the duplication of both cytokines and receptors. A systematic analysis of primary sequence homology indicates that receptors for gp130-associated cytokines group into signal transducing and non-signal transducing receptors. This observation is consistent with the evolution of the interleukins 6, 11 and 12, granulocyte colony stimulating factor (G-CSF), leukemia inhibitory factor (LIF), oncostatin M, and the ciliary neurotrophic factor complexes from a common ancestral complex which included a homodimer of gp130-like signalling receptors and an interleukin 6 receptor-like non-signalling receptor. Alterations in the components of the complex are proposed to have arisen by receptor duplication and divergence to allow signal transduction via a LIF receptor/gp130 heterodimer, and loss of the non-signalling receptor component in the G-CSF and the LIF lineage. The short-chain haematopoietins and their receptors do not group clearly, although interleukins 4 and 13 grouped together, as did 2 and 10. Internal duplication of the ligand-binding domain appears to have occurred independently in three separate lineages. These observations have implications for the classification of cytokines and receptors, and for the modelling by homology of their structures and interactions. © 1995 Academic Press Limited

Cytokines are signal molecules which are involved in a range of physiological processes including haematopoiesis, cell differentiation and the initiation, maintenance and modulation of host defence mechanisms. They are usually secreted by one cell type to bind a signal transducing receptor complex on a target cell type, thereby eliciting responses such as cell activation, proliferation, maturation, differentiation, adhesion, migration, and the increased synthesis of acute phase proteins. In addition cytokines may induce the production of other cytokines and their antagonists and receptors, thereby contributing to the genesis of a cytokine signalling ‘network’ during host defence. Individual cytokines may have multiple functions, some of which are unique and some of which are shared with other cytokines.1 The consequent complexity of the cytokine ‘network’ reflects a requirement for particular combinations of cytokines to meet different physiological challenges. Seven cytokine families with distinct tertiary strucFrom the 1Department of Genetics, 2Biotechnology Institute, and 3 National Pharmaceutical Biotechnology Centre, Trinity College, Dublin 2, Ireland Correspondence to: Dr Denis Shields Received 30 January 1995; accepted for publication 3 May 1995 © 1995 Academic Press Limited 1043-4666/95/070679110 $12.00/0 KEY WORDS: cytokine/evolution/haematopoietin/interleukin 6/receptor CYTOKINE, Vol. 7, No. 7 (October), 1995: pp 679–688

tures have been identified.2 The haematopoietins, which have a distinctive core 4-α-helix bundle, were first grouped together because most of their receptors have an extracellular haematopoietin-binding domain that contains a conserved WSXWS motif and shared cysteine residues.3 These cytokines have been sub-classified into two groups according to the length of their core helices:2 a short-chain group where in some cases the cytokine functions as a dimer [granulocyte-macrophage colony stimulating factor (GM-CSF), interleukins (IL) 2, 3, 4, 5, 7, 9, 10, 13] and a long-chain group, where the cytokine is only known to act monomerically [growth hormone (GH), prolactin (PRL), thrombopoietin (THPO), erythropoietin (EPO), IL-6, IL-11, ciliary neurotrophic factor (CNTF), leukemia inhibitory factor (LIF), oncostatin M (OSM), IL-12, and myelomonocytic growth factor (MGF)]. The current model for signal transduction requires the formation of a homodimer or heterodimer of two receptor molecules with long cytoplasmic domains.4 Other receptor molecules lacking such a domain may also be present in certain complexes, so that as many as three distinct receptor molecules may be involved in, and required for, signal transduction.5 The evolution of the genes encoding the current spectrum of cytokines and receptor complexes involved multiple duplications from a smaller set of genes, followed by divergence of sequence and product function. A receptor component may be common to more than 679

680 / Shields et al.

one complex, indicating that the generation of new complexes has been achieved by altering only some of the components of the complex. Thus IL-6, IL-11, CNTF, OSM and LIF share the gp130 receptor;5 IL-2, IL-4, IL-7 and IL-13 share the γc receptor,6 and IL-3, IL-5 and GM-CSF complexes share the KH97 receptor.5 A recent example of the continuation of such evolutionary processes in current mammalian lineages is the presence in mouse of two signal transducing subunits related to human KH97, one of which is used solely by IL-3.7 The functional redundancy and pleiotropy of cytokines in higher mammals8 is consistent with their evolution from counterparts in simpler organisms with fewer signalling complexes. To characterize this evolutionary process, we have carried out a systematic analysis of sequence similarities among cytokines and their receptors, and developed an appropriate evolutionary model.

RESULTS AND DISCUSSION The conservation of key features in short and longchain haematopoietin structures supports the hypothesis that their structural similarities reflect evolution from a common ancestor rather than convergence to an advantageous structure.2 Thus the diversity of haematopoietin complexes probably arose through duplication of all or some of their elements combined with changes in the composition of particular complexes. Two approaches were taken to characterizing similarities among the cytokines and their receptors. The first, profile searching, identifies similarities which provide useful overall groupings. However, these do not definitively specify which molecules are most closely related, as a protein whose sequence remains relatively unchanged due to a slow rate of evolution will appear to be more closely related to many molecules than will a rapidly evolving protein. Therefore, sequences were multiply aligned, and their phylogeny inferred by constructing trees using appropriate sequences as outgroups. Cytokines For each cytokine profile searched, Table 1 lists in order of matching score all the sequences on the augmented SwissProt 29 database which had a Z-score of 5.0 or more. On the basis of these searches, the cytokines group into the following approximate families: (1) The IL-6 group: G-CSF, MGF, IL-6, IL-11, OSM, CNTF, LIF, (IL1-2A?); (2) EPO and thrombopoietin (THPO); (3) GH and PRL and related proteins; and (4) IL-2 and IL-10. Matches which cut across these groups (e.g. the identi-

CYTOKINE, Vol. 7, No. 7 (October 1995: 679–688)

fication of PRL by the IL-11 profile) were observed, some of which extend beyond the haematopoietins (e.g. matches of interferon by the IL-3 profile). While some more members of the short-chain haematopoietins recognized each other, the matches were weak and inconsistent. For the IL-10 search, the mean Z-score over IL-2 species was higher than the mean score obtained for any other cytokine, and similarly the IL-2 profile identified IL-10 as the nearest cytokine. In the same way IL-4 and IL-13 recognized each other, although the former showed some similarity to IL-7 and the latter some similarity to IL-3 (Table 1). Other groupings were weaker. The IL-6 grouping did not emerge clearly from the initial searches: in particular, IL-12A was only identified by G-CSF. However, the initial grouping, including IL-12A, was confirmed by incrementally adding new proteins to an expanding family. Starting with IL6, the profile progressively identified as most similar: MGF, IL-11, CNTF, OSM, LIF, and finally IL-12A. A combined profile of EPO and THPO failed to identify any further family members; neither did a combined profile of IL-2 and IL-10 clearly identify any closelyrelated cytokine. To construct a tree of the IL-6 group of cytokines, growth hormone (GH) was chosen as a more distantly related signal molecule to act as an outgroup to root the tree. The alignment, based partly on the known tertiary structures of G-CSF, LIF and GH, was restricted to the four a-helices (Fig. 1). Although this is likely to minimize errors of alignment, it includes only 109 residues, none of which are completely conserved across all sequences. Some of the IL-6 and many of the GH sequences were excluded, to eliminate pairwise distances which were too great for Kimura’s correction for multiple hits. The inferred tree topology is only weakly supported at most branch-points (Fig. 2). The topology is not inconsistent with previous classification of these cytokines9 into an IL-6 subgroup (IL-6, G-CSF, MGF and IL-11) and a LIF subgroup (LIF, OSM, IL-12A). However, the positioning of IL-11 in particular is weakly supported. IL-12A was omitted from this tree, as it was not possible to reliably align it (when included it was placed outside the rest of the gp130 group; not shown). Receptors Receptor (R) profile search results are presented in Table 2. Groupings were less clear than for cytokines; certain receptors such as PRLR were identified by many profiles, suggesting that they are more conserved. The following approximate families were suggested: (1) CNTFR, IL-11R, IL-6R. [IL-12 cytokine receptorlike moiety (IL-12B)] (2) GP130, G-CSFR

Cytokine/receptor complex evolution / 681

TABLE 1. Proteins identified by profile searches with cytokines (Z-score of 5.0 or geater). Proteins used in the profile are given in italics Interleukin 2 IL2_SHEEP IL2_MOUSE

IL2_BOVIN ––

IL2_PIG *IL10_CEY

IL2_CAPHI IL2_HUMAN IL10_HUMAN *IL10_MAC

IL2_GIBBON

IL2_RAT

*IL2_CAT

Interleukin 3 IL3_GIBBON IL3_RAT

IL3_HYLLA TP13_BOVIN

IL3_HUMAN TP12_BOVIN

IL3_MACMU *IL3_CJ INA2_MOUSE

IL3_SHEEP

*IL3_COW

IL3_MOUSE

IL4_HUMAN IGF2–HUMAN

*IL4_CEU IL7_MOUSE

IL4_BOVIN *IL4_MAC IL13_HUMAN

IL4_SHEEP

IL4_RAT

IL4_MOUSE

Interleukin 5 IL5_HUMAN

*IL5_CEY

*IL5_RAT

IL5_MOUSE

Interleukin 6 IL6_PIG MGF_CHICK –

IL6_BOVIN CSF3_MOUSE *IL11_MOUSE

IL6_SHEEP CSF3_CANFA IL11_MACFA

IL6_HUMAN *IL6_MAC *CSF3_SHEEP CSF3_BOVIN ––––––

*IL6_CEY CSF3_HUMAN

IL6_MOUSE –

IL6_RAT IL11_HUMAN

Interleukin 7 *IL7_SHEEP

IL7_BOVIN

IL7_HUMAN

IL7_MOUSE

––––

Interleukin 9 IL9_MOUSE

IL9–HUMAN

–– – – – – – – – – –

Interleukin 10 *IL10_MAC *IL10_PIG

*IL10_CEY BCRF_EBV

*IL10_SHEEP IL2_PIG

*IL10_COW

IL10_HUMAN

IL10_RAT

*IL10_CEY

IL10_MOUSE

Interleukin 11 IL11_HUMAN – IL6_PIG

IL11_MACFA CSF3_HUMAN IL6_MOUSE

*IL11_MOUSE CNTF_CHICK

MGF_CHICK CSF3_CANFA

CSF3_BOVIN –

PRL_HUMAN *CSF3_SHEEP

IL6_BOVIN – –– –

IL6_SHEEP *CSF2_RAT

Interleukin 12, alpha subunit I12A_HUMAN *I12A_MOUSE



SOMV_HUMAN ––––

Interleukin 13 IL13_MOUSE

Interleukin 4 IL4_PIG –

*IL13_RAT

––––

INS_CERAE

*IL3_COW

IL13_ HUMAN

––

Myelomonocytic growth factor (MGF) MGF_CHICK CSF3_CANFA IL6_PIG IL6_HUMAN IL6_RAT –

CSF3_BOVIN IL6_MOUSE PRL_HYPNO

*CSF3_SHEEP CSF3_HUMAN CSF3_MOUSE IL6_BOVIN IL6_SHEEP IL11_HUMAN *I12A_MOUSE –

*IL6_CEY IL11_MACFA

*IL6_MAC *IL11_MOUSE

Ciliary neurotrophic factor (CNTF) CNTF_HUMAN CNTF_RABIT ONCM_HUMAN MGF_CHICK

CNTF_RAT CSF3_BOVIN

CNTF_CHICK CSF3_HUMAN IL11_HUMAN CSF3_CANFA *CSF3_SHEEP – –

IL11_MACFA *IL11_MOUSE

PRL_PROAT –

Leukemia inhibitory factor (LIF) LIF_RAT LIF_MOUSE

LIF_HUMAN

––

Oncostatin M (OSM) ONCM_HUMAN CNTF_CHICK

CSF3_BOVIN

––

Granulocyte-macrophage colony-stimulating factor (GM-CSF) CSF2_SHEEP *CSF2_PIG CSF2_BOVIN EPO_HUMAN

CSF2_HUMAN *CSF2_CANFA

CSF2_MOUSE

*CSF2_RAT

PRO1_PHYPO

Granulocyte colony-stimulating factor (G-CSF) CSF3_HUMAN *CSF3_SHEEP CSF3_CANFA IL6_SHEEP IL6_HUMAN *IL6_CEY CNTF_CHICK *I12A_MOUSE

CSF3_BOVIN *IL6_MAC

MGF_CHICK IL11_MACFA

IL6_PIG ONCM_HUMAN

IL6_BOVI IL6_MOUSE

Prolactin (PRL) PRL_(21 species) PRO2_MOUSE

CSF3_MOUSE IL11_HUMAN

(74 other PRL/GH related proteins) PRO3_MOUSE INB2_BOVIN ––

PRO1_MOUSE

Growth hormone (GH) SOMA (41 species) PRO2_MOUSE PROR_MOUSE

(51 other PRL/GH related proteins) PRO3 MOUSE

PRO1 MOUSE

Erythropoietin (EPO) EPO_HUMAN *THPO_MOUSE

EPO_FELCA

EPO_MOUSE

EPO_SHEEP

EPO_RAT

EPO_CANFA

*THPO_HUMAN

– ––––––

EPO_FELCA EPO_MACFA

EPO_MOUSE ––––––

EPO_CANFA INS_CANFA



EPO_RAT

PROR_MOUSE

EPO_MACFA –––

Thrombopoietin (THPO) *THPO_MOUSE *THPO_HUMAN EPO_SHEEP EPO_HUMAN

*: Sequence obtained from GenBank 84; others from SwissProt 29. –: False positive (no identified relationship to cytokines).

(3) LIFRN (N-terminal domain), LIFRC (C-terminal domain) (4) KH97N, KH97C, mouse AIC2AN, AIC2AC (5) EPOR, MPLN, MPLC (6) GHR, PRLR (7) GMCR, IL-3Rα, IL-5Rα.

The tree constructed for the gp130 group was based on a mainly automatic alignment of the haematopoietinbinding domains (Fig. 3), and may include errors undetectable by eye. The IL-12R, IL-12B, and LIFR N-terminal domain were omitted from the tree because of their level of divergence and difficulties in alignment.

682 / Shields et al.

CYTOKINE, Vol. 7, No. 7 (October 1995: 679–688)

Figure 1.

Alignment of the four α-helices of the gp130 cytokines with GH .

When they were added individually (not shown), LIFRN grouped with LIFRC, IL-12R grouped clearly with the signal-transducing receptors, and IL-12B was clearly grouped with the non-transducing receptors, but outside the other three. The main finding, that the receptors fall into signal-transducing, and non-signaltransducing groups (Fig. 4) is supported by a bootstrap of 99%. Co-evolution of gp 130-group cytokines and receptors The inferences concerning receptor and cytokine evolution can be combined into a single co-evolutionary model (Fig. 5). The approach used here was to develop a model which is consistent with both the phylogenies of cytokines and receptors under consideration here, and our current understanding of cytokine biology. Note that receptors which have not been fully characterized10,11 are shown for the OSM and IL-12 complexes in our model, and that it is likely that further as yet unidentified receptors exist.

Figure 2. Neighbour-joining tree of gp130-group cytokines with GH as an outgroup, based on the alignment in Figure 1, excluding positions with any gaps. The % bootstraps supporting each main branching are shown.

The model is intended to be parsimonious, employing as few changes in overall complex composition as possible, thereby postulating ancestral complexes for which there are present-day analogues. It is based in part on the assumption that existing functional complexes did not recruit additional receptor molecules from other complexes (e.g. the CNTF complex did not acquire an IL-6R like receptor into a complex previously involving only gp130 and LIFR receptors). Many of the branchings in this tree require duplication and divergence of both a cytokine gene and receptor gene(s) to form a novel complex. Unless the ancestral cytokine and receptor genes were closely linked these would have occurred as separate events. Although it would be desirable to incorporate the order of duplication events, this is not possible, so the model portrays the simplest case, i.e. that of simultaneous duplication of cytokine and receptor. As the rates of amino acid substitution in ancestral cytokines can be very different from the rates of change observed between mammalian lineages,12 no assumptions about rate constancy were made. For the gp130-group receptor tree, growth hormone/prolactin is assumed to form an outgroup. This is consistent with the Immunoglobin-like (Ig) domain being acquired prior to the divergence of the signalling and non-signalling receptors (Fig. 5), with the subsequent acquisition of fibronectin-like domains in the signal transducing receptors. While the complete composition of the IL-12 complex is uncertain,11 it is likely to have branched off at this early stage from the other complexes, and undergone substantial re-organization, including loss of the Ig domain, and incorporation of the non-signalling receptor component into the cytokine.13 The IL-11 complex is placed outside all other gp130 complexes, on the basis of the receptor tree, which places IL-11R outside CNTFR (Fig. 4). The positioning of IL-11 in the cytokine tree (Fig. 3) is inconsistent with this interpretation. However, this placement is only poorly supported by a bootstrap of 45%: the receptor branching suggests that it might in fact branch outside the other cytokines in this tree. Next, the divergence of the LIF subgroup involved the loss of two introns,9 and the for

Cytokine/receptor complex evolution / 683

TABLE 2.

Proteins identified by profile searches with haematopoeitin-binding domain of receptors

CNTF receptor (CNTFR) PLR2_MOUSE PLR2_HUMAN *LIFRN_HUMAN

CNTR_HUMAN PLR1_MOUSE *GP130_HUMAN –

*CNTR_RAT IL6R_RAT *GP130_MOUSE *MPLN_MOUSE

*I11R_MOUSE PLR_CHICK *MPLC_VIRUS

IL6R_MOIJSE I12B_HUMAN –––––

IL6R_HUMAN *MPLC_MOUSE *LIFRC_HUMAN

KH97 receptor (common to IL-3, IL-5, GM-CSF), N-terminal domain (KH97N) *CYRBN_HUMAN *CYRBN_MOUSE *IL3BN_MOUSE *IL3BC_MOUSE *CYRBC_MOUSE *MPLC_MOUSE CYRG_MOUSE CYRG_HUMAN *MPLN_MOUSE IL9R_MOUSE *MPLC_VIRUS PLR_RAT *CYRBC_HUMAN *MPLN_VIRUS GHRL_MOUSE *LIFRC_HUMAN PLR1_MOUSE KH97 receptor (common to IL-3, IL-5, GM-CSF), C-terminal domain (KH97C) *IL3BC_MOUSE *CYRBC_MOUSE *CYRBC_HUMAN *IL3BN_MOUSE CYRG_HUMAN *MPLC_VIRUS EPOR_MOUSE EPOR_HUMAN *LIFRC_MOUSE PLR_RAT PLR1_MOUSE PLR2_MOUSE IL4R_MOUSE *LIFRN_HUMAN IL5R_MOUSE IL2B_MOUSE IL2B_HUMAN GP130_HUMAN – IL4R_HUMAN γc receptor (common to IL-2, IL-4, IL-7, IL-9 and IL-13) CYRG_HUMAN CYRG_MOUSE GMCR_HUMAN *CYRBC_HUMAN IL4R_MOUSE *MPLN_MOUSE *MPL_VIRUS PLR_CHICK *MPL_ MOUSE

PLR_RAT PLR2_RABIT GMCR_HUMAN

IL9R_HUMAN PLR2_MOUSE

*CYRBN_HUMAN GMCR._HUMAN *MPLN_VIRUS IL3A_MOUSE IL9R_MOUSE

*CYRBN_MOUSE IL3R_HUMAN *LIFRC_HUMAN *MPLN_MOUSE *LIFRN_MOUSE

*MPLC_MOUSE CYRG MOUSE IL5R_HUMAN IL9R_HUMAN –––

*CYRBN_MOUSE *IL3BC_MOUSE CYRB_HUMAN *LIFRC_MOUSE

*MPLC_VIRUS PLR1_MOUSE *LIFRC_HUMAN –

*IL3BN_MOUSE PLR2_MOUSE IL2B_HUMAN *LIFRN_HUMAN

*MPLC_MOUSE PLR_RAT *MPLN_VIRUS IL5R_MOUSE

*CYRBN_HUMAN IL5R_HUMAN *CYRBC_MOUSE GHR_CHICK

EPO receptor (EPOR) EPOR_HUMAN EPOR_MOUSE *CYRBC_HUMAN PLR2_MOUSE GHRL_MOUSE PLR_CHICK CYRB_MOUSE GHR_PIG *CYRBN_HUMAN IL5R_HUMAN

*MPLN_MOUSE PLR1_MOUSE CYRB_HUMAN GHR_RABIT GMCR_HUMAN

*MPLN_VIRUS LIFRC_HUMAN PLR2_RABIT CYRG_HUMAN GHRH_MOUSE

*MPLC_VIRUS LIFRC_MOUSE *LIFRN_HUMAN IL2B_HUMAN ––

*MPLC_MOUSE PLR_RAT PLR2_HUMAN IL9R_HUMAN *CYRBN_MOUSE

*IL3BC_MOUSE *CYRBC_MOUSE *LIFRN_MOUSE IL7R_MOUSE ––––

G-CSF receptor (G-CSFR) *GCSR_HUMAN PLR2_HUMAN CNTR_HUMAN CYRG_MOUSE *MPLC_MOUSE

*GCSR_MOUSE PLR2_RABIT *CNTR_RAT *MPLN_MOUSE –

*GP130_MOUSE *LIFRC_HUMAN *IL3BC_MOUSE GHRL_MOUSE *IL3BN_MOUSE

*GP130_HUMAN PLR_CHICK *CYRBN_MOUSE *MPLC_VIRUS –

PLR_RAT IL6R_MOUSE *MPLN_VIRUS –

PLR2_MOUSE IL6R_RAT – *CYRBN_HUMAN

PLRl_MOUSE *LIFRC_MOUSE IL6R_HUMAN ––

GH receptor (GHR) GHRL_MOUSE PLR2_MOUSE *MPLC_MOUSE CNTR_HUMAN *CYRBN_HUMAN

GHR_RABIT PLR1_MOUSE *MPLN_MOUSE *CNTR_RAT IL6R_MOUSE

GHR_PIG PLR_RAT *I11R_MOUSE *CYRBN_MOUSE *MPL_MOUSE

GHR_HUMAN PLR_CHICK *IL3BC_MOUSE EPOR_HUMAN –

GHR_RAT PLR2_RABIT *MPLN_VIRUS CYRG_MOUSE

GHRH_MOUSE PLR2_HUMAN *IL3BN_MOUSE *MPL_VIRUS

GHR_CHICK *MPLC_VIRUS *LIFRC_MOUSE

GM-CSF receptor (GM-CSFR) GMCR_HUMAN *LIFRC_HUMAN IL9R_HUMAN IL9R_MOUSE

IL3R_HUMAN *IL3BN_MOUSE PLR1_MOUSE CNTR_HUMAN

IL5R_HUMAN CYRG_MOUSE PLR2_MOUSE

IL5R_MOUSE *LIFRC_MOUSE *IL3BC_MOUSE *LIFRN_HUMAN – EPOR_MOUSE

CYRG_HUMAN *LIFR_MOUSE ––

IL3A_MOUSE PLR_RAT *CYRBN_HUMAN

PLR_RAT PLR2_RABIT IL6R_MOUSE *LIFR_HUMAN CYRG_MOUSE *CYRBC_MOUSE

*GCSR_HUMAN PLR_CHICK *MPLC_VIRUS EPOR_MOUSE IL2B_MOUSE

*LIFRC_HUMAN *LIFRN_HUMAN GHRL_MOUSE *MPLN_MOUSE

gp130 receptor (common to IL-6, IL-11, G-CSF, LIF, OSM, CNTF) *GP130_HUMAN *GP130_MOUSE PLR2_MOUSE *GCSR_MOUSE *LIFRC_MOUSE *IL3BC_MOUSE CNTR_HUMAN *CNTR_RAT *I11R_MOUSE *CYRBN_MOUSE IL6R_RAT *MPLC_MOUSE SOMV_HUMAN IL2B_RAT *CYRBC_HUMAN CYRG_HUMAN – *IL3BN_MOUSE IL-11 receptor (IL-11R) *I11R_MOUSE PLR1_MOUSE I12B_HUMAN *CYRBN_HUMAN

*CNTR_RAT *LIFRC_HUMAN *LIFRN_HUMAN

CNTR_HUMAN PLR_CHICK *MPLC_MOUSE

IL6R_MOUSE IL6R_HUMAN *LIFRC_MOUSE IL6R_RAT GHRL_MOUSE *LIFRN_MOUSE

PLR_RAT PLR2_RABIT *GP130_HUMAN

PLR2_MOUSE PLR2_HUMAN *MPLC_VIRUS

*CNTR_RAT *MPLC_MOUSE ––

*I11R_MOUSE –

IL6R_MOUSE PLRl_MOUSE

INB_HORSE PLR_RAT

*MPLN_VIRUS *LIFRN_HUMAN

*IL3BC_MOUSE IL2B_HUMAN

*MPLN_MOUSE *MPLC_VIRUS – –––– GHRL_MOUSE

*MPLN_VIRUS –

*CYRBC_MOUSE

IL2B_RAT *LIFRC_MOUSE *MPLC_VIRUS –– –– –

IL2B_HUMAN *IL3BC_MOUSE IL9R_HUMAN CYRBN_MOUSE *MPLN_VIRUS *IL3BN_MOUSE *CYRBC_HUMAN *MPLC_MOUSE CYRG_MOUSE

*CYRBC_MOUSE *LIFRC_HUMAN IL9R_MOUSE CYRG_HUMAN – EPOR_MOUSE

IL3R_HUMAN INS_ZAODH – – ––––

GMCR_HUMAN –

– – –– – – – – –– INS_KATPE

IL-12 cytokine β-subunit (receptor-like) I12B_HUMAN CNTR_HUMAN *MPLC_VIRUS *LIFRC_MOUSE – *IL3BC_MOUSE IL-12 receptor *I12R_HUMAN *MPLC_MOUSE IL-2 β receptor (IL-2Rβ) IL2B_MOUSE *CYRBN_HUMAN *MPLN_MOUSE – IL-3 receptor (IL-3R) IL3A_MOUSE – – ––– – INS1_BATSP IL-4 receptor (IL-4R) IL4R_MOUSE CYRG_MOUSE –– *GP130_HUMAN

PLR1_MOUSE IL6R_HUMAN PLR2_HUMAN *CYRBN_HUMAN *MPLN_VIRUS –––––

IL5R_HUMAN INS_HYSCR

IL6R_HUMAN PLR2_MOUSE

IL5R_MOUSE – – – – – – – – – – ––

IL4R_HUMAN *CYRBN_HUMAN *IL3BC_MOUSE *IL3BN_MOUSE *CYRBC_HUMAN CYRG_HUMAN *LIFRC_MOUSE *LIFRC_HUMAN *GP130_MOUSE *MPLC_VIRUS PLR_RAT IL2B_HUMAN –

INS_GADCA – –– –– –

*CYRBN_MOUSE *CYRBC_MOUSE –– *MPLC_MOUSE IL9R_MOUSE –

684 / Shields et al.

CYTOKINE, Vol. 7, No. 7 (October 1995: 679–688)

TABLE 2. Continued IL-5 receptor (IL-5R) IL5R_MOUSE CYRG_MOUSE *CYRBN_MOUSE

IL5R_HUMAN GMCR_HUMAN *CYRBC_MOUSE *MPLN_MOUSE *CYRBC_HUMAN PLR2_MOUSE

IL3R_HUMAN *MPLC_VIRUS CYRG_HUMAN *IL3BN_MOUSE PLR1_MOUSE –––

*IL3BC_MOUSE PLR_RAT EPOR_MOUSE

*MPLC_MOUSE *CYRBN_HUMAN IL3A_MOUSE

IL6R_RAT PLR1_MOUSE PLR2_HUMAN CYRG_MOUSE

IL6R_HUMAN *GP130_MOUSE *LIFRN_HUMAN *MPLC_MOUSE

CNTR_HUMAN PLR2_RABIT *LIFRC_MOUSE *MPLN_VIRUS

*I11R_MOUSE *GP130_HUMAN I12B_HUMAN INS MACFA

PLR RAT *LIFRC_HUMAN GHRL_MOUSE INB1_BOVIN

IL-7 receptor (IL-7R) IL7R_MOUSE IL9R_MOUSE *CYRBN_M

IL7R_HUMAN *LIFRN_HUMAN EPOR_HUMAN

PLR_RAT *IL3BC_MOUSE –––

PLR2_MOUSE PLR1_MOUSE *CYRBC_HUMAN *LIFRC_MOUSE

*CYRBN_HUMAN *CYRBC_MOUSE *MPLN_VIRUS *MPLC_MOUSE

IL-9 receptor (IL-9R) IL9R_MOUSE IL2B_HUMAN IL2B_RAT EPOR_HUMAN

IL9R_HUMAN *MPLC_MOUSE *LIFRC_MOUSE PLR2_MOUSE

*IL3BN_MOUSE *CYRBC_MOUSE *LIFRC_HUMAN –

*CYRBN_HUMAN *LIFRN_HUMAN *CYRBC_HUMAN CYRB_MOUSE

*CYRBN_MOUSE IL2B_MOUSE *MPLN_MOUSE PLR1_MOUSE

*IL3BC_MOUSE *MPLN_VIRUS *LIFRN_MOUSE PLR_RAT

*MPLC_VIRUS CYRB_HUMAN GMCR_HUMAN

*MPLC_MOUSE



*LIFRC_MOUSE

IL-6 receptor (IL-6R) IL6R_MOUSE PLR2_MOUSE *GCSR_MOUSE GCSR_HUMAN –––

LIF receptor N-terminal domain (LIFRN) *LIFRN_MOUSE LIFRN_HUMAN

*LIFRC_HUMAN – –

*CNTR_RAT PLR_CHICK – INS_CERAE

LIF receptor C-terminal domain (LIFRC) *LIFRC_MOUSE *LIFRC_HUMAN *MPLC_VIRUS *LIFRN_HUMAN EPOR_ MOUSE PLR_CHICK *LIFRN_MOUSE –

PLR_RAT *GP130_MOUSE –

PLR2_MOUSE PLR1_MOUSE *I11R_MOUSE *IL3BC_MOUSE *CYRBN_HUMAN *GP130_HUMAN EPOR_HUMAN – *MPLC_MOUSE

PRL receptor (PRLR) PLR2_MOUSE GHR_RABIT CNTR_HUMAN *GCSR_MOUSE *MPLN_VIRUS *IL3BN_MOUSE IL7R_MOUSE

PLR_RAT GHR_CHICK *MPLC_VIRUS IL6R_HUMAN *CYRBN_MOUSE CYRG_MOUSE

PLR2_RABIT GHR_PIG *GP130_HUMAN *GP130_MOUSE EPOR_HUMAN *CYRBC_MOUSE

PLR2_HUMAN GHR_HUMAN *LIFRC_MOUSE IL6R_MOUSE IL6R_RAT IL5R_MOUSE

PLR_CHICK GHRH_MOUSE *LIFRC_HUMAN EPOR_MOUSE *MPLN_MOUSE I12B_HUMAN

GHRL_MOUSE *I11R_MOUSE *MPLC_MOUSE *LIFRN_HUMAN *CYRBN_HUMAN IL9R_MOUSE

Thrombopoietin receptor (MPL) C-terminal domain (MPLC) *MPLC_VIRUS *MPLC_MOUSE *MPLN_VIRUS EPOR_HUMAN *IL3BC_MOUSE EPOR_MOUSE *LIFRC_MOUSE *IL3BN_MOUSE *CNTR_RAT *CYRBC_HUMAN GHR_RABIT CNTR_HUMAN

*MPLN_MOUSE CYRG_MOUSE *LIFRC_HUMAN PLR_CHICK

PLR_RAT PLR2_MOUSE *CYRBN_MOUSE CYRG_HUMAN GHR_PIG *LIFRN_MOUSE GHR_HUMAN PLR2_RABIT

PLR1_MOUSE *CYRBN_HUMAN GHRL_MOUSE IL9R_HUMAN

Thrombopoietin receptor (MPL) N-terminal domain (MPLN) *MPLN_MOUSE *MPLN_VIRUS *MPLC_VIRUS *CYRBC_HUMAN –– PLR2_MOUSE

EPOR_HUMAN *MPLC_MOUSE *CYRBN_HUMAN PLR1_ MOUSE

PLR1_MOUSE GHR_RAT *CNTR_RAT *IL3BC_MOUSE *GCSR_HUMAN IL7R_HUMAN –

mation of a gp130-LIFR heterodimer; LIFR has also undergone an internal duplication of the haematopoietin-binding domain. The remaining events include the loss of the non-signalling receptor molecule in both the G-CSF and LIF lineages, and the replacement of the CNTF receptor cytoplasmic and trans-membrane domains by glycosylphosphoinositol linkage to the membrane.14 Co-evolution of small-chain cytokine complexes When the small-chain receptors were aligned together and a tree drawn using PRLR as an outgroup, IL-3Rα and GM-CSFR were strongly supported as a group, and IL-5Rα was well supported as lying with this group. However, other branches of the tree were only poorly supported, so that no general pattern of shortchain cytokine receptor evolution could be inferred (tree not shown). While saturation of substitutions at variable sites and poor alignment may be the principal obstacles to inferring a definitive tree, a simultaneous radiation of receptors from a single root could also explain the indeterminacy of the major branchings. The

EPOR_MOUSE –––

GMCR_HUMAN PTPD_HUMAN –

*CYRBN_MOUSE

internal duplication of the haematopoietin-binding domain in KH97 probably occurred following its divergence from all other cytokines, assuming no convergent evolutionary pressures (Table 2). The alignment of IL2 and IL-10 reveals little similarity, except for the two cysteines involved in a disulphide bond in IL-2, which have counterparts in IL-10. It is possible that this similarity is fortuitous, as the IL-10 receptor belongs to the interferon family15 which is only distantly related to the haematopoietin family,3 while the IL-2 receptors belong to the haematopoietin family and the complement receptor family.16 Taking the cytokine and receptor evidence together, the groupings suggested are: IL-2/IL-10; IL-4/IL-13; and IL-3/GM-CSF/IL-5. This does not conflict with the previous classification9 into groups consisting of IL-2 on its own, IL-7 with IL-9, and a large group with IL-3, IL-4, IL-5, GM-CSF, IL-13. Conclusions Within the gp130 group, the receptors which lack the long cytoplasmic domains implicated in signal transduction group together, indicating that the addition of

Figure 3.

Alignment of gp130-group cytokine receptors with PRLR. Conserved residues are shaded.

Cytokine/receptor complex evolution / 685

Figure 4. Neighbour-joining tree of gp130-group receptors with PRLR as an outgroup, based on the alignment in Figure 3, excluding positions with any gaps.

non-signalling receptors to the complex has only occurred once. A similar pattern is seen in the KH97 group, where the three non-signalling receptors group together. Based primarily on these observations, it is proposed that the ancestral complex of the gp130 group included IL-6R-like and gp130-like receptors. Evolution of the complexes is characterized by the generation of receptors with specialized sites of expression. Evolution of the G-CSF and the LIF complexes has apparently involved the loss of a non-signalling receptor. The internal duplication of the haematopoietinbinding domain within receptors appears to have occurred independently on three occasions, in the MPL, LIFR and KH97 signal-transducing receptors. This represents an unusual convergence in domain shuffling which clearly has some adaptive advantage. It has been proposed that the additional LIFR domain binds a third receptor binding site on LIF,17 and it is possible that it thereby substitutes for the non-signalling receptor which we propose was lost at some stage in the evolution of the LIF complex. The evolutionary models presented here have consequences for homology-based modelling of cytokine-receptor interactions. Their validity should become evident as the constituents of incompletely characterized complexes and complexes in other species become known. While the haematopoietin cytokine and receptor family share a common origin, the composition and function of the original ancestral complex is unknown. The identification of complexes in lower organisms, in which the requirements for cytokine-mediated signalling may have remained unchanged since the divergence of the ancestors of higher orders, may provide evidence of the nature of the ancestral complex.

686 / Shields et al.

CYTOKINE, Vol. 7, No. 7 (October 1995: 679–688)

Figure 5.

An evolutionary model for the gp130-group complexes.

MATERIALS AND METHODS Sequences Sequences were drawn from SwissProt 29. The naming convention of this database was followed in referring to individual sequences (see Tables 1 and 2 for a description of sequences). For clarity in the text and figures, GH replaces SOMA, KH97 replaces CYRB, CNTFR replaces CNTR, OSM replaces ONCM, G-CSFR replaces GCSR, AIC2A replaces IL3, IL-12A replaces I12A, IL-12B replaces I12B, PRLR replaces PLR, and γc replaces CYRG. The following additional sequences were derived from GenBank 84 (accession numbers are given in parentheses): CNTR_RAT (S54212), CSF1_RAT (M84261), CSF2_ CANFA (S49738), CSF2_PIG (D21074), CSF2_RAT (U00620), CSF3_SHEEP(L07939), GCSR_HUMAN (M59818), GCSR_MOUSE (M58288), GP130_HUMAN (M57230), GP130_MOUSE(M83336), I12A_MOUSE (M86672), I12R_HUMAN(U11767), IL10_CEY(L2630), IL10_COW(U00799), IL10_MAC(L26029), IL10_PIG (L20001), IL10_SHEEP(U11421), IL11_ MOUSE(U03421),

IL13_RAT(L26913), IL2_CAT(L25408), IL2_GIBON (M11144), IL3_CJ(X74877), IL3_COW(L31893), IL3_GIBON(M14744), IL4_CEU (L07081), IL4_MAC (L26027), IL5_CEY(L26033), IL5_RAT(X54419), IL6_CEY(L26032), IL6_MAC(L26028), IL7_SHEEP (U10089), IL8_GUINEA(L04986), LIFR_HUMAN (X61615), LIFR_MOUSE(D17444), MPL_MOUSE (X73677), MPL_VIRUS(M90102), THPO_HUMAN (L36052), THPO_MOUSE(L34169). Additionally, the sequence of IL11R was obtained from Hilton et al.18 A database for profile-searching was constructed by combining SwissProt 29 with the above additional sequences. While the cytokine sequences consist of a single 4-α-helix domain and are all of approximately the same size, the receptors vary greatly in size and domain organization. The receptor analysis was therefore limited to the region of the haematopoietin-binding domain;3 i.e. the sequence between the first cysteine of the four conserved cysteines, and the WSXWS motif, and an additional eight residues to either side, so long as these octapeptides could be aligned reliably for sequences from different species (see Fig. 2). For the three

Cytokine/receptor complex evolution / 687

receptors (MPL, LIFR and KH97) that have duplications of the haematopoietin-binding domain,5 each domain was treated separately and added to the database with the postscript N for the N-terminal domain, and C for the C-terminal domain (e.g. LIFRN and LIFRC).

Sequence alignment The alignments were carried out using CLUSTALW, which is an improved version of CLUSTALV,19 using the default options which are appropriate to distantly related proteins, and displayed using ALSCRIPT.20 In the initial analysis, the sequences of protein homologues from different species were aligned, making minor manual adjustments to correct obvious errors. Subsequent alignments between different (i.e. non-homologous) proteins were carried out similarly, using all available sequences from different species. In these analyses, however, the differences between individual proteins were often very great; manual correction of alignment errors was therefore more subjective and it is likely that some errors have not been corrected. Objective criteria could be applied to the accurate alignment of some of the distantly-related proteins (G-CSF, LIF and GH) by using the published physical superimpositions of amino acid residues based on tertiary structure analyses.17 These alignments were confined to the four α-helices as the loop regions could not be superimposed reliably.2 The inclusion of GH provided an outgroup to infer clusterings among the gp130-interacting group. Additions of other sequences to this primary alignment were consistent with groupings published by others.16,17,19

Profile searching A profile is a matrix in which each position in a sequence alignment is allocated a weighted score for each of the 20 possible amino acids. The weight given to each amino acid at any given position is derived from a calculation of the probability of a substitution to that amino acid21 from all the actual amino acids present at that position in a conventional alignment.22 Databases can be efficiently searched using profiles, which we derived from the known sequences of either (i) a particular protein from different species; or, (ii) a family of ancestrally related proteins. PROFILEWEIGHT14 was used to compute profiles from alignments, and PROFILESEARCH23 to search the database with profiles. Profiles were initially generated from the alignment of sequences of homologous proteins from various species. The values chosen for parameters used in the profile construction and database searching were those which we had previously established as optimizing the ability of the interleukin 1 (IL-1α, IL-1β and IL-1 receptor antagonist) cytokines to detect each other, and also the ability of the IL-6, MGF and G-CSF cytokines to detect each other. Profile construction (using PROFILEWEIGHT): (1) A neutral matrix was used (including positive and negative values). This favours local similarities, whereas an all-positive matrix would favour more global similarities.24 (2) The blosum62 matrix was used in preference to the PAM 250, blosum30 or blosum45 matrices. This defines the

expected probability of substitution between all pairs of amino acids, determined by analysis of a large database of aligned groups of proteins with a certain level of sequence similarity.21 (3) Equal weighting of all sequences in a profile would overemphasize closely related sequences. For this reason, sequences were weighted according to the branch length in a tree.22 (4) Exclusion or inclusion of regions in the alignment with gaps did not markedly alter the results. Regions where more than 50% of the sequences were absent in the alignment were excluded from the profile. Profile searching (using GCG PROFILESEARCH): (1) The options to correct for sequence length and composition were implemented. (2) Search results were not very sensitive to gap opening and gap lengthening penalties, so the program defaults (4.5 and 0.05) were used. The score for every sequence on the augmented SWISSPROT database was obtained, and standardized to a Z-score, which represents the score if the distribution of all scores was normal, with a mean of 0 and a variance of 1.23 All matches with a Z-score greater than the arbitrary cut-off of 5.0 were listed (Tables 1 and 2). The mean Z-score and the highest and lowest rank similarity was also assessed for families of interest. In order to increase the power to detect and group more distantly related sequences, initial groupings were combined into ‘core’ families, and the resulting profiles were used in further searches of the database to identify more distant members of the family.

Tree drawing The construction of evolutionary trees of the haematopoietin cytokine and receptor families presents a number of problems that are a consequence of the degree of overall sequence divergence being greater than that for which standard methods were developed. The substitution matrices and the corrections for multiple hits employed are each likely to be suboptimal, with a consequent loss of some information. However, the main barrier to correct inference of the true tree topology in highly diverged sequences are errors in sequence alignment. The reliability of each branching can be inferred by bootstrapping, a resampling technique which estimates the percentage frequency with which a particular topology is formed.19 Thus, a bootstrap of 95% at a branch indicates that the available data support this branching topology 95% of the time. The assumption by this method that all sites are independent is invalidated by extensive alignment errors; e.g. an incorrectly aligned sequence might be placed further away from its closest relatives because it is incorrectly inferred to be very different. The high bootstrap values presented here should therefore be interpreted with some caution, and only evaluated with reference to the sequence alignments (Figs 1 and 3). Trees were constructed with the CLUSTALW19 and PHYLIP packages25 using the neighbour-joining method which allows unequal rates of substitution in different lineages.26 The Kimura correction for multiple substitutions was employed, and positions with gaps in any sequences were

688 / Shields et al.

excluded.19 A few sequences were omitted, either because the distances between individual pairs were greater than Kimura’s correction could allow for, or because the presence of the protein gave rise to a branch with a low bootstrap value, which is indicative of an unstable tree topology.

Acknowledgements We thank Toby Gibson for assistance with the PROFILEWEIGHT program. This work was supported by a grant from the Wellcome Trust (039618/Z/93/Z) and by project funding from BioResearch Ireland and Yamanouchi Research Institute (UK).

REFERENCES 1. Arai K, Lee F, Miyajima A, Miyatake S, Arai N, Yokota T (1990) Cytokines: coordinators of immune and inflammatory responses. Annu Rev Biochem 59:783–836. 2. Sprang SR, Bazan JF (1993) Cytokine structural taxonomy and mechanisms of receptor engagement. Curr Biol 3:815–827. 3. Bazan JF (1990) Structural design and molecular evolution of a cytokine receptor superfamily. Proc Natl Acad Sci USA 87:6934–6938. 4. De Vos AM, Lutsch M, Kossiakoff AA (1992) Human growth hormone and extracellular domain of its receptor: crystal structure of the complex. Science 255:306–312. 5. Cosman D (1993) The hematopoietin receptor superfamily. Cytokine 5:95–106. 6. Russell SM, Keegan AD, Harada N, Nakamura Y, Noguchi M, Leland P, Friedmann MC, Miyajima A, Puri RK, Paul WE, Leonard WJ (1993) Interleukin-2 receptor gamma chain: a functional component of the interleukin-4 receptor. Science 262:1880–1883. 7. Hara T, Miyajima A (1992) Two distinct functional high affinity receptors for mouse interleukin-3 (IL-3). EMBO J. 11:1875–1884. 8. Kishimoto T, Taga T, Akira S (1994) Cytokine signal transduction. Cell 76:253–262. 9. Boulay J-L, Paul WE (1993) Hematopoeitin sub-family classification based on size, gene organisation and sequence homology. Curr Biol 3:574–581. 10. Liu J, Moddrell B, Aruffo A, Scharnowske S, Shoyab M (1994) Interactions between oncostatin M and the IL-6 signal transducer, gp130. Cytokine 6:272–278. 11. Chua AO, Chizzonite R, Desai BP, Truitt TP, Nunes P, Minetti LJ, Warrier RR, Presky DH, Levine JF, Gately MK,

CYTOKINE, Vol. 7, No. 7 (October 1995: 679–688) Gubler U (1994) Expression cloning of a human IL-12 receptor component. J Immunol 153:128–136. 12. Wallis M. (1994) Variable evolutionary rates in the molecular evolution of mammalian growth hormones. J Mol Evol 38:619–627. 13. Wolf SF, Temple PA, Kobayashi M, Young D, Dicig M, Lowe L, Dzialo R, Fitz L, Ferenz C, Azzoni L, Chan SH, Trinchieri G, Perussia B (1991) Cloning of cDNA for natural killer cell stimulatory factor, a heterodimeric cytokine with multiple biological effects on T and natural killer cells. J Immunol 146:3074–3081. 14. Davis S, Aldrich TH, Valenzuela DM, Wong V, Furth ME, Squinto SP, Yancopoulos GD (1991) The receptor for ciliary neurotrophic factor. Science 253:59–63. 15. Liu Y, Wei SH-Y, Ho ASY, Malefyt RDW, Moore KW (1994) Expression, cloning and characterisation of a human IL-10 receptor. J Immunol 152:1821–1829. 16. Nikaido T, Shimizu A, Ishida N, Uchimaya T, Yodoi J, Honjo J (1984) Molecular cloning of cDNA encoding human interleukin-2 receptor. Nature 311:631–635. 17. Robinson RC, Grey LM, Staunton D, Vanelecom H, Vernallis AB, Moreau J-F, Stuart DI, Heath JK, Jones EY (1994) The crystal structure and biological function of leukemia inhibitory factor: implications for receptor binding. Cell 77:1101–1116. 18. Hilton DJ, Hilton AA, Raicevic A, Rakar S, Harrison-Smith M, Gough NM, Begley CG, Metcalf D, Nicola NA, Willson TA (1994) Cloning of a murine IL-11 receptor alpha-chain; requirement for gp130 for high affinity binding and signal transduction. EMBO J 13:4765–4775. 19. Higgins DG, Bleasby AJ, Fuchs R (1992) CLUSTAL V: improved software for multiple sequence alignment. Comput Applic Biosci 8:189–191. 20. Barton GJ (1993) ALSCRIPT: a tool to format multiple sequence alignments. Protein Eng 6:37–40. 21. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919. 22. Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84:4355–4358. 23. Genetics Computer Group. Program manual for the GCG package, Version 8. University of Wisconsin, Madison, WI. 24. Thompson JD Higgins, DG, Gibson, TJ (1994) Improved sensitivity of profile searches through use of sequence weights and gap excision. Comput Applic Biosci 10:19–29. 25. Felsenstein J (1989) PHYLlP-phylogeny inference package. Cladistics 5:164–166. 26. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425.