Regulatory Peptides 155 (2009) 121–130
Contents lists available at ScienceDirect
Regulatory Peptides j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / r e g p e p
Molecular evolution of mammalian incretin hormone genes David M. Irwin ⁎ Department of Laboratory Medicine and Pathobiology, Banting and Best Diabetes Centre, Faculty of Medicine, University of Toronto, 6207 Medical Sciences Building, 1 King's College Circle., Toronto, ON, Canada M5G 1A8
a r t i c l e
i n f o
Article history: Received 10 November 2008 Received in revised form 14 March 2009 Accepted 5 April 2009 Available online 15 April 2009 Keywords: Proglucagon Glucose-dependent insulinotropic polypeptide (GIP) Glucagon Glucagon-like peptide 1 (GLP-1) Glucagon-like peptide 2 (GLP-2)
a b s t r a c t Incretin hormones are encoded by two different genes in the human genome, the proglucagon (GCG) and the glucose-dependent insulinotropic polypeptide (GIP) genes. To better understand the evolution of incretin hormones, as well as the potential for the evolution of species-specific functions for these peptides, we have identified and characterized the genes for these hormones from the genomes of 35 mammalian species, as well as from the genomes of a few non-mammalian vertebrates. Both proglucagon and GIP were found to be single-copy genes in mammals, and exist in stable genomic neighborhoods with conserved flanking gene order. The exon–intron structure of the genes has been conserved within mammals, although variation in the rate of protein sequence evolution for the peptide hormones was observed. Glucagon and GLP-1 sequences are largely invariant among mammals, except the glucagon sequences from hystricomorph rodents. Previous work has shown that the change in glucagon sequences in hystricomorph rodents is associated with the evolution of species-specific functions. GLP-2 sequences have evolved most rapidly, while GIP evolved at an intermediate rate, although both show punctuated rates with a common very rapid phase of evolution on the early mammalian lineage. These observations suggest that GIP and GLP-2 are more likely to have evolved species-specific functions in subgroups of mammals. © 2009 Elsevier B.V. All rights reserved.
1. Introduction The ingestion of food by mammals leads to the secretion of two incretin hormones, glucagon-like peptide-1 (GLP-1) and glucosedependent insulinotropic polypeptide (GIP), by intestinal cells [1,2]. Incretin hormones are peptides produced by intestinal cells, in response to the intake of food, that potentiate the secretion of insulin from pancreatic beta-cells. GIP and GLP-1 appear to contribute roughly equally to the incretin effect, although GLP-1 appears to be more promising as a therapeutic agent to augment insulin production in diabetes [1,2]. While GLP-1 and GIP have overlapping actions on pancreatic beta-cells as incretin hormones, they each also have additional distinct physiological functions [1]. GLP-1 is produced by intestinal L-cells, and is one of three peptide hormones encoded by the proglucagon gene, the others being glucagon and glucagon-like peptide-2 (GLP-2) [3,4]. In addition to intestinal L-cells, the proglucagon gene is also expressed in alpha-cells of the endocrine pancreas and select neurons of the hypothalamus and brain stem [3,4]. In the pancreas, the major product of the
⁎ Tel.: +1 416 978 0519. E-mail address:
[email protected]. 0167-0115/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.regpep.2009.04.009
proglucagon gene is glucagon, the counter-regulatory hormone to insulin [5], while in intestinal L-cells and neurons of the brain, the major products are GLP-1 and GLP-2. Additional proteolytic products of the proglucagon gene also likely have physiological functions [3]. The primary functions of GLP-2 are related to the maintenance of intestinal cells [1,6]. Proglucagon genes and proglucagon-derived peptide hormones have been characterized from wide variety of vertebrate species [7]. While glucagon has similar roles in all vertebrate species examined, GLP-1 has differing roles in mammals and fish, and the function of GLP-2 has only been defined in mammals [1,8–10]. The major site of GIP gene expression is the intestinal K-cells, and GIP is the sole hormone produced by this gene [11,12]. The GIP hormone has been described in a few mammalian species [12,13], and genes that encode a GIP-like sequence have been identified in a limited number of diverse vertebrate species [12,14]. The proglucagon and GIP genes are both members of the small vertebrate gene family of secretin-like hormones [2,15]. Separate genes for proglucagon and GIP were likely produced during a whole genome duplication that occurred very early in vertebrate evolution [16]. Duplications of the glucagon-like sequences within the proglucagon gene occurred early in vertebrate evolution, prior to the diversification of the vertebrate classes [17,18], but likely after the divergence of the proglucagon and GIP genes [13,15,18]. Some differences in the intron– exon structure of the proglucagon gene is observed in vertebrates, with some genes encoding fewer and some encoding additional glucagon-
122
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
Table 1 Proglucagon and GIP genes and encoded peptide sequences from mammals, reptiles, and birds. Species
Common name
Proglucagon
GIP
Sourcea GLUd GLP1d GLP2d Source GIPd Genome databases Mammals Homo sapiens Pan troglodytes Gorilla gorilla Pongo pygmaeus Macaca macaque Callithrix jacchus Tarsirus syrichta Otolemur gamettii Microcebus murinus Tupaia belangeri Mus musculus Rattus norvegicus Dipodomys ordii Cavia porcellus Spermophilus tridecemlineatus Ochotona priceps Oryctolagus cuniculus Bos taurus Tursiopus truncatus Sus scrofa Vicugna pacos Cannis familiaris Felix catus Equus caballus Myotis lucifugus Pteropus vampyrus Erinaceus europaeus Sorex araneus Dasypus novemcinctus Choloepus hoffmanni Loxodonta africana Procavia capensis Echinops telfairi Monodelphus domestica Ornithorhynchus anatinus Outgroups Anolis carolinensis Taeniopygia guttata Gallus gallus NCBI cDNA database Mammals Mesocricetus auratus Cricetulus sp. Octodon degus Ovis aries Outgroups Heloderma suspectum Meleagris gallopavo
Human Chimpanzee Gorilla Orangutan Rhesus macaque White-tufted-ear marmoset Tarsir Bushbaby
Gene Gene Geneb Gene Gene Gene
Y Y P Y Y Y
Y Y Y Y Y Y
Y Y N Y Y Y
Gene Gene Geneb Geneb Gene Gene
Y Y Y Y Y Y
Geneb Gene
Y Y
P Y
Y Y
Gene Geneb
Y N
Mouse lemur
Geneb
Y
N
Y
Gene
Y
Tree shrew Mouse Rat
Gene Gene Gene
Y Y Y
Y Y Y
Y Y Y
Gene Gene Gene
Y Y Y
Kangaroo rat Guinea pig Squirrel
Gene Gene Gene
Y Y Y
Y Y Y
Y Y Y
Gene Geneb Geneb
Y Y p
Pika Rabbit
Geneb Geneb
Y N
N N
Y N
Geneb Gene
Y Y
Cow Bottlenose dolphin Pig
Gene Geneb
Y Y
Y Y
Y Y
Gene Gene
Y Y
Gene/ cDNA Gene Gene Geneb Gene Genec Geneb
Y
Y
Y
Geneb
Y
Y Y Y Y N Y
Y Y Y Y N P
Y Y N Y N Y
b
Gene Gene Geneb Gene Gene Geneb
N Y Y Y Y P
Geneb
Y
Y
N
Geneb
Y
b
P Y
Alpaca Dog Cat Horse Little brown bat Large flying fox bat Hedgehog Shrew Armadillo
b
Gene Geneb
Y Y
N N
Y Y
Gene Gene
Two-toed sloth
Geneb
Y
N
N
–
African elephant Rock hyrax Lesser hedgehog tenrec Opossum
Gene
b
b
N
Y
Y
N
Gene
b
Y
b
Y Y
Gene Geneb
Y Y
N Y
Y Y
Gene Gene
Gene
Y
Y
Y
Geneb
P
b
P
Duckbill platypus
Gene
b
N
Y
Y
Gene
Anole lizard
Geneb
Y
Y
N
Geneb
Y
Zebra finch
Gene
Y
Y
Y
Geneb
Y
Chicken
Gene
Y
Y
Y
Gene
Y
Golden hamster
cDNA
Y
Y
Y
–
–
Hamster Degu Sheep
– cDNA cDNAb
– Y Y
– Y Y
– Y Y
cDNA – –
Y – –
Gila monster
cDNA
Y
Y
Y
–
–
Turkey
cDNA
Y
Y
Y
–
–
Fig. 1. Schematic structure of the proglucagon and GIP genes and encoded peptides. The structure of proglucagon (A) and glucose-dependent insulinotropic polypeptide (B) genes and coding sequences are shown. A schematic of the gene structure is shown below the mRNA sequences for each gene. Gene exons are shown as boxes, with introns and flanking sequences as thin lines. Exons, introns, coding, and untranslated sequences are not to scale. Protein coding sequences are shown as solid boxes, except the unique exon that is found between the glucagon and GLP-1 encoding exons in bird and reptilian proglucagon genes which is shown as a shaded box. The protein coding sequences in the cDNAs are shown as shaded or solid boxes, with the solid boxes representing the mature hormones. Boxes are not to scale. The identity of the different hormones and proteolyticaly generated peptides are indicated above. Glu is glucagon, IP1 and IP2 are intervening peptides-1 and -2 respectively, Sig are signal peptides, GRPP is glicentin-related polypeptide, N-term and C-term are the N-terminal and C-terminal propeptides released during processing of the proGIP protein to release GIP.
like peptides [see 7,19 for reviews]. Similarly, changes in the gene structure for the GIP gene has also been observed, and these changes may explain why the mammalian GIP hormone sequence is longer than the glucagon-like hormone sequences [14]. In addition to changes in gene structure, changes in the sequences of the hormones have been observed. Glucagon is rather well conserved [7], although the glucagon hormones of hystricomorph rodents have evolved more rapidly and this rapid change is associated with a reduction in the gluconeogenic activity of this hormone from these species [20,21]. Greater variability in the GLP-1 sequence has been observed, and GLP-1 has opposite roles in the physiological control of glucose metabolism in fish and mammals [8,10]. In contrast to mammals, where GLP-1 acts through the pancreatic islets to increase insulin action and thus glucose uptake from blood [1–3], in fish GLP-1 acts directly on the liver causing the release of glucose into the blood, a physiological responses similar to that of glucagon [8,10]. The change in GLP-1 function appears to be a consequence of GLP-1 binding to, and activating, an ortholog of the glucagon receptor in fish [19,22]. Fish genomes do not have an ortholog of the GLP-1 receptor, thus GLP-1 cannot act as an incretin hormone in these species [19,22]. While no change in the functions of GLP-2 or GIP have been clearly described, it needs to be emphasized that the physiological function of these peptides has only been characterized for a few mammalian species [3,6,9,11,12]. The sequences of both GLP-2 and GIP in nonmammalian vertebrates is considerably different from those of mammals, raising the possibility that they may have changed function and acquired new roles on the mammalian lineage [7,14]. The recent availability of large number of near complete genome sequences has greatly expanded our ability to explore the evolution of the genes, Notes to Table 1 a Source indicates whether from Genomic (Gene) of cDNA sequences, see Tables S1 and S2 for source details. b Partial gene or cDNA sequence. c The bat proglucagon gene appears to be a pseudogene and does not encode intact hormone sequences (see text). d Hormone sequence, Y — complete sequence, N — no sequence, P — partial sequence.
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
123
including genes for mammalian hormones [e.g., see 23,24]. I have taken advantage of these new genome resources to examine the evolution of the proglucagon and GIP genes within mammals, focusing on the evolution of these hormones on the early mammalian lineage using the opossum and platypus genomes, descendents of the earliest diverging mammalian lineages. 2. Materials and methods 2.1. Genome sequence data Genomic sequences encoding the proglucagon (GCG) and glucosedependent insulinotropic polypeptide (GIP) genes were obtained from release 53 of the Ensembl and Pre-ensembl databases (www.ensembl. org) in March 2009, either by searching by gene name/symbol or by similarity searching using the tblastn algorithm with diverse proglucagon or GIP protein sequences as queries. Searches of the non-redundant and genome databases maintained at the National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov) were used to complement the searches of the Ensembl database, and aided in the identification of complete gene or coding region sequences. Databases maintained by the
Fig. 2b.
Washington University, St. Louis, Genome Sequencing Center (genome. wustl.edu/genome_group_index.cgi) and from the Baylor College of Medicine Human Genome Sequencing Center (www.hgsc.bcm.tmc.edu/ blast.hgsc) were also examined for additional or complementary genomic sequences (most of these genome sequences are incorporated into the Ensembl or NCBI databases). Specifically, genomic sequences of the whitetufted-ear marmoset, Callithrix jacchus (Assembly 2.0.1), and zebra finch, Taeniopygia guttata (Assembly 3.2.4) from the Washington University, St. Louis, Genome Sequencing Center, and the bottlenose dolphin, Turisiopis truncates (April 19, 2007 Assembly) from the Baylor College of Medicine Human Genome Sequencing Center were searched by the tblastn algorithm for sequences similar to the GCG and GIP genes. 2.2. Alignment of sequences Long genomic DNA sequences that included the proglucagon and GIP genes were aligned with MultiPipMaker (pipmaker.bx.psu.edu/pipmaker/) [25]. The human, mouse, dog, or chicken genes were used as guides for these alignments, and the locations of exons and coding regions for these genes was obtained from the Ensembl or NCBI databases. Repetitive elements in the human and mouse genomic sequences were identified using RepeatMasker (www.repeatmasker.org). The genomic alignments were used to refine predicted potential coding regions of the genes. The predicted protein sequences were then aligned with ClustalW (www.ebi.ac.uk/Tools/clustalw/index.html#) [26]. Fig. 2. Sequences of Proglucagon-derived peptide sequences from diverse mammals, reptiles and birds. Alignment of glucagon (A), GLP-1 (B), and GLP-2 (C) peptide sequences predicted from diverse genome data and cDNAs. The human sequences are shown on top with sequences from other species shown below. Identities are indicated as dots and differences from the human sequence are shown in single letter amino acid code. Question marks (?) represent amino acid residues that were not predicted from the partial genomic or cDNA sequence.
2.3. Phylogenetic analysis The molecular evolution of the hormone sequences encoded by the proglucagon and GIP genes was examined using the programs PAUP [27] and MacClade [28]. Phylogenetic trees of the aligned glucagon,
124
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
than 5 amino acids) proglucagon coding sequences, 22 of which were mammalian (Table 1, Fig. S1) and 23 GIP sequence, 22 of which were mammalian (Table 1, Fig. S2). For the remaining species, only partial proglucagon or GIP genes could be predicted from the genomic databases due to either gaps in the sequences or incomplete/incorrect assemblies (see Tables S1 and S2). A GIP gene could not be identified in the two-toed sloth genome. Most of the incomplete/missing genes were from species that have low depth of sequencing (i.e., 2×), or early draft (e.g., pre-ensembl), genome assemblies and likely reflect the incomplete nature of these genomes. A complete little brown bat proglucagon gene sequence could be predicted (Tables 1 and S1), but insertions and/or deletions were inferred in the alignments of the exons that encode glucagon, GLP-1 and GLP-2 (exons 3, 4 and 5 respectively, see Fig. 1A for a schematic structure of the proglucagon gene) that prevented the predictions of functional hormone sequences (Table 1). Whether this bat proglucagon gene is a pseudogene, or the genomic sequence contains errors due to assembly is not known, but since the predicted hormone sequences may be non-functional they were not used for any of the other analyses. For several other species complete gene sequences (see Fig. 1 for structure
Fig. 2c.
GLP-1, GLP-2, and GIP hormone sequences were generated manually using MacClade following the generally accepted phylogeny of mammals and their relatives [29–32]. Amino acid substitutions were mapped to branches of the phylogenetic trees using parsimony with both MacClade and PAUP, using the PROTPARS matrix. The PROTPARS matrix assigns a genetic distance for each amino acid substitution that is equal to the minimum number of nucleotide substitutions necessary for the change, and is equivalent to the minimum number of amino acid substitutions necessary to explain the observed differences in sequences assuming that one nucleotide changes at a time. Rates of sequence evolution were calculated using generally accepted dates for divergence for mammalian lineages [29–33]. 3. Results and discussion 3.1. Proglucagon and GIP Genes To better understand the evolution of the proglucagon and GIP genes, and the peptide hormones that these genes encode, we have identified these genes from publically available genomic databases from 35 diverse mammalian species, as well as closely related reptilian and avian outgroup species (Tables 1, S1 and S2). Searches of non-genomic databases identified cDNAs (or ESTs) for proglucagon from 5 additional species, three mammals (hamster, degu, and sheep), a reptile (gila monster) and a bird (turkey), as well as ESTs that filled a coding sequence gap in the incomplete pig proglucagon gene sequence, and a cDNA for GIP from the hamster (Tables 1, S1 and S2). The combination of genomic and cDNA/EST searches identified 26 full-length (or very near full-length, i.e., missing less
Fig. 3. Evolution of GLP-2. A phylogenetic tree representing the evolution of the GLP-2 sequences in diverse mammals and relatives. The topology of the tree is drawn in accord with the accepted relationships of these species [29–32], with branch lengths proportional to the number of amino acid substitutions inferred to have occurred on each lineage. Birds and Reptiles diverged from mammals more than 300 million years ago, while the platypus and opossum divergences and the placental mammal radiations occurred about 230, 180, and 100 million years ago, respectively [29–33].
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
of the genes) could not be predicted from the genomic sequences but some of the hormone sequences could be predicted (see Table 1, S1, S2 and Figs. S1 and S2) and were included in the analyses below. None of the predicted hormone sequences differed from previously characterized known sequences [see 7,12]. To examine the tempo and mode of proglucagon-derived peptide and GIP evolution in mammals we extracted and aligned the predicted peptide hormone sequences from the genomic sequences. The genomic and cDNA sequence data allowed us to predict 34 glucagon, 30 GLP-1, 32 GLP-2, and 33 GIP hormone sequences from mammals as well as 3-5 from the outgroup species (Table 1, Figs. 2 and 3). All of the complete proglucagon-derived peptide sequences are predicted to have identical sizes as the locations of potential processing sites are conserved (Fig. S1). All of the mammalian GIP gene sequences that contained both exons 3 and 4 predict a GIP precursor (see Fig. 1B for gene structure) that could be processed to release a 42 amino acid long peptide hormone (Fig. S2). Bird GIP gene sequences did not predict a 42-amino acid peptide product (Figs. 3 and S2). Previously, it was shown that the chicken GIP gene predicted a 41-amino acid long GIP peptide, due to the removal of one extra basic residue [14]. Similarly, the anole lizard sequence also predicts a 41-amino acid long GIP peptide (Figs. 3 and S2). In contrast, the zebra finch GIP gene sequence could predict a 42-amino acid long peptide (Figs. 3 and S2). 3.2. Episodic evolution of proglucagon sequences All glucagon hormone sequences are very well conserved except for those from hystricomorph rodents (guinea pig and degu), species
125
whose glucagon is known to have undergone an episode of rapid sequence evolution and change in biological activity [20,21]. Excluding hystricomorph rodents, the greatest difference observed between a pair of glucagon sequences is 4 amino acids, between the gorilla or shrew and the zebra finch sequences and the variation in the glucagon sequences is observed at five positions, residues 11, 16, 24, 28, and 29 (Fig. 2A). When the amino acid substitutions are mapped onto the generally accepted phylogeny of mammals it is clear that most mammalian glucagon sequences have not changed since the divergence of birds and lizards from mammals (Fig. S3A), a divergence that occurred more than 300 million years ago [33]. Greater variability is observed in mammalian GLP-1 sequences, although almost all of this variability is confined to the platypus sequence (Fig. 2B). Excluding the platypus sequence, only one site, replacement of the C-terminal glycine with glutamate in the squirrel, was variable within mammals. The platypus differed from the other mammalian sequences at 11 positions in the mature peptide (7–37) and 1 position in the 6 amino acid long leader (Figs. 2B and S1). The avian GLP-1 sequences are identical to each other, while the reptilian sequences differ from each other at one position, residue 5 (Fig. 2B). The non-mammalian sequences, though, differ from the typical mammalian GLP-1 sequence at 4–5 positions (Fig. 2B). Phylogenetic analysis of the sequences show that within mammals only two lineages incurred amino acid substitutions, the lineages leading to squirrel, with one substitution, and the lineage leading to platypus where more than one third of the sequence (11 substitutions) has changed (Fig. S3B). The episodic evolution that has led to a large number of changes to the platypus GLP-1 sequence suggests that it
Fig. 4. Sequences of GIP from diverse mammals, reptiles and birds. Alignment of GIP peptide sequences predicted from diverse genome data and cDNAs. The human sequence is shown on top with sequences from other species shown below. Identities are indicated as dots and differences from the human sequence are shown in single letter amino acid code. Question marks (?) represent amino acid residues that could not be inferred from the genomic sequences as exon 4 was not identified in some species (see Table S2).
126
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
has either become non-functional or, like hystricomorph rodent glucagon, altered its function. For the remaining mammalian lineages, GLP-1 has not changed since prior to the divergence of platypus from the remaining mammals, more than 230 million years ago [29,33]. Previous analyses have shown that the GLP-2 sequence had evolved rapidly on the early mammalian lineage [16]. We have now re-examined the evolution of this peptide with a larger number of GLP-2 sequences, and with species that diverged at different time points in mammalian evolution. All placental mammalian GLP-2 sequences differ from reptilian or avian sequences at 14–18 residues (Fig. 2C). GLP-2 from the platypus sequence, a species that diverged about 230 million years ago from other mammals [29,33], differs almost as much from placental mammals as reptilian and avian sequences (Fig. 2C). In contrast, GLP-2 from the opossum, a species that diverged about 180 million years ago [29,33], shows 6 or fewer differences to the placental mammalian sequences (Fig. 2C). Phylogenetic analysis of the sequences shows all of the mammalian GLP-2 sequences have accumulated nearly equal numbers of substitution (Fig. 3), thus unlike GLP-1, there is no evidence that the platypus GLP-2 has become non-functional. The analysis, though, did show that the GLP-2 sequence has evolved in an episodic manner, with 9 amino acid substitutions occurring on the 50 million year-long lineage that separates the divergence of monotremes (platypus) for placental mammals (230 million years ago) from marsupials (opossum) from placental mammals (180 million years ago), a rate of amino acid substitution much greater than seen subsequently within marsupial (2 substitution in 180 million years) or placental (2–6 substations in 180 million years) mammals (Fig. 3). The addition of the opossum and platypus sequences thus narrow the previously identified rapid evolutionary phase of GLP-2 evolution [7] to the common ancestral lineage of placental and marsupial lineages, after the divergence of the monotremes. The rapid episode of evolution of GLP-2 on this short lineage raises the possibility that it was adapting to a new function at this time, thus the role of GLP-2 in non-mammalian vertebrates and monotremes may be different from that of placental and marsupial mammals. Despite the slowdown in the rate of GLP-2 sequence evolution within marsupial and placental mammals, the rate of evolution of GLP-2 within these species is considerably greater than that observed for the glucagon or GLP-1 sequences (compare Fig. 2C to A and B), thus within mammals GLP-2 is under less sequence constraint than glucagon or GLP-1. In addition to the glucagon-like sequences, the proglucagon gene encodes several other peptides with potential or known biological functions [1–3]. The evolution of these peptides was also examined and all but intervening peptide-1 (IP-1) were found to show a large amount of sequence change but with little variation in rate within mammals (Figs. S1, S3C-G). Intriguingly, IP-1 (Fig. S3E) and oxyntomodulin (Fig. S3G) show little variation within mammals, although greater variation is observed in the non-mammalian sequence which may be due to their greater length in these species (Fig. S1). Whether IP-1 is conserved because it is part of the oxyntomodulin sequence (with glucagon) [1–3], a hormone believed to be involved in appetite control [34], or because of constraints due to proglucagon processing [35] is not known.
opossum sequence the identity at the C-terminus of GIP is uncertain (Fig. S2). Phylogenetic analysis of the GIP amino acid substitutions shows that GIP has evolved in an episodic fashion (Fig. 5), and has accumulated a large number of amino acid substitutions on the mammalian lineage after the divergence of the platypus but before the radiation of placental mammals, a divergence which occurred about 100 million years ago [29,33]. The incomplete opossum GIP sequence suggests that much of this rapid evolution occurred prior to the divergence of placental and marsupial mammals (Figs. 4 and 5). Again a rapid rate of sequence evolution suggests that GIP was adapting to a new role in early mammalian evolution. Within placental mammals the GIP sequences are largely conserved, with, most of the variation confined to the C-terminal extension between residues 30 and 42 (Fig. 4). GIP from placental mammals appears to be evolving at a rate intermediate between that seen for GLP-2 (Figs. 2C and 3) and the extremely well conserved glucagon and GLP-1 sequences (Figs. 2A and B). An examination of the signal peptide, as well as the N- and Cterminal peptides released during production of GIP reveals that these sequences have considerable variation, and do not appear to be evolving in an episodic fashion (Figs. S2 and S3). 3.4. Evolution of proglucagon gene structure The proglucagon gene was found as a single copy gene in all of the mammalian, reptilian and avian genomes examined (Tables 1 and S1), although it is found in multiple copies in some other vertebrates [7].
3.3. Episodic evolution of GIP sequences The putative GIP hormone sequence from chicken was previously shown to differ by about 50% from those of mammals [14]. The new sequences from anole lizard and zebra finch also show a similar level of divergence from the mammalian sequences (Fig. 4). Similar to GLP-2, platypus GIP sequence differs considerably from that of most mammals while opossum and placental mammalian sequences show greater levels of similarity (Fig. 4), although due to the incomplete
Fig. 5. Evolution of GIP. A phylogenetic tree representing the evolution of the GIP sequences in diverse mammals and relatives. The topology of the tree is drawn in accord with the accepted relationships of the species [29–32], with branch lengths proportional to the number of amino acid substitutions inferred to have occurred on each lineage. Since the C-terminal portions of some GIP sequences is unknown (see Fig. 4), the branch lengths of some lineages may be underestimated. Birds and Reptiles diverged from mammals more than 300 million years ago, while the platypus and opossum divergences and the placental mammal radiations occurred about 230, 180, and 100 million years ago, respectively [29–33].
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
The genes that flank the proglucagon gene were identified in the human, chimpanzee, orangutan, rhesus macaque, bushbaby, mouse, rat, kangaroo rat, squirrel, tree shrew, guinea pig, cow, dolphin, pig, dog, horse, fox bat, opossum, platypus, anole lizard, zebra finch and chicken genome assemblies. The fibroblast activation protein (FAP) gene was always located 5′ to the proglucagon gene while the dipeptidyl peptidase IV (DPIV) gene was 3′, an arrangement identical to that found in a previously [16], indicating that proglucagon is within a conserved genomic neighborhood suggesting that changes in genomic structure near the proglucagon gene occur infrequently. It also should be noted that some genomic assemblies, e.g., 2× assemblies, are built on scaffolds based on assemblies from another species (usually human) and assumes that a genomic rearrangement has not occurred (see: Mar2009.archive.ensembl.org/info/docs/genebuild/2x_genomes.html). Previous studies demonstrated that mammalian proglucagon genes are composed of 6 exons (5 coding exons) [3,4], while the chicken proglucagon gene has one extra coding exon that is located between the exons that encode glucagon and GLP-1 and encodes part of intervening peptide-1 (IP-1 exon) [36]. A sequence similar to this exon has also been
127
found in a cDNA for proglucagon from a reptile, the gila monster [36,37]. Our genomic alignments generated by MultiPipMaker [25] demonstrate that all of the coding exons of the human proglucagon gene are conserved and identifiable in genomes of diverse mammals, reptiles and birds (Fig. 6A). Similar results were observed when mouse or dog sequences were used as guides for the alignment (data not shown). In contrast, when genomic sequences were compared to the chicken proglucagon gene using MultiPipMaker only 5 of the 6 coding exons of the chicken gene were found in all species, (Fig. 6B). The chicken IP-1 exon (exon 4, Fig. 6B) is conserved only in the zebra finch and the anole lizard genomic sequences (Fig. 6B), where the sequences were found to possess conserved consensus splice sites and an open reading frame that predicts protein sequences similar to that of the chicken and gila monster sequences (Fig. S1). Since the chicken IP-1 exon (exon 4, Fig. 6B) was not found in any of the mammalian genomes, nor in amphibians or fish [16,36], suggests that the exon originated on the lineage leading to birds and lizards. Genomic comparisons also identify conserved non-coding regions that are potential regulatory sequences [25,38]. Previously we identified three evolutionary conserved non-coding regions (ECRs)
Fig. 6. Conservation of proglucagon genomic sequences. A portion of the genomic alignments generated by MultiPipMaker with sequences compared to the (A) human and (B) chicken proglucagon genomic sequences. The structure of the proglucagon genes (GCG) is shown above with the tall boxes representing exons. The human proglucagon gene (A) is composed of 6 exons with exons 3, 4, and 5 encoding glucagon, GLP-1 and GLP-2, respectively. The chicken proglucagon gene (B) is composed of 7 exons with exons 3, 5, and 6 encoding glucagon, GLP-1 and GLP-2, respectively. The chicken proglucagon gene contains an exon, exon 4, that codes for part of IP-1, which is not found in mammalian proglucagon genes. Filled boxes are coding sequences, while open boxes are untranslated sequences. The direction of transcription is indicated by the arrow. Shorter boxes and triangles represent different types of repetitive DNA elements found in the genomic sequences [see 24]. Similarity of each species' (named on the left) genomic sequence to the human sequence is shown below the human (A) or chicken (B) sequence with dots or bars indicating the percent identity observed, if above 50%. Gaps represent either sequences that show less than 50% identity or gaps. Gaps may be due to the absence of homologous sequence due to deletion/insertion or a sequences assembly gap. The locations of the three human evolutionary conserved regions (ECR1-3) shared by mammalian genomes (A) are indicated in as the boxes in the alignment to the human sequence, with the identity of the ECRs shown below. A potential bird ECR in intron 1 of the chicken proglucagon gene is shown by the box in the zebra finch alignment (B). The bottom three genomic comparisons in (A) are between the human proglucagon genomic sequence and non-mammalian genomes (lizard, chicken, and zebra finch). The top two comparisons in (B) are between the chicken proglucagon genomic sequence and other non-mammalian genomes (zebra finch and lizard).
128
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
Fig. 6 (continued ).
(ECR1, the immediate promoter region; ECR2, a sequence about 5 kb 5′to the mRNA start site; ECR3, a 500-base sequence near the 3′ end of intron 1 — see Fig. 6A) in the human proglucagon gene [39]. Surprisingly, none of these regions were found in the chicken proglucagon gene [36,39]. All three of the human ECRs are found in all of the mammalian genomes examined including opossum and platypus (Fig. 6A), indicating that these sequences have been under strong selection pressure. When genomic comparisons were made to the chicken genome, only the zebra finch sequence showed any similarity to the chicken genomic sequence in non-exonic regions (Fig. 6B), while the anole lizard and mammalian sequences showed similarity only in exons for coding and 3′ untranslated sequences. A few better-conserved regions were identified in the chicken — zebra finch comparison (e.g., sequence in the middle of exon 1, Fig. 6B), raising the possibility that this sequence may in involved in the regulation of avian proglucagon gene expression. These observations suggest that different sequences function as transcriptional regulatory sequences for the proglucagon genes in non-mammalian vertebrates. 3.5. Evolution of GIP gene structure The GIP gene was also a single copy gene in all mammalian, reptilian and avian species (Tables 1 and S2). Genes neighboring the GIP gene were identified in the genomes for human, chimpanzee, orangutan, rhesus macaque, mouse lemur, mouse, rat, guinea pig, tree shrew, cow, pig, dog, horse, hyrax, opossum, anole lizard, and chicken. In all genomes the IGF2 binding-protein-1 gene (IMP-1) was 5′ to the GIP gene while SNF8 was located 3′, as previously observed in the human, mouse and chicken genomes [14]. GIP, like proglucagon, thus resides in a conserved genomic neighborhood.
Both mammalian and chicken GIP genes are composed of 6 exons, 5 of which are coding [14,40,41], and no variation in the number of exons was found in any of the new GIP genes (Fig. S2). Previous work had shown that exon 3 of the human GIP gene is longer than that of the rat gene [40,41]. The size and sequence of exon 3 of most of the new GIP genes is similar to that of the rat gene (Fig. S2), indicating that the increase in size of exon 3 occurred in primates. Exon 3 of all primates is identical in size, except in the mouse lemur where exon 3 is of the same length as other mammals (Fig. S2). A potential mechanism for the increase in an exon's size is intron sliding where a mutation within an intron creates a new splice site and adds coding region to an exon [42]. Our genomic alignments show that the extra primate exon 3 sequences are similar to intronic sequences in the other mammalian GIP genomic sequences, and that a base change, that occurred after divergence of mouse lemur and human, has created a new 5′ splice acceptor site that extends the 5′ end of exon 3, thus intron sliding explain this change in exon size. Variation in length, and sequence, has also been observed in the lengths of exons 4 and 5 between mammals and non-mammalian vertebrate species [14], unfortunately, these exons are missing from the platypus and opossum GIP genes (Table S2 and Fig. S1), thus we cannot determine when, or how, during mammalian evolution exons 4 and 5 became better conserved in size and sequence. We also used MultiPipMaker to examine the conservation of genomic sequences of GIP genes. In contrast to proglucagon, only exons of the GIP gene show strong sequence conservation in mammals (Fig. S5). The closest flanking strongly conserved sequences were exons from the flanking genes. Similar results were observed if the genomic sequences were compared to the mouse, dog, or chicken genomic sequence (results not shown). Only one moderately conserved sequence, 5′ to exon 1 (Fig. 7), was observed, although
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
potentially ECRs important for GIP gene expression may exit at more distant locations imbedded within flanking gene sequences. 3.6. Evolution of incretin hormones The hormones encoded by the proglucagon and GIP genes are essential regulators of metabolism [1,3,9] and signal through closely related receptors [19,22,43]. A combination of selective forces acts on these hormones to ensure specific binding to their respective receptors. Previous work has shown that the receptors for glucagon and GIP are more closely related to each other than to receptors for any other peptide hormone [19,22,43], therefore, one might expect that these two receptors are more likely to suffer cross-talk between the ligands if cross-talk was simply due to relatedness. Instead, the glucagon receptor binds GLP-1 stronger than GIP [44]. The GLP-1 receptor is most closely related to the glucagon and GIP receptors, but more distantly related than glucagon and GIP receptors are to each other [19,22,43], which suggest that there should be equal levels of cross-talk with both glucagon and GIP if this was only due to relatedness. Characterization of the GLP-1 receptor showed that glucagon binds more strongly to this receptor than GIP [45]. Both glucagon and GLP-1 have equally poor ability to bind to the GIP receptor [46]. These observations suggest that it was GIP that has lost its ability to interact with the glucagon and GLP-1 receptors. Above we have shown that the GIP sequence evolves more rapidly than the
129
glucagon or GLP-1 peptide sequences (compare Fig. 4 to 2A and B or Fig. 5 to S3A and B). The strong conservation of the glucagon is consistent with glucagon being the primary counter-regulatory hormone to insulin [5]. Both GIP and GLP-1 are incretin hormones [1,2], thus could be considered to be equally redundant. The difference in rate of GLP-1 and GIP sequence evolution implies that GLP-1 sequence is under stronger constraints than GIP, which may be due to functions that are in addition to its role as an incretin hormone. Thus GIP appears to be the more dispensable hormone. GLP-2 also evolves more rapidly than glucagon or GLP-1 (compare Fig. 2C to A and B or Figs. 3 to S3A and B). The GLP-2 receptor is also most distantly related to any of the other receptors [19,22,43], and show no cross-talk with the other hormones [47]. GLP-2, like GIP, would appear to have reduced selective constraints and may not as essential as glucagon or GLP-1, and thus more likely to adapt to new species-specific functions. Much of the knowledge concerning the functions of proglucagonderived peptides and GIP are based on studies with mice and a few other mammals [1–4]. While many physiological functions are conserved among species, some are not. Sequences that evolve more rapidly typically have fewer functional constraints, thus more likely to alter function and gain species-specific functions. Since GLP-2 and GIP show the greater rates of sequence evolution, implying fewer constraints, these hormones would appear to be most likely to evolve species-specific functions. With a better understanding of the evolutionary constraints acting on incretin hormone sequences
Fig. 7. Conservation of GIP genomic sequences. A portion of the genomic alignments generated by MultiPipMaker with sequences compared to the human sequence. The structure of the human proGIP gene (GIP) is shown above with the tall boxes representing exons. The GIP peptide is encoded by exons 3 and 4. Filled boxes are coding sequences, while open boxes are untranslated sequence. The direction of transcription is indicated by the arrow. Shorter boxes and triangles represent different types of repetitive DNA elements found in the genomic sequences [see 24]. Similarity of each species' (named on the left) genomic sequence to the human sequence is shown below the human sequence with dots or bars indicating the percent identity, if above 50%, observed. Gaps represent either sequences that show less than 50% identity or gaps. Gaps may be due to the absence of homologous sequence due to deletion/insertion or a sequences assembly gap. A potential ECR in the 5′ flanking region of the GIP gene is boxed in the alignments. The last two genomic comparisons are between the human GIP genomic sequence and genomic sequences from non-mammalian species (lizard and chicken).
130
D.M. Irwin / Regulatory Peptides 155 (2009) 121–130
suggests that greater caution is required in extrapolating results concerning the functions of GLP-2 and GIP from model species to human physiology. [24]
Acknowledgement [25]
This work was supported by a grant from the Canadian Natural Sciences and Engineering Research Council.
[26]
Appendix A. Supplementary data [27]
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.regpep.2009.04.009.
[28] [29]
References [1] Baggio LL, Drucker DJ. Biology of incretins: GLP-1 and GIP. Gastroenterology 2007;132:2131–57. [2] Holst JJ, Vilsbøll T, Deacon CF. The incretin system and its role in type 2 diabetes mellitus. Mol Cell Endocrinol 2009;297:127–36. [3] Kieffer TJ, Habener JF. The glucagon-like peptides. Endocrine Rev 1999;20:876–913. [4] Drucker DJ. In: Henry HL, Norman AW, editors. Glucagon gene expression. Encyclopedia of hormonesBoston: Academic Press; 2003. p. 47–55. [5] Jiang G, Zhang BB. Glucagon and regulation of glucose metabolism. Am J Physiol 2003;284:E671–8. [6] Anini Y, Brubaker PL. In: Henry HL, Norman AW, editors. Glucagon-like peptides: GLP-1 and GLP-2. Encyclopedia of hormones. Boston: Academic Press; 2003. p. 55–62. [7] Irwin DM. Molecular evolution of proglucagon. Regl Pept 2001;98:1–12. [8] Duguay SJ, Mommsen TP. In: Hoar WS, Randall DJ, editors. Molecular aspects of pancreatic peptides. Fish physiology San Diego: Academic Press; 1994. p. 225–71. [9] Drucker DJ. Biological actions and therapeutic potential of glucagon-like peptides. Gastroenterology 2002;122:531–44. [10] Moon TW. Hormones and fish hepatocyte metabolism: “the good, the bad, the ugly!”. Comp Biochem Physiol 2004;139B:335–45. [11] Fehmann HC, Göke R, Göke B. Cell and molecular biology of the incretin hormones glucagon-like peptide I and glucose-dependent insulin releasing peptide. Endocrine Rev 1995;16:390–410. [12] McIntosh CHS, Widenmaier, Kim SJ. Glucose-dependent insulinotropic polypeptide (Gastric inhibitory peptide; GIP). Vitamin Horm 2009;80:409–71. [13] Hoyle CHV. Neuropeptide families: evolutionary perspectives. Regl Pept 1998;73:1–33. [14] Irwin DM, Zhang T. Evolution of vertebrate glucose-dependent insulinotropic polypeptide (GIP) gene. Comp Biochem Physiol 2006;1D:385–95. [15] Sherwood NM, Krueckl SL, McRory JE. The origin and function of the pituitary adenylate cyclase-activating polypeptide (PACAP)/glucagon superfamily. Endocr Rev 2000;21:619–70. [16] Irwin DM. Ancient duplications of the human proglucagon gene. Genomics 2002;79:741–6. [17] Lopez LC, Li WH, Frazier ML, Luo CC, Saunders GF. Evolution of glucagon genes. Mol Biol Evol 1984;1:335–44. [18] Irwin DM, Huner O, Youson JH. Lamprey proglucagon and the origin of glucagonlike peptides. Mol Biol Evol 1999;16:1548–57. [19] Irwin DM. Evolution of hormone function: proglucagon-derived peptides and their receptors. BioScience 2005;55:583–91. [20] Wriston Jr JC. Comparative biochemistry of the guinea-pig: a partial checklist. Comp Biochem Physiol 1984;77B:253–78. [21] Sieno S, Blackstone CD, Chan SJ, Whittaker J, Bell GI, Steiner DF. Appalachian spring: variations on ancient gastro–entero–pancreatic themes in New World mammals. Horm Metab Res 1988;20:430–5. [22] Irwin DM, Wong K. Evolution of new hormone function: loss and gain or a receptor. J Heredity 2005;96:205–11. [23] Wallis M. Mammalian genome projects reveal new growth hormone (GH) sequences: characterization of the GH-encoding genes of armadillo (Dasypus
[30] [31]
[32]
[33] [34] [35] [36] [37]
[38] [39]
[40]
[41]
[42] [43] [44] [45]
[46]
[47]
novemcinctus), hedgehog (Erinaceus europaeus), bat (Myotis lucifugus), hyrax (Procavis capensis), shrew (Sorex araneus), ground squirrel (Spermpohilus tridecemlineatus), elephant (Loxodonta africana), cat (Felis catus) and opossum (Monodelphus domestica). Gen Comp Endocrinol 2008;155:271–9. Wallis M. New insulin-like growth factor (IGF)-precursor sequences from mammalian genomes: the molecular evolution of IGFs and associated peptides in primates. Growth Hormone IGF Res 2009;19:12–23. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W. PipMaker — a web server fro aligning two genomic sequences. Genome Res 2000;10:577–86. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD. Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res 2003;31:3497–500. Swofford DL. PAUP⁎. Phylogenetic analysis using parsimony (⁎and other methods). Version 4.0b10, Sunderland, MA: Sinauer Associates; 2002. Maddison DR, Maddison WP. MacClade, analysis of phylogeny and character evolution. Version 4.0, Sunderland, MA: Sinauer Associates; 2000. Murphy WJ, Pevzner PA, O'Brien SJ. Mammalian phylogenomics comes of age. Trends Genet 2004;20:631–9. Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W. Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res 2007;17:413–21. Hallström BM, Janke A. Resolution among major placental mammal interordinal relationships with genome data imply that speciation influenced their earliest radiations. BMC Evol Biol 2008;8:162. Arnason U, Adegoke JA, Gullberg A, Eric H, Harley EH, Janke A, Kullberg M. Mitogenomic relationships of placental mammals and molecular estimates of their divergences. Gene 2008;421:37–51. Kumar S, Hedges SB. A molecular timescale for vertebrate evolution. Nature 1998;392:917–20. Wynne K, Bloom SR. The role of oxyntomodulin and peptide tyrosine–tyrosine (PYY) in appetite control. Nat Clin Pract Endocrinol Metab 2006;2:612–20. Bataille D. Pro-protein convertases in intermediary metabolism: islet hormones, brain/gut hormones and integrated physiology. J Mol Med 2007;85:673–84. Yue W, Irwin DM. Structure and expression of the chicken proglucagon gene. Mol Cell Endocrinol 2005;230:69–76. Chen YE, Drucker DJ. Tissue-specific expression of unique mRNAs that encode proglucagon-derived peptides or exendin 4 in the lizard. J Biol Chem 1997;272:4108–15. Pennacchio LA, Rubin EM. Genomic strategies to identify mammalian regulatory sequences. Nat Rev Genet 2001;2:100–9. Zhou L, Nian M, Gu J, Irwin DM. Intron 1 sequences are required for pancreatic expression of the human proglucagon gene. Am J Physiol Regul Integr Comp Physiol 2006;290:R634–41. Inagaki N, Seino Y, Takeda J, Yano H, Yamada Y, Bell GI, Eddy RL, Fukushima Y, Byers MG, Shows TB, Imura H. Gastric inhibitory polypeptide: structure and chromosomal localization of the human gene. Mol Endocrinol 1989;3:1014–21. Higashimoto Y, Liddle RA. Isolation and characterization of the gene encoding rat glucose-dependent insulinotropic peptide. Biochem Biophys Res Commun 1993;193:182–90. Rogozin IB, Lyons-Weiler J, Koonin E. Intron sliding in conserved gene families. Trends Genet 2000;16:430–2. Gloriam DE, Fredricksson R, Schiöth. The G-protein-coupled receptor subset of the rat genome. BMC Genomics 2007;8:338. MacNeil DJ, Occi JL, Hey PJ, Strader CD, Graziano MP. Cloning and expression of a human glucagon receptor. Biochem Biophys Res Commun 1994;198:328–34. Graziano MP, Hey PJ, Borkowski D, Chicchi GG, Strader CD. Cloning and functional expression of a human glucagon-like peptide-1 receptor. Biochem Biophys Res Commun 1993;196:141–6. Volz A, Göke R, Lankat-Buttgereit B, Fehmann HC, Bode HP, Göke B. Molecular cloning, functional expression, and signal transduction of the GIP-receptor cloned from a human insulinoma. FEBS Lett 1995;373:23–9. Munroe DG, Gupta AK, Kooshesh F, Vyas TB, Rizkalla G, Wang H, Demchyshyn L, Yang ZJ, Kamboj RK, Chen H, McCallum K, Sumner-Smith M, Drucker DJ, Crivici A. Prototypic G protein-coupled receptor for the intestinotrophic factor glucagon-like peptide 2. Proc Natl Acad Sci U S A 1999;96:1569–73.