Chapter 239
In Silico Search for Biologically Active Peptides Mariko Nishiyama, Munehiro Ishii, Shuichi Hirose, Hiroyuki Yamazaki and Sadao Kimura
ABSTRACT Reverse pharmacology and orphan receptor strategy have successfully unraveled many endogenous compounds including peptides for orphan GPCRs, combined with a variety of screening assays. However, there are still >100 orphan GPCRs, and their ligands and functions remain to be identified for fundamental and clinical research. Many screening methods to deorphanize orphan GPCRs have so far been developed and successfully used, while few practical methods to predict novel ligand candidates have been reported. A potentially useful approach to predict bioactive peptide candidates may be the use of bioinformatics, based on information of public database of protein amino acid sequences derived from human genome sequence project. In this chapter, we describe an outline for systematic in silico prediction of bioactive peptide candidates with a C-terminal amide.
BACKGROUND OF GPCR DEORPHANIZATION According to the reported data of the human genome sequence, approximately 800 genes are indicated to encode G protein-coupled receptors (GPCRs).9,22 Excluding presumed odorant receptors and pheromone receptors (~400 GPCRs), there exist around 400 typical GPCRs in the human genome. Among them, about 100 GPCRs, whose endogenous ligands are still unknown, are called “orphan receptors.”1,4,6,8,23 A wide variety of chemicals are identified as ligands for GPCRs, including amines, amino acids, peptides, proteins, nucleosides, nucleotides, Ca2+ ions, and lipids. Among GPCRs for known ligands, the ratio of peptide receptors to nonpeptide receptors is approximately 90:100.1 If the same ratio is assumed among orphan receptors, potentially about 50 peptide receptors may remain to be deorphanized, although it might be a little overestimated. Formerly, the conventional approach of GPCR investigation was the first to discover natural ligands, then to search and identify their receptors, next to analyze the characteristics Handbook of Biologically Active Peptides. http://dx.doi.org/10.1016/B978-0-12-385095-9.00239-6 Copyright © 2013 Elsevier Inc. All rights reserved.
of the various compounds that inhibit binding of the ligand to the receptor, and finally to develop novel therapeutics relating to the receptor. However, in recent years, other strategies have been widely accepted in orphan receptor research. One such approach is to search the ligands in natural tissue extracts by monitoring responses of cells expressing orphan receptors (orphan receptor strategy),11,15–17 and another one is to identify the ligands by combining orphan receptors and chemically known compounds whose receptors are unknown (reverse pharmacology strategy).2,3,21 Both approaches have been successfully applied so far. In these 15 years, ligands for >60 GPCRs have been newly identified. Hunting of endogenous ligands for the remainder of the orphan GPCRs now becomes the target of intense worldwide severe competition between pharmaceutical industries, universities, and academic institutes, to develop novel therapeutic agents.4,8,23 About 45% of the top selling drugs target GPCRs.24 In the usual methods to search for peptide ligands, extracts from animal tissues are used as sources of peptides. However, in the cases of peptides whose content in tissues is extraordinarily low, or which are expressed only at extremely limited sites, this method with tissue extracts may fail to identify the endogenous ligands. Peptides existing abundantly in tissues with potent biological activities may have already been identified easily, so that peptides with only trace or unique activities remain unknown.
PREDICTION OF BIOACTIVE PEPTIDES: BIOSYNTHETIC PATHWAY OF BIOACTIVE PEPTIDES AND THE EMPIRICAL CLEAVAGE RULES N-terminal and C-terminal Structures of Bioactive Peptides Bioactive peptides, including peptide hormones, neuropeptides, and neurotransmitters, are initially synthesized 1743
Chapter | 239 In Silico Search for Biologically Active Peptides
1744
as precursor proteins, usually called “preprohormones.” In this chapter, we use the term “preprohormone” for precursors of all kinds of bioactive peptides that may have various sizes and functional properties. Each preprohormone is cleaved into the signal peptide and the remaining part, “prohormone,” containing bioactive peptide sequences. The presence of the signal peptide suggests that bioactive peptides are secreted to play physiological roles. The prohormones are further processed by proprotein convertases and carboxylpeptidases to yield mature bioactive peptides (Fig. 1). Although each prohormone possesses a distinct primary sequence, proteolytic processing occurs at dibasic amino acid residue sites that commonly flank the amino (N)- and carboxyl (C)-termini of bioactive peptides within their precursors. Then, the dibasic residues at the C-terminal are removed by carboxylpeptidases to produce mature bioactive peptides. Lys-Arg (KR) or Arg-Arg (RR) are the dibasic pairs most often found at the cleavage sites of known bioactive peptides.7,18 However, processing also occurs at the dibasic residue sites Lys-Lys and sometimes Arg-Lys (Table 1). Processing occurs at monobasic Arg sites in certain cases, such as procholecystokinin, provasopressin, and prosomatostatin. For bioactive peptides with a C-terminal α-amide, including thyrotropin-releasing hormone, substance P, gastrin, oxytocin, and vasopressin, a specific sequence, Gly-Lys-Arg (GKR) or Gly-Arg-Arg (GRR) is located at the C-terminal sites of the bioactive peptides
(Fig. 1). The glycine (Gly) residue in these sequences is catalyzed by amidation enzymes (peptidylglycine α-amidating monooxygenase and peptidylamido-glycolate lyase)5,14 and serves as the amide donor for C-terminal amidation, that is essential for exerting most biological activity.
Biochemical Characteristics of Bioactive Peptides and Their Precursors To predict bioactive peptides by in silico methods, we introduced several criteria for the characteristics of preprohormones deduced from full-length cDNA and genomic DNA sequences. (1) Each preprohormone should contain a signal peptide at the N-terminal site. (2) Preprohormones should not have membrane-spanning motifs in their sequences. (3) Peptide candidates produced from preprohormones should belong to the following six groups. We classified bioactive peptides into six groups based on their specific sequences of the N-terminal and C-terminal sites franking mature bioactive peptides (Table 2). In type A, each peptide contains a most frequently cleaved signal, the dibasic pair (KR or RR) at both N- and C-terminal sites. Type B peptides contain an N-terminal dibasic pair and an amide signal, GKR or GRR, at the C-terminal. As shown in the following section, in the final version of the criteria in our in silico search of bioactive peptides, the dibasic pairs of KR and RR are accepted as processing sites and those of KK
Preprohormone Lys-Arg
Preprohormone
Arg-Arg
Lys-Arg Gly-Arg-Arg
Signal peptide
Signal peptide
Signal peptidase Lys-Arg
Arg-Arg
Lys-Arg
Gly-ArgArg
Proprotein convertase
Carboxyl peptidase-like protease
Peptide hormone
Amidation enzyme - NH2
C-terminal amidated peptide hormone FIGURE 1 A schematic representation of preprohormone precursors and their posttranslational processing. Arrows indicate where prohormone convertases cleave the C-terminal side of a dibasic pair site (Lys-Arg, Arg-Arg, or Gly-Arg-Arg). C-terminal amides of mature peptide hormones are produced from Gly by amidation enzymes (peptidylglycine a-amidating monooxygenase and peptidylamido-glycolate lyase).
1745
See color plate 62.
SECTION | XX Handbook of Biologically Active Peptides: Peptide Biosynthesis/Processing
TABLE 1 Processing Patterns of Known Bioactive Peptides
Chapter | 239 In Silico Search for Biologically Active Peptides
1746
TABLE 2 Classification of Bioactive Peptides with a Dibasic Pair at the N- or C-terminal Cleavage type
Pattern*
Step 1
Step 2
Step 3
Known bioactive peptides†
Type A
([KR]R-peptide-[KR]R)
15,848
3,815
327
3
Type B
([KR]R-peptide-G[KR]R)-amide
1,789
1,789
170
16
Type C
([KR]R-peptide-[KR]n-STOP)
7,425
916
130
5
Type CA
([KR]R-peptide-G[KR]n-STOP)-amide
638
638
78
3
Type D
(signal peptide-peptide-[KR]R)
5,896
726
108
0
Type E
(signal peptide-peptide-G[KR]R)-amide
709
709
104
13
No. of bioactive peptide candidates
32,302
8,593
917
40
(C-terminal amide type peptides)
3,136
3,136
352
32
(Non-C-terminal amide type peptides)
29,168
5,457
565
8
Step 1: 6 ≤ peptides ≤60, precursors ≤ 300, Cys residues (0, 2, or 4). Step 2: calculated from precursors with a C-terminal amide (including sequence overlapping). Step 3: Lys-Arg and Arg-Arg in mature peptide sequences, no overlapping sequences. *G = Gly, K = Lys, R = Arg, [KR] = K or R, n = 0-n. †Predicted numbers of known bioactive peptides.
and RK are omitted, because our analyses on the sequences of about 140 known bioactive peptides precursors showed that most of the peptides are processed at the dibasic pairs KR or RR, and not KK or RK at N- and C-terminal cleavage sites (Table 1). Similarly, only GKR and GRR are accepted as C-terminal cleavage sites of amide type peptides. On the other hand, the four types of dibasic pairs are found almost equally in the population of peptide candidates simply extracted in silico from the DNA databases. Type C peptides and type CA peptides contain an N-terminal dibasic pair and a stop signal at C-terminal. Type D peptides possess a signal peptide at N-terminal and a dibasic pair at C-terminal. Similarly, type E peptides possess a signal peptide and a C-terminal amide signal. Further, analysis of known bioactive peptides indicates their biochemical properties; (1) preprohormones consist of 50–300 amino acid residues, (2) most of the mature bioactive peptides are 6–60 amino acids in length (Fig. 2) and contain 0, 2, or 4 Cys residue(s).
PREDICTION OF BIOACTIVE PEPTIDE CANDIDATES Extraction of Protein Precursors We used sequence information from three databases, Ensembl, IPI (International Protein Index), and H-InvDB (H-Invitational Database) (Table 3). In February 2006, we obtained about 25 million protein sequences with overlapping. Removal of overlapped sequences gave us about 200,000 protein sequences. Next, to extract the probable secreted precursors, we further applied the two criteria
FIGURE 2 Relationship of the number of amino acid residues between mature peptide hormone and their preprohormone precursors. The number of amino acid residues of 94 peptide hormones and their precursors are indicated. Most of peptide hormones and their precursors contain <60 and 300 amino acid residues, respectively.
into our original soft program; (1) preprohormones contain a signal peptide, judged by SignalP and (2) contain no membrane-spanning region, judged by their membrane retention signal analyses. We finally obtained 33,119 proteins as starting precursor proteins for in silico prediction of bioactive peptide candidates (Table 3).
Extraction of Bioactive Peptide Candidates by Pattern Matching Methods We started in silico analysis using 33,119 protein sequences to obtain bioactive peptide candidates matching to six groups of bioactive peptides (Table 2). We first tried a pattern matching analysis with a simple criterion that bioactive
SECTION | XX Handbook of Biologically Active Peptides: Peptide Biosynthesis/Processing
1747
TABLE 3 Number of Prepropeptide Precursors Found in Three Databases (February, 2006) Precursors downloaded
Database
Precursors without overlapping
Precursors with a signal peptide and without membrane spanning sequence
Ensembl* IPI (International Protein Index)† H-InvDB (H-Invitational
24,535,323
197,080
33,119
Database)**
*http://hinvdb.ddbj.nig.ac.jp/ahg-db/ChgLang.do?lang=en&path=/index.jsp †http://www.ebi.ac.uk/IPI/IPIhelp.html **http://www.ensembl.org/index.html
peptides contain dibasic residues (KR, RR, RK, KK) at their N- and C-terminal sites. Surprisingly, enormous numbers of about 300,000 peptide candidates were obtained, suggesting that we need the more specific criteria based on biochemical features empirically obtained from biosynthetic pathways (Table 1). Next, we introduced several criteria into the computer program; (1) 6 ≤ amino acid residues of bioactive peptides ≤60 amino acids in length, (2) 50 ≤ amino acid residues of precursor proteins ≤300 amino acids in length, (3) number of Cys residues in a peptide sequence is 0, 2, or 4. These criteria gave us 32,302 peptide candidates with overlapping (C-terminal amide-type peptides 3,136; non-C-terminal amide-type 29,168) (step 1 in Table 2). We judged that these candidate numbers are still too many for making peptide libraries by chemical synthesis. Thus, we tried to limit the candidate proteins to those containing peptides with C-terminal amide sequences, for example, GKR, GRR, and G-stop (step 2 in Table 2). Further, we added two criteria; (4) bioactive peptide candidates not containing dibasic pair sequences (KR, RR, KK, and RK) within their mature peptide sequences, and (5) N- and C-terminal cleavage sites are KR or RR (as mentioned above, step 3 in Table 1). Finally, 352 C-terminal amidetype and 565 non-C-terminal amide-type peptide candidates were obtained (step 3 in Table 2). Among these predicted candidates, 32 peptides of C-terminal amide type and 8 peptides of the nonamide type matched known bioactive peptides, indicating that the prediction method described above may be practically useful. The number of finally predicted candidates may be acceptable for preparing a synthetic peptide library for screening. A part of these predicted peptides is now commercially available from PharmaDesign, Inc. (PharmaGPEP series, http://www.pharmadesign.co.jp/eng/ index.html).
PERSPECTIVES Screening methods used to search for biologically active ligand among the predicted peptide candidates were not described in this article because of the limited space. In
s ignaling of many orphan GPCRs, it is difficult to specify the responsible type(s) of G protein α subunits and to choose adequate assay procedures. Therefore, multiple assay procedures are often combined for screening. Identification of a true ligand for an orphan receptor and elucidation of its physiological and pathophysiological roles require that researchers have much patience. Discovery of novel peptide ligands is decreasing in recent years. It has also been pointed out by some investigators that an orphan GPCR and a known receptor can form a heterodimer. If so, in such situations, the orphan receptor itself might have no specific endogenous ligand and only serve as a regulatory protein through the dimer formation.12,13 However, as described above, through the systematic in silico prediction method based on the information of the opened protein database, the number of amide type peptides predicted as GPCR ligand candidates may not be too many. A combination of well-established expression systems of orphan GPCRs and adequately tuned assay procedures may hopefully lead to the discovery of novel ligands. In contrast, as to nonamide type peptide candidates predicted from the systematic in silico search, the number of the predicted peptides are still too many for practical screening, and future development of more effective and more reliable prediction methods is awaited to find a more limited number of peptide candidates of good quality. For the detection of endogenous bioactive peptides whose tissue content is very low, development of multiple assay systems has been improved in recent years, and for identification of the detected peptides, technical sensitivity has been increased dramatically by the introduction of mass spectrometry. However, to overcome the difficulties for peptides whose tissue content is very low, or which is expressed at very limited sites, it may be necessary to improve in silico prediction methods by bioinformatics.10,13,19,20 Because identification of novel ligands of orphan GPCRs is inevitable to clarify a novel mechanism in biological regulation, continuous efforts are required to discover novel ligands. For this purpose, improved methods to predict peptide ligand candidates, together with improved screening technique, may be a powerful tool for researchers.
1748
Chapter | 239 In Silico Search for Biologically Active Peptides
REFERENCES 1. Alexander SPH, Mathie A, Peters JA. 7TM receptors. In: Guide to Receptors and Channels (GRAC), 4th ed. Br J Pharmacol 2009; 158:(Suppl. 1):S5–101. 2. Ames RS, Sarau HM, Chambers JK, Willette RN, Aiyar NV, Romanic AM, et al. Human urotensin-II is a potent vasoconstrictor and agonist for the orphan receptor GPR14. Nature 1999;401: 282–6. 3. Boels K, Schaller HC. Identification and characterisation of GPR100 as a novel human G protein-coupled bradykinin receptor. Br J Pharmacol 2003;140:932–8. 4. Civelli O. GPCR deorphanizations: the novel, the known and the unexpected transmitters. Trends Pharmacol Sci 2005;26:15–9. 5. Eipper BA, Milgram SL, Husten EJ, Yun HY, Mains RE. Peptidylglycine alpha-amidating monooxygenase: a multifunctional protein with catalytic, processing, and routing domains. Protein Sci 1993;2:489–97. 6. Foord SM. Receptor classification: post genome. Curr Opin Pharmacol 2002;2:561–6. 7. Hook V, Funkelstein L, Lu D, Bark S, Wegrzyn J, Hwang SR. Proteases for processing proneuropeptides into peptide neurotransmitters and hormones. Annu Rev Pharmacol Toxicol 2008;48:393–423. 8. Howard AD, McAllister G, Feighner SD, Liu Q, Nargund RP, Van der Ploeg LH, et al. Orphan G protein-coupled receptors and natural ligand discovery. Trends Pharmacol Sci 2001;22:132–40. 9. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001;409:860–921. 10. Kliger Y. Computational approaches to therapeutic peptide discovery. Biopolymers 2010;94:701–10. 11. Kojima M, Hosoda H, Date Y, Nakazato M, Matsuo H, Kangawa K. Ghrelin is a growth-hormone-releasing acylated peptide from stomach. Nature 1999;402:656–60. 12. Levoye A, Dam J, Ayoub MA, Guillaume JL, Jockers R. Do orphan G protein-coupled receptors have ligand-independent functions? New insights from receptor heterodimers. EMBO Rep 2006;7:1094–8.
13. Levoye A, Clement N, Tenconi E, Jockers R. Past and future strategies for GPCR deorphanization. In: Gilchrist A, editor. GPCR molecular pharmacology and drug targeting, shifting paradigms and new directions. New Jersey: John Wiley & Sons Inc; 2010. p.165–90. 14. Nakayama K. Furin: a mammalian subtilisin/Kex2p-like endoprotease involved in processing of a wide variety of precursor proteins. Biochem J 1997;327:625–35. 15. Reinscheid RK, Nothacker HP, Bourson A, Ardati A, Henningsen RA, Bunzow JR, et al. Orphanin FQ: a neuropeptide that activates an opioidlike G protein-coupled receptor. Science 1995;270:792–4. 16. Saito Y, Nothacker HP, Wang Z, Lin SH, Leslie F, Civelli O. Molecular characterization of the melanin-concentrating-hormone receptor. Nature 1999;400:265–9. 17. Sakurai T, Amemiya A, Ishii M, Matsuzaki I, Chemelli RM, Tanaka H, et al. Orexins and orexin receptors: a family of hypothalamic neuropeptides and G protein-coupled receptors that regulate feeding behavior. Cell 1998;92:573–85. 18. Seidah NG, Mayer G, Zaid A, Rousselet E, Nassoury N, Poirier S, et al. The activation and physiological functions of the proprotein convertases. Int J Biochem Cell Biol 2008;40:1111–25. 19. Shemesh R, Toporik A, Levine Z, Hecht I, Rotman G, Wool A, et al. Discovery and validation of novel peptide agonists for G proteincoupled receptors. J Biol Chem 2008;283:34643–9. 20. Shichiri M, Ishimaru S, Ota T, Nishikawa T, Isogai T, Hirata Y. Salusins: newly identified bioactive peptides with hemodynamic and mitogenic activities. Nat Med 2003;9:1166–72. 21. Szekeres PG, Muir AI, Spinage LD, Miller JE, Butler SI, Smith A, et al. Neuromedin U is a potent agonist at the orphan G protein- coupled receptor FM3. J Biol Chem 2000;275:20247–50. 22. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science 2001;291:1304–51. 23. Wise A, Jupe SC, Rees S. The identification of ligands at orphan G protein coupled receptors. Annu Rev Pharmacol Toxicol 2004;44: 43–66. 24. Xiao SH, Reagan JD, Lee PH, Fu A, Schwandner R, Zhao X, et al. High throughput screening for orphan and liganded GPCRs. Comb Chem High Throughput Screen 2008;11:195–215.