In-silico determination of Pichia pastoris signal peptides for extracellular recombinant protein production

In-silico determination of Pichia pastoris signal peptides for extracellular recombinant protein production

Journal of Theoretical Biology 364 (2015) 179–188 Contents lists available at ScienceDirect Journal of Theoretical Biology journal homepage: www.els...

1002KB Sizes 0 Downloads 57 Views

Journal of Theoretical Biology 364 (2015) 179–188

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

In-silico determination of Pichia pastoris signal peptides for extracellular recombinant protein production Aslan Massahi a,b, Pınar Çalık a,b,n a b

Industrial Biotechnology and Metabolic Engineering Laboratory, Chemical Engineering Department, Middle East Technical University, 06800 Ankara, Turkey Department of Biotechnology, Graduate School of Natural and Applied Sciences, Middle East Technical University, 06800 Ankara, Turkey

H I G H L I G H T S

    

Promising endogenous secretory signal peptides of Pichia pastoris were identified. The highest endogenous D-scores obtained were: 0.932, 0.918, and 0.910. Eight signal peptides had D-score higher than that of Saccharomyces cerevisiae α-mating factor. Verified secretory signal peptides had signal peptide-ness score (D-score) of 40.8. Overall, SignalP, Phobius, and WolfPsort predicted 82% same cleavage sites.

art ic l e i nf o

a b s t r a c t

Article history: Received 27 May 2014 Received in revised form 13 August 2014 Accepted 27 August 2014 Available online 8 September 2014

In-silico identified novel secretory signal peptides (SPs) are required in vivo to achieve efficient transfer or to prevent other cellular proteins from interfering with the process in extracellular recombinant protein (r-protein) production. 56 endogenous and exogenous secretory SPs, have been used or having the potential to be used in Pichia pastoris for r-protein secretion, were analyzed in-silico using the softwares namely SignalP4.1, Phobius, WolfPsort0.2, ProP1.0, and NetNGlyc1.0. Among the predicted 41 endogenous secretory SPs, five of them have been used in P. pastoris, and regarded as positive controls; whereas, 36 of them have not been used. Amongst, the predicted cleavage site for each of the 32 endogenous secretory SPs was found to be same by the three programs. The secretory SPs having the highest D-scores, the score quantifying the signal peptide-ness of a given sequence segment, were: MKILSALLLLFTLAFA (D ¼0.932), MRPVLSLLLLLASSVLA (D¼ 0.932), MFKSLCMLIGSCLLSSVLA (D¼0.918). As D-scores of these SPs are higher than that of Saccharomyces cerevisiae α-mating factor signal peptide which has been widely used for r-protein production, they can be considered as the promising candidates. Among the predicted 15 exogenous SPs, 11 have been used in P. pastoris: therefore, they were evaluated as positive controls. The three programs predicted a unique cleavage site for each of the 10 exogenous SPs; and D-scores of these SPs were within D¼ 0.805–0.900; whereas, four exogenous secretory SPs, which have not been used in P. pastoris, have D-scores within D¼ 0.494–0.702. & 2014 Elsevier Ltd. All rights reserved.

Keywords: In-silico Signal peptide Pichia pastoris Secretion mechanism Protein

1. Introduction Signal peptides (SPs) associate synthesized secretory proteins with membrane by sponsoring transfer of the attached protein into the membrane; thus, a novel signal peptide (SP) based on metabolic engineering design commences a new era in extracellular recombinant protein (r-protein) production. In recent years, n Corresponding author at: Department of Chemical Engineering, Middle East Technical University, 06800 Ankara, Turkey. Tel.: þ90 312 210 43 85; fax: þ90 312 210 26 00. E-mail address: [email protected] (P. Çalık).

http://dx.doi.org/10.1016/j.jtbi.2014.08.048 0022-5193/& 2014 Elsevier Ltd. All rights reserved.

the yeast Pichia pastoris has become one of the most successful and popular host systems for heterologous protein production, as it grows rapidly on an inexpensive minimal medium at high cell densities and secretes the r-protein(s) to the fermentation medium which, consequently, simplifies the downstream processes. For r-protein production, among the crucial steps which include several physiological and genetic factors, utilizing a secretory SP for transferring the r-protein to the extracellular medium is one of the challenging steps (Çelik and Çalık, 2012). In spite of its importance in extracellular protein production, the choice of a secretory SP is rather arbitrary and it is based on try and error experiments (Cereghino et al., 2002; Damasceno et al., 2012;

180

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

Fig. 1. Schematic representation of tripartite structure of a secretory SP (a), and two Sec-dependent translocation pathways in eukaryotes (b). SPase: signal peptidase I enzyme, SRP: signal recognition particle, DP: docking protein, Pro: pro-sequence, ER: endoplasmic reticulum, PTMs: Post-translational modifications.

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

Macauley-Patrick et al., 2005; Sreekrishna et al., 1997). Secretory SPs can be very different in terms of secretion efficiency for different r-proteins (Liang et al., 2013). All secreted proteins as well as many transmembrane (TM) proteins are synthesized with N-terminal SPs. Functioning as an “address tag” or “zip-code” in directing proteins to their proper cellular and extracellular locations, SPs control the entry of virtually all secretory proteins to the secretion pathway both in eukaryotes and prokaryotes (Chou and Shen, 2007). Although SPs generally locate in the N-terminal for most secreted proteins and TM proteins, they can also be detected within or at the C-terminal of the protein (Chou, 2002), such as nuclear localization SPs and peroxisomal targeting signals which can be found in both N-terminal and C-terminal (Nakai, 2000). A three-region structure has been conserved for the secretory SPs namely, N-region, H-region, and C-region (Martoglio and Dobberstein, 1998; Monod et al., 1989; Nakai, 2000; Nagarajan, 1993) (Fig. 1). The N-region locates at the N-terminal of the secretory SP and has basic segments often with positivelycharged polar residues. The H-region spans the ER membrane and facilitates the translocation of nascent polypeptide; its hydrophobic α-helical domain commonly has 7–15 amino acids which is shorter compared to the hydrophobic TM helix (Nagarajan, 1993). The C-region in the C-terminal of the secretory SP has slightly polar structure and serves as a recognition site for cleavage of secretory SP by signal peptidase I enzyme (Kjarulff and Jensen, 2005; Stroud and Walter, 1999); a “weak consensus pattern” as a clear motif is detected to determine the cleavage site (Nakai, 2000) where, small and neutral residues are located at 1 and  3 position relative to the cleavage site (Fig. 1). Classical endoplasmic reticulum (ER)/Golgi-dependent and non-classical ER/Golgi-independent pathways are considered as two methods in the secretion of proteins in eukaryotes. Cotranslational signal recognition particle (SRP)-dependent and post-translational SRP-independent translocation are the two proposed mechanisms for classical secretion pathway (Fig. 1). In SRP-dependent pathway, the secretory SP is recognized by SRP while protein is being translated; whereas, in SRP-independent pathway, the secretory SP is recognized by a protein complex on the ER membrane after translation. Since these two pathways require collaboration of Sec protein through interacting at the Sec61p translocon on ER membrane, both of them are considered Sec-dependent. Yeast and bacterial SRPs bind preferentially to highly hydrophobic secretory SPs (Hegde and Bernstein, 2006); highly hydrophobic H-region, identified by the SRP flexible hydrophobic segment, directs proteins toward the SRP-dependent pathway (Martoglio and Dobberstein, 1998). However, hydrophobicity is not the only requirement for SRP-dependent pathway (Stroud and Walter, 1999). Higher productivity and authentic N-terminus are the two inalienable issues in r-protein production. When an efficient secretory SP is used, higher amounts of extracellular r-protein is produced as expected, if there is no limitation in the functioning of the intracellular reaction network. Consequently, the secretion efficiency of proteins can be enhanced by optimizing the SPs through investigating different SPs (Kober et al., 2013). Reliable and fast computational programs for predicting the SPs are required for in-silico analyses of the overwhelming proteome data. By taking into account the great variation in both length and amino acid sequences, the prediction of SPs has become complex and confusing. Therefore, researchers have proposed several models and algorithms to overcome the obstacles in identification of SPs (Chou, 2001a, 2001b, 2001c, 2002; Emanuelsson et al. 2000; Horton et al., 2007; Käll et al., 2004; Liu et al., 2005, 2007; Petersen et al., 2011; Wang and Yang, 2005). In the present study, five computer programs namely, SignalP4.1, Phobius, WolfPsort0.2,

181

ProP1.0, and NetNGlyc1.0 were utilized. SPs were identified through analyzing the extracellular proteins of P. pastoris as well as available extracellular proteins of other yeast species and organisms. The sequences of secretory SPs were reported along with corresponding H-regions; further, their D-scores quantifying the signal peptide-ness, and their probable pro-regions were represented. In addition, possible glycosylation sites of proregions were also analyzed to verify whether the presence of these sites can halt the cleavage of the pro-region or not. Finally, the promising candidates were given to be used in extracellular r-protein production by P. pastoris.

2. Materials and methods 2.1. Sequences The amino acid sequences of the endogenous proteins found in extracellular medium of the P. pastoris determined by Huang et al. (2011) were down-loaded from NCBI (http://www.ncbi.nlm.nih. gov; 20.03.2014) and then analyzed for determination of their secretory SPs (SP1–SP41) (Table 1). In addition to these proteins, the amino acid sequences of the exogenous proteins (SP42–SP56) obtained from UNIPROT (http://www.uniprot.org; 20.03.2014) were also analyzed. 2.2. Computational tools Among the proposed models and tools for identifying SPs and their cleavage sites, Signal-CF (Chou and Shen, 2007) and Signal-3L (Shen and Chou, 2007) have been introduced as two promising tools. Signal-3L could correct SPs miss-predicted by SignalP3.0 (Shen and Chou, 2007) and Signal-CF showed the best performance among the known SP predictor tools especially in the case of long SPs (Hiss and Schneider, 2009). In addition, PrediSi (Hiller et al., 2004) and SPEPlip (Fariselli et al., 2003) are the other two predictive models which have been widely used for the prediction of SPs. Compared to PrediSi, SignalP2.0 (both SignalP-Neural Network and SignalP-Hidden Markov Model) seems to be slightly more reliable in predicting the eukaryotic SPs (Hiller et al., 2004). Furthermore, signal-CF was found more accurate compared to PrediSi in discriminating secretory and non-secretory proteins as well as the correct prediction of the cleavage point of the secretory SP (Chou and Shen, 2007). In the case of fungal proteins, the highest secretory SP prediction accuracy was achieved by using SignalP (version 3.0 and 4.0), WolfPsort, and Phobius programs in combination (Melhem et al., 2013; Min, 2010). Thus, in this work, SignalP4.1, WolfPsort, and Phobius programs were regarded as prediction tools. Huang et al. (2011) reported that 41 endogenous proteins found in the extracellular medium of P. pastoris have secretory SP sequences; however, they have not given the sequences. SignalP4.1, WolfPsort, and Phobius were utilized to identify these sequences.  SignalP4.1 (Petersen et al., 2011) (http://www.cbs.dtu.dk/ser vices/SignalP/; 20.03.2014) SignalP4.1 can predict the secretory SP, the cleavage site of signal peptidase I, through neural network method. The prediction is based on the D-score (discrimination score). Thus, the program deals with the sequences in terms of cleavage site prediction and discrimination of different segments in the secretory SP. The decision about probable secretory SP is made by D-score. Sequences having D-score40.7 are considered to be a secretory SP with high probability (Liang et al., 2013). In SignalP4.1 there is an option for user to tune cut-off values to increase the sensitivity

182

Table 1 List of endogenous and exogenous SPs for Pichia pastoris. SP No.

Gene Index (P.pastoris ORF)

Function/localization of corresponding proteina

Predicted SP (pre-sequence…prosequence) (# amino acids of pre-sequence predicted by SignalP4.1)

Pro-sequence No.

254564921

2

254574366

3

254569190

4

254568502

5

254567221

6

254573232

7

254572688

8 9

254570259 254565617

10

254566893

11

254570078

12 13

254568684 254567645

14

254570357

15

254564915

16

254573228

17

254565329

18

254573224

19 20 21

254567547 254573944 254565023

22 23

254568260 254565391

24 25

254572565 254569896

Cell wall protein with similarity to glucanases (Huang MQVKSIVNLLLACSLAVAc et al., 2011) MQFNWNIKTVASILSALTLAQA Protein disulfide isomerase, multifunctional protein resident in the endoplasmic reticulum lumen (Huang et al., 2011) Hypothetical protein (Huang et al., 2011) MYRNLIIATALTCGAYS… AYVPSEPWSTLTPDASLESALKDYSQTFGIAIKSLDADKIKRf Major exo-1,3-beta-glucanase of the cell wall, involved MNLYLITLLFASLCSA…ITLPKR in cell wall beta-glucan assembly (Huang et al., 2011) Lectin-like protein with similarity to Flo1p, thought to MFEKSKFVVSFLLLLQLFCVLGVHG be expressed and involved in flocculation (Huang et al., 2011) Putative protein of unknown function (Huang et al., MQFNSVVISQLLLTLASVSMG 2011) Mitochondrial outer membrane and cell wall localized MKSQLIFMALASLVAS….APLEHQQQHHKHEKR SUN family member (Huang et al., 2011) Hypothetical protein (Huang et al., 2011) MKFAISTLLIILQAAAVFA Peptidyl-prolyl cis-trans isomerase(cyclophilin) of the MKLLNFLLSFVTLFGLLSGSVFA endoplasmic reticulum (Huang et al., 2011) Endo-beta-1,3-glucanase, major protein of the cell wall, MIFNLKTLAAVAISISQVSA involved in cell wall maintenance (Huang et al., 2011) Protein of the SUN family (Sim1p, Uth1p, Nca3p,Sun4p) MKISALTACAVTLAGLAIA…. that may participate in DNA replication (Huang et al., APAPKPEDCTTTVQKRHQHKR 2011) Hypothetical protein (Huang et al., 2011) MSYLKISALLSVLSVALA Cell wall protein with similarity to glucanases (Huang MLSTILNIFILLLFIQASLQ et al., 2011) Protein of unknown function, has similarity to Pry1p MKLSTNLILAIAAASAVVSA…. and Pry3p and to the plant PR-1 class of pathogen APVAPAEEAANHLHKR (Huang et al., 2011) Beta-1,3-glucanosyltransferase, required for cell wall MFKSLCMLIGSCLLSSVLA assembly (Huang et al. 2011) O-glycosylated protein required for cell wall stability MKLAALSTIALTILPVALA (Huang et al., 2011) Daughter cell-specific secreted protein with similarity MSFSSNVPQLFLLLVLLTNIVSG to glucanases, endo-1,3-beta-glucanase (Huang et al., 2011) Putative protein of unknown function (Huang et al., MQLQYLAVLCALLLNVQS…. KNVVDFSRFGDAKISPDDTDLESRERKR 2011) Hypothetical protein (Huang et al., 2011) MKIHSLLLWNLFFIPSILG Hypothetical protein (Huang et al., 2011) MSTLTLLAVLLSLQNSALA Mucin family member (Huang et al., 2011) MINLNSFLILTVTLLSPALA…. LPKNVLEEQQAKDDLAKR Hypothetical protein (Huang et al., 2011) MFSLAVGALLLTQAFG MKILSALLLLFTLAFA Protein disulfide isomerase, multifunctional protein resident in the endoplasmic reticulum lumen (Huang et al., 2011) Hypothetical protein (Huang et al., 2011) MKVSTTKFLAVFLLVRLVCA Cell wall protein that contains a putative GPIMQFGKVLFAISALAVTALG attachment site (Huang et al., 2011)

N-glyc.

SignalP4.1 D-score

WolfPsort0.2 Phobius

Experimental Confirmation

b

(18)





0.707

þd

þd

NYC

e

(22)





0.747

þ

þ

NYC

42



no SPg

17 aah

17 aa

(Khasa et al., 2011)

(16)

6



0.860

þ

þ

(Liang et al., 2013)

(25)





0.885

þ

þ

NYC

(21)





0.694

þ

þ

NYC

(16)

15



0.695

17 aa

þ

NYC

(19) (23)

– –

– –

0.838 0.904

þ þ

þ þ

NYC NYC

(20)





0.651

þ

þ

NYC

(19)

21



0.694

þ

þ

NYC

(18) (20)

– –

– –

0.760 0.925

þ þ

þ þ

(20)

16



0.822

þ

26 aa

NYC (Liang et al., 2013) NYC

(19)





0.918

þ

þ

NYC

(19)





0.861

þ

þ

NYC

(23)





0.893

þ

þ

(Liang et al., 2013)

(18)

28



0.699

þ

þ

NYC

(19) (19) (20)

– – 18

– – –

0.875 0.910 0.868

þ þ þ

þ þ þ

NYC NYC NYC

(16) (16)

– –

– –

0.644 0.932

þ þ

þ þ

NYC NYC

(20) (19)

– –

– –

0.897 0.761

þ þ

þ 26 aa

NYC NYC

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

1

b

254572672

27

254573438

28 29 30 31

254567898 254569230 254569662 254567750

32 33

254570525 254567531

34

254565679

35 36 37

254567499 254570227 254573778

38

254572447

39

254572379

40

254564917

41

254566331

42

Source of exogenous SP (other than P. pastoris) Saccharomyces cerevisiae

43

S. cerevisiae

44

Saccharomyces spp.

45

46 47

Hypothetical protein (Huang et al., 2011) Phosphatidylglycerol / phosphatidylinositol transfer protein (Huang et al., 2011) Cell wall protein that functions in the transfer of chitin to beta(1-6) glucan (Huang et al., 2011) Protein ROT1 (Huang et al., 2011) Hypothetical protein (Huang et al., 2011) Putative chitin transglycosidase, cell wall protein (Huang et al., 2011) Vacuolar aspartyl protease (proteinase A) (Huang et al., 2011) Putative integral membrane protein (Huang et al., 2011) Beta-1,3-glucanosyltransferase, required for cell wall assembly (Huang et al., 2011) Hypothetical protein (unknown function, has similarity to Pry1P and Pry3p) (Huang et al., 2011)

Alpha pheromone precursor, the active factor is excreted into the culture medium by haploid cells of the alpha mating type and acts on cells of the opposite mating type PHO5, repressible acid phosphatase (1 of 3) that also mediates extracellular nucleotide-derived phosphate hydrolysis; secretory pathway-derived cell surface glycoprotein SUC2 (invertsae), hydrolysis of sucrose

MWSLFISGLLIFYPLVLG

(18)





0.883

þ

þ

NYC

MRNHLNDLVVLFLLLTVAAQA

(21)





0.885

þ

23 aa

NYC

MFLKSLLSFASILTLCKA MFVFEPVLLAVLVASTCVTA MVSLRSIFTSSILAAGLTRAHG MFSPILSLEIILALATLQSVFA

(18) (20) (22) (22)

– – – –

– – – –

0.863 0.735 0.540 0.865

þ þ 20 aa þ

þ þ þ þ

MIINHLVLTALSIALA MLALVRISTLLLLALTASA

(16) (19)

– –

– –

0.565 0.885

þ 20 aa

þ þ

NYC NYC NYC (Murasugi and TohmaAiba, 2001; Yoshimasu et al., 2002) NYC NYC

MRPVLSLLLLLASSVLA

(17)





0.932

þ

þ

NYC

MVLIQNFLPLFAYTLFFNQRAALA MKFPVPLLFLLQLFFIIATQG MVSLTRLLITGIATALQVNA

(24) (21) (20)

– – –

– – –

0.546 0.852 0.708

45 aa þ þ

þ þ þ

NYC NYC NYC

MIFDGTTMSIAIGLLSTLGIGAEA

(24)





0.477

þ

þ

NYC





no SP

31 aa

29 aa

NYC

MVLVGLLTRLVPLVLLAGTVLLLVFVVLSGG MLSILSALTLLGLSCA

(16)





0.818

þ

þ

NYC

MRLLHISLLSIISVLTKANA

(20)





0.822

þ

þ

NYC

MRFPSIFTAVLFAASSALA… APVNTTTEDETAQIPAEAVIGYLDLEGDFDVAVLPFSNSTNNGLLFINTTIASIAAKEEGVSLDKR…EAEA MFKSVVYSILAASLANA

(19)

66i

3

0.885

þ

þ

(Batra et al., 2010)

0.494

þ

þ

NYC

(17) (Arima et al., 1983)

(19)





0.834

þ

þ

Phaseolus vulgaris PHA-E (Phaseolus vulgaris Erythroagglutinin), lectin found in plants. It agglutinates erythrocytes

MASSNLLSLALFLVLLTHANS

(21)





0.889

þ

þ

Kluyveromyces lactis Pichia acaciae

Killer toxin, hypothetical protein

MNIFYIFLFLLSFVQG…LEHTHRRGSLVKR

(16)

13



0.843

þ

þ

Killer toxin, product of linear DNA plasmid pPac1-2 similar to Kluyveromyces lactis killer toxin Killer toxin, K28 preprotoxin (M28 virus)

MLIIVLLFLATLANS….. LDCSGDVFFGYTRGDKTDVHKSQALTAVKNIKR MESVSSLFNIFSTIMVNYKSLVLALLSVSNLKYARG… MPTSERQQGLEER MFAFYFLTACISLKGVFG

(15)

33



0.882

þ

þ

(36)

13

j



no SP

31 aa.

no SP

(18)





0.702

þ

þ

(Paifer et al., 1994, Kuwae et al., 2005) (Raemaekers et al., 1999) (Kato et al., 2001) (Crawford et al., 2003) (Eiden-Plach et al., 2004) NYC

MRFSTTLATAATALFFTASQVSA

(23)





0.703

þ

þ

NYC

MKFAYSLLLPLAGVSA…SVINYKR

(16)

7



0.685

þ

þ

NYC

MKFFAIAALFAAAAVA…QPLEDR

(16)

7



0.805

þ

þ

S.cerevisiae

49

S.carlsbergensis

50

S.cereviciae

51

Kluyveromyces marxianus

MEL1 (melibiase or α-galactosidase), pre-alpha galactosidase (melibiase) – probably extracellular ! BGL2 (endo-beta-1,3-glucanase), major protein of the cell wall, involved in cell wall maintenance INU (inulinase), hydrolyzing the beta-D-2,1-fructan fructoside of inulin – extracellular

183

MLLQAFLFLLAGFAAKISA

48

52

Hypothetical protein (unknown function) (Huang et al., 2011) Ferric reductase and cupric reductase (Huang et al., 2011) Hypothetical protein (Huang et al., 2011) Ferro-O2-oxidoreductase (Huang et al., 2011) Hypothetical protein (Huang et al., 2011) One of three repressible acid phosphatases, a glycoprotein that is transported to the cell surface (Huang et al., 2011)

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

26

184

Table 1 (continued ) SP No.

Gene Index (P.pastoris ORF)

Function/localization of corresponding proteina

Predicted SP (pre-sequence…prosequence) (# amino acids of pre-sequence predicted by SignalP4.1)

Pro-sequence No.

54

Human

55

Chicken (Gallus gallus)

56

Bos Taurus (Bovine)

a

Hydrophobin I, contributes to surface hydrophobicity Secreted-cell wall Hydrophobin II, responsible for spore hydrophobicity and protection spore wall, secreted (cell wall) Serum albumin, main function is the regulation of the colloidal osmotic pressure of blood. Major zinc transporter in plasma Lysozymes have primarily a bacteriolytic function; those in tissues and body fluids are associated with the monocyte-macrophage system and enhance the activity of immunoagents. Beta casein

N-glyc.

SignalP4.1 D-score

WolfPsort0.2 Phobius

Experimental Confirmation

b

(Kottmeier et al., 2011) (Kottmeier et al., 2011) (Xiong et al., 2008)

MQFFAVALFATSALA

(15)





0.854

þ

þ

MKWVTFISLLFLFSSAYS…RGVFRR

(18)

6k



0.848

þ

þ

MRSLLILVLCFLPLAALG

(18)





0.897

þ

þ

(Oka et al., 1999)

MKVLILACLVALALA

(15)





0.900

þ

þ

(Zuyong et al., 2012)

Functions of SP1 to SP41 have been obtained from Huang et al. research in 2011. “No.” refers to the number of amino acids in the pro-sequence and “N-glyc.” refers to the number of N-glycosylation sites in the pro-sequence. c The bold sequences are the ones that have been predicted in this work. The underlined part of the SPs is the “H-region” predicted by Phobius program. d The (þ ) signs in WolfPsort and Phobius analyses refer to the same cleavage site prediction obtained by SignalP4.1; different cleavage site prediction results (than SignalP4.1) by WolfPsort and Phobius were indicated with the number of amino acids in each program individual column. e “NYC”: has not yet been controlled in P. pastoris expression system. f In the case of presence of “pro” sequence, it has been separated from the “pre” part by “…“. g “no SP”: no secretory signal peptide was predicted. h “aa”: amino acid. i In the column of pro-sequence, the underlined numbers show that the length of the pro-sequence was obtained based on literature or data banks and not ProP program. j Obtained from Riffer et al. (2002). k The pro-sequence is the prediction of ProP program, however, in the UNIPROT it has been mentioned to be four amino acids. b

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

53

Trichoderma reesei T. reesei

b

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188









of the program. This trade off will lead to a slightly higher false positive rate. In this research, D-cut-off values were selected as default which was optimized for the each organism group; the value is 0.45 for the sequences that may contain TM regions and 0.5 for sequences with no TM region. SignalP4.1 uses SignalP-TM as a pre-processor to make a decision on using either SignalP-TM or SignalP-noTM in the final prediction. In all 56 SPs, the SignalP-TM was used but the program recruited the SignalP-no TM by its own evaluation, except for SP39 where the program reported the result by applying SignalP-TM. It should be emphasized that SignalP4.1 is able to discriminate between SPs and non-SPs in a realistic setting. Phobius (Käll et al., 2004) (http://phobius.binf.ku.dk/; 20.03. 2014) Phobius has been developed with a specific algorithm for secretory SP prediction by neglecting membrane proteins, as the initial TM helix of a membrane protein can be confused with secretory SP. Phobius discriminates TM segments from SPs. In Phobius, the cross prediction between TM helix and SP is smaller comparing other TM predictor programs, e.x. TMHMM (Käll et al., 2004). It should be noted that Phobius has lower sensitivity and accuracy in prediction of SPs and their cleavage sites in comparison to SignalP (Käll et al., 2004). WolfPsort 0.2 (Horton et al., 2007) (http://wolfpsort.org/; 20.03. 2014) Resident proteins of ER, Golgi complex, lysosomes, endosomes, and plasma membrane enter the ER along with secreted proteins; thus, need to be distinguished from the extracellular proteins. WolfPsort can predict the subcellular localization of the protein along the secretory pathway; WolfPsort overall accuracy in predicting the localization is over 80%. Further, it predicts a possible cleavage site for the secretory SP. WolfPsort converts the amino acid sequence of protein into numerical localization features. For common localization sites such as cytosol, nucleus, mitochondria WolfPsort performs better than majority classifier predictions even for sequences that do not have strong sequence similarity to any sequence in the dataset (Horton et al., 2007). The performance of WolfPsort in prediction of fungal secreted proteins is higher than SignalP4.1, SignalP3.0, and Phobius individually (Melhem et al., 2013; Min, 2010). ProP1.0 (Duckert et al., 2004) (http://www.cbs.dtu.dk/services/ ProP/;20.03.2014) ProP1.0 predicts the possible cleavage sites of pro regions by proprotein convertase family members. The “pro” region of the secretory SP remains un-cleaved after cleavage of “pre” region by signal peptidase I and remains until cleaved by proprotein convertase in Golgi apparatus (Fig. 1). However, in the program the default is the furin-specific propeptide cleavage site prediction which can be altered by labelling the box of general proprotein convertase cleavage site prediction. In P. pastoris genome (GS115) there is just one Kex2 proprotein convertase located in chromosome 2, thus, the analyses were carried out by general proprotein convertase method. NetNGlyc1.0 (http://www.cbs.dtu.dk/services/NetNGlyc/; 20.03. 2014) This program is also used in the analyses as the putative N-glycosylation sites on “Pro” region can interfere with the cleavage procedure. NetNGlyc1.0 predicts the possible N-glycosylation sites by using the Asn-X-Ser/Thr sequence. However, it should be reminded that not all of the sequons are modified. The training of artificial neural network has been conducted in a manner to discriminate between glycosylated and non-glycosylated motifs. NetNglyc1.0 can rapidly scan whole proteoms.

3. Results 41 endogenous proteins with putative secretory SPs found in extracellular medium of P. pastoris (Huang et al., 2011) and 15

185

exogenous proteins from other organisms were analyzed by three computer programs including: SignalP4.1, WolfPsort, and Phobius to predict the secretory SPs and their cleavage sites (Table 1). Thereafter, ProP1.0 and NetNGlyc1.0 programs were used for the prediction of the pro-sequences and the putative N-glycosylation sites, respectively. The number of amino acids in the presequences was shown in the “predicted SP” column (Table 1). For the prediction of secretory SPs, SignalP4.1 was used as primary predictive program; following that WolfPsort and Phobius were used as validating programs. The ( þ) signs in WolfPsort and Phobius analyses column show that the prediction results are in agreement with the SignalP4.1 program results. On the other hand, different cleavage site prediction results obtained by WolfPsort and Phobius were indicated in individual column with the number of amino acids predicted by the corresponding program. Endogenous secretory SPs: Among the predicted 41 endogenous secretory SPs, five of them have been used in P. pastoris, thus regarded as positive controls; whereas, 36 of them have not been used in P. pastoris yet. The predicted cleavage site for 32 endogenous secretory SPs was consistent based on the predictions of the three programs (Table 1). For 40 endogenous secretory SPs at least two programs predicted same cleavage site for secretory SP; in the case of SP39 three programs found three different results. For SP3 and SP39, in contrast to WolfPsort and Phobius, SignalP4.1 could not predict any signal peptide. For SP7, SP14, SP25, SP27, SP30, SP33 and SP35, SignalP4.1, WolfPsort, and Phobius programs predicted the presence of a secretory SP; however, in the cleavage site determination inconsistencies were found. Among nine endogenous secretory SPs with inconsistent predictions, SP27 and SP39 having, respectively, seven and four TM helices were predicted as multi-spanning TM proteins by Phobius and TMHMM programs, based on their function and localization (results not shown). Unlike SP39, SignalP4.1 did not recruit SignalP-TM for SP27, in spite of finding more than four TM regions specifically seven, in pre-processing step. Many membrane proteins do not possess a removable SP (Nakai, 2000); however, it is possible that SP27, as an integral component of the membrane, meets the substrate specificity of signal peptidase I and makes its SP cleaved. Endogenous secretory SPs having higher D-scores, not yet been experimentally controlled, were SP23 (0.932), SP34 (0.932), SP15 (0.918), SP20 (0.910), SP9 (0.904), and SP24 (0.897). 36 endogenous secretory SPs that have not yet been used in P. pastoris should be verified experimentally for different fusion proteins. On the other hand, five SPs namely SP3, SP4, SP13, SP17, and SP31 which were previously utilized in P. pastoris were considered as positive controls (Khasa et al., 2011; Liang et al., 2013; Murasugi and Tohma-Aiba, 2001; Yoshimasu et al., 2002). Excluding SP3, the analyses results were in good agreement with experimental results. Exogenous secretory SPs: Among 15 analyzed exogenous secretory SPs (SP42–SP56), 11 of them, including SP42, SP44, SP45, SP46, SP47, SP48, SP52, SP53, SP54, SP55, SP56, were previously utilized in P. pastoris (Table 1); therefore, they were regarded as positive controls. On the other hand, three of these exogenous secretory SPs (SP49, SP50, and SP51) were used in S. cerevisiae but have not yet been utilized in P. pastoris (Achstetter et al., 1992; Chung et al., 1996; Hofmann and Schultz, 1991). Among 11 exogenous secretory SPs previously utilized in P. pastoris, SignalP4.1, WolfPsort, and Phobius predicted same cleavage site in the case of 10 SPs and their D-scores were between D ¼0.805–0.900; whereas, only for SP48 two programs predicted no secretory SP. For M28 virus (SP48), the conducted research (Eiden-Plach et al., 2004)

186

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

revealed that its secretory SP contains 36 amino acids; however, SignalP4.1 and Phobius did not predict any secretory SP; in contrast, WolfPsort predicted a secretory SP consists of 31 amino acids. Related with PHO5 of S. cerevisiae (SP43), although the D-score of the secretory SP of PHO5 was obtained 0.494, which is regarded as a low value, the corresponding protein was reported to be secreted to the extracellular medium as it has a secretory SP containing 17 amino acids (Arima et al., 1983). SP43 was not used for r-protein production neither in S. cerevisiae nor in P. pastoris probably due to the low efficiency, as all successfully-utilized exogenous secretory SPs in P. pastoris have D-score40.8. In addition, D-scores of SP49, SP50, and SP51 were lower than 0.8; thus, SP49, SP50, and SP51 should have less priority than other promising candidates having D-score40.8. In order to compare the prediction results, the analyses were also conducted by using Signal-3L and Signal-CF programs and the findings were compared with SignalP4.1, Phobius, and WolfPsort findings. WolfPsort predicted secretory SP for all the proteins analyzed; whereas, Phobius did not predict secretory SP for one protein (SP48), and SignalP4.1 did not predict secretory SP for three proteins (SP3, SP39 and SP48). On the other hand, Signal-3L did not predict secretory SP for six proteins (SP2, SP13, SP28, SP30, SP32 and SP 41); whereas, Signal-CF did not predict secretory SP for nine proteins (SP10, SP15, SP21, SP23, SP25, SP30, SP32, SP38 and SP 42). Related with SP13 and SP42, no SP was found by Signal-3L and SignalCF respectively, which is in contrast to experimental evidences and SignalP4.1, Phobius, and WolfPsort results. On the other hand, for SP3 SignalP4.1 could not predict secretory SP, whereas, Signal-3L and Signal-CF programs predicted the presence of a secretory SP. However, the number of amino acids in SP3 was found as 17 and 18 by Signal-3L and Signal-CF, respectively. Similarly, for SP48, although no secretory SP was predicted by SignalP4.1, Signal-3L and Signal-CF predicted presence of a secretory SP; and the prediction of Signal-CF was same as the experimentally verified length (Eiden-Plach et al. 2004). Finally, for SP39, which has not yet been controlled experimentally in P. pastoris, in contrast to SignalP4.1, Signal-3L and Signal-CF predicted a secretory SP. Pro-sequences and N-glycosylation site analyses: By knowing that the pre-region of the secretory SP directs the protein to the ER and then is cleaved, meanwhile the protein molecule is translocated into the ER or shortly after completion of translocation (Kjarulff and Jensen, 2005), the “pro” region of the secretory SP will not assist in guiding the immature protein to the ER. ProP program prediction results were also presented in Table 1. Same concept is true about the glycosylation of the pro region which does not have any role in directing the preprotein toward the ER. The ProP results (Table 1) were obtained by general proprotein convertase pattern. For SP3 and SP46, ProP program results were in good agreement with the literature and data bank results in predicting the pro sequence length. The other pro-sequences have been determined either by ProP program, by referring to literature or data banks. In the case of SP42, SP48, SP51, and SP52, ProP program did not predict any pro-sequence where all of them have been reported to possess a pro-sequence. Considering SP54, it was predicted to have a 6-amino acid pro sequence but its pro sequence has four amino acids length in UNIPROT. Interestingly, ProP program could predict a 36-amino acid secretory SP for SP48 which is in accordance with the findings of Eiden-Plach et al. (2004). As an important parameter which affects protein secretion and protein maturation, the presence of putative N-glycosylation sites

in putative pro-sequences was also investigated by NetNGlyc program. Except S. cerevisiae α-mating factor (α-MF) pro-sequence, other specified pro-sequences were predicted to possess no putative N-glycosylation site.

4. Discussion and conclusion Computational methods provide the opportunity of rapid prediction of potential secretory SPs and their potential cleavage sites. Software programs SignalP4.1, Phobius, WolfPsort, and TMHMM were used to predict the final destination of the proteins; whereas, the putative cleavage sites of pro-sequences and the putative N-glycosylation sites were predicted, respectively, by ProP and NetNGlyc programs to gain adequate understanding of the secretory SPs structure. Even though available endogenous or exogenous secretory SPs have been successfully used for r-protein production by P. pastoris, novel SPs based on metabolic engineering design will commence a new era in extracellular r-protein production. A repertoire of secretory SPs can provide the opportunity of trying many choices and choosing the best option based on efficient secretion and authenticity of the r-protein. The microorganisms’ own secretory SPs obtained from their secretome analyses can be regarded as promising candidates. It is worth mentioning that the choice of secretory mode of expression is efficient and rational for proteins that are secreted in their native hosts (Cereghino and Cregg, 2000). Although secretory SPs are sufficient for targeting the precursor polypeptide to the secretory pathway, the protein sequence itself can also have effect on the secretion efficiency; namely, it cannot be determined by either one alone (Nagarajan, 1993). Correct cleavage of the secretory SP is of great importance in protein bioactivity (Kato et al., 2001) and N-terminal authenticity should be tested. The efficiency of the secretory SP cleavage may be influenced by the N-terminal amino acids of the mature protein; thus, the D-score and the cleavage site should be re-analyzed together with the desired protein sequence. Further, the substitution of especial residues that favour β-turn can affect the processing of secretory SP (Nagarajan, 1993). The secretory SPs having the highest D-scores are SP23 (0.932), SP34 (0.932), SP15 (0.918), SP20 (0.910), SP9 (0.904), and SP24 (0.897). As D-scores of these SPs are higher than that of S. cerevisiae α-MF signal peptide which has been widely used for rprotein production, are considered as promising candidates (Table 1). Thus, the effect of these secretory SPs on r-protein production is currently being studied in our research group. On the other hand, SP13 (0.925) and SP17 (0.893) that were used by Liang et al. (2013) have also higher D-scores than that of α-MF (SP42). In the analyses inconsistent results were obtained for SP14, SP25, and SP27, Phobius was not able to confirm the cleavage site. Since Phobius is known to be less accurate than SignalP (specifically version 2.0.2b) in predicting the cleavage sites (Käll et al., 2004), higher credit should be given to SignalP4.1 predictions. Related to SP39, “no SP” implies the absence of the signal peptidase I cleavage site, and does not express any information on the entry to the secretory pathway. SP39 corresponding protein is known as a putative integral membrane protein, thus, enters the ER, and the hydrophobic part of the signal peptide can be inserted into the membrane without cleavage and act as an anchor for mature protein, therefore, may not be recognized as a cleavable secretory SP (Nakai, 2000). Both SP predictors and TM predictors first scan the sequence for a stretch of hydrophobic residues as the primary recognition pattern and, therefore, there may be misclassification of SPs and TM helices. The prediction results obtained for SP4, SP13, SP17, and SP31 by the programs SignalP4.1, Phobius, and WolfPsort were in good

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

agreement with the experimental results; the D-score of these SPs were higher than 0.8. However, for SP3, based on SignalP4.1 prediction where the cut-off value was 0.45, in contrast to Huang et al. (2011) results by SignalP3.0, no cleavable secretory SP was found whereas, WolfPsort and Phobius predicted a 17amino acid secretory SP, as was predicted by SignalP3.0. The cutoff values are set to maximize the performance of the prediction which is measured by Matthews Correlation Coefficient (MCC). When the default cut-off value was set to 0.45 in SignalP4.1, sensitivity of the program was lower than SignalP3.0 which means lower true positive predictions or higher false negative predictions which can be the reason of the conflicting result about SP3. When the default cut-off value was set to 0.38 the SignalP4.1 was able to predict a SP in the case of SP3. Min (2010) and Melhem et al. (2013) reported that the individual accuracy of the softwares for the fungal proteins was in the following order: WolfPsort 4Phobius 4 SignalP3.0/SignalP4.0. Therefore, while deciding on the SPs that would be used experimentally, three programs should be simultaneously used and the results for the three programs should be analyzed by paralleling each other. Last but not least, if SignalP4.1, WolfPsort and Phobius were not successful in predicting the secretory SP, it would be better to analyze the proteins with other programs, such as Signal-3L and Signal-CF. The previously identified pre-sequence of α-MF (SP42) contains 19 amino acids, and this was also confirmed by a high D-score of 0.885 using SignalP4.1 which was further confirmed by Phobius and WolfPsort. The H-region length was determined to be 12 amino acids. Two alanine residues in ( 1) and (3) positions conform ( 3, 1) rule described before. In literature, the pro-sequence has been reported to be cleaved at 85th amino acid; however, interestingly ProP program could not find any pro region. Thus, new efficient programs are needed for the determination of proregions. Pro-sequence may undergo post-translational modifications such as glycosylation as the case in S. cerevisiae α-MF prepro-sequence which can affect protein secretion efficiency (Oka et al., 1999). Using NetNGlyc program the putative Nglycosylation sites in pro-sequence of S. cerevisiae α-MF were determined as three, in accordance with the literature. The identified endogenous SPs have more compatibility with secretion machinery of the host P. pastoris; therefore, these endogenous SPs can lead to a more efficient r-protein production process with authentic N-terminus. Determination of the endogenous SPs of the hosts can reveal the preferences of the desired microorganisms in choosing the amino acids in their SPs and this finding can lead to designing novel artificial SPs by comparing the selected candidates with even higher efficiency in comparison with native endogenous SPs. As an extension to this work, verifying the difference between the efficiency of SPs obtained from different groups of secretome such as ER-resident proteins and cell wall proteins in secreting the r-protein would be useful. Furthermore, in-silico analyses can be generalized to other commercial host organisms which can, consequently, lead to the finding of the most efficient universal SP(s) at least in each distinct category of cell factories. Endogenous secretory SPs can also be engineered in order to increase the efficiency of the secretion. The optimization of SPs would be an option to improve the yield of r-protein production as was applied in Escherichia coli (Klatt and Konthur, 2012). The mutations that would be exerted on secretory SPs could reveal the function of different segments of the secretory SP (Nagarajan, 1993), as what performed by Monod et al. (1989) on secretory SP of yeast PHO5 protein and Lin-Cereghino et al. (2013) on different parts of the S. cerevisiae α–MF signal peptide. Being engineered to perform more humanized N-glycosylation opens the route for the P. pastoris to become the major industrial host for production of therapeutic proteins specially the

187

glycoproteins. Choosing an effective secretory SP for extracellular r-protein production from any set of endogenous proteins secreted under different conditions, can immensely improve reliability and acceptability of P. pastoris. It is indeed important to mention that while growing under severe and limited conditions microorganisms produce different proteins; therefore, proteomics research for P. pastoris under extreme conditions should be conducted by focusing on the determination of the novel secreted proteins and, consequently, their secretory SPs.

Conflict of interest No conflict of interest declared. References Achstetter, T., Nguyen-Juilleret, M., Findeli, A., Merkamm, M., Lemoine, Y., 1992. A new signal peptide useful for secretion of heterologous proteins from yeast and its application for synthesis of hirudin. Gene 110 (1), 25–31. Arima, K., Oshima, T., Kubota, I., Nakamura, N., Mizunaga, T., Toh-e, A., 1983. The nucleotide sequence of the yeast PHO5 gene: a putative precursor of repressible acid phosphatase contains a signal peptide. Nucleic Acids Res. 11 (6), 1657–1672. Batra, G., Gurramkonda, C., Nemani, S.K., Jain, S.K., Swaminathan, S., Khanna, N., 2010. Optimization of conditions for secretion of dengue virus type 2 envelope domain III using Pichia pastoris. J. Biosci. Bioeng. 110 (4), 408–414. Çelik, E., Çalık, P., 2012. Production of recombinant proteins by yeast cells. Biotechnol. Adv. 30 (5), 1108–1118. Cereghino, G.P.L., Cereghino, J.L., Ilgen, C., Cregg, J.M., 2002. Production of recombinant proteins in fermenter cultures of the yeast Pichia pastoris. Curr. Opin. Biotechnol. 13, 329–332. Cereghino, J.L., Cregg, J.M., 2000. Heterologous protein expression in the methylotrophic yeast Pichia pastoris. FEMS Microbiol. Rev. 24, 45–66. Chou, K.C., 2001a. Prediction of protein signal sequences and their cleavage sites. Proteins: Struct. Funct. Genet. 42, 136–139. Chou, K.C., 2001b. Prediction of signal peptides using scaled window. Peptides 22, 1973–1979. Chou, K.C., 2001c. Using subsite coupling to predict signal peptides. Protein Engin 14, 75–79. Chou, K.C., 2002. Review: prediction of protein signal sequences. Curr. Protein Pept. Sci. 3, 615–622. Chou, K.C., Shen, H.B., 2007. Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem. Biophys. Res. Commun. 357, 633–640. Chung, B.H., Nam, S.W., Kim, B.M., Park, Y., 1996. Highly efficient secretion of heterologous proteins from Saccharomyces cerevisiae using inulinase signal peptides. Biotechnol. Bioeng. 49, 473–479. Crawford, K., Zaror, I., Bishop, R.J., Innis, M.A., 2003. Pichia Secretory Leader for Protein Expression. US Patent. Damasceno, L.M., Huang, C., Batt, C.A., 2012. Protein secretion in Pichia pastoris and advances in protein production. Appl. Microbiol. Biotechnol. 93, 31–39. Duckert, P., Brunak, S., Blom, N., 2004. Prediction of proprotein convertase cleavage sites. Protein Eng. Des. Sel. 17, 107–112. Eiden-Plach, A., Zagorc, T., Heintel, T., Carius, Y., Breinig, F., Schmitt, M.J., 2004. Viral preprotoxin signal sequence allows efficient secretion of green fluorescence protein by Candida glabrata, Pichia pastoris, Saccharomyces cerevisiae, and Schizosaccharomyces pombe. Appl. Environ. Microbiol. 70 (2), 961–966. Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G., 2000. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016. Fariselli, P., Finocchiaro, G., Casadio, R., 2003. SPEPlip: the detection of signal peptide and lipoprotein cleavage sites. Bioinform 19, 2498–2499. Hegde, R.S., Bernstein, H.D., 2006. The surprising complexity of signal sequences. Trends Biochem. Sci. 31 (10), 563–571. Hiller, K., Grote, A., Scheer, M., Munch, R., Jahn, D., 2004. PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res. 32, W375–W379. Hiss, J.A., Schneider, G., 2009. Architecture, function and prediction of long signal peptides. Brief. Bioinform. 10, 569–578. Hofmann, K.J., Schultz, L.D., 1991. Mutations of the α-galactosidase signal peptide which greatly enhances secretion of heterologous proteins by yeast. Gene 101 (1), 105–111. Horton, P., Park, K., Obayashi, T., Fujita, N., Harada, H., Adams-Collier, C.J., Nakai, K., 2007. WolfPsort: protein localization predictor. Nucleic Acids Res., 10.1093/nar/ gkm259. Huang, C., Damasceno, L.M., Anderson, K.A., Zhang, S., Old, L.J., Batt, C.A., 2011. A proteomic analysis of the Pichia pastoris secretome in methanol-induced cultures. Appl. Microbiol. Biotechnol. 90, 235–247. Käll, L., Krogh, A., Sonnhammer, E.L.L., 2004. Combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338 (5), 1027–1036.

188

A. Massahi, P. Çalık / Journal of Theoretical Biology 364 (2015) 179–188

Kato, S., Ishibashi, M., Tatsuda, D., Tokunaga, H., Tokunaga, M., 2001. Efficient expression, purification and characterization of mouse salivary α-amylase secreted from methylotrophic yeast, Pichia pastoris. Yeast 18, 643–655. Khasa, Y.P., Conrad, S., Sengul, M., Plautz, S., Meagher, M.M., Inan, M., 2011. Isolation of Pichia pastoris PIR genes and their utilization for cell surface display and recombinant protein secretion. Yeast 28, 213–226. Kjarulff, S., Jensen, M.R., 2005. Comparison of different signal peptides for secretion of heterologous proteins in fission yeast. Biochem. Biophys. Res. Commun. 336, 974–982. Klatt, S., Konthur, Z, 2012. Secretory signal peptide modification for optimized antibody-fragment expression-secretion in Leishmania tarentolae. Microb. Cell Fact. 11, 97. Kober, L., Zehe, C., Bode, J., 2013. Optimized signal peptides for the development of high expressing CHO cell lines. Biotechnol. Bioeng. 110, 1164–1173. Kottmeier, K., Ostermann, K., Bley, T., Rodel, G., 2011. Hydrophobin signal sequence mediates efficient secretion of recombinant proteins in Pichia pastoris. Appl. Microbiol. Biotechnol. 91, 133–141. Kuwae, S., Ohyama, M., Ohya, T., Ohi, H., Kobayashi, K., 2005. Production of recombinant human antithrombin by Pichia pastoris. J. Biosci. Bioeng. 99 (3), 264–271. Liang, S., Li, C., Ye, Y., Lin, Y., 2013. Endogenous signal peptides efficiently mediate the secretion of recombinant proteins in Pichia pastoris. Biotechnol. Lett. 35, 97–105. Lin-Cereghino, G.P., Stark, C.M., Kim, D., Chang, J., Shaheen, N., Poerwanto, H., Agari, K., Moua, P., Low, L.K., Tran, N., Huang, A.D., Nattestad, M., Oshiro, K.T., Chang, J. W., Chavan, A, Tsai, J.W., Lin-Cereghino, J., 2013. The effect of α-mating factor secretion signal mutations on recombinant protein expression in Pichia pastoris. Gene 519 (2), 311–317. Liu, D.Q., Liu, H., Shen, H.B., Yang, J., 2007. Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids 32, 493–496. Liu, H., Yang, J., Ling, J.G., 2005. Prediction of protein signal sequences and their cleavage sites by statistical rulers. Biochem. Biophys. Res. Commun. 338, 1005–1011. Macauley-Patrick, S., Fazenda, M.L., McNeil, B., Harvey, L.M., 2005. Heterologous protein production using the Pichia pastoris expression system. Yeast 22, 249–270. Martoglio, B., Dobberstein, B., 1998. Signal sequences: more than just greasy peptides. Trends Cell Biol. 8, 410–415. Melhem, H., Jia Min, X., Butler, G., 2013. The impact of SignalP4.0 on the prediction of secreted proteins. CIBCB, 16–22. Min, X.J., 2010. Evaluation of computational methods for secreted protein prediction in different eukaryotes. J. Proteomics Bioinform 3 (5), 143–147. Monod, M., Haguenauer-Tsapis, R., Rauseo-Koenig, I., Hinnen, A., 1989. Functional analysis of the signal-sequence processing site of yeast acid phosphatase. Eur. J. Biochem. 182, 213–221.

Murasugi, A., Tohma-Aiba, Y., 2001. Comparison of three signals for secretory expression of recombinant human midkine in Pichia pastoris. Biosci. Biotechnol. Biochem. 65 (10), 2291–2293. Nagarajan, V., 1993. Protein Secretion in Bacillus subtilis and Other Gram-positive Bacteria, Biochemistry, Physiology, and Molecular Genetics. In: Sonenshein, A. L., Hoch, J.A., Losick, R. (Eds.), American Society for Microbiology, Washington DC, pp. 713–726. Nakai, K., 2000. Protein sorting signals and prediction of subcellular localization. Adv. Protein Chem. 54, 277–344. Oka, C., Tanaka, M., Muraki, M., Harata, K., Suzuki, K., Jigami, Y., 1999. Human lysozyme secretion increased by α-factor pro-sequence in Pichia pastoris. Biosci. Biotechnol. Biochem. 63 (11), 1977–1983. Paifer, E., Margolles, E., Cremata, J., Montesino, R., Herrera, L., Delgado, J.M., 1994. Efficient expression and secretion of recombinant α-amylase in Pichia pastoris using two different signal sequences. Yeast 10, 1415–1419. Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786. Raemaekers, R.J.M., de Muro, L., Gatehouse, J.A., Fordham-Skelton, A.P., 1999. Functional phytohemagglutinin (PHA) and Galanthus nivalis agglutinin (GNA) expressed in Pichia pastoris—correct N-terminal processing and secretion of heterologous proteins expressed using the PHA-E signal peptide. Eur. J. Biochem. 265, 394–403. Riffer, F., Eisfeld, K., Breinig, F., Schmitt, M.J., 2002. Mutational analysis of K28 preprotoxin processing in the yeast Saccharomyces cerevisiae. Microbiology 148 (5), 1317–1328. Shen, H.B., Chou, K.C., 2007. Signal-3L: a 3-layer approach for predicting signal peptides. Biochem. Biophys. Res. Commun 363, 297–303. Sreekrishna, K., Brankamp, R.G., Kropp, K.E., Blankenship, D.T., Tsay, J., Smith, P.L., Wierschke, J.D., Subramaniam, A., Birkenberger, L.A., 1997. Strategies for optimal synthesis and secretion of heterologous proteins in the methylotrophic yeast Pichia pastoris. Gene 190, 55–62. Stroud, R.M., Walter, P., 1999. Signal sequence recognition and protein targeting. Curr. Opin. Struct. Biol. 9, 754–759. Wang, M., Yang, J., 2005. Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids 28, 395–402. Xiong, R., Chen, J., Chen, J., 2008. Secreted expression of human lysozyme in the yeast Pichia pastoris under the direction of the signal peptide from human serum albumin. Biotechnol. Appl. Biochem. 51, 129–134. Yoshimasu, M.A., Ahn, J., Tanaka, T., Yada, R.Y., 2002. Soluble expression and purification of porcine pepsinogen from Pichia pastoris. Protein Expr. Purif. 25, 229–236. Zuyong, H., Huang, Y., Qin, Y., Liu, Z., Mo, D., Cong, P., Chen, Y., 2012. Comparison of α-Factor preprosequence and a classical mammalian signal peptide for secretion of recombinant xylanase xynB from yeast Pichia pastoris. J. Microbiol. Biotechnol 22 (4), 479–483.