Do Toxoplasma gondii apicoplast proteins have antigenic potential? An in silico study

Do Toxoplasma gondii apicoplast proteins have antigenic potential? An in silico study

Computational Biology and Chemistry 83 (xxxx) xxxx Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: w...

2MB Sizes 0 Downloads 21 Views

Computational Biology and Chemistry 83 (xxxx) xxxx

Contents lists available at ScienceDirect

Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/cbac

Do Toxoplasma gondii apicoplast proteins have antigenic potential? An in silico study Hüseyin Cana,*, Sedef Erkunt Alaka, Ahmet Efe Köseoğlua, Mert Döşkayab, Cemal Üna a b

Ege University Faculty of Science Department of Biology Molecular Biology Section, İzmir, Turkey Ege University Faculty of Medicine Department of Parasitology, İzmir, Turkey

ARTICLE INFO

ABSTRACT

Keywords: T. gondii Apicoplast Ribosomal protein Antigen Epitope Vaccine

Toxoplasma gondii, one of the extensively studied Apicomplexan parasites, is prevalent worldwide in animals and humans. Apart from its nuclear genome, T. gondii contains an apicoplast genome in 35 kb length which is originated from a secondary endosymbiotic event. In this study, we aimed to investigate the antigenic potential of apicoplast genome encoded proteins (n:28) of T. gondii using in silico analysis. For this purpose, proteins were primarily predicted to reveal antigenic probability and then, several bioinformatics analyses were applied for all predicted antigenic apicoplast proteins to analyze physico-chemical parameters, subcellular localization and transmembrane domain. Also, further prediction analyses including structural, B cell and MHC-I/II epitope sites as well as post-translational modifications were performed for antigenic proteins that have a signal peptide or a high antigenicity value. Of the 28 apicoplast proteins, 19 were predicted as probable antigen. Among antigenic proteins, ribosomal protein S5, L11 and S2 were predicted to have signal peptide whereas ribosomal protein L36 and S17 were predicted to have a significantly high antigenicity value (P < 0.05). In addition, ribosomal protein S5, L11, S2, L36 and S17 were predicted to have a lot of epitopes which have low IC50 and percentile rank value indicating a strong binding among epitopes and MHC-I/II alleles, and post-translational modifications such as N-linked glycosylation, acetylation and phosphorylation. To the best of authors’ knowledge this is the first study to show the antigenic potential and other properties of apicoplast-derived proteins of T. gondii.

1. Introduction Apicomplexa is a large phylum including single-celled and obligate intracellular protozoan parasites. About 6000 apicomplexan species have been named to date, nevertheless studies claim that there may be more than one million species belonging to this phylum (Sato, 2011). Apicomplexan species are characterized by the presence of an apical complex which is composed of polar rings, rhoptries, micronemes, and usually a conoid and by lacking locomotory structures (Seeber and Steinfelder, 2016; Siński and Behnke, 2004). Also, most of the apicomplexan species have an extrachromosomal circular DNA in 35 kb (kilobase) length within a cytoplasmic organelle with plastid-like properties named apicoplast in addition to nuclear and mitochondrial genomes. The apicoplast, surrounded by three or four membranes, originates from a secondary endosymbiotic event (Ajioka et al., 2001; Lim and McFadden, 2010; Striepen, 2011; McFadden and Yeh, 2017). Toxoplasma gondii, Plasmodium spp., Cryptosporidium spp., Cyclospora cayetanensis, Babesia spp., Theileria spp. and Eimeria spp. are among the most known members of Apicomplexa because of their medical,



veterinary, and economical importance (Sato, 2011). T. gondii, one of the extensively studied Apicomplexan parasites, is prevalent worldwide in animals and humans. Although the prevalence of toxoplasmosis is generally high, it varies depending on several factors such as host’s feeding habit, geographical location and cultural background. In healthy human with normal immune function, T. gondii infection is generally asymptomatic but in immunocompromised patients and fetus, it can give rise to a wide range of severe symptoms resulting in death (Tenter et al., 2000). T. gondii infection is also a common cause of infectious abortion in livestock such as sheep and goat, and leads to considerable economic losses worldwide (Edwards and Dubey, 2013). To date, wide range of nuclear genome proteins such as dense granule protein GRA1, rhoptry protein ROP2, heat shock protein BAG1 and microneme protein MIC1, have been used in development of vaccine against T. gondii infection but, an effective vaccine has not been discovered yet (Gedik et al., 2016; Verma and Khanna, 2013). T. gondii apicoplast genome includes both large and small-subunit rRNA genes, RNA polymerase genes, the elongation factor tufA gene,

Corresponding author. E-mail address: [email protected] (H. Can).

https://doi.org/10.1016/j.compbiolchem.2019.107158 Received 29 April 2019; Received in revised form 10 September 2019; Accepted 2 November 2019 1476-9271/ © 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Hüseyin Can, et al., Computational Biology and Chemistry, https://doi.org/10.1016/j.compbiolchem.2019.107158

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

Table 1 Antigenicity, allergenicity and solubility results of 19 selected proteins predicted by Vaxijen, AlgPred and SolPro, respectively. Apicoplast proteins

Elongation factor-tu Ribosomal protein S11 Ribosomal protein L36 Ribosomal protein S5 Ribosomal protein S17 Ribosomal protein L16 Ribosomal protein S3 Ribosomal protein S19 Ribosomal protein L2 Ribosomal protein L4 Ribosomal protein S4 RNA polymerase B RNA polymerase C1 ORF D ORF B ORF F ORF E Ribosomal protein L11 Ribosomal protein S2

Vaxijen v2.0 Value

0.5258, 0.6961, 1.1173, 0.6889, 0.9567, 0.7035, 0.6998, 0.5100, 0.7340, 0.5309, 0.6095, 0.5243, 0.5347, 0.7582, 0.5625, 0.5334, 0.7624, 0.6862, 0.6783,

Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable Probable

AlgPred

antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen antigen

SolPro

IgE epitopes

MEME/MAST motif

– – – – – – – – – – – – – – – – – – –

– – – – – – – – – – – – – – – – – – –

insoluble soluble soluble soluble soluble soluble soluble soluble insoluble soluble soluble soluble insoluble soluble soluble soluble insoluble soluble soluble

(–) shows that protein is not an allergen according to AlgPred.

predict their physico-chemical parameters including number of amino acids, molecular weight, total number of positive and negative charged residues, theoretical isoelectric point, extinction coefficients, aliphatic index, estimated in vitro and in vivo half-life, instability index and grand average of hydropathicity (Wilkins et al., 1999). Prediction of the allergenic properties of these proteins was carried out by selecting two different approaches, including the MEME/MAST motif and mapping of IgE epitopes using Algpred online server (http://crdd.osdd.net/ raghava/algpred/) (Saha and Raghava, 2006). Also, solubility prediction of these proteins was performed by SolPro (http://scratch. proteomics.ics.uci.edu/) working under Scratch Protein Predictor tool, using default parameters (Cheng et al., 2005).

tRNA (transfer RNA) genes, a putative ribosomal protein gene operon, five ORFs with unknown function, sufB gene (formerly named ycf24) and ClpC gene (Ajioka et al., 2001; Sato, 2011). Some proteins such as ribosomal proteins and elongation factor tufA were shown to have an immunogenic effect and could be used as a vaccine candidate against bacterial infections (Du et al., 2016; Pyclik et al., 2018; Yang et al., 2018) as well as some ribosomal proteins were found to be immunogenic to parasite infections including Leishmania spp. (Arora et al., 2005; Stober et al., 2006). Although similar proteins and others are encoded in the apicoplast genome, it is not known whether these proteins have an immunogenic potential and can be used as vaccine candidate against toxoplasmosis. Therefore, the present study primarily aimed to reveal the antigenic potential of proteins encoded in the apicoplast genome. Secondly, it was aimed to predict physico-chemical parameters, subcellular localization and transmembrane domain, structural analysis, B cell and MHCI/II epitope sites as well as post-translational modifications of antigenic proteins. All analyses were conducted using several online bioinformatics servers. To the best of our knowledge this study is a first in its kind to investigate the antigenic potential and other properties of apicoplast-derived proteins.

2.4. Prediction of transmembrane domain and subcellular localization Transmembrane domain and subcellular localization of all antigenic apicoplast proteins were predicted by TMHMM (http://www.cbs.dtu. dk/services/TMHMM/) (Krogh et al., 2001) and TargetP (http://www. cbs.dtu.dk/services/TargetP/) online servers (Emanuelsson et al., 2007), respectively. In analysis of transmembrane domain, default parameters were used. In analysis of subcellular localization, specificity > 0.95 was used as cut-off value.

2. Materials and methods

2.5. Prediction of secretory antigens

2.1. Amino acid sequences

SignalP (http://www.cbs.dtu.dk/services/SignalP/) online server was used to predict whether antigenic apicoplast proteins have signal peptide (Emanuelsson et al., 2007). During this analysis, two different D cut-off values were selected; one was 0.45, the other one was 0.30.

Databank of the National Center for Biotechnology Information (NCBI) was used to determine the amino acid sequences of each T. gondii apicoplast proteins (n:28). Detailed information about name and accession number of apicoplast proteins was given in supplementary 1.

2.6. Prediction of secondary and tertiary structure

2.2. Prediction of antigenicity

Secondary structures of antigenic apicoplast proteins that have a signal peptide or a high antigenicity value were predicted by GOR IV (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/ npsa_gor4.html) using default parameters (Garnier et al., 1996). Similarly, tertiary structure analysis was conducted for same antigenic proteins by SWISS-MODEL (https://swissmodel.expasy.org/) using default parameters (Waterhouse et al., 2018) and refinement analysis was performed by 3Drefine (http://sysbio.rnet.missouri.edu/3Drefine/) using RWplus model analysis (Bhattacharya et al., 2016). In addition, validation of tertiary structure was conducted by Rampage/

Antigenicity of apicoplast proteins was predicted by Vaxijen v2.0 online server (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/ VaxiJen.html) using threshold value of 0.5 (Doytchinova and Flower, 2007). 2.3. Prediction of physico-chemical parameters and allergenic properties Apicoplast proteins predicted as antigen were analyzed by Expasy ProtParam online server (https://web.expasy.org/protparam/) to 2

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

108.95 112.67 63.24 126.51 101.37 86.12 118.75 115.57 108.54 113.65 114.24 107.21 120.11 117.30 106.51 104.14 78 119.77 128.50 28.16/stable 24.72/stable 38.03/stable 29.89/stable 22.91/stable 13.46/stable 29.9/stable 13.81/stable 31.13/stable 30.97/stable 18.39/stable 25.14/stable 38.26/stable 51.75/unstable 25.28/stable 10.34/stable 6.46/stable 34.96/stable 29.71/stable

0.032 0.087 −1.211 0.406 −0.329 −0.320 0.167 0.206 −0.018 0.066 −0.014 −0.096 0.188 0.131 −0.672 0.281 −0.085 0.110 0258

Aliphatic Index

(Arg + Lys): 46 22 13 43 18 32 36 13 53 35 42 151 77 13 12 13 20 22 31

30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

MHC-I and MHC-II epitopes of antigenic proteins that have a signal peptide or a high antigenicity value were predicted by IEDB (https:// www.iedb.org/) (Vita et al., 2018). During prediction of MHC-I epitopes, ANN 4.0 was selected as method, human was MHC-I source and twelve different MHC-I alleles (A01.01, A02.01, A03.01, A24.02, A26.01, B07.02, B08.01, B27.05, B39.01, B40.01, B58.01 and B15.01) named as HLA super-type representative were used. Outputs were given as half maximal inhibitory concentration (IC50) and percentile rank value. During prediction of MHC-II epitopes, NetMHCIIpan was selected as method, human was MHC-II source and seven different MHC-II alleles (DRB1.03.01, DRB1.07.01, DRB1.15.01, DRB3.01.01, DRB3.02.02, DRB4.01.01 and DRB5.01.01) which are HLA reference set in IEDB recommended, were used. Similarly, outputs were given as IC50 and percentile rank values. 2.9. Prediction of post-translational modifications Post-translational modifications of antigenic proteins that have a signal peptide or a high antigenicity value were predicted by NetNGlyc 1.0 server (http://www.cbs.dtu.dk/services/NetNGlyc/) (Gupta et al., 2004), NetOGlyc 4.0 server (http://www.cbs.dtu.dk/services/ NetOGlyc/) (Steentoft et al., 2013), NetPhos 3.1 server (http://www. cbs.dtu.dk/services/NetPhos/) (Blom et al., 1999) and, GPS-MSP and GPS-PAIL which work under CSS-Palm Online Service (http://csspalm. biocuckoo.org/online.php) (Ren et al., 2008). N-linked glycosylation regions were predicted by selecting “Predict on all Asn residues” in NetNGlyc 1.0 server. O-linked glycosylation regions were predicted by default parameters in NetOGlyc 4.0 server. The presence of phosphorylation regions were predicted by default parameters in NetPhos 3.1 server. The presence of methylation and acetylation regions was predicted by selecting “all types” in GPS-MSP and GPS-PAIL, respectively. Also, NetSurfP 2.0 (http://www.cbs.dtu.dk/services/NetSurfP/) was used for predicting the surface accessibility of post-translational modification sites in proteins.

401 135 37 269 73 129 224 70 267 211 198 1052 565 74 43 58 105 131 233 Elongation factor-Tu Ribosomal protein S11 Ribosomal protein L36 Ribosomal protein S5 Ribosomal protein S17 Ribosomal protein L16 Ribosomal protein S3 Ribosomal protein S19 Ribosomal protein L2 Ribosomal protein L4 Ribosomal protein S4 RNA polymerase B RNA polymerase C1 ORF D ORF B ORF F ORF E Ribosomal protein L11 Ribosomal protein S2

44312.65 15549.64 4545.56 33063.11 9036.01 15925.32 27315.85 8657.58 30610.77 25042.02 24377.67 124397.09 66622.67 9374.36 5432.64 7171.88 13491.11 15530.71 28195.8

6.53 10.17 10.36 10.07 10.25 10.25 10.01 9.92 10.72 10.18 10.04 9.68 9.86 10.05 10.56 10.05 10.26 10.17 9.95

(Asp + Glu): 48 5 0 7 2 4 3 2 8 5 11 75 32 0 0 0 0 4 4

The estimated halflife (hour) Total Number of Positvely Charged Residues Total Number of Negatively Charged Residues Theoretical PI Number of Amino Acids

Molecular Weight

Linear B cell epitope prediction was performed for antigenic proteins that have a signal peptide or a high antigenicity value by using two different web-based algorithm servers including the immune epitope database (IEDB, https://www.iedb.org/) (Vita et al., 2018) and Bcepred (http://crdd.osdd.net/raghava/bcepred/) (Saha and Raghava, 2004). Bepipred Linear Epitope Prediction 2.0, a method working under IEDB, was used with a threshold value of 0.5 which shows high sensitivity and specificity. Bcepred predicted B-cell epitopes with their physico-chemical properties such as hydrophilicity, flexibility, accessibility, polarity, exposed surface, turns and antigenic propensity. During this analysis, default thresholds were 1.9 for hydrophilicity, 2 for flexibility, 1.9 for accessibility, 2.4 for turns, 2.3 for exposed surface, 1.8 for polarity and 1.9 for antigenic propensity. 2.8. Prediction of MHC-I and MHC-II epitopes

Apicoplast proteins

Table 2 Physico-chemical parameter results of 19 selected proteins predicted by ExPASyProtParam.

2.7. Prediction of B cell epitopes

The instability Index(II)

Grand average of hydropathicity (GRAVY)

Ramachandran Plot Analysis (http://mordred.bioc.cam.ac.uk/ ∼rapper/rampage.php) using default parameters (Lovell et al., 2003).

2.10. Statistical analysis Antigenicity values obtained by Vaxijen 2.0 were processed by PASW Statistics 18. One sample t-test with 95% confidence interval was used to determine the significance between antigenicity values. 3

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

Table 3 Transmembrane domain results of 19 selected proteins predicted by TMHMM. Elongation factor Tu *Ribosomal protein S11 Ribosomal protein L3 *Ribosomal protein S5 Ribosomal protein S17 Ribosomal protein L16 *Ribosomal protein S3 *Ribosomal protein S19 *Ribosomal protein L2 Ribosomal protein L4 Ribosomal protein S4 RNA polymerase B *RNA polymerase C1 *ORF D ORF B ORF F *ORF E *Ribosomal protein L11 *Ribosomal protein S2

len = 401 len = 135 len = 37 len = 269 len = 73 len = 129 len = 224 len = 70 len = 267 len = 141 len = 128 len = 1052 len = 565 len = 74 len = 43 len = 58 len = 105 len = 131 len = 233

ExpAA = 0.13 ExpAA = 21.41 ExpAA = 0.00 ExpAA = 116.05 ExpAA = 0.00 ExpAA = 0.82 ExpAA = 69.24 ExpAA = 19.38 ExpAA = 47.13 ExpAA = 1.26 ExpAA = 2.32 ExpAA = 6.14 ExpAA = 38.35 ExpAA = 22.16 ExpAA = 0.02 ExpAA = 17.70 ExpAA = 18.27 ExpAA = 36.85 ExpAA = 81.68

First60 = 0.06 First60 = 0.54 First60 = 0.00 First60 = 19.57 First60 = 0.00 First60 = 0.07 First60 = 19.48 First60 = 19.38 First60 = 6.56 First60 = 0.33 First60 = 0.57 First60 = 0.53 First60 = 0.28 First60 = 18.65 First60 = 0.02 First60 = 17.70 First60 = 14.94 First60 = 20.51 First60 = 22.66

PredHel = 0 PredHel = 1 PredHel = 0 PredHel = 5 PredHel = 0 PredHel = 0 PredHel = 3 PredHel = 1 PredHel = 2 PredHel = 0 PredHel = 0 PredHel = 0 PredHel = 0 PredHel = 1 PredHel = 0 PredHel = 1 PredHel = 0 PredHel = 2 PredHel = 4

Topology = o Topology = i64-86o Topology = i Topology = i28-47o62-79i100-122o192-209i222-244o Topology = o Topology = i Topology = i27-49o69-91i115-137o Topology = i21-40o Topology = i53-70o75-97i Topology = i Topology = o Topology = o Topology = o Topology = i45-64o Topology = i Topology = i38-57o Topology = o Topology = i20-39o59-81i Topology = i39-61o90-107i128-147o212-231i

* shows the proteins which have transmembrane domains or signal peptide.

3. Results

Table 4 Subcellular localization results of 19 selected proteins predicted by TargetP. Name

Len

mTP

SP

Other

Loc

RC

TPlen

Elongation factor tu Ribosomal protein S11 Ribosomal protein L3 Ribosomal protein S5 Ribosomal protein S17 Ribosomal protein L16 Ribosomal protein S3 Ribosomal protein S19 Ribosomal protein L2 Ribosomal protein L4 Ribosomal protein S4 RNA polymerase B RNA polymerase C1 ORF D ORF B ORF F ORF E Ribosomal protein L11 Ribosomal protein S2

401 135 37 269 73 129 224 70 267 141 128 1052 565 74 43 58 105 131 233

0.145 0.122 0.226 0.314 0.148 0.468 0.272 0.213 0.172 0.302 0.110 0.208 0.382 0.130 0.158 0.262 0.527 0.027 0.106

0.066 0.293 0.038 0.374 0.059 0.099 0.138 0.192 0.074 0.043 0.312 0.198 0.183 0.139 0.072 0.077 0.259 0.797 0.450

0.841 0.665 0.830 0.349 0.851 0.458 0.478 0.559 0.843 0.753 0.505 0.582 0.340 0.781 0.777 0.553 0.098 0.208 0.409

_ * _ S _ * * * _ _ * * * _ _ * * S S

2 4 2 5 2 5 4 4 2 3 5 4 5 2 2 4 4 3 5

– – – 81 – – – – – – – – – – – – – 23 19

3.1. Antigenicity Among 27 apicoplast proteins, 19 were predicted as probable antigen according to Vaxijen v2.0. Antigenicity values belonging to these 19 proteins changed from 0.51 to 1.11. A significantly high antigenicity values were predicted in ribosomal protein L36 and S17 (P < 0.05). Ribosomal protein S19 had the lowest antigenicity value. The other apicoplast proteins that were predicted to be antigen were elongation factor-Tu, ribosomal protein S11, ribosomal protein S5, ribosomal protein L16, ribosomal protein S3, ribosomal protein L2, ribosomal protein L4, ribosomal protein S4, RNA polymerase B, RNA polymerase C1, ORF D, ORF B, ORF F, ORF E, ribosomal protein L11 and ribosomal protein S2 (Table 1). 3.2. Physico-chemical parameters and allergenic properties The number of amino acids was between 37 and 1052. The highest molecular weight (approximately 124.3 kDa) was predicted in RNA polymerase B. Ribosomal protein L36 had the lowest molecular weight (4.5 kDa).Theoretical PI value varied from 6.53 to 10.72. The number

Proteins written with bold letters have signal peptide.

Table 5 Secretory antigen results of 19 selected proteins predicted by SignalP. Name

C max

pos

Y max

pos

S max

pos

S mean

D

?

D-max cut

Networks-used

Elongation factor Tu Ribosomal protein S11 Ribosomal protein L3 Ribosomal protein S5 Ribosomal protein S17 Ribosomal protein L16 Ribosomal protein S3 Ribosomal protein S19 Ribosomal protein L2 Ribosomal protein L4 Ribosomal protein S4 RNA polymerase B RNA polymerase C1 ORF D ORF B ORF F ORF E *Ribosomal protein L11 *Ribosomal protein S2

0.291 0.106 0.107 0.104 0.106 0.118 0.161 0.130 0.150 0.109 0.276 0.106 0.130 0.107 0.105 0.115 0.109 0.129 0.199

38 70 12 63 31 29 46 49 26 39 28 39 19 67 8 24 26 24 20

0.172 0.117 0.099 0.121 0.100 0.119 0.131 0.128 0.118 0.110 0.243 0.113 0.156 0.108 0.107 0.114 0.158 0.220 0.263

38 12 12 11 70 26 46 49 26 64 28 11 19 67 12 24 12 24 20

0.141 0.160 0.109 0.168 0.107 0.181 0.157 0.240 0.109 0.168 0.439 0.169 0.281 0.133 0.117 0.139 0.272 0.493 0.551

37 3 20 7 55 9 45 48 25 47 15 40 4 54 1 7 1 2 15

0.100 0.128 0.090 0.150 0.093 0.125 0.108 0.107 0.094 0.104 0.236 0.116 0.192 0.101 0.101 0.122 0.221 0.377 0.355

0.133 0.123 0.094 0.137 0.096 0.122 0.119 0.117 0.105 0.107 0.239 0.115 0.175 0.104 0.104 0.119 0.192 0.305 0.312

N N N N N N N N N N N N N N N N N Y Y

0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300

SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM SignalP-noTM

* shows the proteins which have signal peptide. 4

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

Fig. 1. Secondary structure results of antigenic proteins. Amino acid sequences colored in yellow indicate random coil, in red extended strand, in blue alpha helix.

Fig. 2. SWISS-MODEL results showing 3D structure of each antigenic protein.

of asparagine and glutamine indicating negatively charged residues was highest (75) in RNA polymerase B and lowest (0) in ribosomal protein L36, ORF D, B, F and E. Also, the number of arginine and lysine indicating positively charged residues were highest in RNA polymerase B and lowest in ORF B. The estimated half-life was 30 h for all antigenic proteins. Instability index showed differences among antigenic proteins. Accordingly, ORF D was predicted as unstable whereas remaining was stable. Aliphatic Index showed a significant variation ranging between 63.24 and 128.5. GRAVY values were predicted as negative in some antigenic proteins such as Ribosomal protein L36, Ribosomal protein S17, Ribosomal protein L16, Ribosomal protein L2, Ribosomal protein S4, RNA polymerase B, ORF B, ORF E while positive values

were predicted in remaining antigenic proteins (Table 2). According to solubility prediction analysis, nearly all antigenic proteins were soluble, except elongation factor-Tu, ribosomal protein L2, ORF E and RNA polymerase C1 (Table 1). None of the antigenic proteins showed allergenic properties for MEME/MAST motif and IgE epitopes (Table 1). 3.3. Transmembrane domain and subcellular localization Of 19 antigenic proteins, 10 were predicted to be a transmembrane protein or to have a signal peptide because of having an ExpAA value more than 18. The amount of the highest transmembrane domains was 5

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

percentage of random coil in same antigens was predicted as 35.7 for ribosomal protein S5, 42 for ribosomal protein L11 and 41.2 for ribosomal protein S2 (Fig. 1). On the other side, ribosomal protein L36, one of the proteins predicted to have high antigenicity, showed 24.3% of extended strand and 75.7% of random coil. The other one, ribosomal protein S17 showed 32.9% of extended strand and 43.8% of random coil. SWISS-MODEL predicted only one model for each antigenic protein. Among antigenic proteins that predicted to be secretory, ribosomal protein S5 showed a similarity with template between amino acid positions 131 and 265 with a coverage of 48%. For ribosomal protein L11 and S2, similarity range was predicted higher and covered from amino acid position 1–127 and 1–230 with a same coverage of 96%, respectively. Ribosomal protein L36, one of the proteins predicted to have high antigenicity value, showed a similarity with template between amino acid positions 1 and 37 with a coverage of 100%. Other was ribosomal protein S17 that showed a similarity with template between amino acid positions 33 and 63 with a coverage of 42%. By 3Drefine analysis, 5 different refinement models were predicted for each antigenic protein and the model with lowest RWplus value were selected (Fig. 2). For validation of models obtained by SWISSMODEL and 3Drefine, Rampage analysis was performed. According to obtained results, number of residues in favoured and allowed regions for all antigenic proteins were predicted as > 88 percentage (Table 6).

Table 6 Ramachandran plot analysis results of non-refined and refined antigenic proteins. Proteins

Regions

Before refinement

After refinement

Ribosomal protein S5

Favoured Allowed Outlier Favoured Allowed Outlier Favoured Allowed Outlier Favoured Allowed Outlier Favoured Allowed Outlier

118 (88.7%) 9 (6.8%) 6 (4.5%) 104 (83.2%) 16 (12.8%) 5 (4.0%) 209 (91.7%) 14 (6.1%) 5 (2.2%) 30 (85.7%) 3 (8.6%) 2 (5.7%) 67 (95.7%) 2 (2.9%) 1 (1.4%)

119 (89.5%) 6 (4.5%) 8 (6.0%) 113 (90.4%) 8 (6.4%) 4 (3.2%) 214 (93.9%) 10 (4.4%) 4 (1.8%) 31 (88.6%) 3 (8.6%) 1 (2.9%) 66 (94.3%) 2 (2.9%) 2 (2.9%)

Ribosomal protein L11 Ribosomal protein S2 Ribosomal protein L36 Ribosomal protein S17

predicted as five in ribosomal protein S5. Although some antigenic proteins including RNA polymerase C1 and ORF E had high ExpAA value, they did not have transmembrane domains (Table 3). For 16 antigenic proteins, subcellular localization was not predicted by TargetP. However, remaining three proteins (Ribosomal protein S5, L11 and S2) were predicted to have a signal peptide (Table 4).

3.6. B-cell epitopes

3.4. Secretory antigens

To predict linear B-cell epitopes, two different approaches including IEDB (Bepipred) and Bcepred were used. According to results obtained by Bepipred, there were six different types of epitopes for ribosomal protein S5, four for ribosomal protein S2, two for ribosomal protein L11 and S17 and only one for ribosomal protein L36. More epitopes depending on different parameters regarding hydrophilicity, flexibility, accessibility, turns, exposed surface, polarity and antigenic propensity were predicted when the same proteins were analyzed by Bcepred (Table 7 and 8).

The presence of signal peptides was also predicted by SignalP to confirm TargetP results. Obtained results showed that no proteins had signal peptide at a cut-off value of 0.45 while two proteins (ribosomal protein L11 and ribosomal protein S2) had signal peptide at a cut-off value of 0.30 (Table 5). 3.5. Secondary and tertiary structure Among 19 antigenic proteins, ribosomal protein S5, L11, S2, L36 and S17 were selected for structural analysis, prediction of B cell and MHC-I/II epitope sites as well as prediction of post-translational modifications because ribosomal protein S5, L11 and S2 were predicted to have a signal peptide and L36 and S17 were predicted to have a statistically higher antigenicity value (P < 0.05). According to results obtained by secondary structure analysis, the percentage value of alpha helix was predicted as 46.1 for ribosomal protein S5, 46.5 for ribosomal protein L11 and 39.5 for ribosomal protein S2 which are a group of antigens predicted to be secretory. Contrary to high alpha helix percentage value predicted in three antigens, lower alpha helix percentage values were predicted for ribosomal protein S17 (23.3) and ribosomal protein L36 (0) which are predicted to have high antigenicity. The percentage value of extended strand in antigens predicted to be secretory was predicted as 18.2 for ribosomal protein S5, 11.4 for ribosomal protein L11 and 19.3 for ribosomal protein S2. In addition, the

3.7. MHC-I and MHC-II epitopes Both MHC-I and MHC-II epitopes were predicted by IEDB online server. In MHC-I epitope analysis, epitopes with IC50 value < 50 were selected. Accordingly, epitopes belonging to ribosomal protein S5 were predicted for HLA alleles such as HLA-A*02:01, HLA-A*03:01, HLAA*26:01, HLA-A*24:02, HLA-B*08:01, HLA-B*27:05 and HLA-B*15:01. Epitopes specific to ribosomal protein L11 and S2 were predicted for HLA-A*02:01, HLA-A*03:01 and HLA-B*15:01 alleles. There were epitopes belonging to ribosomal protein S2 for HLA-A*03:01 and HLAB*08:01 alleles. Also, epitopes specific to ribosomal protein L36 and S17 were predicted for HLA-A*03:01 allele in addition to epitopes specific ribosomal protein S17 for HLA-B*15:01. Eight epitopes with 10 amino acid in length (KQLYNFLFYL, FLKKKTFFIL, YLINIFLFLL and FLLNLNFLKL corresponding to ribosomal protein S5, YLFNYLPYNL,

Table 7 B-cell epitopes predicted by Bepipred. Proteins

Epitope sequences

Ribosomal protein S5

KKKTFFILNFNKFSKQYFKYNKNIIFNNLIKQLYN; FLLNLNFLKLLNININNSFNTISKIKKNISLY;IFNYNLFLLFFYKTNKYNK; LLLIGAKNAWIGIGVSTDFYLQEA;FYYSNTINKIKC; IQVSFLIQIFLDLLGYNVLIIKIFKHTTKYTLINFFIKLLI KLKIIKLILHTELTNFLSSLSSILGPIGININLFFQEYNKRIKLKNNIDLPLHIIVYNDKSYVLNFNLIYTSFF;KKKQNKQYLIKKFSFLKQIKLFPKSNISICKTIKATLNSFY KLNSLINIPIYIGSATSYRQKK; IYKKYNNKHFFYILSKYNRSLLFLNSSIKNFNLIKSLANLTNNFYI; QKLPKNIFICNSYSNYLYKNLNLKNFLIISIVDINNENTLKNISVRIIGNNKSYLAI; LTALLHGSL SSLKYFCLNCKKKSYKKKQIIKCSNLKHNQ IGYVITNKKNTKYK;LKYKNIQLKIKSNFFHDSRKEFLQNFIVLVKFNCKQKKYNLI

Ribosomal protein L11 Ribosomal protein S2 Ribosomal protein L36 Ribosomal protein S17

6

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

Table 8 B-cell epitopes predicted by Bcepred using different parameters. Epitope sequences Ribosomal protein S5 Hydrophilicity Flexibility Accessibility Turns Exposed Surface Polarity Antigenic Propensity Ribosomal protein L11 Hydrophilicity Flexibility Accessibility Turns Exposed Surface Polarity Antigenic Propensity Ribosomal protein S2 Hydrophilicity Flexibility Accessibility Turns Exposed Surface Polarity Antigenic Propensity Ribosomal protein L36 Hydrophilicity Flexibility Accessibility Turns Exposed Surface Polarity Antigenic Propensity Ribosomal protein S17 Hydrophilicity Flexibility Accessibility Turns Exposed Surface Polarity Antigenic Propensity

KTNKYNK;SKTKQKGR LNFNKFS;NTISKIK;IEIKKISKTKQKGR MFLKKKTFFI;NFNKFSKQYFKYNKNII;FFKNKTYWYLNTISKIKKNISLY; FFYKTNKYNKILQKIIEIKKISKTKQKGRIRRFKV; QEAINKAR;EKLFYYSNTINKIK;FKHTTKYTL NININNSFNTI NKFSKQYFKYNKN;KNKTYWY;TISKIKKNIS; FYKTNKYNKILQKIIEIKKISKTKQKGRIRR;KHTTKYT MFLKKKT;ISKIKKNIS;KTNKYNKILQKIIEIKKISKTKQKGRIRRFKVLL; EAINKAR;IIKIFKHTTKYT QLYNFLFYLYILKDFIFYKYFFF;YLINIFLFLLNLNFLKLLNI; NISLYYIYIIKILLIFNYNLFLLFFYKT;RRFKVLLLIG;VSTDFYL; KIYFFYIILEKLFYY;KIKCIIIYKPIFYGIQVSFLIQIFLDLLGYNVLIIKIFK; FFIKLLIDLT

KKKQNKQY QKVFKKKQN FFQEYNKRIKLKNNIDLP;VYNDKSYV;KLQKVFKKKQNKQYLIKKFSFLK; KLFPKSN – FQEYNKRIKLKNNID;KLQKVFKKKQNKQYLIKK KLILHTE;FQEYNKRIKLKNN;KLQKVFKKKQNKQYLIKK VKFKLKIIKLILHTEL;FLSSLSSILGP;IDLPLHIIVYN;KSYVLNFNLIYTSFF; FLKQIKLF;ISICKTI

TSYRQKK;KKYNNKH;DINNENT SATSYRQ;YIYKKYN;YILSKYN;LFLNSSI SATSYRQKKFINYIYKKYNNKHFFN;KYLSKAY;ILSKYNRSLL;KWLKQLYTFY;EYMQKLPKNI;NSYSNYLYKNLNLKN;DINNENTLKNI;IGNNKSYLA KYNNKHFF;NLTNNFYI;ICNSYSNY;VDINNENTL TSYRQKKFINYIYKKYNNKHF;YMQKLPKNI;YKNLNLKNF TSYRQKKFIN;YKKYNNKHF ILLKLNSLINIPIYI;FNIFIILKYLSK;YLYLYILSKY;SLLFLNSS; KLILLKWLKQLYTFYLKYLFNYLPYNLLIKLYYIYINIL;IFICNSYS;FLIISIVDI; FIFNILL;LLHGSLF

KKKSYKKKQ;KHNQRQK FCLNCKKKSYKK;SNLKHNQRQ KIRSSLKYF;LNCKKKSYKKKQIIK;SNLKHNQRQK – LNCKKKSYKKKQIIKC;NLKHNQRQK KIRSSLK;CLNCKKKSYKKKQIIKCSNLKHNQRQK SSLKYFCLNCKKK;KQIIKCSNL

TNKKNTKYK TNKKNTKYK;NFFHDSRK;VKFNCKQ YVITNKKNTKYKFILPFWKKNLKYKNIQLKIKSNFFHDSRKEFLQN; KFNCKQKKYNLIK SNFFHDSR ITNKKNTKYKF;FWKKNLKYKNIQLKIKSN;RKEFLQN;KFNCKQKKYNLIK TNKKNTKYKF;FWKKNLKYKN;FFHDSRKEFLQN;KFNCKQKKYNLIK VIKIGYVI;YKFILPF;FLQNFIVLVKFNCK;YNLIKILY

ALLHGSLFNK and YLAIKFIFNI corresponding to ribosomal protein S2, LQNFIVLVKF corresponding to ribosomal protein S17) had < 10 IC50 and percentile rank among 0.02 and 0.09 (Table 9). In MHC-II epitope analysis, epitopes with IC50 value < 50 were selected. Accordingly, epitopes specific to ribosomal protein S5 were predicted for HLA-DRB5*01:01, HLA-DRB3*02:02, HLA-DRB1*07:01 and HLA-DRB1*15:01 alleles. Epitopes specific to ribosomal protein L11 and S2 were predicted for HLA-DRB1*07:01, HLA-DRB5*01:01 alleles. Additionally, for HLADRB4*01:01 and HLA-DRB1*03:01 alleles, there were epitopes specific to ribosomal protein L11 and an epitope specific to ribosomal protein S2 for HLA-DRB1*15:0. Also, epitopes specific to ribosomal protein S17 were predicted for HLA-DRB5*01:01 allele and no epitopes specific to ribosomal protein L36 were predicted. Seven epitopes with 15 amino acid in length (YLIKKFSFLKQIKLF, SFFFITKLQKVFKKK corresponding

to ribosomal protein S5, LIKLYYIYINILKKF, YLYLYILSKYNRSLL, IFIILKYLSKAYLYL and IPIYIGSATSYRQKK corresponding to ribosomal protein L11, NTKYKFILPFWKKNL corresponding to ribosomal protein S17) had < 10 IC50 and percentile rank among 0.38 and 1 (Table 10). 3.8. Post-translational modifications Ribosomal protein L36, S17, S5 and S2 were predicted to have Nlinked glycosylation, acetylation and phosphorylation regions. Most of N-linked glycosylation, acetylation and phosphorylation regions were predicted to locate in exposed surface of antigenic proteins. Ribosomal protein L11 was predicted to have N-linked glycosylation and phosphorylation regions, many of which locate in exposed surface. However, no acetylation region was predicted in this antigenic protein. Also, it was predicted that no antigenic proteins had O-linked 7

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

Table 9 Epitopes specific to MHC-I alleles. MHC-I Name

Allele

Start

End

Length

Peptide

IC50

Percentile Rank

Ribosomal Protein S5

HLA-A*02:01 HLA-B*08:01 HLA-A*02:01 HLA-A*02:01 HLA-B*27:05 HLA-A*03:01 HLA-A*03:01 HLA-A*03:01 HLA-A*02:01 HLA-B*15:01 HLA-B*15:01 HLA-B*15:01 HLA-A*02:01 HLA-B*15:01 HLA-A*03:01 HLA-B*27:05 HLA-A*03:01 HLA-A*26:01 HLA-A*24:02 HLA-A*03:01 HLA-B*15:01 HLA-B*08:01 HLA-B*15:01 HLA-A*02:01 HLA-A*02:01 HLA-A*03:01 HLA-A*02:01 HLA-A*03:01 HLA-A*03:01 HLA-A*02:01 HLA-A*03:01 HLA-A*02:01 HLA-A*03:01 HLA-B*15:01 HLA-A*03:01 HLA-A*24:02 HLA-A*02:01 HLA-A*24:02 HLA-A*02:01 HLA-A*02:01 HLA-A*24:02 HLA-A*02:01 HLA-A*03:01 HLA-B*15:01 HLA-A*24:02 HLA-A*02:01 HLA-A*02:01 HLA-B*08:01 HLA-B*15:01 HLA-A*03:01 HLA-A*03:01 HLA-A*03:01 HLA-B*15:01 HLA-A*03:01 HLA-A*03:01 HLA-A*03:01

35 3 65 72 152 7 50 237 111 225 81 180 254 231 100 149 45 196 105 116 118 94 70 107 127 224 211 99 53 134 118 131 167 23 153 120 7 114 216 105 48 136 66 147 34 77 5 58 49 17 6 11 50 20 4 38

44 12 74 81 161 16 59 246 120 234 90 189 263 240 109 158 54 205 114 125 127 103 79 116 136 233 220 108 62 143 127 140 176 32 162 129 16 123 225 114 57 145 75 156 43 86 14 67 58 26 15 20 59 29 13 47

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

KQLYNFLFYL FLKKKTFFIL YLINIFLFLL FLLNLNFLKL RRFKVLLLIG KTFFILNFNK FIFYKYFFFK LLGYNVLIIK LLIFNYNLFL IQVSFLIQIF LLNININNSF AINKARFFAF YTLINFFIKL IQIFLDLLGY ISLYYIYIIK GRIRRFKVLL YILKDFIFYK YIILEKLFYY IYIIKILLIF YNLFLLFFYK KTIKATLNSF QYLIKKFSFL FNLIYTSFFF KLFPKSNISI YLFNYLPYNL ALLHGSLFNK YLAIKFIFNI DFLFNWIMFK AYLYLYILSK YNLLIKLYYI KQLYTFYLKY YLPYNLLIKL CNSYSNYLYK RQKKFINYIY GLEYMQKLPK LYTFYLKYLF NSLINIPIYI LKWLKQLYTF FIFNILLTAL IMFKNKLILL KYLSKAYLYL LLIKLYYIYI SLLFLNSSIK ILKKFGGLEY KYNNKHFFNI FNLIKSLANL KLNSLINIPI YILSKYNRSL YLSKAYLYLY KSYKKKQIIK SLKYFCLNCK CLNCKKKSYK LQNFIVLVKF ILPFWKKNLK KIGYVITNKK KSNFFHDSRK

5.04 5.09 6.30 8.96 11.79 12.49 13.98 14.53 15.09 15.11 16.17 19.03 21.28 22.45 24.49 26.16 27.88 30.50 33.49 34.20 16.29 19.43 33.47 40.77 2.94 7.87 9.78 12.08 12.71 13.09 14.23 16.35 16.44 18.28 24.79 27.39 27.85 30.30 31.39 31.49 34.57 34.86 37.22 38.59 41.63 41.73 43.77 48.17 48.51 22.71 28.32 34.45 5.17 19.49 26.57 34.41

0.04 0.02 0.05 0.08 0.02 0.03 0.04 0.04 0.14 0.07 0.08 0.1 0.24 0.12 0.09 0.08 0.11 0.04 0.05 0.14 0.08 0.05 0.21 0.44 0.02 0.02 0.09 0.03 0.03 0.13 0.04 0.16 0.05 0.09 0.09 0.04 0.32 0.05 0.35 0.35 0.05 0.39 0.15 0.22 0.05 0.45 0.47 0.13 0.28 0.08 0.12 0.14 0.02 0.06 0.11 0.14

Ribosomal Protein L11

Ribosomal Protein S2

Ribosomal Protein L36 Ribosomal Protein S17

glycosylation and methylation regions (Table 11).

analyzed for the first time to reveal antigenic potential in the present study. This analysis was performed with some modifications as described by Goodswen et al. (2013). In the first step of this study, 28 apicoplast derived proteins were obtained from NCBI databank and their antigenic potential was tested by Vaxijen 2.0. Of the 28 apicoplast proteins, 19 of them which are mostly ribosomal proteins were predicted to be possible antigens. Among these 19 antigenic proteins, ribosomal protein L36 and S17 were predicted to have a significant high antigenicity value when compared to others (P < 0.05). In addition, ribosomal protein L36 and S17 had negative GRAVY value which indicates a better interaction with surrounding water molecules. These were in soluble and stable characteristics which are important

4. Discussion Although secreted or surface antigenic proteins have been tested as vaccine candidate against toxoplasmosis in previous studies, a successful vaccine has not been developed yet. On the other hand, it is known that evolutionarily conserved antigenic proteins such as heatshock proteins, ribosomal proteins and nucleosomal proteins are able to modulate the host immune response (Requena et al., 2000). Therefore, proteins encoded by apicoplast which have a genome with a size of 35 kb and has been acquired by secondary endosymbiosis were 8

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

Table 10 Epitopes specific to MHC-II alleles. MHC- II Name

Allele

Start

End

Core sequence

Peptide

IC50

Percentile Rank

Ribosomal Protein S5

HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB1*15:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB3*02:02 HLA-DRB1*15:01 HLA-DRB1*15:01 HLA-DRB3*02:02 HLA-DRB1*15:01 HLA-DRB1*15:01 HLA-DRB1*07:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB1*07:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB1*07:01 HLA-DRB4*01:01 HLA-DRB1*07:01 HLA-DRB1*03:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB1*07:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB1*15:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01 HLA-DRB5*01:01

172 201 86 119 52 68 243 180 146 5 1 93 107 30 19 211 109 219 128 95 76 17 35 72 100 90 80 116 3 65 56 137 54 43 12 102 126 26 142 152 111 63 14 2 25 19

186 215 100 133 66 82 257 194 160 19 15 107 121 44 33 225 123 233 142 109 90 31 49 86 114 104 94 130 17 79 70 151 68 57 26 116 140 40 156 166 125 77 28 16 39 33

FYLQEAINK FYYSNTINK FNTISKIKK FYKTNKYNK YFFFKNKTY FLLNLNFLK FKHTTKYTL KARFFAFKK IRRFKVLLL TFFILNFNK FLKKKTFFI IKKNISLYY ILLIFNYNL IKQLYNFLF FKYNKNIIF IIIYKPIFY ILLIFNYNL YGIQVSFLI YNKILQKII FSFLKQIKL TKLQKVFKK FLSSLSSIL FFQEYNKRI FFFITKLQK LKQIKLFPK LIKKFSFLK TKLQKVFKK IKATLNSFY LKIIKLILH FNLIYTSFF IVYNDKSYV YIYINILKK YLYILSKYN LKYLSKAYL YIGSATSYR WIMFKNKLI FNYLPYNLL YIYKKYNNK YIYINILKK YMQKLPKNI LKWLKQLYT LLFLNSSIK YKFILPFWK IGYVITNKK YKNIQLKIK FWKKNLKYK

STDFYLQEAINKARF KLFYYSNTINKIKCI INNSFNTISKIKKNI FLLFFYKTNKYNKIL FYKYFFFKNKTYWYL NIFLFLLNLNFLKLL LIIKIFKHTTKYTLI AINKARFFAFKKIYF KQKGRIRRFKVLLLI KKKTFFILNFNKFSK MMFLKKKTFFILNFN ISKIKKNISLYYIYI IIKILLIFNYNLFLL FNNLIKQLYNFLFYL KQYFKYNKNIIFNNL KIKCIIIYKPIFYGI KILLIFNYNLFLLFF KPIFYGIQVSFLIQI KYNKILQKIIEIKKI YLIKKFSFLKQIKLF SFFFITKLQKVFKKK LTNFLSSLSSILGPI INLFFQEYNKRIKLK LIYTSFFFITKLQKV FSFLKQIKLFPKSNI KQNKQYLIKKFSFLK ITKLQKVFKKKQNKQ ICKTIKATLNSFYNI KFKLKIIKLILHTEL SYVLNFNLIYTSFFF LHIIVYNDKSYVLNF LIKLYYIYINILKKF YLYLYILSKYNRSLL IFIILKYLSKAYLYL IPIYIGSATSYRQKK FNWIMFKNKLILLKW KYLFNYLPYNLLIKL KFINYIYKKYNNKHF YIYINILKKFGGLEY GGLEYMQKLPKNIFI LILLKWLKQLYTFYL YNRSLLFLNSSIKNF NTKYKFILPFWKKNL VIKIGYVITNKKNTK KKNLKYKNIQLKIKS FILPFWKKNLKYKNI

10.83 12.92 13.62 14.18 15.53 15.68 16.75 17.57 20.31 23.00 32.02 32.54 32.82 33.17 33.85 41.77 43.31 43.86 45.62 7.67 8.74 11.82 15.58 15.83 19.92 20.30 29.17 40.90 42.33 43.36 48.00 6.48 7.87 9.51 9.66 12.07 22.68 22.69 23.48 31.63 32.35 32.48 6.53 22.48 28.12 32.10

1.22 1.60 1.75 1.85 2.16 2.17 2.43 2.63 0.20 3.58 5.19 0.12 0.84 0.87 0.16 1.45 1.53 2.96 7.14 0.65 0.85 0.22 2.16 2.23 2.96 3.03 4.69 2.55 0.32 2.92 0.86 0.38 0.69 0.98 1 1.43 1.04 3.45 3.64 5.08 0.79 5.30 0.38 3.41 4.51 5.21

Ribosomal Protein L11

Ribosomal Protein S2

Ribosomal Protein S17

parameters for biophysical studies on epitope-based vaccine design. Also, a moderate aliphatic index which shows to be stable in a wide spectrum of temperature, a sufficient molecular weight, non-allergic property and long estimated half-life (more than 10 h) allowed ribosomal protein L36 and S17 to be used as an antigen. These results were not surprising because ribosomal proteins have been found to be immunogenic in previous studies. For example, L7/L12 ribosomal proteins were found to be protective antigens which conservatively exists in Brucella species (Du et al., 2016). In another study, it was shown that ribosomal protein of Leishmania donovani, a causative agent of visceral leishmaniasis, could stimulate proliferation and IFN-γ secretion of a Tcell clone established from a human donor (Arora et al., 2005). Similarly, Stober et al. (2006) reported that two ribosomal proteins (60S ribosomal L22 and 40S ribosomal S19) were protective antigens against murine Leishmania major infection (Stober et al., 2006). In addition to ribosomal proteins, elongation factor tu, RNA polymerase B and C1, and four different ORF proteins (ORF D, ORF B, ORF F, ORF E) were predicted to be probable antigen in the present study. It is stated that elongation factor tu is one of the most abundant bacterial proteins which is required for protein synthesis. Besides its major role in protein

synthesis, it has been demonstrated that EF-Tu can play a role as a bacterial virulence factor with high immunoreactivity (Schaumburg et al., 2004). Yang et al. (2018) reported that recombinant elongation factor tu induced antigen-specific antibodies and generated moderate immune protection in tilapia, a fish species (Yang et al., 2018). In another study, specific epitopes belonging to elongation factor tu were found to be one of the most immunoreactive proteins in sera of GBS (group B streptococcus) positive patients (Pyclik et al., 2018). On the other hand, although there is no study associated with antigenicity of RNA polymerase B, ORF B and E proteins, we proposed that these proteins could be new antigens in development of serological test or vaccine owing to having a sufficient molecular weight (> 5 kDa), negative GRAVY value, long estimated half-life, being stable, non-allergic and soluble except for ORF E. In the second step of this study, the presence of signal peptide was predicted in 19 antigenic proteins using TargetP and SignalP online tools. According to obtained results, SignalP performed by D cut-off of 0.30 showed that ribosomal protein L11 and S2 had signal peptides while TargetP performed by specificity > 0.95 showed that ribosomal protein S5, L11 and S2 had signal peptides. The presence of signal 9

Computational Biology and Chemistry 83 (xxxx) xxxx

– 2 6 2 2 – 6 protein

6

– 3 5 7 7 – 3 protein

3

– 16 24 – 1 – 24 protein

16

– 6 10 – – – 10 protein

6

– 7 19 6 6 – 21 protein

16

peptide on any protein is an important parameter to express that the protein can be destined towards the secretory pathway. Therefore, it was thought that these three proteins could be carried to cytoplasm or membrane of T. gondii and could be used for several functions such as structural protein, virulence factor and secretory/excretory antigens. Due to mentioned features above regarding high antigenicity value and presence of signal peptide, ribosomal proteins L36, S11, S5, L11 and S2 were selected for further prediction analyses including secondary and tertiary structure, B cell and MHC-I/II epitope sites and post-translational modification. During secondary structure analysis of ribosomal protein L11, the percentage value of alpha helix motifs was predicted as highest (46.5%) compared to other proteins. In ribosomal protein L36, alpha-helix motifs were predicted as lowest while random coil was predicted as highest (75.7%). Alpha helix located in the inner part of the protein can protect the structure of a protein and induce a strong association between antigen and antibody. Also, an antigen that contains a high percentage of random coil can preferably be recognized by an antibody (Shaddel et al., 2018). Accordingly, it can be stated that ribosomal protein L11 interacts with antibody strongly whereas ribosomal protein L36 can be recognized by antibodies preferably. 3D structure analysis were conducted to validate protein structures in terms of alpha helix and beta sheet. According to refinement results, structures of analyzed proteins were supported to be qualified and reliable depending on high percentages (> 88%) of favoured and allowed regions (Lovell et al., 2003). The significance of adaptive immune responses is known for resistance to human T. gondii infection. For example, it was reported that defects related to T or B cells functions increased susceptibility to T. gondii infection (Kang et al., 2000; Johnson and Sayles, 2002). Moreover, detecting the specific epitopes of T. gondii recognized by immune cells not only permit the measurement of immune response but also allow to understand which antigens play role in stimulation of immune cells (Dupont et al., 2012). In addition, epitope determining is reported to have a major significance for the development of a diagnostic tests, epitope based vaccine candidates and the design of immunogenic peptides (Foroutan et al., 2018; Shaddel et al., 2018). Therefore, in the third step of this study, B cell and MHC-I and II linear epitope analysis for 5 antigenic proteins were performed by several online servers. As a result, B cell epitopes, many of which were 20 amino acids in length were predicted by Bepipred online server. It was reported that when epitope length is more than 21 amino acids, it can be accepted as an ideal antigen in diagnostic tests such as ELISA and Western blotting (Sousa et al., 2008, 2009). Another B cell epitope prediction tool, Bcepred which predicts epitopes in any proteins depending on their physico-chemical properties such as hydrophilicity, flexibility, accessibility, turns, exposed surface, polarity and antigenic propensity was used. The results of our study showed that predicted epitopes were proper to be used as an antigen because of being flexible, hydrophilic and accessible or having an exposed surface. For example, it is stated that antibodies can bind easily and show a higher affinity to epitopes that are accessible or flexible (Adekiya et al., 2017). In allele selection during MHC-I and II linear epitope analysis, reference allele sets recommended by IEDB online server were used. According to obtained results, a number of epitopes specific to MHC-I were higher than those of MHC-II. Low IC50 and percentile rank values indicating a strong binding among epitope and MHC-I/II alleles were remarkable for MHCI. Since KQLYNFLFYL, FLKKKTFFIL, YLINIFLFLL, FLLNLNFLKL, YLFNYLPYNL, ALLHGSLFNK, YLAIKFIFNI, LQNFIVLVKF epitopes for MHC-I had lower IC50 and percentile rank values, they were accepted as antigenic peptides. Similarly, since YLIKKFSFLKQIKLF, SFFFITKLQKVFKKK, LIKLYYIYINILKKF, YLYLYILSKYNRSLL, IFIILKYLSKAYLYL, IPIYIGSATSYRQKK, NTKYKFILPFWKKNL epitopes for MHC-II had lower IC50 and percentile rank values, they were also accepted as antigenic peptides. In the last step of this study, a post-translational modification analysis was performed for five antigenic proteins. Analysis predicted the

Ribosomal S5 Ribosomal L11 Ribosomal S2 Ribosomal L36 Ribosomal S17

Number of N glycosylation region located in exposed surface Number of N glycosylation region Proteins

Table 11 Post-translational modification results of antigenic proteins.

Number of O glycosylation region

Number of acetylation region

Number of acetylation region located in exposed surface

Number of phosphorylation region

Number of phosphorylation region located in exposed surface

Number of methylation region

H. Can, et al.

10

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al.

presence of N-linked glycosylation position which is one of the significant post-translational modifications. Mostly N-linked glycosylation positions were predicted to locate in exposed surface of protein. This parameter is known to increase the accuracy of glycosylation (Hamby and Hirst, 2008). The presence of post-translational modifications in eukaryotic cells like parasites including T. gondii is critical for selecting the proper expression system in recombinant protein production (Hansson et al., 2000). Accordingly, results of our post-translational modification indicated that, if these antigenic proteins are produced recombinantly, presence of N-linked glycosylation, phosphorylation as well as acetylation should be considered and eukaryotic expression systems such as yeast, insect and mammalian should be preferred instead of bacterial systems. The improvement of computer based approaches increases the reliability of in silico methods in biological studies. Depending on this, the methods are frequently preferred to predict antigenic proteins even if they are not expressed in vitro. In silico methods also provides several advantages such as time/labor consuming and cost-effectiveness. This makes in silico methods indispensable as a pre-analysis approach before initiating wet lab studies. Although the several number of proteins can easily be predicted by using in silico methods for antigen discovery which is a very significant stage in vaccine design studies, as a further recommendation, three conditions should take into consideration; choosing the right tools for analysis, using several parameters to find the correct results and further validating by wet lab studies. In conclusion, T. gondii infecting 1/3 of human population is presently one of the most significant intracellular parasite causing lethal clinical symptoms. To date, although secreted or surface antigenic proteins were tested as vaccine candidates, a proper antigen that can diagnose the acute T. gondii infection in serologic assays or can be used as vaccine candidate providing protection through life-span has not been discovered yet. In this study, apicoplast of T. gondii was analyzed for the first time to reveal whether its proteins have an antigenic potential. The results showed that 19 proteins had antigenic properties and among 19 antigenic proteins, 5 of them were selected as more antigenic due to having a signal peptide or a high antigenicity value. We concluded that since organelle genomes gathered by endosymbiotic events during evolutionary process can encode antigenic proteins, in addition to nuclear genome, the presence of these proteins expressed from apicoplast genome could be considered for developing serological tests and selecting vaccine candidate proteins.

immunopathology during toxoplasmosis. Semin. Immunopathol. 34 (6), 793–813. Edwards, J.F., Dubey, J.P., 2013. Toxoplasma gondii abortion storm in sheep on a Texas farm and isolation of mouse virulent atypical genotype T. Gondii from an aborted lamb from a chronically infected ewe. Vet. Parasitol. 192 (1–3), 129–136. Emanuelsson, O., Brunak, S., Von Heijne, G., Nielsen, H., 2007. Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protoc. 2 (4), 953–971. Foroutan, M., Ghaffarifar, F., Sharifi, Z., Dalimi, A., Pirestani, M., 2018. Bioinformatics analysis of ROP8 protein to improve vaccine design against Toxoplasma gondii. Infect. Genet. Evol. 62, 193–204. Garnier, J., Gibrat, J.F., Robson, B., 1996. GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 266, 540–553. Gedik, Y., İz, S.G., Can, H., Döşkaya, A.D., Gürhan, S.İ.D., Gürüz, Y., Döşkaya, M., 2016. Immunogenic multistage recombinant protein vaccine confers partial protection against experimental toxoplasmosis mimicking natural infection in murine model. Trials Vaccinol. 5 (2016), 15–23. Goodswen, S.J., Kennedy, P.J., Ellis, J.T., 2013. A guide to in silico vaccine discovery for eukaryotic pathogens. Brief. Bioinform. 14 (6), 753–774. Gupta, R., Jung, E., Brunak, S., 2004. Prediction of N-Glycosylation Sites in Human Proteins. Unpublished. . Hamby, S.E., Hirst, J.D., 2008. Prediction of glycosylation sites using random forests. BMC Bioinformatics 9 (1), 500. Hansson, M., Nygren, P.A., Ståhl, S., 2000. Design and production of recombinant subunit vaccines. Biotechnol. Appl. Biochem. 32 (2), 95–107. Johnson, L.L., Sayles, P.C., 2002. Deficient humoral responses underlie susceptibility to Toxoplasma gondii in CD4-deficient mice. Infect. Immun. 70 (1), 185–191. Kang, H., Remington, J.S., Suzuki, Y., 2000. Decreased resistance of B cell-deficient mice to infection with Toxoplasma gondii despite unimpaired expression of IFN-gamma, TNF-alpha, and inducible nitric oxide synthase. J. Immunol. 164 (5), 2629–2634. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.L., 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305 (3), 567–580. Lim, L., McFadden, G.I., 2010. The evolution, metabolism and functions of the apicoplast. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365 (1541), 749–763. Lovell, S.C., Davis, I.W., Arendall III, W.B., de Bakker, P.I., Word, J.M., Prisant, M.G., Richardson, J.S., Richardson, D.C., 2003. Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins 50 (3), 437–450. McFadden, G.I., Yeh, E., 2017. The apicoplast: now you see it, now you don’t. Int. J. Parasitol. 47 (2–3), 137–144. Pyclik, M., Górska, S., Brzozowska, E., Dobrut, A., Ciekot, J., Gamian, A., BrzychczyWłoch, M., 2018. Epitope mapping of Streptococcus agalactiae elongation factor tu protein recognized by human sera. Front. Microbiol. 9, 125. Ren, J., Wen, L., Gao, X., Jin, C., Xue, Y., Yao, X., 2008. CSS-Palm 2.0: an updated software for palmitoylation sites prediction. Protein Eng. Des. Sel. 21 (11), 639–644. Requena, J.M., Alonso, C., Soto, M., 2000. Evolutionarily conserved proteins as prominent immunogens during Leishmania infections. Parasitol. Today 16 (6), 246–250. Saha, S., Raghava, G.P.S., 2006. AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res. 34, 202–209. Saha, S., Raghava, G.P.S., 2004. BcePred: prediction of continuous B-cell epitopes in antigenic sequences using physico-chemical properties. In: Nicosia, G., Cutello, V., Bentley, P.J., Timmis, J. (Eds.), Artificial Immune Systems. ICARIS 2004. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg. Sato, S., 2011. The apicomplexan plastid and its evolution. Cell. Mol. Life Sci. 68 (8), 1285–1296. Schaumburg, J., Diekmann, O., Hagendorff, P., Bergmann, S., Rohde, M., Hammerschmidt, S., Jänsch, L., Wehland, J., Kärst, U., 2004. The cell wall subproteome of Listeria monocytogenes. Proteomics 4 (10), 2991–3006. Seeber, F., Steinfelder, S., 2016. Recent advances in understanding apicomplexan parasites. F1000Res 5. Shaddel, M., Ebrahimi, M., Tabandeh, M.R., 2018. Bioinformatics analysis of single and multi-hybrid epitopes of GRA-1, GRA-4, GRA-6 and GRA-7 proteins to improve DNA vaccine design against Toxoplasma gondii. J. Parasit. Dis. 42 (2), 269–276. Siński, E., Behnke, J.M., 2004. Apicomplexan parasites: environmental contamination and transmission. Pol. J. Microbiol. 53, 67–73. Sousa, S., Ajzenberg, D., Marle, M., Aubert, D., Villena, I., da Costa, J.C., Dardé, M.L., 2009. Selection of polymorphic peptides from GRA6 and GRA7 sequences of Toxoplasma gondii strains to be used in serotyping. Clin. Vaccine Immunol. 16 (8), 1158–1169. Sousa, S., Ajzenberg, D., Vilanova, M., Costa, J., Dardé, M.L., 2008. Use of GRA6-derived synthetic polymorphic peptides in an immunoenzymatic assay to serotype Toxoplasma gondii in human serum samples collected from three continents. Clin. Vaccine Immunol. 15 (9), 1380–1386. Steentoft, C., Vakhrushev, S.Y., Joshi, H.J., Kong, Y., Vester-Christensen, M.B., Schjoldager, K.T., Lavrsen, K., Dabelsteen, S., Pedersen, N.B., Marcos-Silva, L., Gupta, R., Bennett, E.P., Mandel, U., Brunak, S., Wandall, H.H., Levery, S.B., Clausen, H., 2013. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 32 (10), 1478–1488. Stober, C.B., Lange, U.G., Roberts, M.T., Gilmartin, B., Francis, R., Almeida, R., Peacock, C.S., McCann, S., Blackwell, J.M., 2006. From genome to vaccines for leishmaniasis: screening 100 novel vaccine candidates against murine Leishmania major infection. Vaccine 24 (14), 2602–2616. Striepen, B., 2011. The apicoplast: a red alga in human parasites. Essays Biochem. 51, 111–125. Tenter, A.M., Heckeroth, A.R., Weiss, L.M., 2000. Toxoplasma gondii: from animals to humans. Int. J. Parasitol. 30 (12–13), 1217–1258. Verma, R., Khanna, P., 2013. Development of Toxoplasma gondii vaccine: a global challenge. Hum. Vaccin. Immunother. 9 (2), 291–293.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.compbiolchem.2019. 107158. References Adekiya, T.A., Aruleba, R.T., Khanyile, S., Masamba, P., Oyinloye, B.E., Kappo, A.P., 2017. Structural analysis and epitope prediction of MHC class-1-chain related protein-A for Cancer vaccine development. Vaccines (Basel) 6 (1), 1. Ajioka, J.W., Fitzpatrick, J.M., Reitter, C.P., 2001. Toxoplasma gondii genomics: shedding light on pathogenesis and chemotherapy. Expert Rev. Mol. Med. 3 (1), 1–19. Arora, S.K., Pal, N.S., Mujtaba, S., 2005. Leishmania donovani: identification of novel vaccine candidates using human reactive sera and cell lines. Exp. Parasitol. 109 (3), 163–170. Bhattacharya, D., Nowotny, J., Cao, R., Cheng, J., 2016. 3Drefine: an interactive web server for efficient protein structure refinement. Nucleic Acids Res. 44 (1), 406–409. Blom, N., Gammeltoft, S., Brunak, S., 1999. Sequence- and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294 (5), 1351–1362. Cheng, J., Randall, A., Sweredoski, M., Baldi, P., 2005. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res. 33, 72–76. Doytchinova, I.A., Flower, D.R., 2007. VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics 8 (1), 4. Du, Z.Q., Li, X., Wang, J.Y., 2016. Immunogenicity analysis of a novel subunit vaccine candidate molecule-recombinant L7/L12 ribosomal protein of Brucella suis. Appl. Biochem. Biotechnol. 179 (8), 1445–1455. Dupont, C.D., Christian, D.A., Hunter, C.A., 2012. Immune response and

11

Computational Biology and Chemistry 83 (xxxx) xxxx

H. Can, et al. Vita, R., Mahajan, S., Overton, J.A., Dhanda, S.K., Martini, S., Cantrell, J.R., Wheeler, D.K., Sette, A., Peters, B., 2018. The immune epitope database (IEDB): 2018 update. Nucleic Acids Res. 47 (1), 339–343. Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T., 2018. SWISSMODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46 (1), 296–303.

Wilkins, M.R., Gasteiger, E., Bairoch, A., Sanchez, J.C., Williams, K.L., Appel, R.D., Hochstrasser, D.F., 1999. Protein identification and analysis tools in the ExPASy server. Methods Mol. Biol. 112, 531–552. Yang, Q., Liu, J.X., Wang, K.Y., Liu, T., Zhu, L., He, S.Y., Geng, Y., Chen, D.F., Huang, X.L., Yang, P.O., 2018. Evaluation of immunogenicity and protective efficacy of the elongation factor Tu against Streptococcus agalactiae in tilapia. Aquaculture 492 (2018), 184–189.

12