Bioinformatics applications to aid high-throughput glycan profiling

Bioinformatics applications to aid high-throughput glycan profiling

Accepted Manuscript Title: Bioinformatic applications to aid high-throughput glycan profiling Author: Ian Walsh Roisin O’Flaherty Pauline M. Rudd PII:...

856KB Sizes 2 Downloads 31 Views

Accepted Manuscript Title: Bioinformatic applications to aid high-throughput glycan profiling Author: Ian Walsh Roisin O’Flaherty Pauline M. Rudd PII: DOI: Reference:

S2213-0209(16)30250-6 http://dx.doi.org/doi:10.1016/j.pisc.2016.01.013 PISC 388

To appear in: Received date: Accepted date:

28-9-2015 4-1-2016

Please cite this article as: {http://dx.doi.org/ This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Bioinformatic applications to aid high-throughput glycan profiling Ian Walsh, Roisin O’Flaherty and Pauline M. Rudd* NIBRT GlycoScience Group, NIBRT – The National Institute for Bioprocessing, Research and Training, Foster’s Avenue, Mount Merrion, Blackrock, Co. Dublin, Ireland Email: Pauline M. Rudd - [email protected] *Corresponding author Received: 28 September 2015 Accepted: 4 January 2016

ABSTRACT High-throughput methods to identify and quantify glycans in a given sample are rare. We have optimised a robotic platform for analysing biopharmaceuticals at each stage of the manufacturing process. In addition, it can be applied to basic research. The plate format makes it convenient for large sample sets; it is relatively cheap, robust and quantitative. However, the large datasets churned out by this platform require significant time to interpret. Consequently, informatics tool are required to help with this annotation. This article briefly describes our robotic platform and concentrates on a set of software tools for the interpretation of quantitative glycoprofiling data

Keywords glycomics, bioinformatics, high-throughput, databases, software tools, biomarker

Introduction All cell surfaces are covered with glycoconjugates and more than fifty percent of human proteins are linked to glycan moieties. Increasing evidence indicates that carbohydrates have critical roles in major biological events such as immunological recognition, metastasis, cell signaling and cell differentiation to name just a few (Ohtsubo and Marth, 2006, Calarese et al., 2003). In addition,  

1

alterations in glycosylation are common in physiological and pathological processes enabling the fine tuning of biological pathways. Tailoring glycosylation during the manufacture of biotherapeutics improves the safety and efficacy of drugs such as monoclonal antibodies and erthropoeitin. The analysis of glycans is challenging, evident in the fact that glycomics is significantly lagging behind genomics and proteomics. A possible reason for the lag is the rarity of high-throughput analytical methods. One successful attempt to overcome this is the development of a robotic platform to release and label glycans from glycoproteins in a 96/384 well plate format (Royle et al., 2008, Stöckmann et al., submittet). It has been developed as a front end to glycan separations technologies including HILIC/MS/MS and capillary electrophoresis. The platform was initially optimized for analyzing biopharmaceuticals at each stage of the manufacturing process but is also applicable to basic research; in particular the linking of extensive sets of disparate data for systems biology. The plate format makes it convenient for large sample sets; it is relatively cheap, robust and quantitative. Calibration of the robotic platform produces massive amounts of raw data which need to be annotated and analyzed. Annotation involves the integration of profile peaks and the assignment of glycan structures to each peak but this is far from a simple process. To give an idea of the complexity, in mammalian systems it has been estimated that 1012 structures are possible (Laine, 1994) although it has been estimated ≥7,000 structures exist in nature (with approximately 700 proteins required to generate this diversity) (Cummings, 2009). These quantities alone present significant analytical challenges for determining detailed, quantitative glycan structural data in complex organisms. Bioinformatics applications are therefore essential to speed-up structural annotation or in the ideal case completely automate it. At the analysis stage, the large scale production of high quality experimental data can be inspected with the goal of detecting important molecular characteristics. For example, as glycans undergo rapid structural changes in response to biological stimuli they provide a unique opportunity to identify and exploit glycans as clinical markers that can be indicative of specific disease states, disease progression, and/or therapy response. Moreover, data can be linked to other parameters such as metabolomics and genomics data pointing towards the inclusion of glycan data in ‘big data’ sets for a better understanding of disease. However, manually analyzing large collections of samples is a cumbersome process and therefore a need for bioinformatic tools is a vital component at the data analysis stage.    

2

In this article we briefly describe our high-throughput platform and pay particular attention to the bioinformatics tools used to annotate and analyze the large quantities of data produced. Bioinformatics is a core component of our pipeline and greatly speeds up the storage, annotation and analysis of the data. The bioinformatics programs enable the less experienced researcher to handle the data and the possibility of integrating the program with other –omics data is now on the horizon.

Robtic platform in 2015 Our recently published automated methods for antibody glycoprofiling incorporate a fully integrated platform that combines glycoprotein affinity purification, protein denaturation, enzymatic glycan release, fluorescent glycan labeling and sample clean-up (Stöckmann et al., 2013; Stöckmann et al., 2015) as shown in Figure 1. The robotic protocol is efficiently programmed on a Hamilton Starlet robot. The workstation is software-controlled and is equipped with pipette tip racks, plate carriers, reagent reservoirs, a software-controlled vacuum manifold, a plate transport tool, a vacuum manifold, eight robotic pipettes with individual liquid level and pressure sensors and a temperature controlled orbital shaker. The fluorescently labeled glycans are run on HPLC/UPLC instruments equipped with hydrophilic interaction chromatography (HILIC) columns and the resulting peaks are correlated to a pre-run dextran ladder, thereby assigning a Glucose Unit (GU) value to each of the peaks. The use of standard glucose units makes these values independent of the running conditions; which allows for the direct comparison of chromatographic profile peaks and their relative glycan abundance. The platform has been successfully utilized in glycoprofiling studies in diseases such as cancer, galactosemia, rheumatoid arthritis and diabetes (Adamczyk et al., 2012, Coss et al., 2014, Albrecht et al., 2014, Thanabalasingham, G. et al., 2016) and current studies are ongoing in these areas for biomarker discovery.

Assigning glycan structures Data sources to reveal the glycan structures in a given sample encompass several orthogonal methodologies such as ultra-high pressure liquid chromatography (H/UPLC), capillary electrophoresis (CE) and mass spectrometry (MS), all of which have inherent limitations.  

3

Without the use of high quality bioinformatics tools these difficulties can be a bottleneck in our high-throughput analysis. Figure 2 shows how complicated structure assignments can be for a UPLC profile on human serum samples. Each peak can contain several structures and new technologies such as UPLC provide higher resolution and even more peaks to assign.

Expert manual glycan sequencing In glycan sequencing, the glycan pool is analyzed before and after sequential digestion with arrays of linkage specific exoglycosidases (Mariño et al., 2010). Glycan digestions result in peak shifts, the extent of which depends on the nature and the number of monosaccharides removed. The entire pool of glycans can be digested without separating individual peaks and aliquots of the pool can be digested simultaneously with panels of enzyme arrays. Figure 3a shows an example of a complete exoglycosidase digestion scheme. The enzymes and their roles and the resulting empirical shifts are given in Figure 3b. To confirm the structures under peaks at GU 9.12 and 8.33, ABS (Arthrobacter ureafaciens sialidase) releases α2-6,3 and 8 linked nonreducing terminal sialic acids (NeuNAc and NeuNGc), with a GU shift from 9.12 to 7.63 and 8.33 to 7.63, respectively. Further evidence can be obtained by digesting the ABS profile at peak GU 7.63 with fucosidase (bovine kidney alpha-fucosidase BKF), galactosidase (bovine testes beta-galactosidase BTG) and hexosaminidase (β-N-(1-2,3,4,6) Acetylglucosaminidase S GUH) to remove sugar moieties to a core Man3GlcNAc2. It should be stressed that although this manual assignment is of the highest experimental quality it is often a laborious task. Three vital tools are used to improve the efficiency of glycan structure assignment in LC profiles (see Figure 1 for their role in the platform): 

GlycoBase (https://glycobase.nibrt.ie): our publicly available database of experimentally determined glycan structures originally developed from the EurocarbDB project (Campbell et al., 2008).



GlycoProfileAssigner (https://bitbucket.org/fergaljd/glycoprofileassigner): a software tool to automate the structural assignment of glycan profile data from LC experiments given exoglycosidases digestion profiles (Duffy and Rudd, 2015) developed in an FP7 project, GlycoBioM.

 

4



GlycoDigest (http://www.glycodigest.org): a tool that simulates exoglycosidase digestion based on controlled rules acquired from expert knowledge and experimental evidence available in GlycoBase (Gotz et al., 2014) developed in GlycoBioM.

GlycoBase to help with structure assignment The use of standard glucose units (GU) and the low variation between prepared samples allows for a direct comparison in GlycoBase. For example, the coefficients of variation between samples prepared on different days with the automated robotized method for all major IgG peaks are typically below 10% (i.e., those peaks with a relative percentage area above 1%), indicating an excellent reproducibility. Currently, Glycobase contains 720 N- and O- linked glycan structures with GU values experimentally determined by HPLC, UPLC and CE. The standard deviation, derived from data collected over ten years, is also provided. Hydrophilic interaction liquid

chromatography

combined

with

fluorescence

detection

(HILIC-fluorescence),

supplemented by exoglycosidase sequencing and mass spectrometric confirmation, was used to generate this high confidence glycan library. The database was built using data from many samples over the course of one decade. Cross-links to other databases such as UnicarbDB (Hayes et al., 2011) (currently UnicarbKB (Campbell et al., 2013) links are under construction) are provided to give further information on each structure. GlycoBase enables users to search for specific glycans using a variety of tools. These include searching by the regular expression name or by antennary composition (e.g., A1, A2 etc.). Alternatively searches can be carried out according to a GU value (± a given threshold), or the user can search for a particular glycan feature, for example the presence or absence of sialic acid or core-fucose. The user also has the ability to carry out a stoichiometric search by, for example the number of hexoses or xyloses. GlycoBase provides users with access to a “summary report” which collates all the available data for a selected glycan (e.g., experimental conditions, source material, monoisotopic mass, etc.).

 

5

GlycoProfileAssigner to assign candidate structures Using Glycobase in a naïve manner by simply extracting similar GU values will give a number of candidate structures for a peak. This is an effective first approximation of the structures under a given peak. However, further automatic tools are required to reduce the number of structures to the correct ones. Exoglycosidases profiles (Figure 3a) are another source of information which can be used to shrink the number of candidate structures to the correct ones. GlycoProfileAssigner (Duffy and Rudd, 2015) takes as input numerous digestion profiles on the same source material and can automatically detect the structures from the empirical peak shifts (see Figure 4 for basic algorithm outline). Automatic assignment using GlycoProfileAssigner was shown to be quite accurate for human IgG with only slight errors (Duffy and Rudd, 2015). To guarantee complete accuracy the number of possible structures could be reduced to a manageable few with GlycoProfileAssigner and expert manual intervention picking the correct ones. Usage of GlycoProfileAssigner speeds up the assignment of structures considerably. Its role in the structural assignment workflow is summarized in Figure 4. The exoglycosidase array digestions are designed to remove individual monosaccharides in turn. In the last digestion any remaining peaks indicate that the glycans they contain have not been exposed to all the enzymes needed to digest their sugars to the GlcNAc2Man3 trimannosyl core. In the example shown in Figure 4 this could be a mannosidase. It is an important feature of this approach that the data will inform the experimenter of any unexpected glycans. The experimenter does not have to search for them as is the case with MS. It is also possible to observe peaks that contain more than one glycan since at some point in the digestions only part of the peak will be digested.

GlycoDigest to simulate digestions GlycoDigest uses the experimentally determined digestion shifts in Glycobase to build a knowledgebase. From this base rules are constructed to model the action of exoglycosidases. The software is quite simple taking a candidate structure, in now well established formats, and offering a selection of enzyme combinations. Theoretical structures are returned with links to UnicarbKB (Campbell et al., 2013) and further structural information. GlycoDigest can be used by the expert chemist to quickly summarize the effect of multiple digestions on a given glycan

 

6

molecule. Figure 5 shows an improved protocol for assigning structures to LC profiles using GlycoBase, GlycoProfileAssigner and GlycoDigest.

Biomarker detection At the analysis stage finding molecular markers defining some physiological or pathological process is of great clinical significance. Our platform has recently found interesting biomarkers in breast cancer (Saldova et al., 2014), lung cancer (Arnold et al., 2011), ovarian cancer (Saldova et al., 2014) and arthritis (Albrecht et al., 2014). In general, biomarkers can be found by measuring changes in large sets of LC profiles. For example, disease biomarkers could be statistically significant changes in sets of disease and control LC profiles. Markers can also be found to monitor disease response during therapy by comparing LC data at different time points (e.g., during chemotherapy). Moreover, apart from the obvious human health applications, other markers can be detected to generate large economic gain for example early detection of animal pregnancy in the agricultural industry. A recent addition to our bioinformatics pipeline, GlycoMarker, is a tool to easily identify biomarkers in LC profiles. It is a client–server application available to the public over the web (https://glycobase.nibrt.ie/glycomarker). Client– server applications have advantages over locally installable ones since computations are carried out on servers (NIBRT, Dublin in GlycoMarkers case) rather than the user machine and nothing needs to be installed locally making it portable across many web enabled devices. The three main components of GlycoMarker are tools to execute (i) statistical testing of markers, (ii) informative visualization and (iii) modeling algorithms borrowed from statistics and machine learning. Figure 6 shows some of the functionality available in GlycoMarker on a dataset of 62 breast cancer and 107 control LC profiles derived from Saldova et al., 2014. Figure 6a shows the calculation of significant differences between breast cancer and control peaks. The significance is determined by appropriate statistical tests and their resulting p-values. Figure 6b shows the numerous visualization capabilities of GlycoMarker such as similarity networks (measuring how similar profiles are across breast cancer and control), Principal Component Analysis (PCA) and clustered PCA loading plots describing groups of features more prevalent in tumor or normal tissue. The clustered loading plots also group structural properties of the glycans summarizing  

7

them in simple plots. Finally, Figure 6c shows the performance of machine learning algorithms modeling the separation between breast cancer and control. The best model (Random Forest algorithm) achieved sensitivity and specificity of 48.4 and 95.3%, respectively. In other words, the model can detect 30 breast cancer cases correctly (32 false negatives) with very little error detecting normal cases (102 normal cases correct with five false positives). All performances are measured in a fair 10-fold cross validation procedure.    

Conclusion The robotic N-glycan release and labeling platform in combination with UPLC/HPLC and bioinformatics tools offers the basis for high-throughput glycoprofiling and characterization of biological samples from biomarker discovery to clinical studies and genome-wide association studies. Obviously, the most important component of any robotic platform is the optimization of sample preparation and robotic procedures in addition to manual expert reasoning. Combining this automated technology with bioinformatics allows high-throughput processing of several orders of magnitude higher than any protocol without bioinformatics. In total our platform uses four such computational tools (see Table 1 for a summary) annotating and analyzing the raw data in a semi-automatic manner. The glycan sample preparation platform can be easily adapted and allows glycan labeling with a variety of labels so that it can be linked to complementary analytical technologies such as mass spectrometry and capillary electrophoresis.

Key points 

High-throughput techniques are needed to increase the pace of glycomics and catch up with proteomics and genomes.



Bioinformatics is a necessary component of any high-throughput technique.



Structural assignment to peaks is difficult manually and automated tools can help speed up the process. Three tools GlycoBase, GlycoProfileAssigner and GlycoDigest are useful tools to achieve this.

 

8



Biomarker discovery from large quantities of LC profiles is now much easier using GlycoMarker.

Funding I. W.’s work was supported by funding from the European Union’s Seventh Framework Programme under the GlycoBioM grant agreement FP7-HEALTH-259869. R. O. F. acknowledges EU FP7 program HighGlycan, grant no. 278535, for funding this work.

Acknowledgements We thank all members of the GlycoScience group at National Institute for Bioprocessing Research & Training (NIBRT). The authors wish to acknowledge that GlycoBaseDigest was in collaboration with the Swiss Institute of Bioinformatics. We thank Dr. Fergal Duffy for the development of the GlycoProfileAssigner software used in our bioinformatics pipeline.

References Adamczyk, B., Tharmalingam, T., Rudd, P. M., 2012. Glycans as cancer biomarkers. Biochim. Biophys. Acta, Gen. Subj. 1820, 1347–1353. doi:10.1016/j.bbagen.2011.12.001 Albrecht, S., Unwin, L., Muniyapp, M., Rudd, P. M., 2014. Glycosylation as a marker for inflammatory arthritis. Cancer Biomark. 14, 17–28. doi:10.3233/CBM-130373 Arnold, J. N., Saldova, R., Galligan, M. C., Murphy, T. B., Mimura-Kimura, Y., Telford, J. E., Godwin, A. K., Rudd, P. M., 2011. Novel glycan biomarkers for the detection of lung cancer. J. Proteome Res. 10, 1755–1764. doi:10.1021/pr101034t Calarese, D. A., Scanlan, C. N., Zwick, M. B., Deechongkit, S., Mimura, Y., Kunert, R., Zhu, P., Wormald, M. R., Stanfield, R. L., Roux, K. H., Kelly, J. W., Rudd, P. M., Dwek, R. A., Katinger, H., Burton, D. R., Wilson, I. A., 2003. Antibody domain exchange is an immunological solution to carbohydrate cluster recognition. Science 300, 2065–2071. doi:10.1126/science.1083182

 

9

Campbell, M. P., Royle, L., Radcliffe, C. M., Dwek, R. A., Rudd, P. M., 2008. GlycoBase and autoGU:

tools

for

HPLC-based

glycan

analysis.

Bioinformatics

24,

1214–1216.

doi:10.1093/bioinformatics/btn090 Campbell, M. P., Peterson, R., Mariethoz, J., Gasteiger, E., Akune, Y., Aoki-Kinoshita, K., Lisacek, F., Packer, N. H., 2013. UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res. 42 (D1), D215–D221. doi: 10.1093/nar/gkt1128 Coss, K. P., Hawkes, C. P., Adamczyk, B., Stöckmann, H., Crushell, E., Saldova, R., Knerr, I., Rubio-Gozalbo, M. E., Monavari, A. A., Rudd, P. M., Treacy, E. P., 2014. N-glycan abnormalities

in

children

with

galactosemia.

J.

Proteome

Res.

13,

385–394.

doi:10.1021/pr4008305 Cummings, R. D., 2009. The repertoire of glycan determinants in the human glycome. Mol. BioSyst. 5, 1087–1104. doi:10.1039/b907931a Duffy, F. J. and Rudd, P. M., 2015. GlycoProfileAssigner: automated structural assignment with error

estimation

for

glycan

LC

data.

Bioinformatics

31,

2220–2221.

doi: 

10.1093/bioinformatics/btv129 Gotz, L., Abrahams, J. L., Mariethoz, J., Rudd, P. M., Karlsson, N. G., Packer, N. H., Campbell, M. P., Lisacek, F., 2014. GlycoDigest: a tool for the targeted use of exoglycosidase digestions in glycan

structure

determination.

Bioinformatics

30,

3131–3133.

doi:10.1093/bioinformatics/btu425 Hayes, C. A., Karlsson, N. G., Struwe, W. B., Lisacek, F., Rudd, P. M., Packer, N. H., Campbell, M. P., 2011. UniCarb-DB: a database resource for glycomic discovery. Bioinformatics 27, 1343– 1344. doi:10.1093/bioinformatics/btr137 Laine, R. A., 1994. Invited Commentary: A calculation of all possible oligosaccharide isomers both branched and linear yields 1.05Ã × 1012 structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems. Glycobiology 4, 759–767. doi:10.1093/glycob/4.6.759

 

10

Mariño, K., Bones, J., Kattla, J. J., Rudd, P. M., 2010. A systematic approach to protein glycosylation analysis: a path through the maze. Nat. Chem. Biol. 6, 713–723. doi:10.1038/nchembio.437 Ohtsubo, K. and Marth, J. D., 2006. Glycosylation in cellular mechanisms of health and disease. Cell 126, 855–867. doi:10.1016/j.cell.2006.08.019 Royle, L., Campbell, M. P., Radcliffe, C. M., White, D. M., Harvey, D. J., Abrahams, J. L., Kim, Y.-G., Henry, G. W., Shadick, N. A., Weinblatt, M. E., Lee, D. M., Rudd, P. M., Dwek, R. A., 2008. HPLC-based analysis of serum N-glycans on a 96-well plate platform with dedicated database software. Anal. Biochem. 376, 1–12. doi:10.1016/j.ab.2007.12.012 Saldova, R., Royle, L., Radcliffe, C. M., Abd Hamid, U. M., Evans, R., Arnold, J. N., Banks, R. E., Hutson, R., Harvey, D. J., Antrobus, R., Petrescu, S. M., Dwek, R. A., Rudd, P. M., 2007. Ovarian cancer is associated with changes in glycosylation in both acute-phase proteins and IgG. Glycobiology 17, p. 1344–1356. doi:10.1093/glycob/cwm100 Saldova, R., Shehni, A. A., Haakensen, V. D., Steinfeld, I., Hilliard, M., Kifer, I., Helland, A., Yakhini, Z., Børresen-Dale, A.-L., Rudd, P. M., 2014. Association of N-glycosylation with breast carcinoma and systemic features using high-resolution quantitative UPLC. J. Proteome Res. 13, 2314–2327. doi:10.1021/pr401092y Stöckmann, H., Adamczyk, B., Hayes, J., Rudd, P. M. 2013. Automated, high-throughput IgGantibody glycoprofiling platform. Anal. Chem. 85, 8841–8849. doi:10.1021/ac402068r Stӧckmann, H., O’Flaherty, R., Adamczyk, B., Saldova, R., Rudd, P. M., 2015. Automated, High-throughput

Serum

Glycoprofiling

Platform.

Integr.

Biol.

7,

1026–1032.

doi:10.1039/c5ib00130g Stӧckmann, H., Duke, .R. M., Rudd, P. M., ULTRA 3: Ultrafiltration-based, ultra-high throughput N-glycomics platform for ultra-performance liquid chromatography. Submitted. Thanabalasingham, G., Huffman, J., Kattla, J. J., Novokmet, M., Rudan, I., Gloyn, A., L., Hayward, C., Adamczyk B., Reynolds, R. M., Muzinic, A., Hassanali, N., Pucic, M., Bennett, A. J., Essafi, A., Polasek, O., Mughal, S. A., Redzic, I., Primorac, D., Zgaga, L., Kolcic, I., Hansen,  

11

T., Gasperikova, D., Tjora, E., Strachan, M. W. J., Nielsen, T., Stanik, J., Klimes, I., Pedersen. O. B., Njølstad, P. R., Wild, S. H., Gyllensten, U., Gornik, O., Wilson, J. E., Hastie, N. D., Campbell, H., McCarthy, M. I., Rudd, P. M., Owen, K. R., Lauc, G., Wright, A. F., 2016. Mutations in HNF1A result in marked alterations of plasma glycan profile. Diabetes 62, 1329– 1337. doi:10.2337/db12-0880

 

12

Rudd – Figure Legends Figure 1: Glycomics workflow. Sera/plasma, individual glycoproteins, mixtures or cell culture supernatant samples are processed on a robotic workstation, resulting in fluorescently labeled glycans, which are subsequently separated into pools and quantified by ultra-high pressure liquid chromatography (UPLC). Bottom right structures are predicted using GlycoBase 3.2 and confirmed by enzyme digestion using bioinformatics software (Duffy and Rudd, 2015). Top right the high quality data produced by the platform can be analyzed using new GlycoMarker bioinformatics software to find glycan biomarkers. Figure 2: Human IgG N-glycome and peak assignments. IgG from native human serum was isolated and processed on the liquid-handling workstation followed by glycan analysis by UPLC with fluorescence detection. GU: glucose units. Reprinted with permission from reference (Stöckmann et al., 2013) Copyright 2013 American Chemical Society. Figure 3: Enzyme array digestions for glycan structure elucidation. (a) Exoglycosidase array digestions analyzed by UPLC with fluorescence detection. (b) Left: Description of Glycosidase Enzymes mentioned in (a). Right: empirically derived influence of monosaccharide types and linkages on glycan GU. Figure 4: The GlycoProfileAssigner algorithm. Profiles are listed from least digested (top, undigested) to most (bottom, ABS, BTG, BKF, GUH) digested. An example is provided for each step. Step 1: all structures in Glycobase with a GU value matching the peak GU +/−0.30 are assigned. Step 2: glycans that would not survive the applied glycosidases and any glycans that cannot digest to the next profile are removed. Step 3: All remaining structures have an assignment error value calculated using the difference between expected and observed GU shift. Step 1 and 3 are simple operations while Step 2 is the most complicated. Figure 5: An improved protocol for assigning glycan structures to LC peaks. GlycoProfileAssigner and GlycoDigest were developed in an FP7 project, GlycoBioM. Figure 6: Some of the main functionality in GlycoMarker. (a) The statistical significant peaks in the LC profile. (b) Graphical representation of the separation between breast cancer and control

 

13

LC profiles. (c) The modeling performance of the separation for some machine learning algorithms available in GlycoMarker.

 

14

Fig 1

 

15

Fig 2

 

16

Fig 3

 

17

Fig 4

 

18

Fig 5

 

19

Fig 6

 

20

Table 1: List of the bioinformatics tools used in the high-throughput platform, their availability, type of software and brief description of their role in the platform.   Name

Available web link

Software type

Role in platform

GlycoBase (Campbell et

https://glycobase.nibrt.ie

- Web accessible database

Storage and glycan

al., 2008) GlycoProfileAssigner

structural assignment https://bitbucket.org/fergaljd/glycoprofileassigner

(Duffy and Rudd, 2015) GlycoDigest (Gotz et al.,

http://www.glycodigest.org

2014)

- Locally installable

Glycan structural

graphical user interface

confirmation

- Locally installable

Glycan structural

graphical user interface

confirmation

- Web application/server GlycoMarker

https://glycobase.nibrt.ie/glycomarker

- Web application/server

 

 

21

Biomarker discovery