Advancing the large-scale CCS database for metabolomics and lipidomics at the machine-learning era

Advancing the large-scale CCS database for metabolomics and lipidomics at the machine-learning era

Available online at www.sciencedirect.com ScienceDirect Advancing the large-scale CCS database for metabolomics and lipidomics at the machine-learnin...

1MB Sizes 0 Downloads 17 Views

Available online at www.sciencedirect.com

ScienceDirect Advancing the large-scale CCS database for metabolomics and lipidomics at the machine-learning era Zhiwei Zhou1,2,3, Jia Tu1,2,3 and Zheng-Jiang Zhu1 Metabolomics and lipidomics aim to comprehensively measure the dynamic changes of all metabolites and lipids that are present in biological systems. The use of ion mobility–mass spectrometry (IM–MS) for metabolomics and lipidomics has facilitated the separation and the identification of metabolites and lipids in complex biological samples. The collision cross-section (CCS) value derived from IM–MS is a valuable physiochemical property for the unambiguous identification of metabolites and lipids. However, CCS values obtained from experimental measurement and computational modeling are limited available, which significantly restricts the application of IM–MS. In this review, we will discuss the recently developed machine-learning based prediction approach, which could efficiently generate precise CCS databases in a large scale. We will also highlight the applications of CCS databases to support metabolomics and lipidomics. Addresses 1 Interdisciplinary Research Center on Biology and Chemistry, and Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, PR China 2 University of Chinese Academy of Sciences, Beijing 100049, PR China Corresponding author: Zhu, Zheng-Jiang ([email protected]) These authors contributed equally.

3

Current Opinion in Chemical Biology 2017, 42:34–41 This review comes from a themed issue on Omics Edited by Erin Baker and Perdita Barran

https://doi.org/10.1016/j.cbpa.2017.10.033

introduced to analyze the complex biological samples towards metabolomics and lipidomics applications [5,6,7]. Ion mobility technology utilizes the collisions between ions and inert buffer gas under an electric field to rapidly separate ions based on their structural and conformational differences [5,8–10]. It provides the orthogonal separation to effectively reduce chemical noise [11], improve the signal-to-noise [11,12] and enhance peak capacity [13,14]. It also enables to distinguish the isomeric metabolites and lipids that commonly exist in the biological samples [15,16,17,18]. Drift times derived from IM separation can be further converted to collision cross-section (CCS) values, which is a valuable physiochemical property for metabolite/lipid identification [19]. Unlike retention time and MS/MS spectra affected by many experimental factors, CCS value is highly reproducible across instruments and labs, and easy to be standardized. The use of CCS values significantly increases the confidence of metabolite/lipid identification [20,21,22,23]. Therefore, the development of the CCS database is indispensable to support the application of IM–MS to metabolomics and lipidomics. Two most common methods to obtain the CCS values are the experimental measurement of chemical standards and computational modeling (Figure 1a,b). Recently, our group and others have developed a third method, namely, machine-learning based prediction, to accurately and efficiently generate the CCS values in a large scale (Figure 1c). In this review, we briefly compare the generation of the CCS values for metabolites and lipids using three different strategies with a focus on the machinelearning based approach, and discuss the applications of CCS values to support metabolomics and lipidomics.

1367-5931/ã 2017 Elsevier Ltd. All rights reserved.

Experimental measurement

Introduction Metabolomics and lipidomics aim to comprehensively profile the dynamic changes of endogenous small molecules (i.e., metabolites and lipids) in biological systems, and both have been widely applied to discover diagnostic biomarkers and understand disease pathogenesis [1–3]. The metabolome and lipidome feature high chemical and structural diversity, therefore require a powerful platform to perform the holistic analysis [4]. Recently, ion mobility–mass spectrometry (IM–MS) has been Current Opinion in Chemical Biology 2018, 42:34–41

In past decades, experimental measurement of CCS values from chemical standards is a major approach to build the CCS database (Figure 1a). In a very recent report, this strategy generates the CCS values with a relative standard deviation (RSD) of 0.29% across four laboratories [29]. In Table 1, we summarized the major resources of experimental CCS values for metabolites and lipids reported from 2014 to 2017. In total, about 5000 CCS values covering 3880 metabolites (with duplications) were reported, and the number of available CCS values is rapidly increasing every year. However, the coverage is still very limited compared to the total number of metabolites and lipids because of the limited availability of chemical standards. Among these CCS www.sciencedirect.com

Advancing the large-scale CCS database for metabolomics and lipidomics Zhou, Tu and Zhu 35

Figure 1

(a) Experimental Measurement High precision

Gas

Instrument dependent Drift time

Standards

Mathematical transformation

IM-MS measurement

Limited by standard availability Experimental CCS

(b) Computational Modeling O

OH

Computationally intensive

HO

Prone to large error

OH

HO

Conformational elucidation

OH

3D structure

Theoretical calculation

Geometry optimization

Theoretical CCS

(c) Machine-Learning based Prediction

O

OH

Computationally efficient

HO

Large scale OH

HO

Low error

OH

2D structure

Selection of MD

Prediction Model

Predicted CCS

(d) Development of Mechane-Learning based Prediction Training data set 1

2

n Met.

Validation data set Prediction model

External validation

1

2

n Met.

MDs

MDs

CCS

CCS

Error evaluation

Predicted CCS

Prediction algorithm

Data training

Experimental CCS Current Opinion in Chemical Biology

Three major strategies for the generation of CCS databases: (a) experimental measurement; (b) computational modeling; and (c) machine-learning based prediction; (d) the workflow to develop a machine-learning based prediction method. The acronym of ‘MD’ represents ‘molecular descriptor’, and ‘Met.’ represents ‘metabolite’.

values, most experimental CCS values were measured in nitrogen gas using commercial IM–MS instruments like drift tube ion mobility spectrometry (DTIMS) and traveling wave ion mobility spectrometry (TWIMS). For different ion mobility techniques, the ways to measure CCS values are generally divided into the calibrantindependent and calibrant-dependent approaches. The stepped-field method in DTIM-MS is the only calibrantindependent approach, which allows to directly measure the CCS values of an analyte according to the Mason– www.sciencedirect.com

Schamp equation [24,29]. IM experiments with multiple drift voltages (at least 5 different voltages) are performed to build a linear regression between measured arrival times and the inversed drift voltages (1/V). The CCS value is further calculated according to the slope of the linear fitting curve [30]. The uncertainty of CCS values is from the accuracy in measuring the drift tube length, experimental temperature and drift voltage. Recently, a stringent and inter-laboratory evaluation of those factors in stepped-field method has achieved the highest reproducibility measurement of CCS values with an Current Opinion in Chemical Biology 2018, 42:34–41

36 Omics

Table 1 Major available CCS values (N > 50) for metabolites and lipids from 2014 to 2017. Type a

Reference Experimental measurement May et al., 2014 [24] Paglia et al., 2014 [20]

DT

CCSN2 CCSN2

TW

Number of CCS values 314 209

Paglia et al., 2015 [21] Groessl et al., 2015 [15] Zhang et al., 2015 [25]

TW

CCSN2 CCSN2 TW CCSN2

244 135 87

Zhou et al., 2016 [22]

DT

953

Hines et al., 2017 [26] Hines et al., 2017 [27] Zhou et al., 2017 [23] Zheng et al., 2017 [28]

TW

Computational modeling Paglia et al., 2014 [20]

DT

CCSN2

Coverage (N = compound number)

Glycerophospholipids, sphingolipids (N = 294) Amino acids and derivatives, carboxylic acids, nucleobases, phosphorylated compounds, sugars, nucleosides, nucleotides (N = 125) Glycerophospholipids, sphingolipids, sterols, fatty acids (N = 244) Glycerophospholipids, sphingolipids (N = 112) Phosphorylated compounds, glycerophospholipids, nucleosides, nucleotides, nucleobases (N = 76) Amino acids and derivatives, nucleobases, nucleosides, nucleotides, phosphorylated compounds, sugars, vitamins, polyols, organic acids, among others (N = 617) Glycerophospholipids, sphingolipids, glycerolipids (N = 148) Drugs, natural products, metabolites and others (N = 1425) Glycerophospholipids, sphingolipids, glycerolipids (N = 380) Metabolites and xenobiotics (N = 459)

CCSN2 CCSN2 DT CCSN2 DT CCSN2

258 1440 572 826

Theo

205

Amino acids and derivatives, carboxylic acids, nucleobases, phosphorylated compounds, sugars, nucleosides, nucleotides (N = 125)

176 015 63 434

Metabolites from HMDB (N = 35 203) Phospholipids, sphingolipids and glycerolipids from LipidMaps (N = 15 646)

TW

CCSN2

Machine-learning based prediction Pred Zhou et al., 2016 [22] CCSN2 Pred Zhou et al., 2017 [23] CCSN2 a T

CCSX: X denotes the drift gas used for the generation of CCS values; T denotes the ion mobility instrument or approach for the generation of CCS values; TW: TWIM-MS; DT: DTIM-MS; Theo: theoretical calculation; Pred: prediction.

unprecedented precision of 0.29% (RSD) [29]. Therefore, the stepped-field method in DTIM-MS is recognized as the ‘gold standard’ for measuring the CCS values. However, the stepped-field method is not compatible with the chromatographic separation. To address this challenge, Kurulugama et al. [31] recently developed a single-field method in DTIMS to measure the CCS values using one drift voltage. In this method, calibrants with known CCS values (i.e., Agilent Tune mixture solution) were measured first to establish a linear calibration curve between the arrival time and their CCS values. Two coefficients, the mobility-independent flight time (tfix) and instrument-dependent proportionality coefficient (b), are obtained from the calibration curve. Then, the experimental CCS values for analyte ions are readily calculated using their measured arrival times. It has been recently reported the single-field method provided an average error of 0.54% compared to the stepped-field method [29]. Since only one voltage is required, the single-field method is compatible for LC–IM–MS analysis, and widely used for metabolomics and lipidomics. Similar calibrant-dependent method was also developed for TWIMS to measure the experimental CCS values. For example in TWIMS, a calibration curve is first generated to establish the nonlinear relationship between the CCS values of calibrants and the measurable arrival times. Then, the CCS values of analyte ions can be determined using the calibrant-derived calibration curve and the Current Opinion in Chemical Biology 2018, 42:34–41

measured arrival times [7,32]. Currently, the commonly used calibrants are polyalanine peptides (PolyAla) [33] and Agilent tune mix solution due to their long-term stability and broad coverage of mass-to-charge ratios (m/z) and CCS values. However, it is very important to know that the accuracy of CCS values from TWIMS significantly depends on the structural similarity between the analyte and calibrant ions. Hines et al. [34] has recently reported that the structurally mismatched calibrants lead to larger errors in the measurement of lipid CCS values. For example, lipid CCS values with tryptic peptide calibrants are systemically larger with an averaged relative errors of 6.4% compared to ones obtained from the DTIMS, but reduce to <2% using the lipid calibrants [34]. Currently, no universal and unbiased calibrant systems are available yet. Therefore, for TWIMS, the choice of suitable calibrants is important for the accuracy of experimental CCS values.

Computational modeling Computational modeling is an alternative approach to obtain the theoretical CCS values (Figure 1b) [5,8,35]. First, the molecule ion is converted to three-dimensional structure, and to generate all possible conformers. Several computational conditions need to be considered, such as protonation/deprotonation sites and bond length [5]. Then, geometry optimization is performed to obtain the conformer with the minimalized energy using various theoretical approaches (see Ref. [5] for more details), such as molecular dynamics, density functional theory (DFT), www.sciencedirect.com

Advancing the large-scale CCS database for metabolomics and lipidomics Zhou, Tu and Zhu 37

and molecular mechanics. Finally, the theoretical CCS values of the target molecule are calculated through different algorithms such as projection approximation (PA) [36], trajectory method (TM) [37], and the exact hard sphere scattering (EHSS) [38]. Several programs are available to support the theoretical calculation, like MOBCAL and Sigma. After computational calculation, these CCS values can be compared with the experimental CCS values to help to elucidate the conformation of small molecules. For example, Chouinard et al. [39] utilized the computational modeling to determine the conformational differences between 25-hydroxyvitamin D3 epimers. However, the calculation of theoretical CCS values is computationally intensive, and prone to large error, especially for molecules with flexible structures (e.g., lipids) [40].

Machine-learning based prediction The machine-learning based prediction has been emerged as the third way to generate the CCS values for small molecules (Figure 1c), such as metabolites [22], phenolics [41], pesticides [42], and lipids [23,43]. It utilizes the machine-learning algorithm to predict the CCS values from the molecular descriptors (MDs) of small molecules. Compared to the computational modeling, this method is computationally efficient, and each prediction can be readily completed within seconds. The prediction error is as low as 1–3% (median relative error) with interquartile ranges (IQR) of 1–3%, and significantly decreased compared to computational modeling. Therefore, this method is suitable to generate the large-scale CCS databases with low bias to support IM–MS based metabolomics and lipidomics. Generally, it requires four components to develop a machine-learning based prediction method (Figure 1d): a training data set, a set of molecular descriptors, a prediction algorithm, and a validation data set. First, using the training data set, the machine-learning algorithm learns the ‘rule’ between molecular descriptors and their experimental CCS values. The rule-learning process is also called data training. Through data training, a prediction model (i.e., the ‘rule’) is established, and enables to rapidly predict the CCS values using the input structures (Figure 1c). Finally, the external validation data set is used to validate and evaluate the prediction error. Training data set. The training data set is a set of compounds with known experimental CCS values. During data training, the characteristics of the training data set are encoded into the prediction model, and passed to the predicted CCS values. Important characteristics include the experimental parameters such as ionization polarity, drift gas, and calibration method, and the structural coverage of compounds. To increase the validity for different metabolites, a large training data set is required to represent the diverse physiochemical properties. For example, Zhou et al. [22] used the experimental CCS www.sciencedirect.com

values from 400 metabolites for data training to build the prediction model. Molecular descriptors. Molecular descriptors are a series of numeric values to describe the physic-chemical properties of the chemical structure [44]. Hundreds to thousands of MDs can be readily calculated through the input chemical structures using various cheminformatics programs, such as Dragon [45], CDK [46] and ChemAxon. The wide availability of molecular descriptors provides a comprehensive characterization of the structure, but it presents a challenge to select valuable ones. Our recent work demonstrated that the inclusion of redundant MDs to build the prediction model is prone to over-fitting and reduces the predictive power. For example, Zhou et al. [23] observed that the correlation coefficient value (R2) of predicted CCS values was increased to 0.9941 using the optimized 45 MDs, comparing to 0.1322 using all calculated 221 MDs. Similarly, Soper-Hopper et al. [43] also reported the improved prediction capability by reducing the number of MDs from 3827 to 68. The selection of the molecular descriptors is primarily developed based on bioinformatics approaches such as stepwise selection and genetic algorithm selection. Prediction algorithm. The prediction algorithm establishes the correlation between the molecular descriptors and the CCS values. Several machine-learning algorithms have been reported for CCS value prediction, including support vector regression (SVR) [22,23], and artificial neural networks (ANN) [42]. SVR algorithm utilizes the kernel function mapping molecular descriptors into a high-dimensional feature space, and performs the highdimensional regression between molecular descriptors and CCS values in a hyperplane [22,23]. Unlike SVR, ANN algorithm constructs artificial neuron connection network between input nodes (molecular descriptors) and the output node (CCS value) through the threshold activation function in several hidden layers [42]. In order to obtain a prediction model with the best predictive capability, parameters in prediction algorithm are required to be optimized through iterations. Other multivariate regression algorithms such as partial least squares regression (PLS) [41,43] are also used to build the prediction model. However, there is no systematic comparison of different algorithms towards the predictive capability. External validation and prediction error evaluation. The prediction method is required to be externally validated using an independent set of compounds that are not included in training data set, and to determine the prediction error. Ideally, the independent set of experimental CCS values should be measured in different instruments and laboratories. For example, Zhou et al. [22] used two independent CCS value sets with 78 and 95 metabolites to validate the prediction, which were Current Opinion in Chemical Biology 2018, 42:34–41

38 Omics

not available (Figure 2a). For example, Paglia et al. [21] used an experimental CCS database (244 CCS values for lipids) to increase the identification confidence by reducing false positive identifications in shotgun lipidomics. Similar results were also demonstrated by Schmitz and co-workers [48] in contaminant screening in wastewater. However, the wide application of IM–MS to metabolomics is limited by the available number of CCS values. The machine-learning based prediction delivers the full potential of IM–MS for metabolite identification through the generation of large-scale CCS databases [22,23]. With the large-scale predicted CCS database, one can readily match the m/z and CCS values in untargeted metabolomics for fast metabolite identification. For example, Zhou et al. demonstrated that the use of predicted CCS values reduces 50% of false positive identifications in untargeted metabolomics [22], and the identification confidence can be further improved with lower prediction error [23]. For untargeted metabolomics, this strategy is particularly important for the identification of low abundant metabolites when experimental MS/MS spectra are not acquired.

measured either in their group (intra-lab validation) or other laboratories (inter-lab validation), respectively. Through the external validation, the machine-learning based prediction methods generally have median relative errors of 1–3% [22,23,42]. Finally, to help common users to predict CCS values of interests, our group have recently developed two web servers with user-friendly interface, namely, MetCCS Predictor [47] and LipidCCS Predictor [23], for the prediction of CCS values for metabolites and lipids, respectively. In addition, we also discovered that MetCCS predictor is also applicable for other chemicals like nature products, drugs, and pesticides. This may generate a broad impact in other fields, but should be used carefully before the validation.

The large-scale CCS database to support metabolomics The use of CCS database in metabolomics and lipidomics facilitates the identification of metabolites and lipids of interest especially when the standard MS/MS spectra are Figure 2

(a)

Metabolic feature

Metabolite identification m/z + CCS match

×10

CCS

2

15

Feature number

CCS

CCS

m/z match

Reduce false positive

m/z

10

0

m/z

0

0

m/z

m/z

5

m/z ± 1% CCS

0 100 70 40 20 10 5 0 m/z tolerance (ppm)

(b)

Four-dimensional data

Metabolite identification

LC

IM

MS

MS/MS

RT

Id

t en

DT

m/z

c ifi

a ti

o

m/z CCS

on nc

fi d e

n ce

m/z CCS RT

m/z CCS RT MS/MS

m/z

m/z Current Opinion in Chemical Biology

The use of CCS values to support metabolomics and lipidomics. (a) The CCS value is an important criterion to identify metabolites and reduce false positives; (b) the integration of m/z value, CCS value, retention time, and MS/MS spectra improves the confidence of metabolite identification. Current Opinion in Chemical Biology 2018, 42:34–41

www.sciencedirect.com

Advancing the large-scale CCS database for metabolomics and lipidomics Zhou, Tu and Zhu 39

Moreover, CCS values can be further combined with MS/ MS spectra and retention time to improve the confidence of identification (Figure 2b). For example, Paglia et al. [20] combined the CCS values with retention time and m/z values for metabolite identification, and compared to the traditional LC–MS approach. Regueiro et al. [49] utilized the m/z, CCS value, retention time, and MS/ MS to decrease the false positive identifications in pesticide screening. This strategy is also important for the identification of co-eluting metabolites that have contaminant MS/MS spectra acquired. In this case, metabolite identification using both MS/MS spectra and CCS values provides higher confidence. Similarly for data independent acquisition (DIA) based metabolomics, MS/MS spectra are reconstructed through deconvolution [50] or targeted extraction [51] from the multiplexed MS/MS spectra. Metabolite identification using the MS/MS spectra from DIA-MS is prone to errors, while the addition of CCS match improves the confidence. Finally, we think the integration of m/z, CCS value, retention time, and MS/MS spectra will be particularly useful for lipidomics since the predicted MS/MS database and predicted retention time database are also readily available. The CCS database also has other interesting applications in metabolomics and lipidomics. For example, with the predicted CCS database, a two-dimensional (2D) plot is readily achieved to describe the correlation between the structure (as indicated by CCS values) and m/z values, or called trend line. Then, the 2D plot could be used to classify the unknown metabolites and lipids according to the location of their CCS values in the trend lines [52]. The CCS database also helps to develop a targeted metabolomics assay with high selectivity. For example, through comparing the CCS values between different adducts and ionization modes, it facilitates to select the best conditions (e.g., adducts, ionization polarities) to distinguish and separate isomers [53,54]. Moreover, the combination of diagnostic product ions and their CCS values assists the lipid identification, which has been validated to identify neutral lipids [55] and phospholipids [56]. It implies that the development of a CCS database specific for fragment ions may help the structural elucidation. In summary, the CCS databases facilitate the identification of metabolites and lipids, and support the application of IM–MS to metabolomics and lipidomics.

developed to support the common users to predict CCS values of interests. With the large-scale predicted CCS databases, one can readily match the m/z and CCS values in metabolomics and lipidomics for identification. This strategy is particularly important for the identification of low abundant metabolites/lipids when experimental MS/ MS spectra are not acquired or no MS/MS spectra are available. The combination of CCS value with other properties (e.g., MS/MS, RT) further improves the confidence of identification and supports the application of IM–MS to metabolomics and lipidomics. Therefore, the development of the large-scale CCS databases at machine-learning era will realize the full potential of IM–MS technology for metabolomics and lipidomics.

Acknowledgements ZJZ received financial support from National Natural Science Foundation of China (Grant No. 21575151) and Thousand Youth Talents Program from Government of China.

References and recommended reading Papers of particular interest, published within the period of review, have been highlighted as:  of special interest  of outstanding interest 1.

Johnson CH, Ivanisevic J, Siuzdak G: Metabolomics: beyond biomarkers and towards mechanisms. Nat Rev Mol Cell Biol 2016, 17:451-459.

2.

Wishart DS: Emerging applications of metabolomics in drug discovery and precision medicine. Nat Rev Drug Discov 2016, 15:473-484.

3.

Han X: Lipidomics for studying metabolism. Nat Rev Endocrinol 2016, 12:668-679.

4.

Cajka T, Fiehn O: Toward merging untargeted and targeted methods in mass spectrometry-based metabolomics and lipidomics. Anal Chem 2016, 88:524-545.

5.

Lapthorn C, Pullen F, Chowdhry BZ: Ion mobility spectrometry– mass spectrometry (IMS-MS) of small molecules: separating and assigning structures to ions. Mass Spectrom Rev 2013, 32:43-71.

6.

May JC, Gant-Branum RL, McLean JA: Targeting the untargeted in molecular phenomics with structurally-selective ion mobility–mass spectrometry. Curr Opin Biotechnol 2016, 39:192-197.

Paglia G, Astarita G: Metabolomics and lipidomics using traveling-wave ion mobility mass spectrometry. Nat Protoc 2017, 12:797-813. This work reports a detailed protocol to instruct the use of IM–MS for untargeted metabolomics and lipidomics, including sample preparation, instrument setup, data acquisition and analysis, and metabolite and lipid identification using the CCS database.

7. 

8.

Lanucara F, Holman SW, Gray CJ, Eyers CE: The power of ion mobility–mass spectrometry for structural characterization and the study of conformational dynamics. Nat Chem 2014, 6:281-294.

9.

May JC, McLean JA: Ion mobility–mass spectrometry: timedispersive instrumentation. Anal Chem 2015, 87:1422-1436.

Conclusion In summary, we discussed three strategies for the generation of CCS values for metabolites and lipids, including experimental measurement, computational modeling, and machine-learning based prediction. The machinelearning based prediction is computationally efficient with a prediction error as low as 1–3% (median relative error). Currently, it is the only method suitable to generate the large-scale CCS databases with low bias. Two web servers, namely, MetCCS and LipidCCS, are also www.sciencedirect.com

10. Zheng X, Wojcik R, Zhang X, Ibrahim YM, Burnum-Johnson KE, Orton DJ, Monroe ME, Moore RJ, Smith RD, Baker ES: Coupling front-end separations, ion mobility spectrometry, and mass spectrometry for enhanced multidimensional biological and environmental analyses. Annu Rev Anal Chem 2017, 10:71-92. 11. Zhang X, Kew K, Reisdorph R, Sartain M, Powell R, Armstrong M,  Quinn K, Cruickshank-Quinn C, Walmsley S, Bokatzian S et al.: Current Opinion in Chemical Biology 2018, 42:34–41

40 Omics

Performance of a high-pressure liquid chromatography–ion mobility–mass spectrometry system for metabolic profiling. Anal Chem 2017, 89:6384-6391. This work systematically evaluates the performance of LC–IM–MS for metabolomics, including the limit of detection, linear dynamic range, and resolving power. The authors also indicated the potential of LC–IM–MS to improve the throughput of metabolomics analysis. 12. Baker PR, Armando AM, Campbell JL, Quehenberger O, Dennis EA: Three-dimensional enhanced lipidomics analysis combining UPLC, differential ion mobility spectrometry, and mass spectrometric separation strategies. J Lipid Res 2014, 55:2432-2442. 13. Stephan S, Jakob C, Hippler J, Schmitz OJ: A novel fourdimensional analytical approach for analysis of complex samples. Anal Bioanal Chem 2016, 408:3751-3759. 14. Rainville PD, Wilson ID, Nicholson JK, Issacs G, Mullin L, Langridge JI, Plumb RS: Ion mobility spectrometry combined with ultra performance liquid chromatography/mass spectrometry for metabolic phenotyping of urine: effects of column length, gradient duration and ion mobility spectrometry on metabolite detection. Anal Chim Acta 2017, 982:1-8. 15. Groessl M, Graf S, Knochenmuss R: High resolution ion mobility–mass spectrometry for separation and identification of isomeric lipids. Analyst 2015, 140:6904-6911. 16. Kyle JE, Zhang X, Weitz KK, Monroe ME, Ibrahim YM, Moore RJ,  Cha J, Sun X, Lovelace ES, Wagoner J et al.: Uncovering biologically significant lipid isomers with liquid chromatography, ion mobility spectrometry and mass spectrometry. Analyst 2016, 141:1649-1659. The authors demonstrated the use of LC–IM–MS to analyze and distinguish biologically significant lipid isomers. 17. Gaye MM, Nagy G, Clemmer DE, Pohl NL: Multidimensional analysis of 16 glucose isomers by ion mobility spectrometry. Anal Chem 2016, 88:2335-2344. 18. Dodds JN, May JC, McLean JA: Investigation of the complete suite of the leucine and isoleucine isomers: toward prediction of ion mobility separation capabilities. Anal Chem 2017, 89:952959. 19. May JC, Morris CB, McLean JA: Ion mobility collision cross section compendium. Anal Chem 2017, 89:1032-1044. 20. Paglia G, Williams JP, Menikarachchi L, Thompson JW, Tyldesley Worster R, Halldorsson S, Rolfsson O, Moseley A, Grant D, Langridge J et al.: Ion mobility derived collision cross sections to support metabolomics applications. Anal Chem 2014, 86:3985-3993. This study indicates that the addition of CCS values to the searchable databases increase the identification confidence of metabolites in metabolomics. 21. Paglia G, Angel P, Williams JP, Richardson K, Olivos HJ,  Thompson JW, Menikarachchi L, Lai S, Walsh C, Moseley A et al.: Ion mobility-derived collision cross section as an additional measure for lipid fingerprinting and identification. Anal Chem 2015, 87:1137-1144. The authors demonstrated that the addition of CCS values in lipidomics improves the confidence of lipid identification. 22. Zhou Z, Shen X, Tu J, Zhu ZJ: Large-scale prediction of collision  cross-section values for metabolites in ion mobility–mass spectrometry. Anal Chem 2016, 88:11084-11091. The authors developed the first machine-learning based prediction approach to generate CCS values for metabolites, and demonstrated the use of MetCCS database to effectively improve the identification accuracy in untargeted metabolomics. 23. Zhou Z, Tu J, Xiong X, Shen X, Zhu ZJ: LipidCCS: prediction of  collision cross-section values for lipids with high precision to support ion mobility–mass spectrometry based lipidomics. Anal Chem 2017, 89:9559-9566. The authors developed a machine-learning based prediction to generate CCS values for lipids with a precision as high as 1%, and demonstrated that the use of LipidCCS database could effectively remove false positives in lipid identification and support lipidomics. The functions of LipidCCS web server are also described. Current Opinion in Chemical Biology 2018, 42:34–41

24. May JC, Goodwin CR, Lareau NM, Leaptrot KL, Morris CB, Kurulugama RT, Mordehai A, Klein C, Barry W, Darland E et al.: Conformational ordering of biomolecules in the gas phase: nitrogen collision cross sections measured on a prototype high resolution drift tube ion mobility–mass spectrometer. Anal Chem 2014, 86:2107-2116. 25. Zhang L, Vertes A: Energy charge, redox state, and metabolite turnover in single human hepatocytes revealed by capillary microsampling mass spectrometry. Anal Chem 2015, 87:1039710405. 26. Hines KM, Herron J, Xu L: Assessment of altered lipid homeostasis by HILIC-ion mobility–mass spectrometry-based lipidomics. J Lipid Res 2017, 58:809-819. 27. Hines KM, Ross DH, Davidson KL, Bush MF, Xu L: Large-scale structural characterization of drug and drug-like compounds by high-throughput ion mobility–mass spectrometry. Anal Chem 2017, 89:9023-9030. 28. Zheng X, Aly NA, Zhou Y, Dupuis KT, Bilbao A, Paurus VL, Orton DJ, Wilson R, Payne SH, Smith RD: A structural examination and collision cross section database for over 500 metabolites and xenobiotics using drift tube ion mobility spectrometry. Chem Sci 2017, 8:7724-7736. 29. Stow SM, Causon T, Zheng X, Kurulugama RT, Mairinger T, May JC, Rennie EE, Baker ES, Smith RD, McLean JA et al.: An interlaboratory evaluation of drift tube ion mobility–mass spectrometry collision cross section measurements. Anal Chem 2017, 89:9048-9055. 30. Ma J, Casey CP, Zheng X, Ibrahim YM, Wilkins CS, Renslow RS, Thomas DG, Payne SH, Monroe ME, Smith RD et al.: PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association. Bioinformatics 2017, 33: 2715-2722. 31. Kurulugama RT, Darland E, Kuhlmann F, Stafford G, Fjeldsted J: Evaluation of drift gas selection in complex sample analyses using a high performance drift tube ion mobility–QTOF mass spectrometer. Analyst 2015, 140:6834-6844. 32. Ruotolo BT, Benesch JL, Sandercock AM, Hyung SJ, Robinson CV: Ion mobility–mass spectrometry analysis of large protein complexes. Nat Protoc 2008, 3:1139-1152. 33. Bush MF, Campuzano ID, Robinson CV: Ion mobility mass spectrometry of peptide ions: effects of drift gas and calibration strategies. Anal Chem 2012, 84:7124-7130. 34. Hines KM, May JC, McLean JA, Xu L: Evaluation of collision cross section calibrants for structural analysis of lipids by  traveling wave ion mobility–mass spectrometry. Anal Chem 2016, 88:7329-7336. This work demonstrated that the accuracy of CCS values from TWIMS significantly depends on the structural similarity between the analyte and calibrant ions. 35. Metz TO, Baker ES, Schymanski EL, Renslow RS, Thomas DG, Causon TJ, Webb IK, Hann S, Smith RD, Teeguarden JG: Integrating ion mobility spectrometry into mass spectrometry-based exposome measurements: what can it add and how far can it go? Bioanalysis 2017, 9:81-98. 36. Mesleh M, Hunter J, Shvartsburg A, Schatz GC, Jarrold M: Structural information from ion mobility measurements: effects of the long-range potential. J Phys Chem 1996, 100:16082-16086. 37. Shvartsburg AA, Schatz GC, Jarrold MF: Mobilities of carbon cluster ions: critical importance of the molecular attractive potential. J Chem Phys 1998, 108:2416-2423. 38. Shvartsburg AA, Jarrold MF: An exact hard-spheres scattering model for the mobilities of polyatomic ions. Chem Phys Lett 1996, 261:86-91. 39. Chouinard CD, Cruzeiro VWD, Beekman CR, Roitberg AE, Yost RA: Investigating differences in gas-phase conformations of 25-hydroxyvitamin D3 sodiated epimers using ion mobility– mass spectrometry and theoretical modeling. J Am Soc Mass Spectrom 2017, 28:1497-1505. www.sciencedirect.com

Advancing the large-scale CCS database for metabolomics and lipidomics Zhou, Tu and Zhu 41

40. Paglia G, Kliman M, Claude E, Geromanos S, Astarita G: Applications of ion-mobility mass spectrometry for lipid analysis. Anal Bioanal Chem 2015, 407:4995-5007. 41. Gonzales GB, Smagghe G, Coelus S, Adriaenssens D, De Winter K, Desmet T, Raes K, Van Camp J: Collision cross section prediction of deprotonated phenolics in a travelling-wave ion mobility spectrometer using molecular descriptors and chemometrics. Anal Chim Acta 2016, 924:68-76.

48. Stephan S, Hippler J, Kohler T, Deeb AA, Schmidt TC, Schmitz OJ: Contaminant screening of wastewater with HPLC–IM–qTOF– MS and LC+LC–IM–Qtof–MS using a CCS database. Anal Bioanal Chem 2016, 408:6545-6555. 49. Regueiro J, Negreira N, Berntssen MH: Ion-mobility-derived collision cross section as an additional identification point for multiresidue screening of pesticides in fish feed. Anal Chem 2016, 88:11169-11177.

42. Bijlsma L, Bade R, Celma A, Mullin L, Cleland G, Stead S,  Hernandez F, Sancho JV: Prediction of collision cross-section values for small molecules: application to pesticide residue analysis. Anal Chem 2017, 89:6583-6589. The authors developed an artificial neural network (ANN)-based prediction method to generate CCS values for pesticides, and demonstrated the application for pesticide screening. This prediction approach is applicable for metabolites and lipids.

50. Tsugawa H, Cajka T, Kind T, Ma Y, Higgins B, Ikeda K, Kanazawa M, VanderGheynst J, Fiehn O, Arita M: MS-DIAL: dataindependent MS/MS deconvolution for comprehensive metabolome analysis. Nat Methods 2015, 12:523-526.

43. Soper-Hopper MT, Petrov AS, Howard JN, Yu SS, Forsythe JG, Grover MA, Fernandez FM: Collision cross section predictions using 2-dimensional molecular descriptors. Chem Commun 2017, 53:7624-7627.

52. Sarbu M, Robu AC, Ghiulai RM, Vukelic Z, Clemmer DE, Zamfir AD: Electrospray ionization ion mobility mass spectrometry of human brain gangliosides. Anal Chem 2016, 88:5166-5178.

44. Todeschini R, Consonni V: Handbook of Molecular Descriptors . John Wiley & Sons; 2008.

53. Chouinard CD, Cruzeiro VW, Roitberg AE, Yost RA: Experimental and theoretical investigation of sodiated multimers of steroid epimers with ion mobility–mass spectrometry. J Am Soc Mass Spectrom 2017, 28:323-331.

45. Mauri A, Consonni V, Pavan M, Todeschini R: Dragon software: an easy approach to molecular descriptor calculations. Match 2006, 56:237-248. 46. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E: The Chemistry Development Kit (CDK): an opensource Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 2003, 43:493-500. 47. Zhou Z, Xiong X, Zhu ZJ: MetCCS predictor: a web server for  predicting collision cross-section values of metabolites in ion mobility–mass spectrometry based metabolomics. Bioinformatics 2017, 33:2235-2237. The first web server with a user-friendly interface was presented for rapidly predicting CCS values of metabolites. MetCCS predictor is also applicable for other chemicals like nature products, drugs, and pesticides.

www.sciencedirect.com

51. Li H, Cai Y, Guo Y, Chen F, Zhu ZJ: MetDIA: targeted metabolite extraction of multiplexed MS/MS spectra generated by dataindependent acquisition. Anal Chem 2016, 88:8757-8764.

54. Chouinard CD, Beekman CR, Kemperman RH, King HM, Yost RA: Ion mobility–mass spectrometry separation of steroid structural isomers and epimers. Int J Ion Mobility Spectrom 2017, 20:31-39. 55. Hankin JA, Barkley RM, Zemski-Berry K, Deng Y, Murphy RC: Mass spectrometric collisional activation and product ion mobility of human serum neutral lipid extracts. Anal Chem 2016, 88:6274-6282. 56. Berry KA, Barkley RM, Berry JJ, Hankin JA, Hoyes E, Brown JM, Murphy RC: Tandem mass spectrometry in combination with product ion mobility for the identification of phospholipids. Anal Chem 2017, 89:916-921.

Current Opinion in Chemical Biology 2018, 42:34–41