New advances in analytical methods for mass spectrometry-based large-scale metabolomics study

New advances in analytical methods for mass spectrometry-based large-scale metabolomics study

Journal Pre-proof New advances in analytical methods for mass spectrometry-based large-scale metabolomics study Xinyu Liu, Lina Zhou, Xianzhe Shi, Guo...

2MB Sizes 0 Downloads 51 Views

Journal Pre-proof New advances in analytical methods for mass spectrometry-based large-scale metabolomics study Xinyu Liu, Lina Zhou, Xianzhe Shi, Guowang Xu PII:

S0165-9936(19)30345-0

DOI:

https://doi.org/10.1016/j.trac.2019.115665

Reference:

TRAC 115665

To appear in:

Trends in Analytical Chemistry

Received Date: 1 June 2019 Revised Date:

7 September 2019

Accepted Date: 11 September 2019

Please cite this article as: X. Liu, L. Zhou, X. Shi, G. Xu, New advances in analytical methods for mass spectrometry-based large-scale metabolomics study, Trends in Analytical Chemistry, https:// doi.org/10.1016/j.trac.2019.115665. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

New advances in analytical methods for mass spectrometry-based large-scale metabolomics study

Xinyu Liu#, Lina Zhou#, Xianzhe Shi, Guowang Xu* CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China

#

: equal contribution.

*

Corresponding authors:

Prof. Guowang Xu, CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China. Tel./Fax: +86-411-84379530. E-mail: [email protected].

1

Abstract Large-scale metabolomics study based on large population cohorts is increasingly applied to identify important metabolites or critical metabolic alterations related to the metabolite perturbation in disease states or under interventions and to investigate metabolic difference and stimuli response in genetically different individuals. Mass spectrometry (MS) coupled with different chromatographic methods, is suitable for large-scale metabolomics study due to its high sensitivity and selectivity, wide dynamic range, and rich information. However, there are still a series of challenges for really realizing large-scale metabolomics applications. Hence, in this review, we mainly focused on new advances in sample pretreatment methods, nontargeted, targeted and pseudotargeted metabolic data collection techniques, and data correction methods used for MS-based large-scale metabolomics study. Typical applications of MS-based large-scale metabolomics methods in molecular epidemiology, precision medicine, and genome-wide association studies with metabolomics (mGWAS) are also given.

Key words: large-scale metabolomics; mass spectrometry; nontargeted; targeted; pseudotargeted

2

1. Introduction Metabolomics is defined as comprehensively quantitative and qualitative small molecular metabolites in biochemical processes within an organism simultaneously, and further to reveal metabolic response to a series of stimuli from endogenous and exogenous factors [1,2]. Large-scale metabolomics study now is being widely applied in the fields such as molecular epidemiology [3], precision medicine [4], and genome-wide association studies with metabolomics (mGWAS) [5], which is helpful to improve the comprehending of (patho-) physiological mechanism of complex disorders, as well as disease diagnosis, prevention and treatment. Considering the substantial difference among the individual phenotype, large population cohorts are required in the studies mentioned above. Therefore, it is highly desirable to develop high throughput analytical methods with high sensitivity, high information and excellent stability and repeatability, as well as robust data integration methods for large-scale metabolomics analysis. Nontargeted and targeted methods are the two typical metabolomics approaches, the former allows to obtain metabolites information with broad coverage based on high resolution mass spectrometry (HRMS), the latter focuses on quantifying known metabolites by using low resolution triple quadrupole (TQ) MS. While pseudotargeted method [6,7] combines the advantages of nontargeted and targeted methods, in which metabolite ion pairs were generated from nontargeted HRMS method, but subsequently analyzed in targeted TQMS mode. Therefore, the information coverage and quantitative accuracy are markedly improved in pseudotargeted metabolomics method. Currently, mass spectrometry (MS) and nuclear magnetic resonance spectroscopy (NMR) are main analytical tools for metabolomics. Due to high sensitivity, good mass accuracy and wide dynamic range, MS is playing more and 3

more roles

in

metabolomics

study by being hyphenated

with

different

chromatographic separation modes, such as liquid chromatography (LC) [8], gas chromatography (GC) [9,10], and capillary electrophoresis (CE) [11,12]. Furthermore, MS instruments also have a variety of mass analyzers including low resolution mass analyzer such as TQ, and high resolution mass analyzer such as quadrupole-time of flight (Q-TOF), Orbitrap and Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) [13]. The flexibility and advantages make MS technique become a main tool for metabolomics study in the nontargeted, targeted and pseudotargeted modes (Figure 1). Although metabolomics study has achieved great successes in different fields, it is still a challenge to guarantee high data quality when measuring thousands of biological samples in one study due to instrument drift (signal fluctuation and sensitivity reduction), environment fluctuation and analytical variances [14,15]. A series of issues should be covered when large-scale metabolomics is performed (Figure 2). Firstly, high efficiency sample pretreatment method is needed to increase throughput. Secondly, data collection method is critical for ensuring rich metabolic profiling information and high analytical throughput. Thirdly, data repeatability and integration among different batches and different instruments are the key factors for really realizing large-scale metabolomics study. Last but not the least, metabolite identification and biological explanation will decide whether the expected aim can be achieved. In response to these challenges, a lot of new progresses have been made in recent 10 years. Because this journal just accepted a review on metabolite identification [16], to avoid repeatability and save the space, this review will focus on summarizing new advances in analytical methods for MS-based large-scale metabolomics study, mainly focusing on high throughput sample pretreatment 4

methods, nontargeted, targeted and pseudotargeted analytical technology, and data correction methodology. In the meantime, typical applications of metabolomics methods in molecular epidemiology, precision medicine, and mGWAS are also given. On the biological explanation, especially on functional metabolomics, readers can refer to our recent review [17].

2. High throughput sample preparation methods Sample preparation is a time-consuming and tedious step before chromatographic separation and metabolomics fingerprinting acquisition, especially when managing large-scale sample set. Ideal sample pretreatment method for metabolomics needs to have high throughput, good repeatability, broad coverage of metabolites. Currently, commonly used sample preparation methods are mainly based on organic solvent-based deproteinization [18], high throughput 96-well plate [19-21] and solid phase (micro) extraction (SPE/SPME) [22]. Some advanced methods which have the potential to be used for large-scale metabolomics study have been reported. Li et al. developed an efficient sample pretreatment method based on SPE using Ostro 96-well plates, metabolome and lipidome in plasma were extracted by using acetonitrile with 1% formic acid and chloroform/methanol (2/1, v/v), respectively. This method can be used for the rapid and high-throughput extraction [19]. Ostro 96-well plate extraction has been tested for metabolome and for selected classes of lipids, but not all lipids can be extracted, and different groups of lipids require different conditions. [23]. Löfgren et al. developed a novel plasma lipid extraction method using butanol-methanol (BUME) mixture with standard 96-well robots, and lipids of 96 samples can be automatically and efficiently extracted in 60 min [20]. The BUME methods for lipid extraction in biofluids and 5

tissue samples are better than commonly used chloroform-based extraction methods [24,25]. When 96-well plate is not available two-step liquid-liquid extraction protocol is an alternative [26,27]. Chen et al. [28] established a method which enabled simultaneous extraction of metabolome and lipidome with methyl tert-butyl ether (MTBE) from a small amount of tissue sample, lipid and lipophilic compounds were mainly extracted into the MTBE layer, polar and moderate polar metabolites into the methanol/water layer, while amphoteric metabolites are distributed in both upper and lower layers. For the analysis of amphoteric metabolites, mixing non-polar (upper phase) and polar (lower phase) fractions together is necessary to ensure the metabolite coverage [28]. This method is especially useful for simultaneous metabolomics and lipidomics when only a limited amount of tissue or cell is available. Barbas et al. proposed an in-vial dual extraction (IVDE) procedure for a small amount of plasma, all the extraction was performed in one single vial and even recovering proteins for proteomics analysis is possible [29,30]. To satisfy the requirement of time-sensitive clinical applications, particularly in the emergency department, Gehrke et al. developed a rapid liquid-liquid extraction protocol (~4 min) for hydrophilic compounds, which can be applied universally to plasma, whole blood and red blood cells [31]. Besides, commercial metabolomics kits were launched by Biocrates Life Sciences AG (Innsbruck, Austria) in this decade. For example, the AbsoluteIDQ® p400 HR Kit is the standardized “targeted metabolic profiling” solution for 408 metabolites based on HR MS (www.biocrates.com). The design of this kit allows to perform all sample preparation steps directly on the 96-well plate, and provides quantification, reproducibility, and high throughput for various biological samples. At present, although there is no universal method for sample pretreatment 6

applied to all small molecular metabolites, organic solvent-based extraction combined with 96-well plate and 96-well automated handler technology are the most widely used high-throughput sample pretreatment method in large-scale metabolomics study. Automated liquid handler technology is another brilliant choice for high-throughput sample preparation, rigorous standardized operation procedures and automated liquid handler ensure good quality of sample extraction and processing.

3. MS-based metabolic profiling data collection methods 3.1 Nontargeted metabolomics methods Nontargeted method is the most characteristic and most frequently used metabolomics approach. It aims at unbiased detecting as many metabolites as possible in a biological sample in a single run, it can be performed when there is no previous knowledge of sample components [32,33]. Nontargeted metabolomics platforms are predominantly realized by coupling high-efficient

chromatographic

separations

with

the

high-sensitivity

and

high-resolution TOF or Orbitrap-MS detector in the full scan mode. Nevertheless, such coupling techniques take relatively longer analysis time, limiting the throughput, especially when a large cohort is studied. Hence, innovations in chromatographic separation have been made for improving throughput. Applications of shorter or microbore chromatographic column, higher column temperature and faster flow rate would be a good choice to shorten elution gradient, and then shorten the analysis time. Gray et al. developed a microbore UPLC−MS method to provide a high-throughput urine analytical platform by using 1 mm × 50 mm columns, analysis time was shortened to 2.5 min/sample. Compared with the common 2.1 mm × 100 mm columns, 7

detected features were approximately 19 000 and ∼6 000, respectively, the peak capacity and the ions detected were greatly reduced because of a very short analytical time [34]. To compromise the analytical speed and peak capacity, Ouyang et al. developed a 12 min of LC-MS method, which reduced about 60% analysis time and increased analytical throughput significantly. Evaluation result of separation performances and metabolite coverage of this method demonstrated that it was robust for nontargeted metabolic profiling analysis of large-scale metabolomics study [21]. Yin et al. developed a LC-MS-based method for bile acid profiling in a 14 min of analytical cycle, which also disclosed factors affecting bile acids separation and detection by LC-MS in negative mode [35]. Moreover, an on-line heart-cutting 2D-LC-MS method developed by Wang et al. realized simultaneous collection of metabolome and lipidome information in 30 min with a single injection, which is valuable for small amount of samples, as well as large-scale metabolomics studies [36]. Direct injection-mass spectrometry could be another choice for high-throughput metabolomics [37]. HRMS-based direct injection method will provide informative and high-quality data including accurate mass and isotopic distribution. Such approach enabled hundreds to thousands of metabolite fingerprints acquisition per day [38-40]. Chekmeneva et al. showed direct infusion nano-electrospray HRMS method enabled cost-efficient analysis of >2,200 urine samples in <3 weeks and >10,000 urine samples in 12 weeks with the required sensitivity and accuracy. Additionally, method validation was assessed based on linear ranges, inter- and intraday accuracy and precision, method robustness, and long-term stability. And the results were in accordance with the acceptance criteria for analytical quality, which proved that this method was suitable for population based large-scale epidemiological studies [41]. 8

While matrix effects would be a significant impact factor which hampers metabolite detection due to lack of chromatographic separation. Dilution of sample is a helpful manner to lessen matrix effects and improve data repeatability and quality. Apart from matrix effects, direct injection MS also suffers from another limitation to separate structural isomers and isobars. Nowadays, thousands of complex samples per day and matching instrument have been a reality for nontargeted metabolomics analysis, and microfluidics related techniques will likely further advance throughput in the future [42]. Frontiers of high-throughput metabolomics can be found from Fuhrer et al.’ review [42]. MS data generated in nontargeted approach are informative and conducive to structure identification of metabolites due to HRMS, but the limited linear range is a shortcoming for accurate quantification of metabolites with a wide concentration range in biological samples. In the meantime, data repeatability in day-to-day and batch-to-batch and data handling complexity especially in peak matching are also issues of nontargeted methods to be covered.

3.2 Targeted metabolomics methods Targeted analysis methods are generally used to quantify the variation of one or more metabolites in a specific pathway, usually carried out by selective ion monitoring mode (SIM) or multiple reaction monitoring (MRM) mode. Targeted approach is considered as the gold standard for absolute quantitative analysis of metabolites due to it has a wider linear range (four to six orders of magnitude), better repeatability and higher sensitivity [43,6]. Recently, a new approach named as parallel reaction monitoring (PRM), which is carried out on HRMS platforms, contributes more and more to accurate quantification of targeted metabolites (Figure 3). 9

3.2.1 TQ MS-based MRM analysis MRM based on TQ MS or hybrid triple-quadrupole-linear ion trap (QTRAP) MS is most widely used for analyzing targeted known metabolites of interest (Figure 3). Ion pair transitions (precursor ion to product ion) and their corresponding MS parameters (collision energy (CE), etc.) are important, and should be optimized to obtain good sensitivity and selectivity. Many high throughput targeted methods have been developed for metabolomics study. Wei et al. combined three different chromatographic column conditions and two ionization modes to guarantee metabolite coverage, separation and sensitivity of detected metabolites, 205 metabolites including amino acids, nucleic acids, sugar and organic acids were detected in 10 min [44]. Gu et al. described a globally optimized targeted method to analyze 160 biologically important metabolites [45]. And Yuan et al. analyzed 258 metabolites in 15 min with positive/negative ion switching based on HILIC-MS, which are also suitable for different biological sample types [46]. A novel stepwise multiple ion monitoring-enhanced product ion (stepwise MIM-EPI) method has been developed for quantification of 277 metabolites in rice metabolomics, and the construction of MS2 spectral tag library not only facilitated metabolite annotation, but also enlarged metabolites coverage [47]. Compared with classic target strategies, metabolites analyzed in advanced targeted metabolomics approach mentioned above have been enlarged to hundreds in single analysis. While absolute quantification of targeted metabolites needs reference standards and corresponding internal standard compounds, as well as comprehensive method validation procedure. Fast scan speed and fast switching speed between positive and negative modes are the two main advantages of TQ/QTRAP instruments, therefore, it is suitable for developing targeted metabolomics method used for large-scale study. The 10

disadvantages are that transition selection and MS parameter optimization for the targeted metabolites are time-consuming, and the metabolites with the same MRM transition can’t be resolved.

3.2.2 HRMS-based PRM analysis The MRM method has been considered as the gold standard of metabolite quantification for its fast scan speed and stable analytical performance, but it still has un-avoidable drawbacks due to low mass resolution. PRM is a new approach that can be carried out on HRMS platforms. In this approach, metabolite precursor is selected by Q1 and then fragmented at the HCD mode, followed by parallel scan of MS/MS fragments in a HR mass spectrometer (Figure 3). As a new attractive strategy, PRM has shown reliable performance in targeted peptide and metabolite quantification analysis. Peterson and Gallien’s work on targeted peptide PRM method demonstrated such approach had a good measurement accuracy, wide dynamic range and reliable run-to-run reproducibility [48,49], in total 770 tryptic yeast peptides have been analyzed in 60 min [49]. PRM-based targeted strategy for metabolite detection has also been developed on Q-Exactive LC-MS system, which enabled simultaneous quantification of 237 polar metabolites with good quantitative accuracy and data reproducibility. The comparison between PRM and MRM measurements were performed, the results indicated that PRM presented higher mass precision and more comprehensive MS2 information than MRM measurement on QTRAP 6500 [50]. MRM and PRM methods have their own advantages and disadvantages. The most significant advantages of PRM assays are that metabolite identification is verified by higher mass accuracy due to PRM performed based on high resolution instrument, which also facilitates separation of metabolites at MS2 dimension. While traditional 11

MRM measurement carried out on TQ MS is generally unit mass resolution without informative full scan MS1 and MS2. Disadvantages of PRM are relatively low spectra scan speed and slow polarity switching. Hence,PRM and MRM are two excellent complementary tools for accurate quantification of large-scale metabolomics. Targeted detection method is one of the most crucial methods in MS-based metabolomics, especially for absolute quantification. However, major limitation of such approach is the less metabolome coverage, often focusing on a small to medium amounts of known metabolites. Therefore, it is necessary and valuable to develop a new method for broadening metabolome coverage, including unknown metabolite detection.

3.3 Pseudotargeted metabolomics method To integrate the advantages and avoid the disadvantages of targeted and nontargeted methods, pseudotargeted metabolomics methods were launched based on GC-MS and LC-MS in 2012 and 2013, respectively [6,7]. This type of methods has a higher selectivity and sensitivity, wider linear range, better quantitative accuracy, especially broader metabolome coverage. Pseudotargeted method aims to acquire ion pairs from pooled biological samples in the full scan nontargeted mode and to measure as many metabolites as possible in individual studied samples in the targeted MRM mode. The general workflow of pseudotargeted method is as follows. Firstly, comprehensive full scan information of pooled biological sample is acquired using HRMS,

meanwhile

MS2

fragmentation

information

is

collected

at

information-dependent or data-dependent acquisition (IDA or DDA) mode. Different 12

CE voltages covering low, medium and high energy levels (eg. 15 V, 30 V and 45 V or 20 V, 40 V and 60 V) are set in parallel LC-MS injections. In this step, sample preparation takes around 4 hours, and LC-MS analysis for positive and negative modes needs 30 and 25 min, respectively. Secondly, precursor ion and its characteristic product ions are selected as MRM ion pairs of known and unknown metabolites. Utilization of MRM-Ion Pair Finder software significantly speeds up this step (<1 hour). The details of this key procedure will be given below. Then MRM transitions are defined after optimizing collision energy (CE) of metabolites (around 10 hours). Finally, the abundances of metabolites in individual biological samples are collected by using MRM mode of TQ MS [51]. Quantitative performance of the developed pseudotargeted metabolomics method is evaluated by linearity, stability, intra-day repeatability and inter-day repeatability, method evaluation needs extra 3 days. The key, but time-consuming aspect of this approach was to extract the characteristic ion pairs of metabolites from high-complexity raw MS data to generate MRM transitions. To solve this issue, Luo et al. developed a novel strategy for defining MRM ion pairs (Figure 4), IDA based experiment under different CE voltages (20V, 40V and 60V) was firstly carried out by Triple TOF 5600+ MS to obtain comprehensive MS2 information, then home-developed MRM-Ion Pair Finder software was innovatively developed and applied to select the daughter ion with four steps including precursor ions alignment, extraction and reduction of MS2 spectrum, selection of characteristic product ion, and ion fusion. Eventually, the most intensive parent ion and its corresponding CE value would be defined to generate MRM transitions [51]. However, IDA-based method was in favor of acquiring MS2 spectra of ions with relative high intensity, which would limit MRM ion pair 13

construction of metabolites with low abundance. While Sequential Windowed Acquisition of All Theoretical Fragment Ion (SWATH) is an emerging MS acquisition technology to theoretically obtain comprehensive fragmental information on all precursor ions. Wang et al. developed a novel ion pair selection method based on SWATH technology with variable Q1 isolation windows and various CE voltages. Compared with IDA MS mode,additional 253 ion pairs have been identified in SWATH method, which indicated that SWATH could provide richer fragmental information of detected metabolites, especially for metabolites with low abundance [52]. By integrating the detected lipids from nontargeted HRMS lipidomics analysis of multiple matrices (e.g., plasma, cell, and tissue) and the predicted lipids speculated on the basis of the structure and chromatographic retention behavior of the known lipids, a total of 3377 lipid ion pairs covering around 7000 lipid molecular structures across 19 lipid subclasses were involved in the pseudotargeted lipidomics method [53]. Similar to pseudotargeted method, widely targeted metabolomics method was also proposed [54,55]. In the pseudotargeted method, ion pair information of MRM is mainly from MS2 of pooled samples, while in widely targeted method it is mainly from the standards. Pseudotargeted analysis is established based on pooled biological samples, combined with comprehensive acquisition of MS1 and MS2 information, which eliminated the reliance on standards when the method is developed. Such approach retains the advantages of targeted method, meanwhile expands metabolome coverage significantly, and can be used to detect known and unknown metabolites simultaneously. Therefore, pseudotargeted method is the most potential technology for large-scale sample analysis with its excellent quantitation capability [56]. 14

Pseudotargeted method was also used for the sensitive analysis of protein phosphorylation in protein complexes [56].

4. Data correction in large-scale sample analysis Large-scale metabolomics study with the analysis of at least hundreds of samples will take a long continuous analysis time from days to Months. It is only practical if analytical sequences can achieve high analytical stability and repeatability. However, due to limited chromatographic column life and changed mass spectrometry sensitivity as well as environment variances during long-time analysis, systematic and gross errors are frequently observed in the data generated from different analytical sequences and batches [56]. Factors which would affect analytical repeatability in different batches should be seriously considered. There have been a lot of protocols suggested as ‘good practice’ for LC-MS and GC-MS-based large-scale metabolomics [56-59]. In these methods, the pooled quality control (QC) sample was used to evaluate the robustness of the analytical sequence and correct analytical variability [56]. QC sample selection, QC insertion frequency, as well as data calibration and integration method are the successful keys. Generally, pooled real sample was recommended as quality control sample (batchQC) equally inserted in sequence for analytical repeatability and stability evaluation. While for large-scale metabolomics study, it is difficult to pool QC sample from all real samples before analysis start [60]. Consequently, alternative sample generated from laboratory quality control (LabQC) samples belonging to the same type was utilized to monitor analytical sequence. Luo et al. compared the effectiveness of batchQC and LabQC samples in monitoring analysis stability. Data 15

demonstrated that although metabolite concentrations were different in two different types of QCs, majority of detected ion features were steadily measured in both QC samples, and they shared similar CV distribution in detected metabolites, signal drifts calibration results based on LabQC were also as good as those based on batchQC [56]. Hence, LabQC would be a better choice for large-scale metabolomics study since it can be prepared in considerable amount prior to sample analysis. More importantly, LabQC can also be used to monitor the stability and repeatability of the analysis sequence effectively. In theory, QC sample can be used to monitor stability of analytical sequence in real time, and the higher QC insertion frequency perhaps contributed to the better data results after signal drift calibration [14]. But frequent QC insertion reduced analytical throughput, whereas increased instrument deterioration and economic consuming. Hence, compromise should be made between QC insertion frequency associated correction effect and high analytical throughput in large-scale study. Dunn suggested that QC sample inserted at every fifth injection in the nontargeted method-based large-scale study [14]. Our group investigated the influence of different QC insertion frequency on data stability and correction effect in pseudotargeted analytical technology. The result demonstrated that different QC insertion frequencies (every 5, 10 or 15 samples with one QC insertion) made no significant difference in data stability and correction effect. Therefore, pseudotargeted analytical technology is feasible to reduce QC insertion frequency (one QC in every 10-15 sample injections) without compromising data quality and correction effect, and to increase the analytical throughput significantly [56]. In many cases, it is necessary to divide large amount samples set into several batches of small-scale samples to obtain high quality metabolomics data and/or 16

improve the speed by analyzing them in several instruments. Therefore, another important issue in large-scale metabolomics study is the integration of metabolomics data from different analytical batches and different analytical instruments. Total intensity signal (TIS) correction and internal standard (IS) normalization methods are the most widely used methods in metabolomics data correction [10,61]. Good correction result could be achieved when all measured metabolites display similar change tendency with TIS or IS. However, in fact it is impossible for hundreds to thousands of detected metabolite features to show similar change pattern to TIS or IS. Hence, integration of data from different batches couldn’t be achieved by a simple TIS or IS correction. In order to solve this problem, novel strategies of feature-based correction have attracted much more attention and made notable progresses. QC samples could be innovatively used as external standards for calibration of each ion features from all real samples hence to minimize variations between and in the batches [60,62-64]. The local or global mean intensity of QCs (LoMec and GoMec), as well as the local or global linear regression of QC intensity values (LoReg and GoRec) were commonly employed to develop a factor for each feature correction and thus to eliminate systematic bias [56,62]. Zhao et al. provided a novel tactic for removing both gross and systematic errors in large-scale metabolomics study, feature ratio between two adjacent QCs was utilized to construct statistical and fitting models for gross error calibration, while systematic bias was removed by virtual QC of each sample which generated through fitting model based on feature intensities between neighbor QCs [65]. Such method could not only improve data quality significantly, but also make metabolomics data integration from multiple analytical batches and different GC-MS instruments possible. Recently, Thonusin et al. developed an Excel-based tool (MetaboDrift) to evaluate and correct visually for intensity drift in a 17

multi-batch LC-MS dataset [66]. Based on the wavelet transform method with independent component analysis, Deng et al. developed a WaveICA method to find and remove batch effects for large-scale metabolomics data [67]. As a representative work, Vaughan et al. developed calibration transfer models which can be used to merge the data from 2 LC-MS platforms [68]. The development of data correction method enables the integration and comparison of large-scale metabolomics data, and breaks through the bottleneck of required data processing. Figure 5 is a typical workflow for large-scale metabolomics including steps of development, optimization and application [65]. The combination of reliable data collection technique, pooled QC insertion with a suitable frequency, blank-wash of pre- and post-acquisition, can extend analytical time of a single batch and further be applied to multiple batches of samples, the final results are promising for a large-scale analysis.

5. Typical applications of metabolomics in population-based cohorts Metabolomics is often used to identify important metabolites or critical metabolic alterations related to the metabolite perturbation in disease states or under interventions, etc. Due to the diversity in physicochemical properties of the human metabolome, multiple analytical techniques need to be employed to cover relative comprehensive information in the complex metabolism networks. Considering the feasibility and the metabolic targets to be investigated, the most robust and compromise metabolomics approach is often chosen for large-cohort experiments. With the above technical progresses, large-scale population-based metabolomics studies have been carried out (Table 1), which have facilitated identification of potential diagnosis markers and risk factors for metabolic syndrome [69,70], cancer 18

[71,72], exploration of individuality in drug metabolism [73] and nutritional epidemiology [74,75], etc. Here, we focus on the advances in how large-scale metabolomics studies aid to screen risk factors or early clinical diagnosis biomarkers, and identify key metabolic pathways and understand associated mechanisms of disease development at the population level in molecular epidemiology, precision medicine, and mGWAS.

5.1 Identification of clinical diagnosis or risk biomarkers for metabolic diseases and cancers A variety of large-scale metabolomics studies have been performed to identify potential risk markers of metabolic diseases and cancers. Related studies on pre-diabetes and type 2 diabetes (T2D) have been systematically reviewed by Liggi et al. [69] and Park et al.[76]. In the former review each large-scale metabolomics study included the descriptions of research aims, epidemiology cohort for sampling and the number of samples, metabolomics detection method, the derived important findings and related results replicated in other studies, etc. The later carried out a systematic review and meta-analysis on diagnostic indicators from large-scale cohort studies. Large-scale longitudinal metabolomics study of gestational diabetes mellitus (GDM) was also performed, 8 metabolites (leucine/isoleucine, glutamic acid, serine, proline, ornithine, tyrosine, uric acid and lysophosphatidylcholine (LPC (20:4)) might have contributed to the development of GDM [77]. Cardiovascular disease is also a hot field for larger-scale metabolomics [70,78,79]. One shotgun lipidomics research was performed on 685 plasma samples of the prospective population-based Bruneck study by profiling 135 lipid species from 8 different lipid classes, it was found that PE (36:5), CE (16:1) and TAG (54:2) 19

improved the risk prediction and classification of CHD [80]. Based on targeted blood metabolomics measurements on two large German prospective cohorts, alterations in phosphatidylcholine and sphingomyelin metabolism, particularly metabolites of the arachidonic acid pathway were identified independently associated with higher incidence of myocardial infarction in initially healthy adults [81]. To identify at-risk population of CHD among 1,028 individuals (131 CHD events at median 10th year’s follow-up) and 1,670 individuals (282 CHD events at the median 3.9 year’s follow-up), a nontargeted metabolomics study based on UPLC-MS was performed. Four circulating metabolites, lysophosphatidylcholine 18:1 (LPC 18:1), LPC 18:2, sphingomyelin 28:1 and monoglyceride 18:2 (MG 18:2) were independent of main cardiovascular risk factors [70]. Cancer not only is a genetic-related disease, but also is a metabolic disease. Therefore, metabolic marker discovery from the large-scale metabolomics study of cohorts has attracted scientists’ attention [82-84]. Our group collected sera from 1,448 subjects including healthy controls and patients with chronic hepatitis B virus infection, liver cirrhosis, and HCC from 6 centers in China, and used UPLC-MS based metabolomics methods to screen and validate the HCC biomarkers, phenylalanyl-tryptophan and glycocholate were found to exhibit a good diagnostic performance for the early detection of HCC from at-risk populations [82]. Stepien et al. investigated circulating amino acids (AA), biogenic amines and hexoses in a large prospective cohort to see whether they were related to the onset and development of hepatobiliary cancer. Lysine, leucine , glutamine and the ratio of branched chain to aromatic

AA were

inversely,

while

glutamate,

glutamate/glutamine

ratio,

phenylalanine, tyrosine and their ratio, kynurenine and its ratio to tryptophan were positively associated with HCC risk [85]. Chajes et al. investigated the pre-diagnostic 20

plasma phospholipid fatty acid concentrations and risk of breast and gastric cancers on two case-control studies nested in the European prospective investigation on cancer and nutrition (EPIC) cohorts. The differential fatty acids presumably reflecting dietary pattern may be related to both increased risk of gastric and breast cancers [86,87]. More applications on ovarian cancer, breast cancer, bladder cancer and rheumatoid arthritis can be seen in Table 1, the sample size criteria used for inclusion in each study is >400.

5.2 Understand associated mechanisms of metabolic diseases Not only contributing to focus on important risk factors, integrative endeavors on metabolomics

studies

and

genome-wide

association study

(GWAS)

or

epigenome-wide association studies can also provide biological insight and help better understand associated onset and development mechanisms of diseases. Recent studies have integrated GWAS data of large cohorts with metabolomic data to explore the cross-correlations of genetic polymorphisms with metabolic alterations. A nontargeted metabolomics research was performed on 1744 African Americans who were free of heart failure (HF) at their baseline examination in the Atherosclerosis Risk in Communities (ARIC) study [88]. Six known metabolites, including dihydroxy docosatrienoic acid, pyroglutamine, and X-11787 (being either hydroxy-leucine or hydroxy-isoleucine) were identified as risk factors for incident HF. To further identify novel genetic factors that are associated with the above three metabolites, GWAS data from 1260 African-Americans free of HF at the baseline were analyzed and three SNPs were identified associated with the three reported HF-related metabolites [89]. By integrating genotyping and targeted plasma betaine detection, genetic risk 21

variants related to coronary artery disease (CAD) and their influences on plasma betaine levels were investigated in two-stage human cohort studies [90], two locus on chromosomes 2q34 and 5q14.1 were significantly associated with betaine levels, and the variant on 2q24 (rs715) also exhibited a very significant female-specific decreased risk of CAD. A genome-wide genotyping and plasma metabolic profiling were also performed in 2076 participants of the Framingham Heart Study (FHS). This cohort was characteristic of family-based structure and rich in cardiometabolic phenotyping, which finally helped demonstrate the higher contributions of inherited factors on the metabolome than that of clinical covariates. Thirty-one genetic locus were identified associated with plasma metabolites, including 8 locus-metabolite associations previously reported. Especially, a new role of AGXT2 has been found in regulating cholesterol ester and triacylglycerol metabolism [91]. In

addition

to

the

integrated

studies

of

GWAS

and

metabolomics,

methylome-metabotype associations were also investigated by analyzing metabolic profile and DNA methylation in human blood from 1814 participants of the Kooperative Gesundheitsforschung in der Region Augsburg (KORA) population study [92]. Though DNA methylation was found playing a critical role in regulating metabolism, the effects of DNA methylation on metabotypes were weaker than what the authors previously reported for associations of genotype with metabotypes. And it is not easy to interpret the causality between methylation and metabolic traits.

6. Future perspectives In the future, with the significant progresses of MS and chromatography, large-scale metabolomics study will increasingly be applied in molecular 22

epidemiology, precision medicine, and mGWAS. From the analytical point of view, advanced sample preparation based on high throughput 96-well plate as well as automated robot combined with rapid chromatographic separation conditions will save sample pretreatment time and significantly increase analytical throughput. Different kinds of liquid handlers with robotization and automatization are helpful to provide better reproducible and reliable results. Micro separation techniques including microfluidics will likely further advance throughput. Combined with derivatization [93] and stable isotope labeling, the quantitation sensitivity for trace metabolites will be greatly improved. In the meantime, the improvement in sensitivity and response stability of MS is one of key factors, MS and chromatography should further increase the antipollution ability to reduce matrix influences from the biological samples. It can also be expected that as the second generation of metabolomics technique, pseudotargeted metabolomics is helpful to fill the gap between nontargeted and traditional targeted metabolomics methods, and will be more applied in clinical biomarker discovery, basic analysis of targeted gene function, as well as quantitative systems biology. On the other hand, artificial intelligence (deep learning) needs to be considered for handling measured metabolic profiling data, it will play very important roles in identifying MS features and increasing the data comparability and repeatability from different batches and instruments in large-scale metabolomics. To integrate GWAS and metabolomics data and expand our knowledge on the genetic effects on circulating metabolites, the next challenge will still be to harmonize and standardize some of metabolome detection methods so that large-cohort metabolomics data can be compared across different research projects and different laboratories in the same way as the current GWAS studies do. 23

Acknowledgements This work was funded by the National Key Research and Development Program of China (2017YFC0906900), the foundation projects (No. 21705147 and No. 21876169) from the National Natural Science Foundation of China and the innovation program (DICP TMSR201601) of science and research from the DICP, CAS.

References 1. Nicholson JK, Lindon JC, Holmes E (1999) 'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 29 (11):1181-1189 2. Fiehn O, Kopka J, Dormann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolite profiling

for

plant

functional

genomics.

Nature

Biotechnology

18

(11):1157-1161.

doi:10.1038/81137 3. Nicholson JK, Holmes E, Kinross JM, Darzi AW, Takats Z, Lindon JC (2012) Metabolic phenotyping

in

clinical

and

surgical

environments.

Nature

491

(7424):384-392.

doi:10.1038/nature11708 4. Clish CB (2015) Metabolomics: an emerging but powerful tool for precision medicine. Cold Spring Harbor molecular case studies 1 (1):a000588-a000588. doi:10.1101/mcs.a000588 5. Dumas M-E, Kinross J, Nicholson JK (2014) Metabolic Phenotyping and Systems Biology Approaches to Understanding Metabolic Syndrome and Fatty Liver Disease. Gastroenterology 24

146 (1):46-62. doi:10.1053/j.gastro.2013.11.001 6. Chen S, Kong H, Lu X, Li Y, Yin P, Zeng Z, Xu G (2013) Pseudotargeted Metabolomics Method and Its Application in Serum Biomarker Discovery for Hepatocellular Carcinoma Based on Ultra High-Performance Liquid Chromatography/Triple Quadrupole Mass Spectrometry. Analytical chemistry 85 (17):8326-8333. doi:10.1021/ac4016787 7. Li Y, Ruan Q, Li Y, Ye G, Lu X, Lin X, Xu G (2012) A novel approach to transforming a non-targeted metabolic profiling method to a pseudo-targeted method using the retention time locking gas chromatography/mass spectrometry-selected ions monitoring. Journal of Chromatography A 1255:228-236. doi:10.1016/j.chroma.2012.01.076 8. Cajka T, Fiehn O (2016) Increasing lipidomic coverage by selecting optimal mobile-phase modifiers in LC-MS of blood plasma. Metabolomics 12 (2). doi:10.1007/s11306-015-0929-x 9. Tsugawa H, Tsujimoto Y, Sugitate K, Sakui N, Nishiumi S, Bamba T, Fukusaki E (2014) Highly sensitive and selective analysis of widely targeted metabolomics using gas chromatography/triple-quadrupole

mass

spectrometry.

Journal

of

Bioscience

and

Bioengineering 117 (1):122-128. doi:10.1016/j.jbiosc.2013.06.009 10. Zhao Y, Zhao C, Lu X, Zhou H, Li Y, Zhou J, Chang Y, Zhang J, Jin L, Lin F, Xu G (2013) Investigation of the Relationship between the Metabolic Profile of Tobacco Leaves in Different Planting Regions and Climate Factors Using a Pseudotargeted Method Based on Gas Chromatography/Mass Spectrometry. Journal of proteome research 12 (11):5072-5083. doi:10.1021/pr400799a 25

11. Garcia A, Godzien J, Lopez-Gonzalvez A, Barbas C (2017) Capillary electrophoresis mass spectrometry

as

a

tool

for

untargeted

metabolomics.

Bioanalysis

9

(1):99-130.

doi:10.4155/bio-2016-0216 12. Soga T (2007) Capillary electrophoresis-mass spectrometry for metabolomics. Methods in molecular biology (Clifton, NJ) 358:129-137. doi:10.1007/978-1-59745-244-1_8 13. Zhou J, Yin Y (2016) Strategies for large-scale targeted metabolomics quantification by liquid

chromatography-mass

spectrometry.

Analyst

141

(23):6362-6373.

doi:10.1039/c6an01753c 14. Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N, Brown M, Knowles JD, Halsall A, Haselden JN, Nicholls AW, Wilson ID, Kell DB, Goodacre R, Human Serum Metabolome HC (2011) Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols 6 (7):1060-1083. doi:10.1038/nprot.2011.335 15. Dunn WB, Wilson ID, Nicholls AW, Broadhurst D (2012) The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. Bioanalysis 4 (18):2249-2264. doi:10.4155/bio.12.204 16. Nash WJ, Dunn WB (2018) From mass to metabolite in human untargeted metabolomics: Recent advances in annotation of metabolites applying liquid chromatography-mass spectrometry

data.

TrAC

Trends

in

Analytical

Chemistry.

doi:https://doi.org/10.1016/j.trac.2018.11.022 26

17. Yan M, Xu G (2018) Current and future perspectives of functional metabolomics in disease studies–A

review.

Analytica

chimica

acta

1037:41-54.

doi:https://doi.org/10.1016/j.aca.2018.04.006 18. Raterink R-J, Lindenburg PW, Vreeken RJ, Ramautar R, Hankemeier T (2014) Recent developments in sample-pretreatment techniques for mass spectrometry-based metabolomics. Trac-Trends in Analytical Chemistry 61:157-167. doi:10.1016/j.trac.2014.06.003 19. Li Y, Zhang Z, Liu X, Li A, Hou Z, Wang Y, Zhang Y (2015) A novel approach to the simultaneous extraction and non-targeted analysis of the small molecules metabolome and lipidome using 96-well solid phase extraction plates with column-switching technology. Journal of Chromatography A 1409:277-281. doi:10.1016/j.chroma.2015.07.048 20. Lofgren L, Stahlman M, Forsberg G-B, Saarinen S, Nilsson R, Hansson GI (2012) The BUME method: a novel automated chloroform-free 96-well total lipid extraction method for blood plasma. Journal of Lipid Research 53 (8):1690-1700. doi:10.1194/jlr.D023036 21. Ouyang Y, Tong H, Luo P, Kong H, Xu Z, Yin P, Xu G (2018) A high throughput metabolomics method and its application in female serum samples in a normal menstrual cycle

based

on

liquid

chromatography-mass

spectrometry.

Talanta

185:483-490.

doi:10.1016/j.talanta.2018.03.087 22.

Mousavi

F,

Microextraction-Liquid Metabolomics.

Bojko

B,

Pawliszyn

Chromatography-Mass

Methods

in

molecular

J

(2019)

High-Throughput

Spectrometry biology

for

(Clifton,

Solid-Phase

Microbial NJ)

Untargeted

1859:133-152. 27

doi:10.1007/978-1-4939-8757-3_7 23. Patterson RE, Ducrocq AJ, McDougall DJ, Garrett TJ, Yost RA (2015) Comparison of blood plasma sample preparation methods for combined LC-MS lipidomics and metabolomics. Journal of Chromatography B-Analytical Technologies in the Biomedical and Life Sciences 1002:260-266. doi:10.1016/j.jchromb.2015.08.018 24. Lofgren L, Forsberg G-B, Stahlman M (2016) The BUME method: a new rapid and simple chloroform-free method for total lipid extraction of animal tissue. Scientific Reports 6. doi:10.1038/srep27688 25. Cruz M, Wang M, Frisch-Daiello J, Han X (2016) Improved Butanol-Methanol (BUME) Method by Replacing Acetic Acid for Lipid Extraction of Biological Samples. Lipids 51 (7):887-896. doi:10.1007/s11745-016-4164-7 26. Liu R, Chou J, Hou S, Liu X, Yu J, Zhao X, Li Y, Liu L, Sun C (2018) Evaluation of two-step liquid-liquid extraction protocol for untargeted metabolic profiling of serum samples to achieve broader metabolome coverage by UPLC-Q-TOF-MS. Analytica chimica acta 1035:96-107. doi:10.1016/j.aca.2018.07.034 27. Wu H, Southam AD, Hines A, Viant MR (2008) High-throughput tissue extraction protocol for

NMR-

and MS-based metabolomics. Analytical

Biochemistry

372

(2):204-212.

doi:10.1016/j.ab.2007.10.002 28. Chen S, Hoene M, Li J, Li Y, Zhao X, Haering H-U, Schleicher ED, Weigert C, Xu G, Lehmann R (2013) Simultaneous extraction of metabolome and lipidome with methyl tert-butyl 28

ether

from

a

single

chromatography/mass

small

tissue

spectrometry.

sample Journal

for of

ultra-high

performance

Chromatography

A

liquid

1298:9-16.

doi:10.1016/j.chroma.2013.05.019 29. Whiley L, Godzien J, Ruperez FJ, Legido-Quigley C, Barbas C (2012) In-Vial Dual Extraction for Direct LC-MS Analysis of Plasma for Comprehensive and Highly Reproducible Metabolic Fingerprinting. Analytical chemistry 84 (14):5992-5999. doi:10.1021/ac300716u 30. Godzien J, Ciborowski M, Grace Armitage E, Jorge I, Camafeita E, Burillo E, Luis Martin-Ventura J, Ruperez FJ, Vazquez J, Barbas C (2016) A Single In-Vial Dual Extraction Strategy for the Simultaneous Lipidomics and Proteomics Analysis of HDL and LDL Fractions. Journal of proteome research 15 (6):1762-1775. doi:10.1021/acs.jproteome.5b00898 31. Gehrke S, Reisz JA, Nemkov T, Hansen KC, D'Alessandro A (2017) Characterization of rapid extraction protocols for high-throughput metabolomics. Rapid Communications in Mass Spectrometry 31 (17):1445-1452. doi:10.1002/rcm.7916 32. Shao Y, Zhu B, Zheng R, Zhao X, Yin P, Lu X, Jiao B, Xu G, Yao Z (2015) Development of Urinary Pseudotargeted LC-MS-Based Metabolomics Method and Its Application in Hepatocellular Carcinoma Biomarker Discovery. Journal of proteome research 14 (2):906-916. doi:10.1021/pr500973d 33. Rojo D, Barbas C, Ruperez FJ (2012) LC-MS metabolomics of polar compounds. Bioanalysis 4 (10):1235-1243. doi:10.4155/bio.12.100 34. Gray N, Adesina-Georgiadis K, Chekmeneva E, Plumb RS, Wilson ID, Nicholson JK 29

(2016).Development of a Rapid Microbore Metabolic Profiling Ultraperformance Liquid Chromatography-Mass Spectrometry Approach for High-Throughput Phenotyping Studies. Analytical chemistry 88 (11):5742-5751. doi:10.1021/acs.analchem.6b00038 35. Yin S, Su M, Xie G, Li X, Wei R, Liu C, Lan K, Jia W (2017) Factors affecting separation and detection of bile acids by liquid chromatography coupled with mass spectrometry in negative

mode.

Analytical

and

Bioanalytical

Chemistry

409

(23):5533-5545.

doi:10.1007/s00216-017-0489-1 36. Wang S, Zhou L, Wang Z, Shi X, Xu G (2017) Simultaneous metabolomics and lipidomics analysis based on novel heart-cutting two-dimensional liquid chromatography-mass spectrometry. Analytica chimica acta 966:34-40. doi:10.1016/j.aca.2017.03.004 37. Rathahao-Paris E, Alves S, Boussaid N, Picard-Hagen N, Gayrard V, Toutain P-L, Tabet J-C, Rutledge DN, Paris A (2019) Evaluation and validation of an analytical approach for high-throughput metabolomic fingerprinting using direct introduction-high-resolution mass spectrometry: Applicability to classification of urine of scrapie-infected ewes. European journal of mass spectrometry (Chichester, England) 25 (2):251-258. doi:10.1177/1469066718806450 38. Draper J, Lloyd AJ, Goodacre R, Beckmann M (2013) Flow infusion electrospray ionisation mass spectrometry for high throughput, non-targeted metabolite fingerprinting: a review. Metabolomics 9 (1):S4-S29. doi:10.1007/s11306-012-0449-x 39. Fuhrer T, Heer D, Begemann B, Zamboni N (2011) High-Throughput, Accurate Mass Metabolome Profiling of Cellular Extracts by Flow Injection - Time-of-Flight Mass Spectrometry. 30

Analytical chemistry 83 (18):7074-7080. doi:10.1021/ac201267k 40. Link H, Fuhrer T, Gerosa L, Zamboni N, Sauer U (2015) Real-time metabolome profiling of the metabolic switch between starvation and growth. Nature Methods 12 (11):1091-1097. doi:10.1038/nmeth.3584 41. Chekmeneva E, dos Santos Correia G, Chan Q, Wijeyesekera A, Tin A, Young JH, Elliott P, Nicholson JK, Holmes E (2017) Optimization and Application of Direct Infusion Nanoelectrospray HRMS Method for Large-Scale Urinary Metabolic Phenotyping in Molecular Epidemiology.

Journal

of

proteome

research

16

(4):1646-1658.

doi:10.1021/acs.jproteome.6b01003 42. Fuhrer T, Zamboni N (2015) High-throughput discovery metabolomics. Current Opinion in Biotechnology 31:73-78. doi:https://doi.org/10.1016/j.copbio.2014.08.006 43. Vrhovsek U, Masuero D, Gasperotti M, Franceschi P, Caputi L, Viola R, Mattivi F (2012) A Versatile Targeted Metabolomics Method for the Rapid Quantification of Multiple Classes of Phenolics in Fruits and Beverages. Journal of agricultural and food chemistry 60 (36):8831-8840. doi:10.1021/jf2051569 44. Wei R, Li G, Seymour AB (2010) High-Throughput and Multiplexed LC/MS/MRM Method for Targeted Metabolomics. Analytical chemistry 82 (13):5527-5533. doi:10.1021/ac100331b 45. Gu H, Zhang P, Zhu J, Raftery D (2015) Globally Optimized Targeted Mass Spectrometry: Reliable

Metabolomics

Analysis

with

Broad

Coverage.

Analytical

chemistry

87

(24):12355-12362. doi:10.1021/acs.analchem.5b03812 31

46. Yuan M, Breitkopf SB, Yang X, Asara JM (2012) A positive/negative ion-switching, targeted mass spectrometry-based metabolomics platform for bodily fluids, cells, and fresh and fixed tissue. Nature Protocols 7 (5):872-881. doi:10.1038/nprot.2012.024 47. Chen W, Gong L, Guo Z, Wang W, Zhang H, Liu X, Yu S, Xiong L, Luo J (2013) A Novel Integrated Method for Large-Scale Detection, Identification, and Quantification of Widely Targeted Metabolites: Application in the Study of Rice Metabolomics. Molecular Plant 6 (6):1769-1780. doi:10.1093/mp/sst080 48. Peterson AC, Russell JD, Bailey DJ, Westphall MS, Coon JJ (2012) Parallel Reaction Monitoring for High Resolution and High Mass Accuracy Quantitative, Targeted Proteomics. Molecular & Cellular Proteomics 11 (11):1475-1488. doi:10.1074/mcp.O112.020131 49. Gallien S, Duriez E, Crone C, Kellmann M, Moehring T, Domon B (2012) Targeted Proteomic Quantification on Quadrupole-Orbitrap Mass Spectrometer. Molecular & Cellular Proteomics 11 (12):1709-1723. doi:10.1074/mcp.O112.019802 50. Zhou J, Liu H, Liu Y, Liu J, Zhao X, Yin Y (2016) Development and Evaluation of a Parallel Reaction Monitoring Strategy for Large-Scale Targeted Metabolomics Quantification. Analytical chemistry 88 (8):4478-4486. doi:10.1021/acs.analchem.6b00355 51. Luo P, Dai W, Yin P, Zeng Z, Kong H, Zhou L, Wang X, Chen S, Lu X, Xu G (2015) Multiple Reaction Monitoring-Ion Pair Finder: A Systematic Approach To Transform Nontargeted Mode to Pseudotargeted Mode for Metabolomics Study Based on Liquid Chromatography-Mass Spectrometry. Analytical chemistry 87 (10):5050-5055. doi:10.1021/acs.analchem.5b00615 32

52. Wang L, Su B, Zeng Z, Li C, Zhao X, Lv W, Xuan Q, Ouyang Y, Zhou L, Yin P, Peng X, Lu X, Lin X, Xu G (2018) Ion-Pair Selection Method for Pseudotargeted Metabolomics Based on SWATH MS Acquisition and Its Application in Differential Metabolite Discovery of Type 2 Diabetes. Analytical chemistry 90 (19):11401-11408. doi:10.1021/acs.analchem.8b02377 53. Xuan Q, Hu C, Yu D, Wang L, Zhou Y, Zhao X, Li Q, Hou X, Xu G (2018) Development of a High Coverage Pseudotargeted Lipidomics Method Based on Ultra-High Performance Liquid Chromatography-Mass

Spectrometry.

Analytical

chemistry

90

(12):7608-7616.

doi:10.1021/acs.analchem.8b01331 54. Sawada Y, Akiyama K, Sakata A, Kuwahara A, Otsuki H, Sakurai T, Saito K, Hirai MY (2008) Widely Targeted Metabolomics Based on Large-Scale MS/MS Data for Elucidating Metabolite Accumulation Patterns in Plants. Plant and Cell Physiology 50 (1):37-47. doi:10.1093/pcp/pcn183 55. Tsugawa H, Arita M, Kanazawa M, Ogiwara A, Bamba T, Fukusaki E (2013) MRMPROBS: A Data Assessment and Metabolite Identification Tool for Large-Scale Multiple Reaction Monitoring Based Widely Targeted Metabolomics. Analytical chemistry 85 (10):5191-5199. doi:10.1021/ac400515s 56. Luo P, Yin P, Zhang W, Zhou L, Lu X, Lin X, Xu G (2016) Optimization of large-scale pseudotargeted metabolomics method based on liquid chromatography-mass spectrometry. Journal of Chromatography A 1437:127-136. doi:10.1016/j.chroma.2016.01.078 57. Zelena E, Dunn WB, Broadhurst D, Francis-McIntyre S, Carroll KM, Begley P, O’Hagan S, 33

Knowles JD, Halsall A, Wilson ID, Kell DB (2009) Development of a Robust and Repeatable UPLC−MS Method for the Long-Term Metabolomic Study of Human Serum. Analytical chemistry 81 (4):1357-1364. doi:10.1021/ac8019366 58. Want EJ, Wilson ID, Gika H, Theodoridis G, Plumb RS, Shockcor J, Holmes E, Nicholson JK (2010) Global metabolic profiling procedures for urine using UPLC–MS. Nature Protocols 5:1005. doi:10.1038/nprot.2010.50 59. Chan ECY, Pasikanti KK, Nicholson JK (2011) Global urinary metabolic profiling procedures using gas chromatography–mass spectrometry. Nature Protocols 6:1483. doi:10.1038/nprot.2011.375 60. Fages A, Pontoizeau C, Jobard E, Levy P, Bartosch B, Elena-Herrmann B (2013) Batch profiling calibration for robust NMR metabonomic data analysis. Analytical and Bioanalytical Chemistry 405 (27):8819-8827. doi:10.1007/s00216-013-7296-0 61. Li J, Hoene M, Zhao X, Chen S, Wei H, Häring H-U, Lin X, Zeng Z, Weigert C, Lehmann R, Xu G (2013) Stable Isotope-Assisted Lipidomics Combined with Nontargeted Isotopomer Filtering, a Tool to Unravel the Complex Dynamics of Lipid Metabolism. Analytical chemistry 85 (9):4651-4657. doi:10.1021/ac400293y 62. Kamleh MA, Ebbels TMD, Spagou K, Masson P, Want EJ (2012) Optimizing the Use of Quality Control Samples for Signal Drift Correction in Large-Scale Urine Metabolic Profiling Studies. Analytical chemistry 84 (6):2670-2677. doi:10.1021/ac202733q 63. Wang S-Y, Kuo C-H, Tseng YJ (2013) Batch Normalizer: A Fast Total Abundance 34

Regression Calibration Method to Simultaneously Adjust Batch and Injection Order Effects in Liquid Chromatography/Time-of-Flight Mass Spectrometry-Based Metabolomics Data and Comparison with Current Calibration Methods. Analytical chemistry 85 (2):1037-1046. doi:10.1021/ac302877x 64. Fages A, Ferrari P, Monni S, Dossus L, Floegel A, Mode N, Johansson M, Travis RC, Bamia C, Sanchez-Perez M-J, Chiodini P, Boshuizen HC, Chadeau-Hyam M, Riboli E, Jenab M, Elena-Herrmann B (2014) Investigating sources of variability in metabolomic data in the EPIC study: the Principal Component Partial R-square (PC-PR2) method. Metabolomics 10 (6):1074-1083. doi:10.1007/s11306-014-0647-9 65. Zhao Y, Hao Z, Zhao C, Zhao J, Zhang J, Li Y, Li L, Huang X, Lin X, Zeng Z, Lu X, Xu G (2016) A Novel Strategy for Large-Scale Metabolomics Study by Calibrating Gross and Systematic Errors in Gas Chromatography-Mass Spectrometry. Analytical chemistry 88 (4):2234-2242. doi:10.1021/acs.analchem.5b03912 66. Thonusin C, IglayReger HB, Soni T, Rothberg AE, Burant CF, Evans CR (2017) Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics

data.

Journal

of

Chromatography

A

1523:265-274.

doi:https://doi.org/10.1016/j.chroma.2017.09.023 67. Deng K, Zhang F, Tan Q, Huang Y, Song W, Rong Z, Zhu Z-J, Li K, Li Z (2019) WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on

wavelet

analysis.

Analytica

chimica

acta

1061:60-69. 35

doi:https://doi.org/10.1016/j.aca.2019.02.010 68. Vaughan AA, Dunn WB, Allwood JW, Wedge DC, Blackhall FH, Whetton AD, Dive C, Goodacre R (2012) Liquid Chromatography–Mass Spectrometry Calibration Transfer and Metabolomics Data Fusion. Analytical chemistry 84 (22):9848-9857. doi:10.1021/ac302227c 69. Liggi S, Griffin JL (2017) Metabolomics applied to diabetes - lessons from human population studies. International Journal of Biochemistry & Cell Biology 93:136-147. doi:10.1016/j.biocel.2017.10.011 70. Ganna A, Salihovic S, Sundstrom J, Broeckling CD, Hedman AK, Magnusson PKE, Pedersen NL, Larsson A, Siegbahn A, Zilmer M, Prenni J, Arnlov J, Lind L, Fall T, Ingelsson E (2014) Large-scale Metabolomic Profiling Identifies Novel Biomarkers for Incident Coronary Heart Disease. Plos Genetics 10 (12). doi:10.1371/journal.pgen.1004801 71. Wen C-P, Zhang F, Liang D, Wen C, Gu J, Skinner H, Chow W-H, Ye Y, Pu X, Hildebrandt MAT, Huang M, Chen C-H, Hsiung CA, Tsai MK, Tsao CK, Lippman SM, Wu X (2015) The Ability of Bilirubin in Identifying Smokers with Higher Risk of Lung Cancer: A Large Cohort Study in Conjunction with Global Metabolomic Profiling. Clinical Cancer Research 21 (1):193-200. doi:10.1158/1078-0432.ccr-14-0748 72. Katagiri R, Goto A, Nakagawa T, Nishiumi S, Kobayashi T, Hidaka A, Budhathoki S, Yamaji T, Sawada N, Shimazu T, Inoue M, Iwasaki M, Yoshida M, Tsugane S (2018) Increased Levels of Branched-Chain Amino Acid Associated With Increased Risk of Pancreatic Cancer in a Prospective Case-Control Study of a Large Cohort. Gastroenterology 36

155 (5):1474-+. doi:10.1053/j.gastro.2018.07.033 73. Wikoff WR, Frye RF, Zhu H, Gong Y, Boyle S, Churchill E, Cooper-Dehoff RM, Beitelshees AL, Chapman AB, Fiehn O, Johnson JA, Kaddurah-Daouk R, Pharmacometabolomics Res N (2013) Pharmacometabolomics Reveals Racial Differences in Response to Atenolol Treatment. PloS one 8 (3). doi:10.1371/journal.pone.0057639 74. Su LJ, Fiehn O, Maruvada P, Moore SC, O'Keefe SJ, Wishart DS, Zanetti KA (2014) The Use of Metabolomics in Population-Based Research. Advances in Nutrition 5 (6):785-788. doi:10.3945/an.114006494 75. Guertin KA, Moore SC, Sampson JN, Huang W-Y, Xiao Q, Stolzenberg-Solomon RZ, Sinha R, Cross AJ (2014) Metabolomics in nutritional epidemiology: identifying metabolites associated with diet and quantifying their potential to uncover diet-disease relations in populations.

American

Journal

of

Clinical

Nutrition

100

(1):208-217.

doi:10.3945/ajcn.113.078758 76. Park J-E, Lim HR, Kim JW, Shin K-H (2018) Metabolite changes in risk of type 2 diabetes mellitus in cohort studies: A systematic review and meta-analysis. Diabetes Research and Clinical Practice 140:216-227. doi:https://doi.org/10.1016/j.diabres.2018.03.045 77. Zhao H, Li H, Chung ACK, Xiang L, Li X, Zheng Y, Luan H, Zhu L, Liu W, Peng Y, Zhao Y, Xu S, Li Y, Cai Z (2019) Large-Scale Longitudinal Metabolomics Study Reveals Different Trimester-Specific Alterations of Metabolites in Relation to Gestational Diabetes Mellitus. Journal of proteome research 18 (1):292-300. doi:10.1021/acs.jproteome.8b00602 37

78. Gao X, Ke C, Liu H, Liu W, Li K, Yu B, Sun M (2017) Large-scale Metabolomic Analysis Reveals Potential Biomarkers for Early Stage Coronary Atherosclerosis. Scientific Reports 7 (1):11817. doi:10.1038/s41598-017-12254-1 79. Zhou H, Li L, Zhao H, Wang Y, Du J, Zhang P, Li C, Wang X, Liu Y, Xu Q, Zhang T, Song Y, Yu C, Li Y (2019) A Large-Scale, Multi-Center Urine Biomarkers Identification of Coronary Heart Disease in TCM Syndrome Differentiation. Journal of proteome research 18 (5):1994-2003. doi:10.1021/acs.jproteome.8b00799 80. Stegemann C, Pechlaner R, Willeit P, Langley SR, Mangino M, Mayr U, Menni C, Moayyeri A, Santer P, Rungger G, Spector TD, Willeit J, Kiechl S, Mayr M (2014) Lipidomics Profiling and Risk of Cardiovascular Disease in the Prospective Population-Based Bruneck Study. Circulation 129 (18):1821-1831. doi:10.1161/circulationaha.113.002500 81. Floegel A, Kuehn T, Sookthai D, Johnson T, Prehn C, Rolle-Kampczyk U, Otto W, Weikert C, Illig T, von Bergen M, Adamski J, Boeing H, Kaaks R, Pischon T (2018) Serum metabolites and risk of myocardial infarction and ischemic stroke: a targeted metabolomic approach in two German

prospective

cohorts.

European

Journal

of

Epidemiology

33

(1):55-66.

doi:10.1007/s10654-017-0333-0 82. Luo P, Yin P, Hua R, Tan Y, Li Z, Qiu G, Yin Z, Xie X, Wang X, Chen W, Zhou L, Wang X, Li Y, Chen H, Gao L, Lu X, Wu T, Wang H, Niu J, Xu G (2018) A Large-scale, multicenter serum metabolite biomarker identification study for the early detection of hepatocellular carcinoma. Hepatology 67 (2):662-675. doi:10.1002/hep.29561 38

83. Yousri NA, Bayoumy K, Elhaq WG, Mohney RP, Emadi SA, Hammoudeh M, Halabi H, Masri B, Badsha H, Uthman I, Plenge R, Saxena R, Suhre K, Arayssi T (2017) Large Scale Metabolic Profiling identifies Novel Steroids linked to Rheumatoid Arthritis. Scientific Reports 7 (1):9137. doi:10.1038/s41598-017-05439-1 84. Vantaktr V, Donepudi SR, Piyarathna DWBA, Amara CS, Ambati CR, Tang W, Potluri V, Chandrashekar OS, Varambally S, Terris MK, Davies K, Ambs S, Bollag R, Apolo AB, Sreekumar A, Putluri N (2019) Large-Scale Profiling of Serum Metabolites in African American and European American Patients With Bladder Cancer Reveals Metabolic Pathways Associated With Patient Survival. Cancer 125 (6):921-932. doi:10.1002/cncr.31890 85. Stepien M, Duarte-Salles T, Fedirko V, Floegel A, Barupal DK, Rinaldi S, Achaintre D, Assi N, Tjonneland A, Overvad K, Bastide N, Boutron-Ruault M-C, Severi G, Kuehn T, Kaaks R, Aleksandrova K, Boeing H, Trichopoulou A, Bamia C, Lagiou P, Saieva C, Agnoli C, Panico S, Tumino R, Naccarati A, Bueno-de-Mesquita HB, Peeters PH, Weiderpass E, Ramon Quiros J, Agudo A, Sanchez M-J, Dorronsoro M, Gavrila D, Barricarte A, Ohlsson B, Sjoberg K, Werner M, Sund M, Wareham N, Khaw K-T, Travis RC, Schmidt JA, Gunter M, Cross A, Vineis P, Romieu I, Scalbert A, Jenab M (2016) Alteration of amino acid and biogenic amine metabolism in hepatobiliary cancers: Findings from a prospective cohort study. International Journal of Cancer 138 (2):348-360. doi:10.1002/ijc.29718 86. Chajes V, Thiebaut ACM, Rotival M, Gauthier E, Maillard V, Boutron-Ruault M-C, Joulin V, Lenoir GM, Clavel-Chapelon F (2008) Association between serum trans-monounsaturated 39

fatty acids and breast cancer risk in the E3N-EPIC study. American Journal of Epidemiology 167 (11):1312-1320. doi:10.1093/aje/kwn069 87. Chajes V, Jenab M, Romieu I, Ferrari P, Dahm CC, Overvad K, Egeberg R, Tjonneland A, Clavel-Chapelon F, Boutron-Ruault M-C, Engel P, Teucher B, Kaaks R, Floegel A, Boeing H, Trichopoulou A, Dilis V, Karapetyan T, Mattiello A, Tumino R, Grioni S, Palli D, Vineis P, Bueno-de-Mesquita HB, Numans ME, Peeters PHM, Lund E, Navarro C, Ramon Quiros J, Sanchez-Cantalejo E, Barricarte Gurrea A, Dorronsoro M, Regner S, Sonestedt E, Wirfaelt E, Khaw K-T, Wareham N, Allen NE, Crowe FL, Rinaldi S, Slimani N, Carneiro F, Riboli E, Gonzalez CA (2011) Plasma phospholipid fatty acid concentrations and risk of gastric adenocarcinomas in the European Prospective Investigation into Cancer and Nutrition (EPIC-EURGAST).

American

Journal

of

Clinical

Nutrition

94

(5):1304-1313.

doi:10.3945/ajcn.110.005892 88. Zheng Y, Yu B, Alexander D, Manolio TA, Aguilar D, Coresh J, Heiss G, Boerwinkle E, Nettleton JA (2013) Associations Between Metabolomic Compounds and Incident Heart Failure Among African Americans: The ARIC Study. American Journal of Epidemiology 178 (4):534-542. doi:10.1093/aje/kwt004 89. Yu B, Zheng Y, Alexander D, Manolio TA, Alonso A, Nettleton JA, Boerwinkle E (2013) Genome-Wide Association Study of a Heart Failure Related Metabolomic Profile Among African Americans in the Atherosclerosis Risk in Communities (ARIC) Study. Genetic Epidemiology 37 (8):840-845. doi:10.1002/gepi.21752 40

90. Hartiala JA, Tang WHW, Wang Z, Crow AL, Stewart AFR, Roberts R, McPherson R, Erdmann J, Willenborg C, Hazen SL, Allayee H (2016) Genome-wide association study and targeted metabolomics identifies sex-specific association of CPS1 with coronary artery disease. Nature communications 7. doi:10.1038/ncomms10558 91. Rhee EP, Ho JE, Chen M-H, Shen D, Cheng S, Larson MG, Ghorbani A, Shi X, Helenius IT, O'Donnell CJ, Souza AL, Deik A, Pierce KA, Bullock K, Walford GA, Vasan RS, Florez JC, Clish C, Yeh JRJ, Wang TJ, Gerszten RE (2013) A Genome-wide Association Study of the Human Metabolome in a Community-Based Cohort. Cell Metabolism 18 (1):130-143. doi:10.1016/j.cmet.2013.06.013 92. Petersen A-K, Zeilinger S, Kastenmueller G, Roemisch-Margl W, Brugger M, Peters A, Meisinger C, Strauch K, Hengstenberg C, Pagel P, Huber F, Mohney RP, Grallert H, Illig T, Adamski J, Waldenberger M, Gieger C, Suhre K (2014) Epigenetics meets metabolomics: an epigenome-wide association study with blood serum metabolic traits. Human Molecular Genetics 23 (2):534-545. doi:10.1093/hmg/ddt430 93. Zhang T-Y, Li S, Zhu Q-F, Wang Q, Hussain D, Feng Y-Q (2019) Derivatization for liquid chromatography-electrospray ionization-mass spectrometry analysis of small-molecular weight

compounds.

TrAC

Trends

in

Analytical

Chemistry

119:115608.

doi:https://doi.org/10.1016/j.trac.2019.07.019 94. Ke C, Hou Y, Zhang H, Fan L, Ge T, Guo B, Zhang F, Yang K, Wang J, Lou G, Li K (2015) Large-scale profiling of metabolic dysregulation in ovarian cancer. International Journal of 41

Cancer 136 (3):516-526. doi:10.1002/ijc.29010

42

Figure legends

Figure 1. The properties of the common analytical strategies used in large-scale metabolomics illustrated by information amount, data complexity and variability, stability and repeatability, and quantitative accuracy.

Figure 2. A series of important issues which should be taken into consideration when performing large-scale metabolomics study. Firstly, high-throughput sample pretreatment method is needed to ensure high efficiency sample preparation. Secondly, data acquisition methods should not only achieve rich metabolic profiling information, but also have a high analytical throughput. Thirdly, data repeatability and integration among different batches and different instruments are the key factors for truly realizing large-scale metabolomics study.

Figure 3. Schematic representation of PRM (upper) and MRM (lower) as performed on Q Exactive and TQ MS instruments, respectively. Generally, the precursor ions are selected in Q1 and fragmented in Q2(MRM) or HCD (PRM). In MRM, predesigned product ion transition is monitored at a time, while all possible product ion transitions are simultaneously analyzed in high resolution and high mass accuracy mass analysis in PRM. The figure is reproduced from [50] with permission.

Figure 4. General workflow of MRM ion pairs selection for LC-MS basedpseudotargeted metabolomics. Firstly, IDA based experiment under different CE voltages was carried out by HRMS to obtain comprehensive MS2 information, home-developed MRM-Ion Pair Finder software was innovatively applied to select 43

the daughter ion with most intensive parent ion and its corresponding CE value to generate MRM transitions. Then detection of these MRM transitions was carried out on TQ MS. The figure is reproduced from [51] with permission.

Figure 5. Workflow for strategy development and optimization based on pseudotargeted analysis for stability and robustness improvement in large-scale metabolomics. QC sample selection, frequency of QC insertion and data calibration methods should be taken into consideration to improve stability and robustness for large-scale metabolomics. The figure is reproduced from [56] with permission.

44

Table 1. Typical applications of large-scale metabolomics Analytical strategy

Analytical

Sample size

tool-platform Targeted approach

two commercial kits

Sample

Number of targets

Disease

Main results

Ref.

105 metabolites

Myocardial

Metabolites of the arachidonic acid

[81]

infarction and

pathway are independently associated

type 3684

serum

(BIOCRATES)

ischemic stroke (MI) Targeted methods

LC-MS

510

plasma

Branched-chain

Pancreatic cancer

amino acid

NanoMate

coupled

685

plasma

to TQ-MS

Increased plasma BCAA level is

[72]

associated with increased risk of

(BCAA) Shotgun lipidomics

with risk of MI

pancreatic cancer

135 lipid species

Cardiovascular

from 8 different

disease

lipid classes

Levels

of

individual

species

of

[80]

cholesterol esters, LPCs, PCs, PEs, sphingomyelins,

and

TAGs

were

related to cardiovascular disease Global metabolomic profiling & targeted validation

LC-MS

discovery: 579 validation:425660

serum

>300 metabolites

Lung cancer

Low

levels

of

serum

bilirubin

[71]

contributed higher risk for lung cancer incidence and mortality in male smokers 45

Nontargeted method

UHPLC-MS

428

serum

Metabolomics

Gestational diabetes

profiling (93

mellitus

differential

Trimester-specific alterations

in

metabolite the

amino

[77]

acid

metabolism, lipid metabolism, and

metabolites were

other pathways.

identified)

Nontargeted method

UPLC - Xevo G2 Q-TOFMS

discovery:1028

plasma

validation:

Metabolomics

Coronary heart

Four lipid-related metabolites had

profiling

disease (CHD)

clinical

UPLC-Q-TOF/MS

potential,

and

contributed to CHD development.

1670 Nontargeted method

utility

[70]

1072

urine

Metabolomics

Coronary Heart

15 CHD- blood stasis syndrome

profiling

disease

biomarkers and 12 CHD-Qi and Yin

[79]

deficiency syndrome biomarkers Nontargeted methods

GC

1065

serum

Fatty acids

Breast cancer

A high serum level of

[86]

trans-monounsaturated fatty acids is probably related to increased risk of invasive breast cancer. Nontargeted method

GC

864

plasma

Phospholipid fatty

Gastric

Plasma phospholipid fatty acid profile

[87]

46

acids

adenocarcinomas

may be related to increased gastric cancer risk.

Nontargeted method

UPLC-MS

448

plasma

53 differential

Ovarian cancer

Piperine, 3-indolepropionic acid,

[94]

5-hydroxyindoleacetaldehyde and

metabolites were

hydroxyphenyllactate were defined as

identified

metabolic biomarkers of epithelial ovarian cancer. Pseudotargeted method

LC-MS

1448

serum

776 (239

Hepatocellular

A serum metabolite biomarker panel

identified

carcinoma

consisting of phenylalanyl-tryptophan

metabolites)

[88]

and glycocholate was defined

47

Highlights ●Large-scale metabolomics study is increasingly applied in diseases-related fields. ●Mass spectrometry is a main platform for large-scale metabolomics due to its unique advantages.

●New advances in sample pretreatment methods, metabolic data collection techniques, and data correction methods were summarized.

●Typical applications of large-scale metabolomics in precision medicine, molecular epidemiology and mGWAS are given.