Meta-analysis of gene expression profiles in preeclampsia

Meta-analysis of gene expression profiles in preeclampsia

Journal Pre-proofs Meta-analysis of gene expression profiles in preeclampsia Konstantina E. Vennou, Panagiota I. Kontou, Georgia G. Braliou, Pantelis ...

730KB Sizes 0 Downloads 37 Views

Journal Pre-proofs Meta-analysis of gene expression profiles in preeclampsia Konstantina E. Vennou, Panagiota I. Kontou, Georgia G. Braliou, Pantelis G. Bagos PII: DOI: Reference:

S2210-7789(19)30479-9 https://doi.org/10.1016/j.preghy.2019.12.007 PREGHY 673

To appear in:

Pregnancy Hypertension: An International Journal of Women's Cardiovascular Health

Received Date: Accepted Date:

30 August 2019 18 December 2019

Please cite this article as: Vennou, K.E., Kontou, P.I., Braliou, G.G., Bagos, P.G., Meta-analysis of gene expression profiles in preeclampsia, Pregnancy Hypertension: An International Journal of Women's Cardiovascular Health (2019), doi: https://doi.org/10.1016/j.preghy.2019.12.007

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 International Society for the Study of Hypertension in Pregnancy. Published by Elsevier B.V. All rights reserved.

Meta-analysis of Gene Expression Profiles in Preeclampsia Konstantina E. Vennou, Panagiota I. Kontou, Georgia G. Braliou, Pantelis G. Bagos *

Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia 35132, Greece

*Correspondence to: Pantelis Bagos, Department of Computer Science and Biomedical Informatics, University of Thessaly, Papasiopoulou 2-4, Lamia 35132, Greece, [email protected], tel. 00302231066914

1

ABSTRACT Objective: Preeclampsia is a serious complication of pregnancy. It is considered a complex condition influenced by maternal genes, environmental factors and a deregulated immune response of the mother, but the etiology is largely unknown. The aim of this study is to identify differentially expressed genes (DEGs) in preeclampsia, to help elucidate the identification of the disease etiological mechanisms. Study design: The databases Pubmed and GEO were searched according to PRISMA guidelines for the existence of gene expression data on placental samples from case-control studies. After meta-analysis the identified DEGs were further analyzed with STRING and PANTHER to retrieve interaction networks and overrepresented biochemical pathways. Results: Only 10 gene expression data-sets and articles fulfilled inclusion criteria, containing data on 195 patients and 231 controls, and were analyzed. Meta-analysis indentified 629 DEGs to be associated with preeclampsia at a False Discovery Rate p-value of 0.01. Network analysis showed few highly interconnected genes involved in innate immunity and signal transduction pathways indicative of a multifaceted disease with etiological heterogeneity. Over-representation analysis revealed that these genes participate mainly in carbohydrates, amino acids and pyrimidine metabolism, circadian clock system and signal transduction pathways. Conclusions: This work, combining rigorous methods of meta-analysis and the use of modern bioinformatics tools, proposes the existence of novel, overlooked so far, biochemical pathways and mechanisms to contribute to preeclampsia development such as carbohydrate, aminoacids and pyrimidine metabolism. Our findings pave the way for further investigation of the above pathways in experimental efforts to decipher the orchestrating mechanisms for preeclampsia development. KEYWORDS:

meta-analysis;

preeclampsia;

microarray;

gene

expression;

differentially expressed genes.

2

INTRODUCTION Preeclampsia is a serious complication of pregnancy and one of the major causes of morbidity and mortality, for both the mother and the embryo, since it affects 2-8% of all pregnancies worldwide [1]. Hypertensive disorders of pregnancy (which include preeclampsia) are the most common causes of death due to pregnancy [2] and they resulted in 46,900 deaths in 2015 [3]. There are two main subtypes of preeclampsia, mild and severe. The clinical symptoms of mild preeclampsia are usually the presence of newly occurring hypertension (diastolic blood pressure ≥90 mm Hg and systolic blood pressure ≥140 mm Hg), and proteinuria after the 20th week of pregnancy [4-6]. Severe preeclampsia, conversely, can be characterized by one of the following: sustained systolic blood pressure ≥160mmHg or diastolic blood pressure ≥110 mmHg, nephrotic-range proteinuria, sudden oliguria, central nervous system disturbances, pulmonary edema or cyanosis, epigastric or right upper quadrant pain, liver dysfunction, thrombocytopenia, and fetal growth restriction [7]. Preeclampsia may stem from the existence of placenta in the uterus and it ends with the removal of placental tissue upon delivery [8]. Generally, preeclampsia is considered

as a complex interaction between a series of maternal genes,

environmental factors (pollution, obesity) [9] and/or a deregulated immune response of the mother to the paternal component of the fetus’s genotype with no clear etiology [10]. Several candidate biomarkers that have been associated with preeclampsia are implicated in placentation, regulation of blood pressure, inflammation, vascular formation and function of endothelial cells. These biomarkers include EGF (epidermal growth factor), PLGF, sFLT-1, HO-2,

HIF1-a, Apo E, INHA, LEP, TLR4, pro-

inflammatory cytokines, apoptosis, cell survival and differentiation genes [11], however with no definitive prognostic value [12, 13]. Interest is growing into identifying gene expression profiles based principally on microarrays (or RNAseq) data for diagnosis and risk prediction of many multifactorial disorders, including preeclampsia. The purpose of the present study is

3

to collect all the available expression data on differentially expressed genes (DEGs) in preeclampsia and perform a rigorous meta-analysis. The identification of statistically significantly DEGs and their associated pathways could provide a basis for understanding the pathophysiological mechanisms of preeclampsia and even help deciphering

differentiating

mechanisms

underlying

the

different

types

of

preeclampsia.

METHODS A comprehensive search was performed to identify gene expression data regarding preeclampsia. Datasets were retrieved from the public microarray data repository Gene Expression Omnibus (GEO) [14] using the search term “preeclampsia” OR

“pre-eclampsia” AND “Homo sapiens”. PubMed [15] was also investigated for relevant research articles using the search terms: ("preeclampia" OR "pre-eclampsia" OR "PE") AND ("expressed genes" OR "microarray" OR "micro array" OR "microarray"). The electronic search was performed on 15 February 2017. Studies including gene expression data on non-placental tissues or reporting the effect of drugs on preeclampsia were excluded. This systematic review was performed according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [16]. For each microarray study gene expression data matrices for every probe and every sample were used as input to the meta-analysis. Since in microarray experiments multiple probes may correspond to the same gene, the probe identifiers were converted to gene identifiers. Given the divergence of platforms, genes were identified using HUGO Gene Nomenclature Committee (http://www.genenames.org/) and GENE (https://www.ncbi.nlm.nih.gov/gene/). Many probes may map to the same Gene ID, and, conversely, a probe may also map to more than one Gene ID if the probe sequence is not specific enough. A simple approach would be to use only the probes

4

with one-to-one mapping for further analysis; however, this approach results in loss of information. To solve this, and in order to perform an analysis based on genes rather than probes, we followed the guidelines of Ramasamy and coworkers [17]. The GPL (Geo Platform) files containing gene symbols corresponding to probe id’s were used in order to combine studies from different platforms and resolve the “many-to-many” relationships between probes and genes, by averaging the expression of genes with more than one probes. As effect size we used the standardized mean difference (SMD) across studies as suggested by Cohen [18]. It has been reported though that standardized mean difference is biased; therefore we used the correction proposed be Hedges resulting to a new unbiased estimator called Hedge’s g [19]. The newly generated corrected standard errors for each gene/locus were used in a standard random effects meta-analysis [20, 21]. Various correction methods were also applied to account for the multiplecomparisons. These methods are grouped into two categories, the ones that control the family-wise error rate (FWER) and the ones that control the False Discovery Rate (FDR). The most common approach to control family-wise error rate is the Bonferroni [22] correction which is easily applied and intuitive, but it is very conservative. FDRcontrolling procedures [23] are designed to control the expected proportion of rejected null hypotheses that were incorrect rejections. FDR methods have greater power (i.e. they detect more differences as statistically significant), at the cost of increased Type I error rate. For both FWER and FDR analyses, genes with p-value less or equal to 0.01 are considered statistically significant. The integration-driven discovery rate (IDR) and the integration-driven revision rate (IRR) were used in order to calculate the differentially expressed genes identified purely by the meta-analysis and the differentially expressed genes identified in individual studies but not in metaanalysis, respectively [24]. For all the statistical analyses performed, statistical software package STATA 13 [25] was used.

5

In order to uncover the biological processes and the biochemical pathways in which the identified DEGs participate, the enrichment analysis tools PANTHER (Protein ANalysis THrough Evolutionary Relationships) (http://www.pantherdb.org/) [26] and STRING (Search Tool for the Retrieval of INteracting Genes/proteins) (https://string-db.org/) [27] were used. In order to further understand the role of each DEG in PE we investigated the gene function, phenotypic and physiological consequences of the highly connected DEGs in mice knock out models using the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org/) [28].

RESULTS From PubMed and GEO databases 60 datasets and articles were retrieved and reviewed. Only ten met the eligibility criteria and subsequently were included in the meta-analysis (Supplementary Figure 1). The datasets provided data for 27,867 unique genes/loci from 195 preeclampsia patients and 231 healthy individuals (Table 1). The number of times each of these genes appeared in the studies was also investigated and is presented in Supplementary Table 1. Random effects meta-analysis was performed followed by multiple testing correction methods (Bonferoni, Sidak, Holm and Holland) in order to narrow down the number of false positive results. 629 statistically significantly DEGs were identified in patients with preeclampsia compared to healthy individuals, at an FDR of 1%. The IDR was equal to 0.234 indicating that 23.4% of DEGs were identified as such through the meta-analysis but not in any of the primary studies (i.e. 147 from the 629), depicting the robustness and the efficiency of the meta-analysis. IRR was 0.92, indicating that the declared differentially expressed genes in individual studies but not in meta-analysis were 92.1% (5,640 out of 6,122 genes in all individual studies)

6

(Figure 1). The high number of IRR is expected in such analyses, since variability and non reproducibility is high in microarray data. All 629 DEGs identified in the meta-analysis along with their p-values are presented in Supplementary Table 2a. Notably, from the 629 DEGS 336 were overexpressed and 293 underexpressed (Supplementary Table 2a). Although overexpressed genes are slightly more abundant, we cannot draw conclusions on generalized suppression of genes. The possible interactions among the 629 DEGs were investigated and visualized with the use of STRING. Of the 629 genes, only 149 were strongly connected (at a confidence interaction cut off score of 0.9) (Figure 2). The fact that distinct hubs of gene products appear in the network implies that different gene pathways are involved in the pathogenesis of preeclampsia. The appearance of EGFR and to a lesser extend CTSC as hubs suggest that signal transduction pathways are in play regulating vesicle mediated transport. RPS27L, CSNK1E, DVL1 and IGFBP5 are involved in response to extracellular signaling through signal transduction pathways or translational control. Importantly, the hubs ATP8B4 and RNF130 suggest that innate immunity pathways related to either ion transport or antigen presentation are associated with preeclampsia development. Importantly, when we tried to create an interaction network with the top 20 genes (in terms of p-values) no connection was presented (data not shown), while the top 50 genes formed only a small network with nine genes (Supplementary Figure 2). The main sub-network consists of genes participating mainly in immune response. Because of the high heterogeneity in preeclampsia etiology and the unconnected gene sub-networks, we set to investigate whether this heterogeneity could be attributed to distinct causes between severe and mild preeclampsia. Stratification meta-analysis according to subtypes resulted in eighteen common genes between mild and severe preeclampsia (Figure 3). More importantly 216 from

7

365 DEGs in mild preeclampsia were found to be significantly DEGs in both mild and overall preeclampsia; only 89 from 287 DEGs in severe PE were in common with the DEGs discovered for overall preeclampsia. Noteworthy, overall meta-analysis revealed 306 additional significantly DEGs that were not identified in the sub-groups. The eighteen common DEGs in mild and severe preeclampsia take part in a) energy ultilization [ACSF2, APMAP, BTD and HK2, NR2F1

],

b)

in

developmental processes [DNM2, GREM2, CA4, PPP1R15A

and HTRA1] c) in

regulation of immune response [HTRA1, CTSC, PROCR, SASH1, RASL11B] d) in procedures for pregnancy maintenance [CGB3, MORC4], e) in regulating apoptosis [SOD1] or f) are transcription factors [MORC4, ZNF555]. The 629 DEGs were further investigated for the biological processes and biochemical pathways that are involved in with the use of PANTHER and STRING. PANTHER recognized 545 while STRING recognized 525 genes and thus the overrepresentation analysis (ORA) tool of PANTHER was used (Table 2). Signal transduction pathways were overrepresented (nine genes) among the 629 DEGs. Carbohydrates

metabolism

and

energy

utilization

pathways

were

also

overrepresented with seven genes [GNPDA1, NPL, HK2), TKT, PFKL, GAPDH, PKM]; these pathways are not well investigated in the literature. Three genes [NT5E, DPYSL3 and DPYD] involved in pyrimidine metabolism and two genes [BCAT2, ILVBL] are involved in amino acids metabolism were found to be DEGs in preeclampsia. Intriguingly, we found two genes involved in the circardian clock system [PER3, CSNK1E] to be DEGs in preeclampsia. In the overrepresented PANTHER

biochemical

pathways

for

overall

preeclampsia

15

DEGs

are

overexpressed and 9 underexpressed (Table 2). Since severe and mild preeclampsias constitute discrete disease entities we also investigated the enriched biochemical pathways according to PANTHER for each of the two subtypes of PE to investigate distinct mechanisms underlying the two subtypes. As shown in Table 2, circadian clock and cell cycle gene pathways are

8

enriched in mild preeclampsia but not in severe PE. Conversely, there are biochemical pathways that are enriched in severe preeclampsia but not in mild, such as Gonadotropin-releasing hormone receptor (GnRHR) pathway (with eight genes), amino acids biosynthesis and aminobutyrate degradation. Common pathways between mild and severe are the ones that involve energy utilization and signal transduction pathways. All the biochemical pathways enriched in separate types of preeclampsia are enriched in the overall preeclampsia search, apart from the ones that are related to GnRHR pathway and aminobutyrate degradation. To clarify whether the stringent FDR threshold of 1% could potentially miss some true positive associated DEGs, we also performed a new analysis with a less stringent FDR of 5% and enrichment analysis as well. We found 1,342 DEGs that formed a more interconnected network compared to the one formed at FDR 1% (Supplementary Figure 3). Four additional biochemical pathways were enriched, the one of sex hormone biosynthesis, angiogenesis and Notch signaling pathways, while GnRHR pathway was enriched in overall PE and not only in severe (Supplementary Figure 3 and Supplementarry Table 3). The 24 genes (Table 2) that participate in the enriched PANTHER pathways in overall PE were investigated for their biological function in MGI database on knockout mouse models [28]. We found that the deletion of certain genes led to mortality in different life stages of knockout mice development. GNPDA1 [28], HK2 [29] and TKT [30] led to embryonic lethality. ADCY3, ADCY7 and RGS10 led to postnatal mortality[28]. Finally PFKL and PKM led to premature lethality [28]. The deletion of CHRM2 [28] and NT5E [31, 32] caused problems in cardiovascular system, a situation related to hypertension. ADCY7 [33] and RGS10 [34] knockout mice had problems in their immune system, a finding that corroborates the notion supported by others [11-13] and our results herein that deregulated immune response is involved in PE development. GNPDA1 deletion led to abnormal uterus

9

morphology [28] a result that is accordance to our findings that deregulated expression of genes involved in developmental processes are associated with PE.

DISCUSSION The present meta-analysis performed on all available microarray data on preeclamptic placentas identified 629 statistically significantly DEGs significantly associated with PE. These DEGs can potentially be used in deciphering the etiology of the disease or as risk prediction markers. The interaction network for the 629 DEGs identified few highly interconnected gene-products, suggesting that multiple metabolic pathways are implicated in the formation of preeclamptic placenta which is indicative of the heterogeneity of the disease pathology. The heterogeneity in the disease causes is further supported by the fact that the top 20 DEGs (according to pvalue) do not interact and that from the top 50 DEGs only few interact. The notion of disease heterogeneity has been already suggested by other researchers (reviewed in [35], [8], [12]) thus giving validity to our analysis. The highly interconnected genes in PE (Figure 2) are involved in various signal transduction pathways and in innate immune response. Investigation for the 18 common DEGs in mild and severe PE revealed genes taking place in developmental/pregnancy processes, immune response and carbohydrate and energy utilization. Using bioinformatics analyses (ORA-PANTHER) the carbohydrate metabolism / energy utilization pathways emerged as the most prominent and common to the two types of PE. The fact that eight genes of this pathway are overrepresented in these pathways, strongly suggest a genuine involvement of energy utilization genes in PE development. The later finding is of high importance since deregulated function of these genes can lead to deregulated energy utilization and perhaps obesity. It is known that increased body mass index (BMI) and abnormal lipid profiles are characteristics of preeclampsia [36]. Though many studies exploring candidate biomarkers have never before referred to such energy utilization genes,

10

they had however investigated obesity as risk factor and found to be associated with PE (reviewed [9, 11-13]). It is also reported that dislipidemia contributes to endothelial cell dysfunction [37]

and is associated with increased inflammation,

angiogenesis and alterations of circulating growth regulators [38]. Mutations in the LPL and ApoE genes have been proposed as risk factors [35]. Thus, one cannot rule out the possibility that deregulated expression of genes involved in energy utilization cannot lead to obesity and thus to preeclampsia. Pathways enrichment analysis also led to some interesting findings. First, amino acid biosynthesis pathway was found to be enriched in PE. Though weird, a recent metabolomics analysis highlighted the fact that the blood concentration of some amino acids are altered in PE patients [39]. Moreover, it has been suggested that supplementation with L-arginine in PE patients can reduce their blood pressure levels [40]. Second, circadian clock system pathway was also enriched in overall and mild preeclampsia represented by two genes. This finding is expected since blood pressure in normal pregnancy follow a circadian rhythm with lowest values during night, while in many patients with PE such drop in not observed [41]. Third, we found three DEGs overrepresented in the pyrimidine metabolism pathways. It has been reported that the circulating levels of cell free fetal DNA (cff DNA) are higher in preeclampsia and cff DNA has been investigated as a possible predictive biomarker for PE [42-45]. Consequently, the involvement of genes taking part in pyrimidine metabolism pathway does appear as a plausible association. Finally, Gonadotropinreleasing hormone receptor pathway (GnRHR) is overrepresented only in severe PE. It is known that GnRH is expressed in placenta and acts through its specific GnRHR locally to exert autocrine functions [46] and thus is absolutely plausible to be found overrepresented. Interestingly, when the meta-analysis was performed with less strict FDR (FDR<0.05) GnRHR pathway was overrepresented in overall PE. Importantly, in the initial network built with the 629 DEGs we saw two main hubs with genes involved in signal transduction pathways which were also enriched

11

in overall and both types of PE. The abundance of signal transduction genes in all outcomes is both expected and required. In almost all specific pathways that respond to external signals such genes are involved and thus their abundance is expected. On the other hand, the abundance of genes involved in signal transduction serves as validation to our method showing that the analysis is correctly performed. This is not the first attempt to present such an analysis since several studies have tried in the past to combine, with meta-analysis, microarray data from preeclampsia. The present meta-analysis has many advantages compared to previous ones in several aspects (Table 3). Firstly, the older studies [47-52] contain fewer data-sets with fewer case and control individuals compared to the present work (3-7 included studies vs. 10 in this work), thus, the current study has larger statistical power. Secondly, we applied strict inclusion criteria by using only placental samples. Thirdly, we used a rigorous meta-analysis methodology by applying Hedge’s correction and using the produced standard errors in a random effects meta-analysis, with various correction methods and a low FDR value (1%) in order to identify the most reliable DEGs. Fourthly, we used advanced bioinformatics tools (STRING and PANTHER) to reveal few robust gene pathways. The previously published metaanalyses, using a variety of methods, demonstrated several generic biochemical pathways to be associated with PE, including inflammation, angiogenesis and amino acids metabolism [47-49]. Though PE cannot be attributed to only one biochemical pathway, the proposal by the previous meta-analyses that a multitude of DEGs and pathways are associated with PE placentas is of poor differential diagnostic value since the sum of all these pathways could be neither specific nor characteristic of PE. Instead, we identified few pathways and highlight that energy utilization pathway needs further experimental investigation. Taken together, our bioinformatics findings strongly propose carbohydrate metabolism and energy utilization as a new pathway that can be deregulated in PE. Though the above pathways might seem rather unrelated to the biomarkers known

12

thus far (EGF, PLGF, sFLT-1, pro-inflammatory cytokines, apoptosis, cell survival and differentiation genes [12, 13]), one has to keep in mind that these known biomarkers are tested in maternal blood samples. Because our results concern placenta tissue, we propose further experimental research for the molecular function of the energy utilization genes on placentas because it can directly show deregulated expression of genes before they exert their systemic actions. However, future functional analysis for the rest of the enriched gene/pathways is also needed to verify our results and shed more light on the mechanisms underlying preeclampsia.

13

Funding This work is part of Konstantina’s Vennou thesis, which is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning» in the context of the project “Strengthening Human Resources Research Potential via Doctorate Research” (MIS-5000432), implemented by the State Scholarships Foundation (ΙΚΥ).

Declaration of interest The

authors

declare

that

they

have

no

competing

interests.

14

REFERENCES 1. Steegers EA, von Dadelszen P, Duvekot JJ, Pijnenborg R. Pre-eclampsia. Lancet (London, England). 2010;376(9741):631-44. 2. Henderson JT, Whitlock EP, O'Connor E, Senger CA, Thompson JH, Rowland MG. Low-dose aspirin for prevention of morbidity and mortality from preeclampsia: a systematic evidence review for the U.S. Preventive Services Task Force. Annals of internal medicine. 2014;160(10):695-703. 3. Global, regional, and national life expectancy, all-cause mortality, and causespecific mortality for 249 causes of death, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet (London, England). 2016;388(10053):1459-544. 4. Sibai BM. Diagnosis and management of gestational hypertension and preeclampsia. Obstetrics and gynecology. 2003;102(1):181-92. 5. Gifford RW, August PA, Cunningham G. Report of the National High Blood Pressure Education Program Working Group on High Blood Pressure in Pregnancy. American journal of obstetrics and gynecology. 2000;183(1):S1-S22. 6. Longo DL, Harrison TR. Harrison's principles of internal medicine. New York, N.Y.: McGraw-Hill Medical; 2012. 7. Turner JA. Diagnosis and management of pre-eclampsia: an update. International journal of women's health. 2010;2:327-37. 8. Louwen F, Muschol-Steinmetz C, Reinhard J, Reitter A, Yuan J. A lesson for cancer research: placental microarray gene analysis in preeclampsia. Oncotarget. 2012;3(8):759-73. 9. Bodnar LM, Catov JM, Roberts JM. Racial/ethnic differences in the monthly variation of preeclampsia incidence. American journal of obstetrics and gynecology. 2007;196(4):324 e1-5. 10. Walker JJ. Pre-eclampsia. Lancet (London, England). 2000;356(9237):12605. 11. Chappell S, Morgan L. Searching for genetic clues to the causes of preeclampsia. Clin Sci (Lond). 2006;110(4):443-58. 12. El-Sayed AAF. Preeclampsia: A review of the pathogenesis and possible management strategies based on its pathophysiological derangements. Taiwanese journal of obstetrics & gynecology. 2017;56(5):593-8. 13. Grill S, Rusterholz C, Zanetti-Dallenbach R, Tercanli S, Holzgreve W, Hahn S, et al. Potential markers of preeclampsia--a review. Reproductive biology and endocrinology : RB&E. 2009;7:70. 14. Barrett T, Edgar R. Mining Microarray Data at NCBI’s Gene Expression Omnibus (GEO). Methods in molecular biology (Clifton, NJ). 2006;338:175-90. 15. PubMed Help [Internet]. 16. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS medicine. 2009;6(7):e1000097. 17. Ramasamy A, Mondry A, Holmes CC, Altman DG. Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS medicine. 2008;5(9):e184. 18. Cohen J. Statistical power analysis for the behavioral sciences. New York: Academic Press; 1969. 19. Hedges LV. Distribution Theory for Glass's Estimator of Effect Size and Related Estimators. Journal of Educational Statistics. 1981;6(2):107-28. 20. Campain A, Yang YH. Comparison study of microarray meta-analysis methods. BMC Bioinformatics. 2010;11(1):408. 21. Choi JK, Yu U, Kim S, Yoo OJ. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics. 2003;19.

15

22. Dudoit SYHY, Matthew J. Callow, and Terence P. Speed. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical report # 578. 2000. 23. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological). 1995;57(1):289-300. 24. Conlon EM, Song JJ, Liu A. Bayesian meta-analysis models for microarray data: a comparative study. BMC Bioinformatics. 2007;8(1):80. 25. StataCorp. Stata 13 Base Reference Manual. College Station, TX: Stata Press. ; 2013. 26. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, et al. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017;45(D1):D183-d9. 27. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447-52. 28. Smith CL, Blake JA, Kadin JA, Richardson JE, Bult CJ. Mouse Genome Database (MGD)-2018: knowledgebase for the laboratory mouse. Nucleic Acids Res. 2018;46(D1):D836-D42. 29. Heikkinen S, Pietila M, Halmekyto M, Suppola S, Pirinen E, Deeb SS, et al. Hexokinase II-deficient mice. Prenatal death of homozygotes without disturbances in glucose tolerance in heterozygotes. The Journal of biological chemistry. 1999;274(32):22517-23. 30. Xu ZP, Wawrousek EF, Piatigorsky J. Transketolase haploinsufficiency reduces adipose tissue and female fertility in mice. Molecular and cellular biology. 2002;22(17):6142-7. 31. Koszalka P, Ozuyaman B, Huo Y, Zernecke A, Flogel U, Braun N, et al. Targeted disruption of cd73/ecto-5'-nucleotidase alters thromboregulation and augments vascular inflammatory response. Circulation research. 2004;95(8):814-21. 32. Zukowska P, Kutryb-Zajac B, Jasztal A, Toczek M, Zabielska M, Borkowski T, et al. Deletion of CD73 in mice leads to aortic valve dysfunction. Biochimica et biophysica acta Molecular basis of disease. 2017;1863(6):1464-72. 33. Duan B, Davis R, Sadat EL, Collins J, Sternweis PC, Yuan D, et al. Distinct roles of adenylyl cyclase VII in regulating the immune responses in mice. Journal of immunology. 2010;185(1):335-44. 34. Yang S, Li YP. RGS10-null mutation impairs osteoclast differentiation resulting from the loss of [Ca2+]i oscillation regulation. Genes & development. 2007;21(14):1803-16. 35. Williams PJ, Broughton Pipkin F. The genetics of pre-eclampsia and other hypertensive disorders of pregnancy. Best practice & research Clinical obstetrics & gynaecology. 2011;25(4):405-17. 36. Lopez-Jaramillo P, Barajas J, Rueda-Quijano SM, Lopez-Lopez C, Felix C. Obesity and Preeclampsia: Common Pathophysiological Mechanisms. Frontiers in physiology. 2018;9:1838. 37. Belo L, Gaffney D, Caslake M, Santos-Silva A, Pereira-Leite L, Quintanilha A, et al. Apolipoprotein E and cholesteryl ester transfer protein polymorphisms in normal and preeclamptic pregnancies. European journal of obstetrics, gynecology, and reproductive biology. 2004;112(1):9-15. 38. Fontana L, Eagon JC, Trujillo ME, Scherer PE, Klein S. Visceral fat adipokine secretion is associated with systemic inflammation in obese humans. Diabetes. 2007;56(4):1010-3. 39. Nobakht MGBF. Application of metabolomics to preeclampsia diagnosis. Systems biology in reproductive medicine. 2018;64(5):324-39. 40. Vadillo-Ortega F, Perichart-Perera O, Espino S, Avila-Vergara MA, Ibarra I, Ahued R, et al. Effect of supplementation during pregnancy with L-arginine and

16

antioxidant vitamins in medical food on pre-eclampsia in high risk population: randomised controlled trial. BMJ (Clinical research ed). 2011;342:d2901. 41. Pears S, Makris A, Hennessy A. The chronobiology of blood pressure in pregnancy. Pregnancy hypertension. 2018;12:104-9. 42. Vlkova B, Turna J, Celec P. Fetal DNA in maternal plasma in preeclamptic pregnancies. Hypertension in pregnancy. 2015;34(1):36-49. 43. Konecna B, Vlkova B, Celec P. Role of fetal DNA in preeclampsia (review). International journal of molecular medicine. 2015;35(2):299-304. 44. Sifakis S, Koukou Z, Spandidos DA. Cell-free fetal DNA and pregnancyrelated complications (review). Molecular medicine reports. 2015;11(4):2367-72. 45. Contro E, Bernabini D, Farina A. Cell-Free Fetal DNA for the Prediction of Pre-Eclampsia at the First and Second Trimesters: A Systematic Review and MetaAnalysis. Molecular diagnosis & therapy. 2017;21(2):125-35. 46. Voltolini C, Petraglia F. Neuroendocrinology of pregnancy and parturition. Handbook of clinical neurology. 2014;124:17-36. 47. Vaiman D, Miralles F. An Integrative Analysis of Preeclampsia Based on the Construction of an Extended Composite Network Featuring Protein-Protein Physical Interactions and Transcriptional Relationships. PloS one. 2016;11(11):e0165849. 48. Brew O, Sullivan MH, Woodman A. Comparison of Normal and Pre-Eclamptic Placental Gene Expression: A Systematic Review with Meta-Analysis. PloS one. 2016;11(8):e0161504. 49. van Uitert M, Moerland PD, Enquobahrie DA, Laivuori H, van der Post JA, Ris-Stalpers C, et al. Meta-Analysis of Placental Transcriptome Data Identifies a Novel Molecular Pathway Related to Preeclampsia. PloS one. 2015;10(7):e0132468. 50. Sitras V, Fenton C, Acharya G. Gene expression profile in cardiovascular disease and preeclampsia: a meta-analysis of the transcriptome based on raw data from human studies deposited in Gene Expression Omnibus. Placenta. 2015;36(2):170-8. 51. Kawasaki K, Kondoh E, Chigusa Y, Ujita M, Murakami R, Mogami H, et al. Reliable pre-eclampsia pathways based on multiple independent microarray data sets. Molecular human reproduction. 2015;21(2):217-24. 52. Moslehi R, Mills JL, Signore C, Kumar A, Ambroggio X, Dzutsev A. Integrative transcriptome analysis reveals dysregulation of canonical cancer molecular pathways in placenta leading to preeclampsia. Scientific reports. 2013;3:2407. 53. Nishizawa H, Pryor-Koishi K, Kato T, Kowa H, Kurahashi H, Udagawa Y. Microarray analysis of differentially expressed fetal genes in placental tissue derived from early and late onset severe pre-eclampsia. Placenta. 2007;28(5-6):487-97. 54. Sitras V, Paulssen RH, Gronaas H, Leirvik J, Hanssen TA, Vartun A, et al. Differential placental gene expression in severe preeclampsia. Placenta. 2009;30(5):424-33. 55. Winn VD, Gormley M, Paquet AC, Kjaer-Sorensen K, Kramer A, Rumer KK, et al. Severe preeclampsia-related changes in gene expression at the maternal-fetal interface include sialic acid-binding immunoglobulin-like lectin-6 and pappalysin-2. Endocrinology. 2009;150(1):452-62. 56. Nishizawa H, Ota S, Suzuki M, Kato T, Sekiya T, Kurahashi H, et al. Comparative gene expression profiling of placentas from patients with severe preeclampsia and unexplained fetal growth restriction. Reproductive biology and endocrinology : RB&E. 2011;9:107. 57. Tsai S, Hardison NE, James AH, Motsinger-Reif AA, Bischoff SR, Thames BH, et al. Transcriptional profiling of human placentas from pregnancies complicated by preeclampsia reveals disregulation of sialic acid acetylesterase and immune signalling pathways. Placenta. 2011;32(2):175-82. 58. Meng T, Chen H, Sun M, Wang H, Zhao G, Wang X. Identification of differential gene expression profiles in placentas from preeclamptic pregnancies

17

versus normal pregnancies by DNA microarrays. Omics : a journal of integrative biology. 2012;16(6):301-11. 59. Jebbink JM, Boot RG, Keijser R, Moerland PD, Aten J, Veenboer GJ, et al. Increased glucocerebrosidase expression and activity in preeclamptic placenta. Placenta. 2015;36(2):160-9. 60. Liang M, Niu J, Zhang L, Deng H, Ma J, Zhou W, et al. Gene expression profiling reveals different molecular patterns in G-protein coupled receptor signaling pathways between early- and late-onset preeclampsia. Placenta. 2016;40:52-9. 61. Guo L, Tsai SQ, Hardison NE, James AH, Motsinger-Reif AA, Thames B, et al. Differentially expressed microRNAs and affected biological pathways revealed by modulated modularity clustering (MMC) analysis of human preeclamptic and IUGR placentas. Placenta. 2013;34(7):599-605. 62. Leavey K, Benton SJ, Grynspan D, Kingdom JC, Bainbridge SA, Cox BJ. Unsupervised Placental Gene Expression Profiling Identifies Clinically Relevant Subclasses of Human Preeclampsia. Hypertension (Dallas, Tex : 1979). 2016;68(1):137-47.

18

Figure Captions Figure 1: Venn diagram comparing DEGs identified by the individual studies and by metaanalysis. The results obtained by the meta-analysis (629 DEGs) are compared with DEGS identified by at least one study (6,121 DEGs) and by at least two studies (880 DEGs).

Figure 2: Gene/protein association network of the 525 (out of the 629) DEGs in preeclampsia, with a cut-off score of 0.9. These scores do not indicate the strength or the specificity of the interaction, but its confidence. Score 0.9 represents highest confidence between the genes. Lines of different colors indicate predicted modes of action with a confidence interaction cut-off score of 0.07. Lines colored green represent activation, blue binding, light blue phenotype, black reaction, red inhibition, purple catalysis, pink posttranslational modification and yellow transcriptional regulation. Lines ending in an arrow indicate a positive interaction, ending in a rectangle indicate a negative interaction and ending in a circle indicate an unspecified interaction. The network was constructed using STRING.

Figure 3: Venn diagram comparing differential expressed genes identified in the subtypes of preeclampsia (PE). The results obtained from meta-analysis for overall PE (629 DEGs) are compared with differential expressed genes identified from meta-analysis for mild PE (365 DEGs) and from meta-analysis for severe PE (287 DEGs).

19

Table 1. Study Characteristics included in meta-analysis

Mean Gestational Female References

GEO Dataset

Platform

PE

Health

Age PE

age PE

neonates

patients patients

patients

in the PE control

(years)

(weeks)

group

y

s

Mean

Gestation Female

Age

al age

neonates

Healthy

Healthy

in the

controls controls (years)

(weeks)

Whole Human

10

Genome

(severe

Oligo

PE) (5

Microarray

early-, 5

G4112A

late-

(Feature

onset)

Unique Genes

mmHg/6h and DBPc

30.3

32.9 ± 4.0

Informatio n not given

Informatio >110 mmHg/6h and 4

29.2

32.9 ± 5.6

n not

Proteinuria >2

given

g/24h. Controls

43931

18003

32879

16229

were normotensive women with normal pregnancies. Severe PE defined

ABI Human Genome

17

Survey

(severe

Microarray

PE)

Version 2

of

by SBPb >160

version)

GSE10588

probes

Number

Severe PE defined

Number

[54]

of

group

012391

GSE4707a

Criteria

control

Agilent-

[53]

Number

30.5

Informatio 34 ± 3.6

n not given

Informatio 26

30.2

39.6 ± 1.3

n not given

by SBP ≥160 mmHg/6h and/or DBP ≥110 mmHg/6h and Proteinuria ≥2+ g/24h on protein

20

dipstick after 20 weeks gestation. Controls were women with non-PE, normal pregnancies. No information for severe PE criteria.

[HG-U133A]

[55]

GSE14722

Affymetrix

12

Human

(severe

Genome

PE)

30.7

32.1 ± 3.3

Informatio n not given

Informatio 11

30.2

31.0 ± 4.6

n not given

U133A Array

Controls were normotensive women with preterm

22115

12709

24805

18761

48701

19866

labor with no evidence of infection. Severe PE defined

[HuGene-1_0-

by SBP >160 mmHg

st] Affymetrix

and DBP >110

Human Gene [56]

GSE24129 1.0 ST Array [transcript

8 (severe PE)

31.0

34.4 ± 1.8

Informatio n not given

8

31.5

38.1 ± 0.8

Informatio

mmHg and

n not

Proteinuria ≥2

given

g/24h. Controls

(gene)

were normotensive

version]

women with normal pregnancies.

[57]

GSE25906

Illumina

23

≥18

33.6 ± 3.7

56.5%

37

≥18

37.6 ± 1.93

43.2%

PE defined by SBP

21

human-6 v2.0

≥140 mmHg and

expression

DBP ≥90 mmHg and

beadchip

Proteinuria ≥0.3 g/24h. Controls were normotensive women with normal pregnancies. PE defined by SBP >140 mmHg/6h and DBP >90 mmHg/6h and Proteinuria ≥0.3

Illumina HumanHT-12 [58]

GSE30186

V4.0

6

26.0

36.4 ± 0.9

expression

Informatio n not given

6

28.5

39 ± 0.7

Informatio

g/24h or >2+ on

n not

protein dipstick after

given

20 weeks gestation.

beadchip

47323

23324

47231

23322

Controls were normotensive women with normal pregnancies.

[59]

GSE54618

Illumina

PE defined by SBP

HumanHT-12

≥140 mmHg/6h or

V4.0

5

31

34

50%

12

29

30

45%

DBP ≥90 mmHg/6h

expression

and Proteinuria ≥0.3

beadchip

g/24h or >1+ on

22

protein dipstick after 20 weeks gestation. Controls were women with normotensive pregnancies. PE defined by SBP =140 mmHg/6h and/or

Agilent-

DBP=90 mmHg/6h

039494 SurePrint G3 Human GE v2 [60]

GSE74341

8x60K

and Proteinuria ≥0.3

17.1 15 (7

(early-

early-, 8 onset),

Microarray

late-

28.2

039381

onset)

(late-

(Feature

29.8 31.7 (early onset), 30.6

(late onset)

Informatio n not given

(early 10

controls), 28.6 (late controls)

onset)

30.2 (early controls), 40.6 (late controls)

g/24h. Early-onset Informatio

PE <34 weeks of

n not

gestation; late-onset

given

>34 weeks of

62976

21815

48700

20007

gestation. Controls were normotensive

Number

women with preterm

version)

and at term pregnancies.

[61]

GSE35574

Illumina human-6 v2.0

19

No informati

33.6 ± 3.7

57.9%

40

No informati

37.7 ± 1.9

42.9%

PE defined by SBP >140 mmHg/6h and

23

expression

on

on

DBP >90 mmHg/6h

beadchip

and Proteinuria ≥0.03 g/4h or >1+ on protein dipstick after 20 weeks gestation. Controls were women with non-PE pregnacies PE defined by SBP ≥140 mmHg/6h

[HuGene-1_0-

and/or DBP ≥90

st] Affymetrix

mmHg/6h and

Human Gene [62]

GSE75010 1.0 ST Array [transcript

80

32.9

31.9 ± 3.5

Informatio n not given

77

32.8

34.9 ± 4.9

Informatio

Proteinuria ≥0.3

n not

g/24h or ≥2+ on

given

protein dipstick after

(gene)

20 weeks gestation.

version]

Controls were

32321

23016

women with non-PE pregnancies a

GSE = GEO accession number

b

SBP = Systolic blood pressure

c

DBP = Diastolic blood pressure

24

Table 2. Statistically significantly overrepresented PANTHER pathways Overall preeclampsia PANTHER Pathway

Gene

p-value

Valine biosynthesis

BCAT2 (↑), ILVBL (↑)

0.00607

Isoleucine biosynthesis

BCAT2 (↑), ILVBL (↑)

0.00896

Pyrimidine Metabolism

DPYSL3 (↓),NT5E (↑), DPYD (↓)

0.00381

N-acetylglucosamine metabolism

GNPDA1 (↓), NPL (↓)

0.0204

Pentose phosphate pathway

HK2 (↑), TKT (↓)

0.0251

Glycolysis

HK2 (↑), PFKL (↑), GAPDH (↑), PKM (↑)

0.00288

Circadian clock system

PER3 (↓), CSNK1E (↑)

0.0302

Fructose galactose metabolism

HK2 (↑), GALK1 (↑)

0.0476

Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha mediated pathway

CALM1 (↑), SSR4 (↑), ADCY7 (↓), GRK3 (↑), ADCY3 (↓), SSR1 (↑), RGS10 (↓), KCNJ6 (↑), CHRM2 (↑)

0.0471

Mild preeclampsia

Acetate utilization

ACSS2 (↓)

0.0495

Circadian clock system

PER3 (↓), CSNK1E (↑)

0.00807

Glycolysis

HK2 (↑), PFKL (↑), PKM (↑)

0.00291

25

Cell cycle

CCND2 (↓), APC2 (↑)

0.0336

Toll receptor signaling pathway

TLR8 (↓), TLR6 (↓), MAP3K8 (↑)

0.0384

Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha mediated pathway

SSR4 (↑),PRKCB (↓), CACNA1A (↑), RGS10 (↓),KCNJ6 (↑)

0.026

Severe preeclampsia

BCAT2 (↑)

0.0411

Aminobutyrate degradation

ALDH5A1 (↓)

0.0411

Alanine biosynthesis

BCAT2 (↑)

0.0411

Pentose phosphate pathway

HK2 (↑), TKT (↓)

0.00805

Pyruvate metabolism

PC (↓), PDHA1 (↓)

0.0136

Fructose galactose metabolism

GALE (↑), HK2 (↑)

0.0157

Glycolysis

HK2 (↑), GAPDH (↑), TPI1 (↑)

0.00383

TGF-beta signaling pathway

SMAD6 (↑), FKBP1C (↓), FKBP2 (↑), INHBA (↑), TGFB1 (↑), BMP5 (↓), JUNB (↑)

0.000629

Alzheimer disease-presenilin pathway

MME (↑), JUP (↑), FZD10 (↑), MMP1 (↓), CDH1 (↓)

0.0363

Gonadotropin-releasing hormone receptor pathway

INHA (↑), SLC2A1 (↑), DUSP1 (↑), INHBA (↑), LHB (↑), TGFB1 (↑), JUNB (↑), EGFR (↑)

0.0186

Leucine biosynthesis

↓ underexpressed, ↑ overexpressed

26

Table 3. Comparisons between other meta-analyses and the present study.

Study

Present Metaanalysis

[47]

[48]

[49]

Year

Included studies

2018

GSE4707 GSE10588 GSE14722 GSE24129 GSE25906 GSE30186 GSE54618 GSE74341 GSE35574 GSE75010

2016

GSE10588 GSE14722 GSE24129 GSE25906 GSE54618 GSE44711 (chorionic vill) GSE43942

2016

GSE47187 GSE43942 GSE30186 GSE25906 GSE4707 GSE35574

2015

GSE5461 GSE30186 GSE4707 GSE24129 GSE10588 GSE25906 GSE14722 Author1 Author2 Author3 Author4

Number of cases

195

85

68

116

Number of controls

231

109

99

139

DEGs

Pathways

629

9 PANTHER pathways (with PANTHER overrepresentation analysis)

369

28 (mentioned in the article) KEGG& WikiPathways (with WebGestat)

9540

207 KEGG Pathways (with WebGestat) not all mentioned in the article

388

71 shown in the supplementary, (not all mentioned in the article) KEGG Pathways (with Graphite)

27

[50]

[51]

[52]

1, 2, 3, 4:

2014

GSE4707 GSE10588 GSE12767 (chorionic villi) GSE25906 GSE30186

2014

GSE10588 GSE14722 GSE25906 GSE30186 GSE25906 GSE30186 GSE4711 (chorionic villi)

2013

GSE10588 GSE14722 GSE4707 GSE24129

63

66

47

85

947

22 (mentioned in the article) PANTHER pathways

88

Not mentioned in the article

21 mentioned but 10 are shown

419

8 (Not all mentioned in the article, with Biobase

49

data retrieved after personal communication with authors

28

29

30

31

Highlights • We performed meta-analysis in microarray data on placental samples. • We identified 629 (FDR1%) statistically significant DEGs associated with preeclampsia. • Potentially biochemical pathways and mechanisms were carbohydrate, aminoacids and pyrimidine metabolism.

32