Epigenetic footprint enables molecular risk stratification of hepatoblastoma with clinical implications

Epigenetic footprint enables molecular risk stratification of hepatoblastoma with clinical implications

Journal Pre-proof Epigenetic footprint enables molecular risk stratification of hepatoblastoma with clinical implications Juan Carrillo-Reixach, Laura...

11MB Sizes 0 Downloads 7 Views

Journal Pre-proof Epigenetic footprint enables molecular risk stratification of hepatoblastoma with clinical implications Juan Carrillo-Reixach, Laura Torrens, Marina Simon-Coma, Laura Royo, Montserrat Domingo-Sàbat, Jordi Abril-Fornaguera, Nicholas Akers, Margarita Sala, Sonia Ragull, Magdalena Arnal, Núria Villalmanzo, Stefano Cairo, Alberto Villanueva, Roland Kappler, Marta Garrido, Laura Guerra, Constantino Sábado, Gabriela Guillén, Mar Mallo, David Piñeyro, María Vázquez-Vitali, Olga Kuchuk, María Elena Mateos, Gema Ramírez, Manuel López Santamaría, Yasmina Mozo, Aroa Soriano, Michael Grotzer, Sophie Branchereau, Nagore García de Andoin, Blanca López-Ibor, Ricardo López-Almaraz, José Antonio Salinas, Bárbara Torres, Francisco Hernández, José Javier Uriz, Monique Fabre, Julià Blanco, Claudia Paris, Viera Bajčiová, Geneviève Laureys, Helena Masnou, Ariadna Clos, Cristina Belendez, Catherine Guettier, Lauro Sumoy, Ramón Planas, Mireia Jordà, Lara Nonell, Piotr Czauderna, Bruce Morland, Daniela Sia, Bojan Losic, Marie Annick Buendia, Maria Rosa Sarrias, Josep M. Llovet, Carolina Armengol PII:

S0168-8278(20)30187-2

DOI:

https://doi.org/10.1016/j.jhep.2020.03.025

Reference:

JHEPAT 7673

To appear in:

Journal of Hepatology

Received Date: 5 September 2019 Revised Date:

11 March 2020

Accepted Date: 13 March 2020

Please cite this article as: Carrillo-Reixach J, Torrens L, Simon-Coma M, Royo L, Domingo-Sàbat M, Abril-Fornaguera J, Akers N, Sala M, Ragull S, Arnal M, Villalmanzo N, Cairo S, Villanueva A, Kappler R, Garrido M, Guerra L, Sábado C, Guillén G, Mallo M, Piñeyro D, Vázquez-Vitali M, Kuchuk O, Mateos ME, Ramírez G, Santamaría ML, Mozo Y, Soriano A, Grotzer M, Branchereau S, García de Andoin N, López-Ibor B, López-Almaraz R, Salinas JA, Torres B, Hernández F, Uriz JJ, Fabre M, Blanco J, Paris C, Bajčiová V, Laureys G, Masnou H, Clos A, Belendez C, Guettier C, Sumoy L, Planas R, Jordà M, Nonell L, Czauderna P, Morland B, Sia D, Losic B, Buendia MA, Sarrias MR, Llovet JM, Armengol C, Epigenetic footprint enables molecular risk stratification of hepatoblastoma with clinical implications, Journal of Hepatology (2020), doi: https://doi.org/10.1016/j.jhep.2020.03.025.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 European Association for the Study of the Liver. Published by Elsevier B.V.

Carrillo-Reixach et al 1

Epigenetic footprint enables molecular hepatoblastoma with clinical implications

risk

stratification

of

Juan Carrillo-Reixach1, Laura Torrens2,3, Marina Simon-Coma1,4, Laura Royo1, Montserrat Domingo-Sàbat1, Jordi Abril-Fornaguera2,3, Nicholas Akers3,5, Margarita Sala4,6,7, Sonia Ragull1, Magdalena Arnal8, Núria Villalmanzo9, Stefano Cairo10,11, Alberto Villanueva12, Roland Kappler13, Marta Garrido14, Laura Guerra15, Constantino Sábado16, Gabriela Guillén17, Mar Mallo18, David Piñeyro19, María Vázquez-Vitali1, Olga Kuchuk3, María Elena Mateos20, Gema Ramírez21, Manuel López Santamaría22, Yasmina Mozo23, Aroa Soriano24, Michael Grotzer25, Sophie Branchereau26, Nagore García de Andoin27, Blanca López-Ibor28, Ricardo LópezAlmaraz29, José Antonio Salinas30, Bárbara Torres31, Francisco Hernández22, José Javier Uriz27, Monique Fabre32, Julià Blanco33,34, Claudia Paris35, Viera Bajčiová36, Geneviève Laureys37, Helena Masnou6, Ariadna Clos6, Cristina Belendez38, Catherine Guettier26, Lauro Sumoy19, Ramón Planas 4,6, Mireia Jordà9,39, Lara Nonell8, Piotr Czauderna40, Bruce Morland41, Daniela Sia2, Bojan Losic5,42,43,44, Marie Annick Buendia45, Maria Rosa Sarrias4,46, Josep M. Llovet2,3,47 and Carolina Armengol1,4* 1

Childhood Liver Oncology Group, Germans Trias i Pujol Research Institute (IGTP), Program for Predictive and Personalized Medicine of Cancer (PMPPC), Badalona, Spain 2 Mount Sinai Liver Cancer Program, Divisions of Liver Diseases, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, USA 3 Translational research in Hepatic Oncology, Liver Unit, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Hospital Clínic, Universitat de Barcelona, Barcelona, Spain 4 CIBER, Hepatic and Digestive Diseases, Barcelona, Spain 5 Department of Genetics and Genomic Sciences, The Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, USA 6 Gastroenterology Department, Hospital Universitari Germans Trias i Pujol Hospital, Badalona, Spain 7 Gastroenterology Department, Hospital Universitari Josep Trueta, Girona, Spain 8 MARGenomics, IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain 9 PMPPC, IGTP, Badalona, Spain 10 XenTech, Evry, France 11 Laboratory for Technologies of Advanced Therapies, Department of Morphology, Surgery and Experimental Medicine University of Ferrara, Italy 12 Chemoresistance and Predictive Factors Group, Program Against Cancer Therapeutic Resistance, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute, L'Hospitalet del Llobregat, Barcelona, Spain 13 Department of Pediatric Surgery, Dr. von Hauner Children's Hospital, Ludwig-MaximiliansUniversity Munich, Lindwurmstr. 2a, 80337 Munich, Germany 14 Hospital Vall d’Hebron, Pathology Department, Barcelona, Spain 15 University Hospital La Paz, Pathology Department, Madrid, Spain 16 Hospital Vall d’Hebron, Pediatric Oncology Department, Barcelona, Spain 17 Hospital Vall d’Hebron, Pediatric Surgery Department, Barcelona, Spain 18 MDS Research Group, Josep Carreras Leukaemia Research Institute, ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona, Spain 19 High Content Genomics and Bioinformatics Unit, PMPPC, IGTP, Badalona, Spain 20 Pediatric Oncology Unit, Department of Pediatrics, University Hospital Reina Sofía, Córdoba, Spain

Carrillo-Reixach et al 2 21

University Hospital Universitario Virgen del Rocío, Pediatric Oncology Department, Sevilla, Spain 22 University Hospital La Paz, Pediatric Surgery Department, Madrid, Spain 23 University Hospital La Paz, Pediatric Oncology Department, Madrid, Spain 24 Biomedical Research in Cancer Stem Cells Group, Pathology Department, Institut de Recerca Hospital Vall d'Hebron (VHIR), Barcelona, Spain. 25 Department of Pediatric Oncology, University Children's Hospital Zurich, University of Zurich, Zurich, Switzerland. 26 Bicêtre Hospital, Le Kremlin-Bicêtre, France. 27 Pediatric Oncology, Hospital Universitario Donostia, Doctor Begiristain Kalea, 117, 20080, Donostia, Spain 28 Department of Pediatric Hematology and Oncology, HM Montepríncipe Hospital, Boadilla del Monte, Madrid, Spain 29 Pediatric Oncology and Hematology, Hospital Universitario Cruces, Bilbao, Spain 30 Division of Hematology-Oncology, Department of Pediatrics, Hospital Universitari Son Espases, Palma de Mallorca, Spain 31 Medical Oncology Department, Pediatric Oncology Department, University Hospital La Fe, Valencia, Spain 32 Department of Pathology, Hôpital Universitaire Necker-Enfants Malades, Assistance Publique-Hôpitaux de Paris, and Université Paris Descartes, Paris, France. 33 IrsiCaixa AIDS Research Institute, IGTP, Badalona, Catalonia, Spain 34 University of Vic - Central University of Catalonia, Vic, Spain. 35 Stem Cell Transplant Unit, Hospital Luis Calvo Mackenna, Santiago, Chile 36 Department of Pediatric Oncology, Childrens University Hospital Brno, Brno, Czech 37 Department of Pediatric Hematology, Oncology and Hematopoietic Stem Cell Transplantation, Ghent University Hospital, Ghent, Belgium 38 Oncohematology Service, Hospital Gregorio Marañón, Madrid, Spain 39 Consortium for the Study of Thyroid Cancer, CECaT, Barcelona, Spain 40 Department of Surgery and Urology for Children and Adolescents, Medical University of Gdansk, Gdansk, Poland 41 Department of Oncology, Birmingham Women's and Children's Hospital, Birmingham, UK 42 Graduate School of Biomedical Sciences, One Gustave L. Levy Place, Box 1022, New York NY 43 Tisch Cancer Institute, Cancer Immunology Program, Icahn School of Medicine at Mount Sinai 44 Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai 45 INSERM, UMR 1193, Paul-Brousse Hospital, Hepatobiliary Centre, Villejuif France 46 Innate Immunity Group, IGTP, Badalona, Spain 47 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain *Corresponding author: Carolina Armengol, PhD

Adress: Germans Trias i Pujol Research Institute (IGTP), Edifici Muntanya, Campus Can Ruti Childhood Liver Oncology Group (c-LOG) Ctra. de Can Ruti. Camí de les Escoles, s/n 08916 Badalona, Spain Email: [email protected]

Carrillo-Reixach et al 3

Phone: + 34 93 554 30 72 Fax: +34 93 497 86 54

Key Words: Hepatoblastoma (HB), RNA editing, BLCAP, 14q32, DLK1-DIO3 locus, integrative molecular classification, prognostic biomarker, CHKA Electronic word count: 4502 words Number of Figures and Tables: 6 Figures, 12 Supplementary Figures, 1 Table and 9 Supplementary Tables Short title: Epigenetic footprint of hepatoblastoma

CONFLICT OF INTEREST Prof. Josep M. Llovet is receiving research support from Bayer HealthCare Pharmaceuticals, Eisai Inc, Bristol-Myers Squibb and Ipsen, and consulting fees from Bayer HealthCare Pharmaceuticals, Bristol-Myers Squibb, Eisai Inc, Celsion Corporation, Eli Lilly, Exelixis, Merck, Ipsen, Glycotest, Navigant, Leerink Swann LLC, Midatech Ltd, Fortress Biotech, Sprink Pharmaceuticals and Nucleix and CANFITE. CA has a research contract with CHIOME Biosciences Inc. The other authors report no conflicts of interest in this work.

FINANCIAL SUPPORT STATEMENT This article was possible thanks to the inputs from the Instituto de Salud Carlos III, ISCIII (PI09/00751, PI10/02082, PI13/02340). The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 668596 (ChiLTERN) and grant agreement No 826121 (iPC). JCR is supported by the Catalan Agency for Management of University and Research Grants (AGAUR, 2019 FI_B01024). LT is supported by an Accelerator Award (CRUCK, AECC, AIRC) (HUNTER, C9380/A26813). DS is supported by the Gilead Research Scholar in Liver Disease. JML is supported by the European Union’s Horizon 2020 research and innovation programme (HEPCAR, 667273-2), Institució Catalana de Recerca i Estudis Avançats (ICREA), U.S. Department of Defense (CA150272P3), an

Carrillo-Reixach et al 4

Accelerator Award (CRUCK, AECC, AIRC) (HUNTER, C9380/A26813), National Cancer Institute, Tisch Cancer Institute (P30-CA196521), Samuel Waxman Cancer Research Foundation, Spanish National Health Institute (SAF2016-76390) and AGAUR (SGR-1358). CA and MRS were supported by Ramón y Cajal (RYC-2010-07249) and Miguel Servet (CPII14/00021) programs of the Ministry of Science and Innovation of Spain and ISCIII, respectively. CA, MRS and MS received funding from CIBERehd (CB06/04/0033), AGAUR (2017-SGR-490). IGTP is a member of the CERCA network of institutes. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

AUTHOR CONTRIBUTIONS JCR designed, performed research, analyzed all data, and wrote the results; LT designed, performed research and reviewed the manuscript; MSC designed and performed research; JAF, LR, MV, NV and SR performed research; LN, MA and MM performed bioinformatics analyses of transcriptomic, genomic and epigenetic data; BL and NA developed and performed RNA variant and editing analysis and BL, DS and NA performed expression analyses of RNA-seq data; OK, performed research; AV performed PDX experiment; SC, RK, MAB, BM and PC provided clinical samples, the associated relevant clinical and pathological information and critically review the manuscript; MDS; performed research and critically review the manuscript; AS, CS, LG, MGa, MGo, MEM, SB, GR, MLS, GG, YM, NGA, JJU, BLI, BT, MF, RLA, JAS, CP, VB, GL, CB, CG, FH, RP, HM, AC and MS provided clinical samples as well as the associated relevant clinical and pathological information; JB, DP, LS and MJ designed and performed research; MRS, designed, analyzed data, and reviewed the manuscript; JML designed research and supervised the study. CA designed research, analyzed data, supervised the whole study and wrote the manuscript.

Carrillo-Reixach et al 5

ABSTRACT Background and aims: Hepatoblastoma (HB) is a rare disease. Nevertheless, it is the predominant pediatric liver cancer, with limited therapeutic options for patients with aggressive tumors. Here we sought to increase knowledge of HB pathobiology in an effort to move towards precision medicine. To this end, we used high-throughput technologies to identify new biomarkers and therapeutic targets for HB patients with poor prognosis. Methods: We performed a comprehensive genomic, transcriptomic and epigenomic characterization of 159 clinically-annotated samples from 113 HB patients. Results: We discovered an unprecedented widespread epigenetic footprint of HB that includes hyperediting of the tumor suppressor BLCAP concomitant with a genome-wide dysregulation of RNA editing and the overexpression of mainly non-coding genes of the oncogenic 14q32 DLK1-DIO3 locus. By unsupervised analysis, we identified two epigenomic clusters (Epi-CA, Epi-CB) with distinct degree of DNA hypomethylation and CpG island hypermethylation that are associated with the C1/C2/C2B transcriptomic subtypes. Based on these findings, we defined the first Molecular Risk Stratification of HB (MRS-HB), which encompasses three main prognostic categories and improves the current clinical risk stratification approach. The MRS-3 category (28%), defined by strong 14q32 locus expression and Epi-CB methylation features, was characterized by CTNNB1 and NFE2L2 mutations, a progenitor-like phenotype and clinical aggressiveness. Finally, we identified choline kinase alpha as a promising therapeutic target for intermediate and high-risk HBs since its inhibition in HB cell lines and patient-derived xenografts strongly abrogated tumor growth. Conclusions: Our findings expand our knowledge about the molecular features of HB and may contribute to improve current clinical stratification approaches and treatments to increase the survival of patients with HB.

LAY SUMMARY Hepatoblastoma is a rare childhood liver cancer little studied so far. We have used cutting-edge technologies to expand our molecular knowledge of this cancer. Our biological findings can contribute to improve its clinical management and pave the way for the development of novel promising therapies.

Carrillo-Reixach et al 6

GRAPHICAL ABSTRACT

HIGHLIGHTS •

Hepatoblastoma (HB) presents an overall dysregulation of RNA editing including that of BLCAP.



Overexpression of a 300kb region within the 14q32 DLK1/DIO3 locus is a new hallmark of HB



We identified two epigenomic HB subtypes -called Epi-CA and Epi-CB- with distinct degree of DNA hypomethylation and CpG island hypermethylation.



The Molecular Risk Stratification of HB (MRS-HB) based on the 14q32signature and Epi-CA/Epi-CB is strongly associated with patient outcome and may complement clinical stratification



The enzyme CHKA could be a novel therapeutic target for HB patients.

Carrillo-Reixach et al 7

INTRODUCTION Hepatoblastoma (HB) is the predominant pediatric liver tumor, affecting mainly infants under 3 years of age[1]. Although its incidence has increased markedly over the last 30 years, HB is a rare disease (1.8 cases per million children per year)[2,3]. Clinical studies combining chemotherapy and efficient surgical approaches have led to dramatic improvements in the outcome of HB patients, with a 3-year event-free survival (3-yr EFS) above 80%[4]. However, clinically advanced tumors have limited treatment options, with a 3-yr EFS of only 34%[5]. Furthermore, patient survivors can suffer severe and lifelong side effects derived from chemotherapy and immunosuppression. A recent unified analysis from the Children's Hepatic tumors International Collaboration (CHIC) led to the development of a new international clinical staging system for risk stratification in children with HB[6]. However, the rarity of the disease has impaired the incorporation of molecular data into this clinical classification. In this context, there is a need to increase our understanding of the biology of this rare tumor and its prognostic determinants in order to be able to move towards biology-driven precision medicine, which includes biomarkers for therapeutic tailoring. The origin of HB is largely unknown. Most tumors are sporadic, and their extreme rarity has limited our understanding of their underlying molecular mechanisms. Regarding the genetic alterations identified to date, the most significant are activating mutations of the catenin beta 1 (CTNNB1) gene, which encodes β-catenin, in more than 70% of HBs[7]. β-catenin is a key regulator of cell fate and proliferation during liver development and regeneration. CTNNB1 mutations in cancer impair the proteosomal degradation of this protein and lead to the constitutive activation of the Wnt pathway[8]. High-throughput technologies now enable us to identify the molecular subtypes of diverse cancers and their associated oncogenic aberrations. Based on transcriptomic studies, we identified two HB subclasses —C1 and C2— that resemble late and early stages of liver development, and a discriminating 16-gene signature[9]. The recent studies led by French[10] and American[11] teams described a third HB subclass not detected by the 16-gene signature. This subclass, called C2B in the paper by Hooks et al.[10], is characterized by increased expression of epithelial-mesenchymal transition markers such as vimentin (VIM). A recent pan-cancer analysis showed that HB is the

Carrillo-Reixach et al 8

tumor with the lowest rate of somatic mutations (1-7 mutations per tumor genome)[12]. However, exome sequencing studies of HB have revealed nuclear factor erythroid 2related factor 2 (NFE2L2), a regulator of critical antioxidant and stress-responsive genes, as the second most mutated gene in ~ 10% of cases[13]. In comparison with hepatocellular carcinoma (HCC), the main liver cancer in adults, HB has more than 10fold fewer mutations. This observation thus suggests that childhood liver tumorigenesis is driven by mechanisms other than DNA mutations, such as epigenetic modifications[14]. To date, genome-wide epigenetic studies on HB are scarce and included a limited number of cases[15–17]. Through a high-throughput genomic, transcriptomic and epigenomic study of unprecedented size, here we discovered and validated a profound epigenetic footprint in HB, spanning RNA editing dysregulation to specific DNA methylation profiles linked to strong overexpression of 14q32 genes, Wnt signaling, and a progenitor-like phenotype. Based our findings which includes an updated 16-gene signature, we present the first molecular risk stratification of HB, which seeks to improve on the current clinical CHIC risk staging system, and we identify CHKA as a potential therapeutic target for patients with HB.

MATERIALS/PATIENTS AND METHODS Patients and samples. The study included 113 patients with HB (discovery set: 67 samples, 33 patients; validation set: 92 samples, 80 patients). In total, we analyzed 112 primary tumors, 3 recurrences and 44 paired non-tumor samples (Table S1). The main clinical characteristics of these patients as well as the pathological and molecular (CTNNB1 status and C1/C2 classification) features of the tumors are summarized in Table 1.

Molecular profiling. RNA-sequencing (RNA-seq), Human Transcriptome Array (HTA), CytoScan HD and methylation 850K-arrays were performed on the discovery set. The main findings were confirmed in the validation set and 5 human fetal livers. Sample-assay overlap is detailed in Table S1. The omics data generated in this study have been deposited in NCBI's Gene Expression Omnibus[18] and are available through GEO Series accession number GSE132219. Whole genome sequencing data are

Carrillo-Reixach et al 9

available under accession number PRJNA548663 at the Sequence Read Archive of the NCBI.

Additional detailed protocols are provided in Supplementary data and CTAT methods.

RESULTS Genomic profiling reveals a recurrent altered sequence of BLCAP in HB RNA-seq data were examined for nucleotide alterations that lead to amino acid changes and fusion transcripts. The study of missense changes found the same point CTNNB1 mutations as identified by RT-Sanger-Sequencing (Table S1), and revealed changes (A/G) in nucleotide (nt) positions 5, 14 and/or 44 of the Apoptosis Inducing Factor BLCAP (for Bladder cancer-associated protein) transcript in 9 cases of the discovery set (28%) (Figure 1). NFE2L2 mutations were found in 3 cases (9%). Analysis of fusion transcripts identified 15 events with perfect alignment in 12 distinct tumors (Table S2). Four of these transcripts were selected and validated in tumor samples and their corresponding patient-derived xenografts (PDX) by RT-PCR-Sanger sequencing (Figure S1). No additional tumors with these fusion transcripts were detected in the complete set (total incidence <1%), thereby ruling out their relevance for HB tumorigenesis. We analyzed the genomic profiling of the same tumors with the high-resolution CytoScan HD array. The recurrent altered chromosomal regions in HB are shown in Figure 1, overall confirming earlier SNP array or karyotype-based reports[9,11,19]. Most frequent chromosomal alterations included broad and focal copy number gains in 1q, 2q, 5p, 6, 7, 8, 12, 13q34, 15q, 17q and 20, and losses in 1p, 4q and 18p11.32 (Figure S2). The most recurrent, already reported[20], allelic imbalance involved 3.9 Mb of the 11p15 locus (13/32, 41% cases). Other additional recurrent allelic imbalances were found in 1p, 2q, 2p, 3p, 7q, 11p and 17q in 13-22% of the primary tumors, of which the last two, from our knowledge, have not been found in previous studies. Tumors from recurrences or with a HCN-NOS (hepatocellular malignant neoplasms not otherwise specified[21]) histology showed an increased number of chromosomal aberrations (Figure S2).

Carrillo-Reixach et al 10

Genome-wide dysregulation of RNA editing and BLCAP hyperediting in HB Because BLCAP RNA is an highly conserved edited transcript[22], we examined whether the nt 5 A>G substitution (which confers a Y2C change) observed in the RNAseq data is due to an editing event. RT-PCR-Sanger sequencing revealed that the nt 5 substitution was present only in the RNA and not in the DNA of the tumors (Figure 2a). This observation strongly points to the dysregulation of RNA editing in the BLCAP transcript. The nt5 editing of BLCAP was further confirmed by droplet digital PCR (ddPCR). To this end, we used probes to measure the fractional abundance (FA) of wild-type and edited nt 5 of BLCAP and found that the latter was 1.85-fold higher in tumor than in non-tumor samples (p<0.0001, Figure 2b). These findings on BLCAP prompted us to study whether RNA editing is globally disrupted in HB. Genome-wide analysis of RNA changes using RNA-seq data revealed that tumor samples had a lower overall editing index than non-tumor ones in both Alu and non-Alu regions (p<0.0001, Figure 2c). Adenosine deaminases acting on RNA (ADARs) are responsible for converting A to I in nuclear-encoded RNAs, leading to A>G substitutions[23]. We therefore studied whether there was an aberrant expression of these enzymes in HB and found a significant overexpression of both ADAR1 and ADAR2 genes in tumors as compared with nontumor samples (p≤0.0005; Figure 2d). In summary, we discovered an unprecedented dysregulation of global editing and regulatory enzymes and identified BLCAP as the first hyperedited gene in HB.

Overexpression of 14q32 genes is a new hallmark of HB By performing an unsupervised hierarchical clustering of the transcriptomic data, we identified three groups of tumors according to the co-clustering study of 12 dendrograms (Figure S3a). Co-cluster 1 (CC1) and co-cluster 2 (CC2) were enriched in C1 and C2 tumors (p=0.001) whereas the third co-cluster (CC3) composed of C2 tumors, significantly overlapped with the recently identified C2B subclass[10] (Figure S3b) which had high expression of VIM (Figure S3c). The tumor gene expression profile showed upregulation of the Wnt/β-catenin pathway, and imprinted and stem cellrelated genes (FC>2, FDR<0.001, Table S3a/b). Among the most strongly dysregulated

Carrillo-Reixach et al 11

genes in tumors, as identified by HTA (FC>30, FDR<10-7), we also found a previously undescribed, highly upregulated 330 Kb region within the DLK1 (Delta-Like NonCanonical Notch Ligand)-DIO3 (Iodothyronine Deiodinase 3) locus, spanning from 14q32.2 to q32.31, called 14q32 henceforth (Figure 3a, see Figure S4a/b for more details). This is an imprinted region with a key role in human development and cancer[24,25]. It contains more than 100 transcripts, including DLK1 (a well-known hepatoblast marker highly expressed in HB[9,26]), MEG3 and MEG8 (two maternally expressed non-coding genes), small nucleolar RNAs of the C/D box family (namely SNORD113 and SNORD114), and the largest microRNA cluster in the human genome (Figure 3a). Tumor overexpression of 14q32 genes was further validated by Gene Set Enrichment Analysis (GSEA) (Figure 3b, Table S4). Hierarchical clustering based on the gene expression profile of all the genes localized at 14q32 showed two main groups of tumors, thus revealing variability among HBs (Figure 3c). This heterogeneity was observed at the level of tumor/non-tumor FC expression of 14q32 genes and in the number of tumor overexpressed genes (Figure S4c). Among these genes, four (DLK1, MEG3, SNORD113-3, SNORD114-22) were selected to classify tumors on the basis of the degree of 14q32 gene expression (strong/moderate), and they are referred to as the 14q32-gene signature hereafter (Figure S4d). Strikingly, the resulting 14q32 classification was also associated with mutations in the Wnt/β-catenin pathway (p<0.0001, Figure 3c). We assessed the mRNA expression of 14q32 in the validation set and confirmed the overexpression of 14q32 genes. Its correlation with the Wnt/β-catenin pathway activation was also confirmed in the validation set by measuring LGR5, a well-known wnt/β-catenin target gene[27] (Figure S5). Moreover, the study of fetal liver samples also indicated an elevated expression of 14q32 genes, thereby reinforcing the idea that HB recapitulates pathological and molecular features of developing livers[9,28]. To gain insight into the possible mechanisms conferring strong 14q32 gene expression in HB, we examined the 14q32 region at the genomic and epigenomic level. Since 14q32 gene overexpression has been previously linked to adeno-associated virus integration in this locus and hepatocarcinogenesis[29,30], we used whole genome sequencing to search for viral integration in a tumor with a strong 14q32-gene signature. No viral integration site was detected (Table S5, Figure S6). Neither did the CytoScan HD array reveal focal chromosomal rearrangements. In contrast, the comparison of

Carrillo-Reixach et al 12

tumor and non-tumor methylation profiles using the 850K-array identified 32 significant differently methylated CpGs localized at the 14q32 locus with predominant tumor DNA hypomethylation (FDR<0.05), specifically, the methylation levels of a CpG (cg02412314) localized within the intragenic MEG3 region showed a strong correlation with the mean gene expression of the 14q32 region (r=-0.61, p<0.0001, Figure 3d). Interestingly, tumors with a strong 14q32-gene signature displayed lower methylation of the 32 CpGs than tumors with a moderate signature (p=0.0082, Figure 3e). Moreover, fetal liver samples showed the lowest levels of 14q32 methylation as compared with tumor samples (Figure 3e). Identification of two distinct epigenetic profiles in HB Next, we used the 850K-array data to extend our methylation study to the complete genome. Principal component analysis of the methylation data showed that tumor samples were clearly distinct from non-tumor ones (Figure 4a). In general, tumors were characterized by genome-wide DNA hypomethylation (p<0.0001). The supervised analysis comparing tumor and non-tumor samples identified 30,165 differently methylated CpGs regulating 7,234 genes (|∆β|> 0.20, FDR<0.0001), including the hypermethylation of RASSF1 promoter[31]. Using HTA and RNA-seq to associate these data with gene expression, we found 21 hypermethylated (∆β>0.20, FDR<0.0001) genes in tumors with concomitant reduced gene expression (FC<- 2, FDR<0.05) (Table S6). Among them, we recognized genes endowed with tumor suppressor functions such as AKR7A3[32], EDNRB[33], ESRP2[34], PEMT[35] and PER3[36]. The unsupervised analysis showed separate clusters of tumor and non-tumor samples (p<0.0001). The epigenetic clustering also disclosed two distinct tumor clusters, which we called Epigenetic-Cluster A and Epigenetic-Cluster B (Epi-CA and Epi-CB) (Figure 4b) which were not associated with the 16-gene C1/C2 classification[9] (p=0.6882) but were strongly associated with our CC1/CC2/CC3 transcriptomic co-clusters (p≤0.0005) and with the Hooks signature[10] (p<0.005). Moreover, Epi-CB cluster, which was enriched with tumors of the C2 subtype, exhibited constitutive activation of Wnt/βcatenin signaling (p=0.0391) and a strong 14q32-gene signature (p=0.0010) (Figure 4b). The study of the methylation profiles between the two tumor clusters revealed that EpiCB tumors had a sharp global hypomethylation compared to Epi-CA tumors in all epigenomic structures, except for CpG islands, which were hypermethylated (Figure

Carrillo-Reixach et al 13

4c). We next studied the impact of this specific CpG island hypermethylation on the transcriptome of Epi-CB tumors and identified KLF6, ITGB3, NFIC, TRANK1 and TSPYL5 as possible tumor suppressor genes, as the hypermethylation of their CpG islands (β>0.2 and FDR<0.0001) was associated with a switch-off of their expression (RNA-seq/HTA: FC<-2 and FDR<0.001, Figure 4d). Next, we sought to investigate whether the dysregulation of methylation observed in HB was associated with changes in the expression of the enzymes regulating DNA methylation. The expression of tet methylcytosine (TET) family genes -specifically TET1 and TET3- involved in DNA demethylation, was significantly higher in tumors as compared with non-tumors and correlated with the degree of hypomethylation (Figure S7a). Therefore, Epi-CB tumors with strong global hypomethylation had significantly higher levels of TET1 and TET3 than Epi-CA tumors (p<0.0025). Similarly, the expression levels of DNA methyltransferases (DNMT1, DNMT3A and DNMT3B) were higher in tumors than nontumor samples (p<0.0001) and mainly in Epi-CB tumors characterized by CpG island methylation. Moreover, expression of DNMTs was also correlated with high level of CpG island methylation (Figure S7b). As we reported that C1/C2 subtypes resembled late and early liver developmental stages[9], we examined whether the two distinct tumor methylation profiles mimicked also the different methylation profiles of adjacent non-tumor and fetal liver samples based of different ages. In line with our previous findings[9], we observed that global methylation value of Epi-CB tumors, enriched with C2 tumors, was similar to that of early embryonal/fetal phases of liver development at 8.4 ± 7.7 weeks of gestation, while the Epi-CA tumors, enriched with C1 and C2B tumors, had a global methylation value similar to that of late fetal or postnatal liver phases at ~ 5 weeks after birth (Figure 4e).

Molecular risk stratification of HB To address the relevance of our molecular findings for the clinical setting, we studied their association with clinical parameters in the whole set of 113 patients (77% CTNNB1 mutations, 4% NFE2L2 mutations, 25% BLCAP nt 5 hyperediting, 63% strong 14q32-gene signature, 33% Epi-CB, and 44% C2; Table S7). Moreover, we measured VIM expression in order to determine its impact on our previous 16-gene signature[9] and defined an updated 16+VIM-gene signature that classified the C2 tumors as either

Carrillo-Reixach et al 14

C2B[10] (11%) or C2-Pure (34%) on the basis of high or low levels of VIM, respectively. The association of these molecular features with clinical data revealed that the losses of chromosome 4q or 18 and the 17q11.2 allelic imbalance where it is localized the tumor suppressor NF1 were associated with poor prognostic parameters (see more details Table S8). Moreover, patients with tumors with a strong 14q32-gene signature or classified as Epi-CB or C2-Pure had a poorer outcome than those with tumors with a moderate 14q32-gene signature or classified as Epi-CA or C1/C2B (Figure S7). On the contrary, CTNNB1, VIM and BLCAP editing were not associated with any parameter of poor outcome. Next, on the basis of the presence of the novel biomarkers of poor prognosis, we defined the first Molecular Risk Stratification of HB (MRS-HB) (Figure 5a). The lowrisk category (MRS-1) included tumors without any biomarker of poor prognosis (i.e., a moderate 14q32-gene signature and Epi-CA) and was enriched for wild-type CTNNB1 tumors (p=0.012). Tumors in the intermediate-risk category (MRS-2) were defined by having only one biomarker (i.e., a strong 14q32-gene signature or Epi-CB), whereas those in the high-risk category (MRS-3) had two poor prognostic biomarkers (i.e., a strong 14q32-gene signature and Epi-CB) and were enriched for NFE2L2 mutations (p=0.005). Kaplan-Meier survival curves showed that the 3-yr EFS was 91%, 82%, and 52% for patients with MRS-1, MRS-2 and MRS-3 tumors, respectively (p<0.0001, Figure 5b). To identify the most aggressive tumors, we integrated the 16+VIM-gene signature to the MRS and subdivided the high-risk category (MRS-3) into MRS-3a (C1 and C2B) and MRS-3b (C2-Pure), in which the latter defined as a very high-risk category had only 37% of 3-yr EFS (p<0.0001, Figure 5b). Importantly, the multivariate analysis indicated that this novel risk stratification based on molecular parameters is an independent prognostic factor of the current clinical CHIC-HS stratification[6] (Figure 5c and 5d). Accordingly, the combination of the clinical and molecular staging systems (Figure 5e) resulted in improved performance at discriminating low- and high-risk patients (p<0.0001, Figure 5f).

CHKA as a new therapeutic target for intermediate and high-risk HBs To identify therapeutic targets for aggressive HBs, we performed a supervised analysis comparing the three main molecular risk categories. Among the 392 differentially

Carrillo-Reixach et al 15

expressed genes (FDR<0.0001, Table S9), we observed overexpression of 14q32 transcripts (DLK1, MEG3, SNORD113-4 and SNORD114-13) and liver progenitor markers (GPC3, KRT19, AFP, EPCAM) in tumors belonging to the high-risk category (MRS-3) compared to tumors in the low-risk (MRS-1), and to a lesser extent, to those of intermediate-risk (MRS-2) categories. The most widely overexpressed coding gene in high-risk, but also in the intermediate tumors, was CHKA (Table S9, Figure 6a), which is the main regulator of the biosynthesis of phosphatidylcholine via the CDP-choline pathway, which plays a key role in regulating cell growth and carcinogenesis[37]. The differential expression of CHKA between MRS categories (Figure 6b), as well as proliferation (Ki67), 14q32 (DLK1) and liver progenitor (EpCAM, GPC3, KRT19, AFP) markers was also seen by immunohistochemistry (IHC) in 20 tumors (Figure S9). To address whether CHKA could be used as a therapeutic target for HB, we tested the anti-tumoral ability of two CHKA inhibitors (MN58b and TCD-717)[38] at growing concentrations (2-8 µM) in two HB cell lines, HepG2 and Huh6. Both CHKA inhibitors exherted a dose-dependent reduction of cell viability in the two cell lines (p=0.0005 and p<0.0001, respectively) (Figure 6c). Similarly, MN58b and TCD-717 completely inhibited colony formation in HepG2 and Huh6 cells (p<0.0001, Figure 6c). We also investigated the anti-tumoral effects of silencing CHKA gene expression via siRNA in the Huh6 cell line, which has the lower expression of this enzyme. After depleting CHKA in ~ 4-fold change, cell viability of Huh6 cells was significantly reduced by ~15% (p<0.0001; Figure 6d). Next, we assessed the in vivo anti-tumor effects of CHKA inhibition using MN58b in a PDX established from a high-risk HB whose CHKA mRNA and protein levels are representative of intermediate and high-risk tumors as well as remaining PDXs (Figure S10). Interestingly, CHKA inhibition fully abrogated tumor growth throughout treatment as compared with the control arm (vehicle) (p=0.028, Figure 6e). The IHC study revealed that MN58b-treated tumors showed a significantly lower proliferation rate, as determined by cyclin D1 (CCND1) and Ki67 as well as a common trend to revert the tumor progenitor-like phenotype compared with vehicle-treated PDXs (Figure S11). In addition, MN58b-treated tumors showed a significant increase of necrotic areas and a trend of having higher levels of the active form of caspase-3 than the tumors of the control arm (Figure S12).

Carrillo-Reixach et al 16

DISCUSSION Through comprehensive molecular profiling, here we identified an unprecedented widespread epigenomic footprint of HB that includes RNA editing dysregulation, overexpression of mainly non-coding genes in the oncogenic 14q32 DLK1-DIO3 locus, and two distinct epigenomic tumor profiles that associate with the transcriptomic C1/C2/C2B subtypes previously defined by Cairo-Armengol et al.[9] and Hooks et al. [10]. The integration of these epigenetic hallmarks together with an updated 16-gene signature allowed us to develop the first molecular risk stratification of HB, which improves on current clinical patient risk classification, and to identify CHKA as a potential therapeutic target for patients with HB. Our study revealed a genome-wide dysregulation of RNA editing in HB for the first time. RNA editing is a widespread epigenetic mechanism that confers specific nucleotide changes on RNA transcripts without altering the sequence of genomic DNA; thereby contributing to transcriptomic diversity in normal but also cancer cells[39]. The functional impact of RNA editing on cell biology ranges from protein recoding to alterations in alternative splicing, miRNA specificity and RNA stability[40]. Here, we identified a global hypo-editing, a specific hyper-editing of BLCAP, a highly conserved gene with potential tumor suppressor functions[22,41,42], and also an imbalance in the expression of ADAR enzymes; the main regulators of RNA editing[23]. Our findings agree with the observed dysregulation of RNA editing across different cancer types with over- and under-editing patterns relative to non-tumor samples[39,40,43]. Similar to our data, global RNA editing dysregulation in cancer has also been associated with the hyperediting of key genes[40]. In that regard, increased hyperediting of BLCAP has been already reported in HCC[44] and to a minor degree in brain, cervical, oral cavity and lung tumors[45,46]. Previous investigations revealed that BLCAP editing could affect the functions of several binding proteins such as RB[47] or STAT3[48]; thereby, influencing proliferation and apoptotic signalling pathways. In HCC, experiments performed in SMMC_7721 and Focus liver cancer cell lines revealed that nt 5 editing confers a growth advantage, modulating the activation of AKT/mTOR signaling[44]. Overall, our study opens up an unexplored field in HB related to RNA editing and

Carrillo-Reixach et al 17

future studies will need to clarify the functional effects of BLCAP editing in these tumors. The second epigenetic alteration discovered here pertains to the pronounced overexpression of mainly non-coding genes localized in a small 14q32 region of the DLK1-DIO3 locus in almost all the HBs examined and that is highly correlated with their degree of methylation. In addition to DLK1, a well-known hepatoblast marker overexpressed in HB[9,26], this locus is characterized by a cluster of imprinted genes whose altered dosage is associated with developmental defects and liver oncogenesis[24,29,49]. Interestingly, the expression of the DLK1-DIO3 locus has been proposed as a marker of induced pluripotent stem cells, thereby supporting its role in early development[50]. In agreement with these previous studies, our data suggest a fine-tuned regulation of the 14q32 region during liver development since its transcripts were highly expressed in fetal livers but strongly repressed in postnatal ones. In that regard, we found that HBs present an aberrant expression of 14q32 genes, an observation that supports the notion that these genes are involved in hepatic tumorigenesis. The oncogenic role of 14q32 genes in the liver was initially identified through research into the mechanisms involved in the spontaneous development of HCC in mice treated with adeno-associated viruses (AAVs)[29]. Moreover, the genetargeting frequency of this locus by AAVs was shown to be sufficient to initiate multiple foci of HCC in mice characterized by Dlk1-Glt2 overexpression linked to CpG hypomethylation[51]. In agreement with these experimental data, two independent studies reported the overexpression of 14q32 genes in a subset of 6-19% of HCC patients with poor prognosis[49,52]. In line with these studies on HCC, we found that the overexpression of 14q32 genes is linked to high expression of liver progenitor cell markers and Wnt/β-catenin targets and that it influences the survival of patients with HB. Collectively, our study pinpoints the overexpression of 14q32 genes of the DLK1DIO3 locus as a novel oncogenic hallmark of HB. Dysregulation of DNA methylation might be considered a third epigenetic hallmark of HB. DNA methylation plays an important role in cell differentiation and cancer, influencing the regulation of gene expression networks[53,54]. The genome-wide DNA hypomethylation that we found in HB is consistent with the findings of previous reports[15–17]. As a novelty, we discovered two distinct epigenetic profiles in HB

Carrillo-Reixach et al 18

based on the degree of DNA hypomethylation and CpG island hypermethylation. We also associate these epigenomic traits with the previously defined C1/C2/C2B molecular subclasses[9,10]. Our results support the strong interplay between the epigenome and the transcriptome for determining distinct tumor molecular entities. Moreover, we investigated the impact of CpG island hypermethylation, an additional level of epigenetic dysregulation in Epi-CB tumors belonging mainly to the C2 subtype, and identified novel tumor suppressor candidates whose expression was strongly repressed in aggressive HB, which could be explored in further functional studies. In an attempt to translate our findings into the clinical setting, we propose the first Molecular Risk Stratification system, called MRS, for HB based on the presence of the two novel prognostic biomarkers identified in the current study (i.e. 14q32-gene signature and Epi-CA/B). Of note, the prognostic impact of the MRS is improved by incorporating the updated 16-gene signature described here that includes VIM expression to distinguish the recently reported C2B subclass[10]. The use of the MRS compared to our previously published 16-gene signature is that it improves its performance at discriminating patients according to their prognosis. This could be explained arguing that, the fact that MRS integrates both epigenetic and transcriptomic classifiers allows a better representation of the molecular complexity of HB. Moreover, similar to the clinical CHIC stratification[6], the integration of multiple molecular prognostic factors may have an additive effect in terms of risk prediction. To note that by combining the clinical CHIC and the molecular MRS systems, we have been able to further improve patient risk prediction. In this regard, our findings highlight the importance of incorporating molecular factors into the clinical setting, thereby facilitating future precision medicine. The main limitations of the current study lies in the use of a retrospective cohort of patients treated with different chemotherapeutic protocols and the study of post-chemotherapy specimens. Thus, the implementation of our findings into the clinical setting requires a further validation in diagnostic biopsies from homogeneously-treated patients and probably the definition of a new algorithm integrating clinical and molecular parameters; the prospective cohort of patients enrolled in the ongoing Paediatric Hepatic International Tumour Trial (PHITT, NCT03017326) provides a unique opportunity to conduct such validation. Finally, we identified CHKA as a novel potential therapeutic target for HB patients. CHKA, a key gene for membrane biosynthesis, overexpressed in different neoplasms

Carrillo-Reixach et al 19

such as breast, lung, prostate, and HCC[55–58]. The complete abrogation of tumor growth that we observed in vitro and in vivo using two HB cell lines and a PDX model is achieved by an inhibition of proliferation and induction of cell death. Of note, CHKA has been proposed as a therapeutic target in different tumor types[58,59] and TCD-717 has already been evaluated in a phase I clinical trial (NCT01215864) in advanced solid tumors. Our findings thereby support further attention to CHKA inhibition as a potential therapy for HB patients. The similarities observed between HB (i.e. BLCAP hyperediting, 14q32 locus overexpression, CHKA overexpression) and HCC also point to common underlying mechanisms between the main hepatic tumors in childhood and adulthood and shed light on overlapping molecular mechanisms and common therapeutic approaches. In summary, our data provide novel epigenetic insights into HB and establish the rationale to advance towards precision medicine by identifying new biology-driven therapies and incorporating molecular data into patient stratification.

ACKNOWLEDGEMENTS We thank patients and their families for allowing the collection of biological samples. We also would like to acknowledge the support of the Spanish Group for the Study Of Childhood Liver Cancer of the Spanish Society of Paediatric Haematology and Oncology (SEHOP) and the clinical staff of Spanish hospitals that provided us with samples and clinical data (Maciej Murawski, Marta Villa, Sandra Pisa, Montserrat Torrent, Pablo Trujillo, Ester Anton, Jose Andres Molino, Hermenegildo González, Mª Esther Llinares, José Luis Fuster and Isabel Ojanguren). We thank the IGTP Biobank and Genomic Units and the HUVR-IBiS Biobank (Andalusian Public Health System Biobank and ISCIII-Red de Biobancos PT17/0015/0041) for the assessment and technical support provided. We also thank Dr. Eduard Cabré for his advice on statistical analysis. We acknowledge the personnel at the Genome Analysis Platform of CIC bioGUNE and CIBERehd for their help on experimental design and for processing the methylation Beadchips. We acknowledge Koji Nakamura (CHIOME Biosciences Inc) for kindly providing DLK1 antibody. We thank Ugne Balaseviciute for her assistance in experimental procedures.

Carrillo-Reixach et al 20

Carrillo-Reixach et al 21

TABLE Table 1. Main clinical and pathological features of the 113 HB patients included in the study.

Carrillo-Reixach et al 22

FIGURE LEGENDS

Figure 1. The genomic landscape of Hepatoblastoma (HB). a, Main gene mutations, fusion transcripts and chromosomal alterations in the 34 tumor (T) samples from the 32 patients (discovery set) ranked by their frequency (right plot) and considering only one sample per patient. CTNNB1 and NFE2L2 mutations and BLCAP nucleotide (nt) 5 change were identified by RNA-seq. Only copy number alterations and allelic imbalances identified by the CytoScan HD array in at least four primary tumors are shown. HBs with a HCN-NOS histology are marked with an asterisk. b, Total number of main aberrations in the above tumor samples. X-axis indicates the individual tumors; Y-axis indicates the number of total gene mutations (black), copy number alterations (gains, blue; losses, red) and allelic imbalances (pink) per sample. R, recurrence. Figure 2. Dysregulation of RNA editing in HB. a, Chromatogram of DNA and RNA Sanger sequences of the BLCAP gene in tumor (T) and non-tumor (NT) samples of a representative case with RNA editing of nucleotide (nt) 5 (highlighted in yellow). The black arrow indicates the ATG start codon. b, Fractional abundance (FA) of nt 5 edited vs. non-edited BLCAP assessed by droplet digital PCR (ddPCR) in the 31 paired T and NT samples (discovery set) for which RNA was available (paired t-test). c, Global editing index in the 32 cases of the discovery set determined by RNA-seq (paired t-test). d, Gene expression of ADAR1 and ADAR2 genes in T and NT samples (Mann Whitney test). HTA plot includes data of 18 NT and 32 T samples; RNA-seq plot includes data of 32 NT and 32 T samples. Gene expression is given in normalized arbitrary units (HTA array) or counts (RNA-seq). Figure 3. Overexpression of genes of the 14q32 DLK1-DIO3 locus is a hallmark of HB. a, Top, scheme of chromosome 14 in which the overexpressed 14q32 region in HB is marked with a red square. Middle, detail of the overexpressed 14q32 genes of the DLK1-DIO3 locus. Bottom, the X-axis indicates the gene chromosomal localization from 14q32.13 to 14qter. Y-axis is the mean tumor (T)/non-tumor (NT) fold-change (FC) of the genes of the 14q32 region present in the HTA array in normalized arbitrary units. The box and whiskers represent the 25th to 75th percentiles and ± min to max values, respectively, of log2 gene expression of 32 tumors. b, GSEA enrichment plot using the 96 genes of the HTA array out of the total of 101 genes presentin the 14q32

Carrillo-Reixach et al 23

region (chr14:101193164-101539274) based on the Genome Browser Database hg19. c, Unsupervised clustering and heatmap of the 133 HTA array probesets (right) at the overexpressed 14q32 chromosomal region of 32 T and 18 NT samples. d, Left, heatmap of the methylation degree of the 32 significant CpGs (FDR<0.05) localized in the overexpressed 14q32 region. The color range in the heatmap indicates the methylation degree of each CpG. Right, correlation between mean gene expression of 14q32 genes (133 probesets) and the methylation (β-value) of the most significant 14q32 CpG, which is localized in a MEG3 intronic region (Spearman test). e, Mean methylation levels (βvalue) of all 568 CpGs localized in the 14q32 region in the three different sample types (NT, n=19; T with moderate and strong 14q32-gene signature, n=12 and n=15, respectively; fetal liver samples, n=5). P-values were calculated using the ANOVA test with the Tukey post-test. Figure 4. Identification of two epigenetic HB subtypes resembling the transcriptomic C1/C2 subclasses. a, Top, Principal Component Analysis (PCA) of tumor (T, n=28) and non-tumor (NT, n=19) samples using normalized methylation data of the 685,375 CpGs of the 850K-array obtained after filtering those probes containing single nucleotide polymorphisms. Bottom, mean global methylation levels (β-value) of the same samples (unpaired t-test). The box and whiskers represent the 25th to 75th percentiles and ± min to max values, respectively. b, Representative unsupervised clustering (Euclidean Ward method) of same CpGs used for the PCA. P-values indicate the associations between the listed molecular features and the two tumor epigenetic clusters, Epi-CA and Epi-CB. P-values were calculated using Fisher’s exact and #Chisquare tests. c, Methylation levels of the epigenetic substructures in NT (n=19), EpiCA (n=13) and Epi-CB (n=13) tumors. The box and whiskers represent the 25th to 75th percentiles and ± min to max values, respectively. P-values were calculated using the ANOVA test with the Tukey post-test (*p<0.05; **p≤0.001, ***p<0.0001). d, Representation of the five most strongly repressed genes by CpG island hypermethylation in Epi-CB tumors (RNA-seq/HTA criteria: FC<-2 and FDR<0.001; 850K-array criteria: β>0.2 and FDR<0.0001). The X-axis indicates methylation levels of the most hypermethylated CpGs islands for each gene and the Y-axis the linear gene expression levels (HTA). The grey shadow indicated the low CpG island methylation levels associated to high gene expression. e, Global methylation levels (Y-axis) of 19 NT (white dots) and 5 FLs (green dots) at different gestational and postnatal ages (X-

Carrillo-Reixach et al 24

axis). The light and dark purple shadows indicate the 25th and 75th percentiles of the methylation levels of Epi-CA and Epi-CB tumors, respectively, and their extrapolation over time. Note the plateau of methylation levels at ~ 12-24 months of age. Red, exponential curve.

Figure 5. Molecular Risk Stratification of HB (MRS-HB). a, Top, representation of the MRS-HB categories for the 106 patients from which all biomarkers could be assessed. Patient stratification was based on transcriptomic and epigenetic classifiers (right): 14q32- and 16+VIM-gene signatures and epigenomic Epi-CA/Epi-CB classification. Bottom, main genomic, pathologic and clinical features and their association with the MRS (right, Chi-Square test). b, Event-Free Survival (EFS) Kaplan-Meier plots of the same patients stratified according to the MRS-HB into three (left) or four (right) categories. Vertical line indicates 3-year EFS probability (in %). c, Multivariate Cox regression analysis comparing MRS with clinical CHIC-HS Stratification[6]. CI, confidence interval; HR, hazard ratio. d, EFS Kaplan-Meier plots of the same patients with HB classified following the CHIC-HS stratification (VL/L, Very Low and Low; L, Low; I, Intermediate; H, High risk). e, Scheme used to combine CHIC-HS and MRS-HB classifications. f, EFS Kaplan-Meier plots of the 106 HB patients classified by combining clinical and molecular risk stratification systems. Figure 6. The effect of inhibition of Choline Kinase A (CHKA) in HB. a, CHKA gene expression assessed by HTA and RNA-seq in the different risk molecular risk categories (MRS-1, n=11; MRS-2, n=8; MRS-3, n=13)╪ b, Left, Images of CHKA immunostaining for representative tumors according to MRS. Right, quantification of stained areas of CHKA immunohistochemistry in formalin-fixed paraffin-embedded tumor samples (MRS-1, n=9; MRS-2, n=6; MRS-3, n=5)╪. c, Cell viability assay (MTT)╪ (top) and colony-formation assay╪ (middle) of HepG2 (left) and Huh6 (right) cells treated by at different concentrations of CHKA inhibitors (MN58b and TCD-717) for 48h. 0 µM of TCD-717 was prepared with DMSO at the same concentration of 8 µM of TCD-717. Representative colony-formation assay images of the different conditions (bottom). Data given are from a minimum of three independent experiments. P-values were calculated versus control. d, CHKA gene expression♯ assessed by RTqPCR of control (Sham siRNA, white) and CHKA knock-down (CHKA-siRNA, black) in Huh6 cells. Cell viability assay (MTT) assay╪ of the same cells at different time

Carrillo-Reixach et al 25

points of culture. Data given are from four independent experiments. P-values were calculated versus control. e, Tumor growth curve of a patient-derived xenograft (PDX) established from a high-risk tumor (MRS-3, case T50) following intraperitoneal injection of vehicle (phosphate buffered saline, control; n=5) or 3 mg/kg/day of CHKA inhibitor (MN58b; n=6). P-value was calculated using the two-tailed t-test with Welch’s correction at 26 days of treatment. Other statistical analysis performed were ANOVA test with the Tukey post-test╪ and unpaired t-test♯ (*p<0.05; **p≤0.001, ***p<0.0001).

SUPPLEMENTARY FIGURES Figure S1. Validation of fusion transcripts by RT-PCR-Sanger. a, Electrophoresis images of the RT-PCR products of the identified fusion transcripts in the tumor (T) and their corresponding patient-derived xenograft (PDX). b, Chromatogram of the RNA Sanger sequences of the identified fusion transcripts. NT, non-tumor. Telomeric repeat binding factor 2 (TERF2), AU RNA binding methylglutaconyl-CoA hydratase (AUH), DnaJ heat shock protein family (Hsp40) member C15 (DNAJC15), tumor protein, translationally controlled 1 (TPT1)¸ transmembrane protein 163 (TMEM163), solute carrier family 25 member 16 (SLC25A16). Figure S2. HB genomic profile. Top, genomic profile of copy number alterations (a) and allelic imbalances (b) in the 34 tumor (T) samples from 32 patients (discovery set) including 3 recurrences (R). Bottom, frequencies of the chromosomal aberrations in the 31 primary tumors, in the 2 HBs with a histology of hepatocellular malignant neoplasms not otherwise specified (HCN-NOS) and in 3 recurrences. Note that tumor samples T19 and R19 from a same patient with a family history indicative of familial adenomatous polyposis have a deletion at 5q22.2-q31.1, which leads to a truncation in the APC gene. Figure S3. Unsupervised transcriptomic co-clustering analysis of HB. a, Representation of the co-clustering index matrix at K=3 of 32 tumor and 18 non-tumor samples considering the unsupervised analysis obtained using 3 clustering methods (average, complete and ward D2) and 4 top-ranked coefficients of variation (CV) gene lists (60%, 80%, 95% and 97.5%). The black squares indicate the three co-clusters (CCs) found. b, Unsupervised clustering of the 4-gene signature reported by Hooks et al[10] in the 32 tumor samples classified according the three co-clusters. *P-values were

Carrillo-Reixach et al 26

calculated using Chi-square test. #P-values were calculated using Fisher’s exact test. c, Representation of VIM expression in normalized arbitrary units by HTA data (left) and RNA-seq (right) in the three co-clusters found above (CC1, CC2, CC3). P-values are calculated using the non-parametric Kruskal-Wallis test with the Dunn’s multiple comparisons post-test. Figure S4. 14q32 gene overexpression and definition of the 14q32-gene signature in the 32 HB patients of the discovery set. a, Representation of the 14q32 mean gene expression (HTA, 133 probe sets of 97 genes; RNA-seq: 110 genes) in 18 paired tumor (T) and non-tumor (NT) liver samples (left, HTA normalized data) and 32 paired T and NT liver samples (right, RNA-seq normalized counts). P-values were calculated using the Wilcoxon matched-pairs signed-rank test. b, Expression of the indicated genes included in the 14q32-gene signature for the same samples using HTA-array (left) and RNA-seq (right). P-values were calculated using the Wilcoxon matched-pairs signedrank test. c, Left, representation of tumor (T)/non-tumor (NT) fold-change (FC) of the mean expression of all 97 genes localized in the 14q32 region in the 32 tumor samples (one sample per patient) of the discovery set in normalized arbitrary units (using HTA data). Right, number of 14q32 genes with T/NT FC ≥ 10 in the same tumors. d, Representation of strong expression (black square, T/NT FC ≥ 10) of the four selected 14q32 genes in the 32 tumor samples and their classification on the basis of the 14q32gene signature. Cases with a moderate 14q32-gene signature were defined when none, one or two of the 14q32 genes were overexpressed (T/NT FC ≥ 10), whereas cases with strong 14q32-gene expression were defined when three or four of these transcripts were overexpressed using the same criteria. Figure S5. Validation of 14q32 gene overexpression in the 80 HB patients of the validation set. a, Gene expression (–∆∆Ct) of the four transcripts of the 14q32-gene signature in non-tumor (NT, n=11), tumor (T, n=80), and fetal liver (FL, n=5) samples of the validation set. P-values were calculated using the non-parametric Kruskal-Wallis test with the Dunn’s multiple comparisons post-test. b, Correlation of the expression (in –∆∆Ct) of the four genes of the 14q32-gene signature with the expression of LGR5, a well-known wnt/β-catenin target gene[27]. Coefficient of correlation and p-value were calculated using the Spearman test. c, Representation of strong expression (black square, T/NT FC ≥ 10) of the four selected 14q32 genes in the 80 tumor samples and their classification on the basis of the 14q32-gene signature. Cases with a moderate

Carrillo-Reixach et al 27

14q32-gene signature were defined when none, one or two of the 14q32 genes were overexpressed (T/NT FC ≥ 10), whereas cases with strong 14q32-gene expression were defined when three or four of these transcripts were overexpressed using the same criteria. d, Contingency plot of 80 samples with mutated (grey) and wild type (white) CTNNB1. P-value was calculated using Fisher’s exact test. e, Box-plot representation of LGR5 gene expression (in –∆∆Ct) according to the 14q32-gene signature (moderate, n=29; strong n=51). Each box represents the 25th–75th percentile range, the horizontal bar indicates the median, and the whiskers represent min to max. P-value was calculated using the unpaired t-test. Figure S6. Virus integration analysis. Scheme and results of the pipeline used to detect possible virus integration events from whole genome sequencing data of a tumor with a strong 14q32-gene signature (T31). Figure S7. Epigenomic dysregulation in HB is correlated with an aberrant expression of epigenetic regulators. a, Gene expression by HTA of the non-tumor and Epi-CA/B tumor samples of tet methylcytosine (TET) and DNA methyltransferases (DNMT) genes (p-values were calculated using the ANOVA test with the Tukey posttest). The box and whiskers represent the 25th to 75th percentiles and ± min to max values, respectively. b, Left, Linear correlation of gene expression of TET with global methylation levels (mean of all β-values) global. Right, DNMT genes and CpG island methylation levels (mean CpG island β-values). P-values were calculated using the Spearman test. Figure S8. Survival analysis of the transcriptomic and epigenomic classifiers. Kaplan-Meier curves of Event-Free Survival (EFS) on the basis of the 14q32-gene signature (moderate/strong), Epigenetic classification (Epi-CA/Epi-CB), 16-gene classification (C1/C2) and 16+VIM-gene classification (C1/C2A/C2B, see more details in Methods). Figure S9.

Proliferation and liver progenitor markers in MRS tumors.

Immunohistochemistry images of proliferation (Ki67), liver progenitor (EpCAM, GPC, KRT19 and AFP) and 14q32 (DLK1) markers in seriate sections of representative primary tumors of the 3 MRS categories, 40× magnification. Right, quantification of stained areas of 20 tumors (MRS-1, n=9; MRS-2, n=6; MRS-3, n=5). The box and

Carrillo-Reixach et al 28

whiskers represent the 25th to 75th percentiles and ± min to max values, respectively and the p-values were calculated using the ANOVA test with the Tukey post-test. Figure S10. CHKA expression in the Patient-Derived Xenografts (PDXs). a, CHKA gene expression of the 32 primary tumors classified according to MRS and the PDXs available (in red is marked the PDX model T50). P-values were calculated using the ANOVA test with the Tukey post-test. b, Representative images at 40× of CHKA immunohistochemistry of the selected PDX (right) and the primary tumor T50 from which it was derived (left). c, Representative images at 40x of CHKA immunohistochemistry of the panel of 12 different PDXs. Figure S11. Proliferation and liver progenitor markers in vehicle- and MN58btreated PDXs. Left, Immunohistochemistry images at 40× of proliferation (Ki67 and CCND1), liver progenitor (EpCAM, GPC, KRT19 and AFP) and 14q32 (DLK1) markers in representative PDXs treated with control (vehicle) or MN58b. Right, quantification of IHC-stained areas for the indicated markers in control (n=4) and MN58b-treated (n=6) PDXs from which tissue material was available (unpaired MannWhitney test). The box and whiskers represent the 25th to 75th percentiles and ± min to max values, respectively and the p-values were calculated using the Mann-Whitney test. Figure S12. Necrosis and apoptosis markers in vehicle- and MN58b-treated PDXs. a, Left, H&E images at 40x of a representative PDXs treated with control (vehicle) or MN58b. Right, quantification of the necrotic area in control (n=4) and MN58b (n=6)treated PDXs. P-value are calculated using unpaired t-test. b, Left, IHC images of the active caspase-3 at 40x of a representative PDXs treated with control (vehicle) or MN58b.

REFERENCES [1]

Schnater JM, Köhler SE, Lamers WH, Von Schweinitz D, Aronson DC. Where do we stand with hepatoblastoma? A review. Cancer 2003;98:668–78. https://doi.org/10.1002/cncr.11585.

[2]

Linabery AM, Ross JA. Trends in childhood cancer incidence in the U.S. (1992-2004). Cancer 2008;112:416–32. https://doi.org/10.1002/cncr.23169.

[3]

Kehm RD, Spector LG, Osypuk TL, Poynter JN, Vock DM. Do pregnancy

Carrillo-Reixach et al 29 characteristics contribute to rising childhood cancer incidence rates in the United States ? 2018:1–9. https://doi.org/10.1002/pbc.26888. [4]

Aronson DC, Czauderna P, Maibach R, Perilongo G, Morland B. The treatment of hepatoblastoma: Its evolution and the current status as per the SIOPEL trials. J Indian Assoc Pediatr Surg 2014;19:201–7. https://doi.org/10.4103/0971-9261.142001.

[5]

Semeraro M, Branchereau S, Maibach R, Zsiros J, Casanova M, Brock P, et al. Relapses in hepatoblastoma patients: Clinical characteristics and outcome-Experience of the International Childhood Liver Tumour Strategy Group (SIOPEL). Eur J Cancer 2013;49:915–22. https://doi.org/10.1016/j.ejca.2012.10.003.

[6]

Meyers RL, Maibach R, Hiyama E, Häberle B, Krailo M, Rangaswami A, et al. Riskstratified staging in paediatric hepatoblastoma: a unified analysis from the Children’s Hepatic tumors International Collaboration. Lancet Oncol 2017;18:122–31. https://doi.org/10.1016/S1470-2045(16)30598-8.

[7]

Armengol C, Cairo S, Fabre M, Buendia MA. Wnt signaling and hepatocarcinogenesis: The hepatoblastoma model. Int J Biochem Cell Biol 2011;43:265–70. https://doi.org/10.1016/j.biocel.2009.07.012.

[8]

Perugorria MJ, Olaizola P, Labiano I, Esparza-baquer A, Marzioni M, Marin JJG, et al. Wnt – β -catenin signalling in liver development , health and disease. Nat Rev Gastroenterol Hepatol 2019;16. https://doi.org/10.1038/s41575-018-0075-9.

[9]

Cairo-Armengol, De Reyniès A, Wei Y, Thomas E, Renard C-AA, Goga A, et al. Hepatic Stem-like Phenotype and Interplay of Wnt/β-Catenin and Myc Signaling in Aggressive Childhood Liver Cancer. Cancer Cell 2008;14:471–84. https://doi.org/10.1016/j.ccr.2008.11.002.

[10]

Hooks KB, Audoux J, Fazli H, Lesjean S, Ernault T, Senant N-D, et al. New insights into diagnosis and therapeutic options for proliferative hepatoblastoma. Hepatology 2017;00:1–36. https://doi.org/10.1002/hep.29672.

[11]

Sumazin P, Chen Y, Treviño LR, Sarabia SF, Hampton OA, Patel K, et al. Genomic analysis of hepatoblastoma identifies distinct molecular and prognostic subgroups. Hepatology 2017;65:104–21. https://doi.org/10.1002/hep.28888.

[12]

Gröbner SN, Worst BC, Weischenfeldt J, Buchhalter I, Kleinheinz K, Rudneva VA, et al. The landscape of genomic alterations across childhood cancers. Nature 2018;555:321–7. https://doi.org/10.1038/nature25480.

[13]

Eichenmüller M, Trippel F, Kreuder M, Beck A, Schwarzmayr T, Häberle B, et al. The genomic landscape of hepatoblastoma and their progenies with HCC-like features. J Hepatol 2014;61:1312–20. https://doi.org/10.1016/j.jhep.2014.08.009.

[14]

Buendia MA. Unravelling the genetics of hepatoblastoma: Few mutations, what else? J

Carrillo-Reixach et al 30 Hepatol 2014;61:1202–4. https://doi.org/10.1016/j.jhep.2014.09.016. [15]

Maschietto M, Rodrigues TC, Kashiwabara AY, de Araujo ÉSS, Aguiar TFM, da Costa CML, et al. DNA methylation landscape of hepatoblastomas reveals arrest at early stages of liver differentiation and cancer-related alterations. Oncotarget 2017;8:97871–89. https://doi.org/10.18632/oncotarget.14208.

[16]

Cui X, Liu B, Zheng S, Dong K, Dong R. Genome-wide analysis of DNA methylation in hepatoblastoma tissues. Oncol Lett 2016;12:1529–34. https://doi.org/10.3892/ol.2016.4789.

[17]

Honda S, Minato M, Suzuki H, Fujiyoshi M, Miyagi H, Haruta M, et al. Clinical prognostic value of DNA methylation in hepatoblastoma: Four novel tumor suppressor candidates. Cancer Sci 2016;107:812–9. https://doi.org/10.1111/cas.12928.

[18]

Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002;30:207–10. https://doi.org/10.1093/nar/30.1.207.

[19]

Tomlinson GE, Douglass EC, Pollock BH, Finegold MJ, Schneider NR. Cytogenetic evaluation of a large series of hepatoblastomas: Numerical abnormalities with recurring aberrations involving 1q12-q21. Genes, Chromosom Cancer 2005;44:177–84. https://doi.org/10.1002/gcc.20227.

[20]

Albrecht S, Schweinitz D von, Waha A, Kraus JA, von Detailing A, Pietsch T. Loss of Maternal Alleles on Chromosome Arm 11p in Hepatoblastoma. Cancer Res 1994;54:5041–4.

[21]

López-Terrada D, Alaggio R, de Dávila MT, Czauderna P, Hiyama E, Katzenstein H, et al. Towards an international pediatric liver tumor consensus classification: Proceedings of the Los Angeles COG liver tumors symposium. Mod Pathol 2014;27:472–91. https://doi.org/10.1038/modpathol.2013.80.

[22]

Levanon EY, Hallegger M, Kinar Y, Shemesh R, Djinovic-Carugo K, Rechavi G, et al. Evolutionarily conserved human targets of adenosine to inosine RNA editing. Nucleic Acids Res 2005;33:1162–8. https://doi.org/10.1093/nar/gki239.

[23]

Nishikura K. Functions and regulation of RNA editing by ADAR deaminases. Annu Rev Biochem 2010;79:321–49. https://doi.org/10.1146/annurev-biochem-060208-105251.

[24]

Rocha ST da, Edwards CA, Ito M, Ogata T, Ferguson-Smith AC. Genomic imprinting at the mammalian Dlk1-Dio3 domain. Trends Genet 2008;24:306–16. https://doi.org/10.1016/j.tig.2008.03.011.

[25]

Benetatos L, Hatzimichael E, Londin E, Vartholomatos G. The microRNAs within the DLK1-DIO3 genomic region : involvement in disease pathogenesis 2013:795–814. https://doi.org/10.1007/s00018-012-1080-8.

Carrillo-Reixach et al 31 [26]

López-Terrada D, Gunaratne PH, Adesina AM, Pulliam J, Hoang DM, Nguyen Y, et al. Histologic subtypes of hepatoblastoma are characterized by differential canonical Wnt and Notch pathway activation in DLK+ precursors. Hum Pathol 2009;40:783–94. https://doi.org/10.1016/j.humpath.2008.07.022.

[27]

Van de Wetering M, Sancho E, Verweij C, De Lau W, Oving I, Hurlstone A, et al. The β-catenin/TCF-4 complex imposes a crypt progenitor phenotype on colorectal cancer cells. Cell 2002;111:241–50. https://doi.org/10.1016/S0092-8674(02)01014-0.

[28]

Zimmermann A. The emerging family of hepatoblastoma tumours: From ontogenesis to oncogenesis. Eur J Cancer 2005;41:1503–14. https://doi.org/10.1016/j.ejca.2005.02.035.

[29]

Donsante A, Miller DG, Li Y, Vogler C, Brunt EM, Russell DW, et al. AAV Vector Integration Sites in Mouse Hepatocellular Carcinoma. Science (80- ) 2007;317:477–477. https://doi.org/10.1126/science.1142658.

[30]

Wang P-R, Xu M, Toffanin S, Li Y, Llovet JM, Russell DW. Induction of hepatocellular carcinoma by in vivo gene targeting. Proc Natl Acad Sci 2012;109:11264–9. https://doi.org/10.1073/pnas.1117032109.

[31]

Harada K, Toyooka S, Maitra A, Maruyama R, Toyooka KO, Timmons CF, et al. Aberrant promoter methylation and silencing of the RASSF1A gene in pediatric tumors and cell lines. Oncogene 2002;21:4345–9. https://doi.org/10.1038/sj.onc.1205446.

[32]

Chow RKK, Sin ST-K, Liu M, Li Y, Chan THM, Song Y, et al. AKR7A3 suppresses tumorigenicity and chemoresistance in hepatocellular carcinoma through attenuation of ERK, c-Jun and NF-κB signaling pathways. Oncotarget 2017;8:83469–79. https://doi.org/10.18632/oncotarget.12726.

[33]

Chen C, Wang L, Liao Q, Huang Y, Ye H, Chen F, et al. Hypermethylation of EDNRB promoter contributes to the risk of colorectal cancer. Diagn Pathol 2013;8:909. https://doi.org/10.1186/1746-1596-8-199.

[34]

Deloria AJ, Höflmayer D, Kienzl P, Łopatecka J, Sampl S, Klimpfinger M, et al. Epithelial splicing regulatory protein 1 and 2 paralogues correlate with splice signatures and favorable outcome in human colorectal cancer. Oncotarget 2016;7:73800–16. https://doi.org/10.18632/oncotarget.12070.

[35]

Tessitore L, Sesca E, Vance DE. Inactivation of phosphatidylethanolamine Nmethyltransferase-2 in aflatoxin-induced liver cancer and partial reversion of the neoplastic phenotype by PEMT transfection of hepatoma cells. Int J Cancer 2000;86:362–7.

[36]

Zhang F, Sun H, Zhang S, Yang X, Zhang G, Su T. Overexpression of PER3 Inhibits Self-Renewal Capability and Chemoresistance of Colorectal Cancer Stem-Like Cells via Inhibition of Notch and β-Catenin Signaling. Oncol Res Featur Preclin Clin Cancer Ther

Carrillo-Reixach et al 32 2017;25:709–19. https://doi.org/10.3727/096504016X14772331883976. [37]

Aoyama C, Liao H, Ishidate K. Structure and function of choline kinase isoforms in mammalian cells. Prog Lipid Res 2004;43:266–81. https://doi.org/10.1016/j.plipres.2003.12.001.

[38]

de la Cueva A, Ramírez de Molina A, Álvarez-Ayerza N, Ramos MA, Cebrián A, Pulgar TG del, et al. Combined 5-FU and ChoKα Inhibitors as a New Alternative Therapy of Colorectal Cancer: Evidence in Human Tumor-Derived Cell Lines and Mouse Xenografts. PLoS One 2013;8:1–13. https://doi.org/10.1371/journal.pone.0064961.

[39]

Paz-Yaacov N, Bazak L, Buchumenski I, Porath HT, Danan-Gotthold M, Knisbacher BA, et al. Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors. Cell Rep 2015;13:267–76. https://doi.org/10.1016/j.celrep.2015.08.080.

[40]

Kung C-P, Maggi LB, Weber JD. The Role of RNA Editing in Cancer Development and Metabolic Disorders. Front Endocrinol (Lausanne) 2018;9:1–21. https://doi.org/10.3389/fendo.2018.00762.

[41]

Galeano F, Leroy A, Rossetti C, Gromova I, Gautier P, Keegan LP, et al. Human BLCAP transcript: New editing events in normal and cancerous tissues. Int J Cancer 2010;127:127–37. https://doi.org/10.1002/ijc.25022.

[42]

Gromova I, Romov P, E. Celis J. bc10 : A NOVEL HUMAN BLADDER CANCERASSOCIATED PROTEIN WITH A CONSERVED GENOMIC STRUCTURE DOWNREGULATED IN INVASIVE CANCER 2002;546:539–46. https://doi.org/10.1002/ijc.10244.

[43]

Han L, Diao L, Yu S, Xu X, Li JBJJ, Zhang R, et al. The Genomic Landscape and Clinical Relevance of A-to-I RNA Editing in Human Cancers. Cancer Cell 2015;28:515– 28. https://doi.org/10.1016/j.ccell.2015.08.013.

[44]

Hu X, Wan S, Ou Y, Zhou B, Zhu J, Yi X, et al. RNA over-editing of BLCAP contributes to hepatocarcinogenesis identified by whole-genome and transcriptome sequencing. Cancer Lett 2015;357:510–9. https://doi.org/10.1016/j.canlet.2014.12.006.

[45]

Paz N, Levanon EY, Amariglio N, Heimberger AB, Ram Z, Constantini S, et al. Altered adenosine-to-inosine RNA editing in human cancer. Genome Res 2007;17:1586–95. https://doi.org/10.1101/gr.6493107.

[46]

Chen W, He W, Cai H, Hu B, Zheng C, Ke X, et al. A-to-I RNA editing of BLCAP lost the inhibition to STAT3 activation in cervical cancer. 2017.

[47]

Zhao M, Zhang L, Qiu X, Zeng F, Chen W, An Y, et al. BLCAP arrests G1/S checkpoint and induces apoptosis through downregulation of pRb1 in HeLa cells. Oncol Rep 2016;35:3050–8. https://doi.org/10.3892/or.2016.4686.

Carrillo-Reixach et al 33 [48]

Chen W, He W, Cai H, Hu B, Zheng C, Ke X, et al. A-to-I RNA editing of BLCAP lost the inhibition to STAT3 activation in cervical cancer. Oncotarget 2017;8:39417–29. https://doi.org/10.18632/oncotarget.17034.

[49]

Luk JM, Burchard J, Zhang C, Liu AM, Wong KF, Shek FH, et al. DLK1-DIO3 genomic imprinted microRNA cluster at 14q32.2 defines a stemlike subtype of hepatocellular carcinoma associated with poor survival. J Biol Chem 2011. https://doi.org/10.1074/jbc.M111.229831.

[50]

Benetatos L, Vartholomatos G, Stem ÁMÁ. DLK1-DIO3 imprinted cluster in induced pluripotency : landscape in the mist 2014:4421–30. https://doi.org/10.1007/s00018-0141698-9.

[51]

Wang P, Xu M, Toffanin S, Li Y, Llovet JM, Russell DW. Induction of hepatocellular carcinoma by in vivo gene targeting 2012. https://doi.org/10.1073/pnas.1117032109//DCSupplemental.www.pnas.org/cgi/doi/10.1073/pnas.1117032109.

[52]

Toffanin S, Alsinet C, Cornella H, Sia D, Llovet JM. MicroRNAs and the MYC network: A major piece in the puzzle of liver cancer. Gastroenterology 2011;140:2138– 40. https://doi.org/10.1053/j.gastro.2011.04.023.

[53]

Pujadas E, Feinberg AP. Regulated noise in the epigenetic landscape of development and disease. Cell 2012. https://doi.org/10.1016/j.cell.2012.02.045.

[54]

Kulis M, Esteller M. DNA Methylation and Cancer 2010;70:27–56. https://doi.org/10.1016/B978-0-12-380866-0.60002-2.

[55]

Ramírez de Molina A, Gutiérrez R, Ramos MA, Silva JM, Silva J, Bonilla F, et al. Increased choline kinase activity in human breast carcinomas: clinical evidence for a potential novel antitumor strategy. Oncogene 2002;21:4317–22. https://doi.org/10.1038/sj.onc.1205556.

[56]

Lin X-M, Hu L, Gu J, Wang R-Y, Li L, Tang J, et al. Choline Kinase α Mediates Interactions Between the Epidermal Growth Factor Receptor and Mechanistic Target of Rapamycin Complex 2 in Hepatocellular Carcinoma Cells to Promote Drug Resistance and Xenograft Tumor Progression. Gastroenterology 2017;152:1187–202. https://doi.org/10.1053/j.gastro.2016.12.033.

[57]

de Molina AR, Sarmentero-Estrada J, Belda-Iniesta C, Tarón M, de Molina VR, Cejas P, et al. Expression of choline kinase alpha to predict outcome in patients with early-stage non-small-cell lung cancer: a retrospective study 2007;8:889–97. https://doi.org/10.1016/S1470-2045(07)70279-6.

[58]

Asim M Massie C Orafidiya F Pértega-Gomes N Warren A et. Choline Kinase Alpha as an Androgen Receptor Chaperone and Prostate Cancer Therapeutic Target. J Natl Cancer Inst 2015;108:djv371. https://doi.org/10.1093/jnci/djv371.

Carrillo-Reixach et al 34 [59]

Hernández-Alcoceba R, Fernández F, Lacal JC. In vivo antitumor activity of choline kinase inhibitors: A novel target for anticancer drug discovery. Cancer Res 1999;59:3112–8.

HIGHLIGHTS •

Hepatoblastoma (HB) presents an overall dysregulation of RNA editing including that of BLCAP.



Overexpression of a 300kb region within the 14q32 DLK1/DIO3 locus is a new hallmark of HB



We identified two epigenomic HB subtypes –called Epi-CA and Epi-CB- with distinct degree of DNA hypomethylation and CpG island hypermethylation.



The Molecular Risk Stratification of HB (MRS-HB) based on the 14q32signature and Epi-CA/Epi-CB is strongly associated with patient outcome and may complement clinical stratification



The enzyme CHKA could be a novel therapeutic target for HB patients.

Carrillo-Reixach et al Suppl. Methods 1

SUPPLEMENTARY METHODS Patient and samples All samples were collected in accordance with European and Spanish law and institutional ethical guidelines. Informed consent was obtained in accordance with European Union guidelines for biomedical research. The study conformed to the ethical guidelines of the 1975 Declaration of Helsinki and it was approved by the Human Ethics Committee of the Hospital Universitari Germans Trias i Pujol. A total of 159 tissue samples from 113 patients with HB were obtained from 17 pediatric hospitals in Spain and main European biorepositories in France (“HEPATBIO” project and SFCE participating centers the BRIF CRB PARIS SUD BB0033-00089), Ludwig-Maximilians-Universitätand the SIOPEL Liver Tumor Tissue Bank. Patient-derived xenografts (PDXs) were established from 13 cases and genomically characterized elsewhere[1]. Samples with highest quality were selected for the high-throughput study (discovery set, 33 patients); the remaining samples were included in the validation set (80 patients). Both sets were comparable in their clinicopathological features and similar to other HB patient cohorts from studies of the SIOPEL group[2]. HB diagnosis was defined by expert pathologists, and tumors with rhabdoid components were excluded from the study. Almost all patients (92%) included in the study were enrolled in SIOPEL clinical trials. Six human fetal liver samples at different gestational ages (10, 11, 20, 21 weeks) were obtained with the appropriate approval of the Ethical and Scientific Committees by the Tumor Bank of the Vall d’Hebron University Hospital Biobank (PT17/0015/0047), integrated in the Spanish National Biobank Network and Xarxa de Bancs de Tumors de Catalunya (XBTC). Total DNA and RNA extraction was performed on snap-frozen tissues, and nucleic acids were stored at -80ºC until use. Formalin-fixed paraffin-embedded samples were used for pathological examination and immunostaining.

DNA and RNA extraction For DNA and RNA extraction of discovery set samples, the MagMAX™ DNA MultiSample kit and mirVanaTM Isolation kit were used (Thermo Fisher Scientific),

Carrillo-Reixach et al Suppl. Methods 2

respectively. Before RNA isolation, tissue samples were homogenized using the Lysing Matrix type D tubes with the Fastprep system (MP Biomedicals, Santa Ana, California, USA). For validation set samples, RNA was extracted using TRIzolTM reagent following data sheet instructions (Thermo Fisher Scientific), and DNA was extracted using the Phenol:Chloroform method. Briefly, tissue was incubated with Lysis buffer and Proteinase K (Qiagen) overnight at 56ºC. Then, 1 volume (V) of Phenol:Chloroform was added and centrifuged for 20 min at 13000 rpm at 4ºC. The supernatant was precipitated overnight at -20ºC after adding 0.1V of sodium acetate and 3V of cold 100% EtOH. After centrifugation for 30 min at 13000 rpm at 4ºC, the pellet was washed twice with cold 75% EtOH. The resulting pellet was resuspended with 1020 µL of Tris-EDTA buffer solution (Sigma-Aldrich) and stored at -20ºC. Before use, all RNA samples were treated using the DNA-freeTM kit, following manufacturer’s instructions (Thermo Fisher Scientific), in order to eliminate any contaminating DNA. Electrophoresis with 1% of agarose was used to assess DNA integrity and only those samples with DNA fragments above 500 bp were used for the CytoScan HD array (Thermo Fisher Scientific). DNA was measured using a NanoDrop™ 2000/2000c Spectrophotometer (Thermo Fisher Scientific). RNA quantification was determined by Qubit (Thermo Fisher Scientific) and integrity was assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA). Only samples with an RNA Integrity Number (RIN) above 7 were subsequently used in sequencing and array experiments.

RNA-sequencing (RNA-seq) RNA-seq was conducted on poly-A enriched RNA of high quality (RIN > 7), using the HiSeq2500 (2x100 bp paired-end reads, Illumina). Raw sequencing reads were mapped to the GRCh37/hg19 reference genome (UCSC) using STAR (2.4.2g1). For the statistical analysis, R programming (Version 3.3.2) was used, together with different packages from Bioconductor[3]. The FeatureCounts method[4] implemented in the R subread package was applied on these bam files with the gencode version 23 GTF file to count reads aligned to genes in the 66 samples. Genes with counts equal to zero in more than 60 samples were filtered out. Raw library size differences between

Carrillo-Reixach et al Suppl. Methods 3

samples were treated with the weighted “trimmed mean method” (TMM)[5] implemented in the edgeR package[6]. These normalized counts were used for the unsupervised analysis, heatmaps and clustering methods. Fusion transcript and mutation discovery. RNA fusions were detected by filtering chimeric STAR alignments using STARChip (https://github.com/LosicLab/starchip) Mutations were called from paired tumor/normal RNA-Seq data. Mapped RNA-seq reads were subject to splitting, trimming, local indel realignment, and base-score recalibration pre-processing with the IndelRealigner and TableRecalibration tools from Genome Analysis Tool Kit (GATK)[7] under the GATK Best Practices for RNA-seq paradigm

(https://gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-

practices-for-variant-calling-on-rnaseq-in-full-detail). The list of putative fusion events with perfect alignment was filtered according to the following criteria: to discard events that (i) included immunoglobulin domains where chromosomal rearrangements typically occur, (ii) were present in non-tumor samples and (iii) that had no clear protein product. MuTect[8] was then used with default settings to quantify somatic mutation burden. Editing analysis. Analysis of RNA editing was performed based on a previously published methodology[9]. Briefly, tumor and non-tumor mutation calls were first filtered to remove known SNP sites.

Mutations were then filtered using GATK

VariantFiltration to keep only those sites with quality by depth < 2.0, Fisher strand score > 30, and root mean square mapping quality >= 25. An in-house script filtered variants with a quality score < 25 or less than 3 reads supporting either the reference or variant allele. Further filtrations were identical to those already reported[9]; however, we excluded chr X and required non-ALU RNA-editing sites to be present in at least 4 samples. Only A>G or T>C variants were considered as editing. GATK methodology was used for calling mutations from RNA-seq data to generate bam files. The editing index for an Alu region was calculated using the ratio of A-to-G mismatches to the total number of reads aligned with the genomic adenosine within the given Alu region[10]. A global editing index was obtained by averaging the editing index across all Alu and nonAlu regions Differential expression. This analysis was carried out at gene level. Read counts were converted to log2-counts-per-million (logCPM), and the mean-variance relationship was

Carrillo-Reixach et al Suppl. Methods 4

modelled

with

precision

weights

using

voom

approach

in

the

limma

package[11].Analysis of differential expression was made with limma lineal models. Unsupervised analysis. Hierarchical clusters were generated from normalized data filtered by four coefficient of variation (CV) thresholds: 60%, 80%, 95% and 97.5%. Three linkage methods were applied: Average, Complete and Ward.D2, and correlationbased distance was used. All dendrograms were generated using the function hclust, and cutree function was used with k=3 to obtain the different clusters; both tools from the stats package (https://www.R-project.org/).

Transcriptomic array Human Transcriptome Array

(HTA, Thermo Fisher Scientific) hybridization was

performed using 100 ng of RNA (RIN > 7) of discovery set tumors and following the manufacturer’s protocol (P/N 703174 Rev. 1). Quality controls were assessed with Expression Console software (Thermo Fisher Scientific). Differential gene expression analysis. Supervised analysis of gene expression data was performed using Transcriptome Analysis Console software (Thermo Fisher Scientific). Unsupervised clustering analysis. This analysis was made using R (v 3.3.2) programming. The CV was calculated for each probeset ID of the array. Samples with the minimum and the maximum value were deleted in each calculation. IDs with the highest CV values were selected (quantiles: 0.975, 0.95, 0.80 and 0.60), which resulted in 4 subsets of matrices. For each matrix, methods Average, Complete and Ward.D2 with Pearson correlations were used to cluster samples. In each method, the ideal number of groups was predicted using Hartigan k-means clustering with Euclidean distances.

Genomic array The CytoScanTM HD array was performed using 250 ng of DNA following the standard protocol and QC guidelines supplied by the manufacturer (User Manual, P/N 703038 Rev. 3, Thermo Fisher). SNP-array data analysis was performed with Chromosome Analysis Suite Version 2.0.1.2 (Thermo Fisher). All samples met quality parameters.

Carrillo-Reixach et al Suppl. Methods 5

The genomic data (annotated using GRCh37/hg19version) were analyzed using Chromosome Analysis Suite software (Thermo Fisher) to identify Copy Number Aberration (CNA) and Allelic imbalance (AI), the latter defined as the disequilibrium of alleles. Resultant CNA and AI were then manually curated in order to remove false annotated genomic aberrations. Only copy number segments with a minimum genomic size of 50 kb and with at least 10 markers and telomeric AI ≥ 2 Mb and interstitial AI ≥ 25 Mb were reported. Inborn genetic abnormalities were excluded by comparing findings with control databases. Integrative Genomics Viewer was used to visualize CNA and AI data and to calculate frequencies. Small overlapping regions of imbalance were

identified

using

R

software

and

the

Rcpp

package

(https://cran.r-

project.org/web/packages/Rcpp/Rcpp.pdf).

Methylation array The Infinium MethylationEPIC (850K) array was performed following the manufacturer’s instructions (Illumina). 850K array data were subsequently analyzed in R and preprocessed using the normalization method Funnorm, recommended for tumor samples[12], as implemented in the minfi Bioconductor Package (v 1.24). CpGs were annotated with the closest gene. Probes containing single nucleotide polymorphisms were discarded from the analyses to avoid confounding. T7 and T8 samples were excluded of the subsequent analysis because we cannot rule out a minimal non-tumor contamination (to note that they were grouped with the non-tumors in the unsupervised clustering). The MEAL package (v 1.9.3) was used to identify differentially methylated CpGs. DMS analysis was performed using a linear model (limma), and CpGs with a standard deviation of < 0.1 for each comparison were discarded. To study the similarities of the methylation profile of tumor and non-tumor samples, global methylation values of 24 non-tumor samples (19 non-tumor and 5 fetal liver tissues) were plotted on the basis of patient age (in the case of adjacent non-tumor tissues) and gestational age (in the case of fetal liver specimens). The resulting exponential curve was calculated using the non-lineal regression tool (one phase decay) in GraphPad Prism v.5 software. The methylation values of individual Epi-CA and EpiCB tumors were then extrapolated by plotting the semi-logarithmic values of gene expression and methylation and using the linear regression equation.

Carrillo-Reixach et al Suppl. Methods 6

Droplet digital Polymerase Chain Reaction (ddPCR) ddPCR was performed using the QX200 Droplet Digital PCR system (Bio-Rad Laboratories), following the manufacturer’s instructions and using an edited nt 5 FAMlabeled probe (5’-TCATGTgTTGCCTCCAG-3’), wild-type nt 5 HEX-labeled probe (5’- ATCATGTaTTGGCTCCAGT-3’) and specific BLCAP primers. Briefly, cDNA of each sample (diluted 1/25) was partitioned into 20 000 droplets with random edited and wild-type probes, uniformly distributed among the droplets. The emulsified PCR reactions were run in a 96-well plate on a C1000 Touch thermal cycler. The plates were incubated at 95°C for 10 min, followed by 40 cycles of 95°C for 15 sec, 60°C for 60 sec, and a 10-min incubation at 98°C. ddPCR data were analyzed with QuantaSoft TM v1.4.0 software (Bio-Rad Laboratories), and the fractional abundance (FA) of the edited nt5 of BLCAP was automatically calculated (FA= a/a+b, where a=FAM and b=HEX). Hyper-editing of BLCAP was determined on the basis of the tumor/mean non-tumor FA fold-change (FC) ≥ 2[13].

Quantification of unmethylated Alu (QUAlu) The Percentage of UnMethylated Alu (PUMA) was used as a surrogate for global hypomethylation and measured using the QUAlu method[14]. First, 5 ng of DNA was digested using the methylation-sensitive and -insensitive restriction enzymes HpaII and MspI, respectively, which leave identical sticky ends (C/CGG). Subsequently, a synthetic adaptor was ligated. The HpaII and MspI digested-ligated DNA (1:20 diluted) was amplified by quantitative polymerase chain reaction (qPCR) using a primer homologous to the adapter and a primer specific for the Alu consensus sequence. As an internal control to normalize DNA input, we performed a specific qPCR to amplify L1PA (a LINE-1 subfamily).

Whole genome sequencing (WGS) DNA samples were prepared using the Illumina TruSeq DNA PCR free (350 bp) library preparation kit, and pair-end (2 x 150 bp) WGS was performed by Macrogen using an Illumina HiSeqX instrument, following the manufacturer’s instructions (Illumina).

Carrillo-Reixach et al Suppl. Methods 7

Virus integration discovery pipeline. The 1,278,717,208 pair-end sequences (coverage ~117X) obtained were interrogated for the presence of viral sequences. Quality control was performed on raw data using FASTQC[15] software (v0.11.4). Illumina

adapter

sequences

and

low

quality

bases

were

trimmed

using

Trimmomatic[16] software (v0.36). The remaining sequences were aligned to the human reference genome (hg38 assembly) using Bowtie2[17] software (v2.3.2) in local mode to allow local alignments. Partially mapped reads and unmapped reads were considered as virus integration candidates and were selected for subsequent steps. Partially mapped reads may appear as a result of genomic structural variation. Therefore, an additional round of Bowtie2 human genome alignment was performed in end-to-end mode. From this second alignment, non-aligned reads and previous unmapped reads from the first alignment were processed using Prinseq[18] software (v0.20.4) to eliminate low complexity sequences (e.g. homopolymers tracks, short repeats, etc.). The remaining reads, constituting a collection of non-human sequences, were then aligned using Bowtie2 in end-to-end mode to a reference from virus RefSeq sequences (02/02/2017 release, ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/). Using in-house developed scripts, the resulting mapped reads were then processed to integrate human and virus mapping information and thus unravel virus integration sites. As a final filtering step to reduce false positives, only virus integration events supported by more than a single read pair and with more than one region detected were considered reliable virus integration events. See a scheme of the virus integration pipeline in Figure M1 below. Validation of the virus integration discovery pipeline. Analyses on additional datasets were conducted to assess the performance and accuracy of our virus integration discovery pipeline. Three additional datasets were used for this purpose: an artificially constructed dataset, made of 100,000 read pairs from our analyzed sample plus 5 artificially constructed read pairs simulating human adenovirus B1 insertion (in chr2:60001001, hg38 assembly) and two real HCC samples[19] (samples 71T and 200T) with an average coverage about 32X and with known human Hepatitis B virus (HBV) integrations. An overview of the pipeline followed to discover virus integration events is shown in Figure M1. Our methodology detects virus integrations by taking information from

Carrillo-Reixach et al Suppl. Methods 8

three types of read-pairs: a) read-pairs in which one is aligned to the host genome and the other to a virus genome; b) split-reads in which one of the reads (or both) maps partially to the host genome and partially to a virus genome (these are used to define integration sites at single-base resolution); and c) read-pairs both aligning to a virus genome (are collected to support integration events, but cannot be assigned confidently to a host genome location). Using our virus discovery pipeline, we detected 172 integration site candidates. A closer inspection of the results suggested that most of them were probably false positive detections, as most were supported by a single readpair and/or detected non-human virus. Virus integration detection from WGS is highly prone to giving false positives[20]. For this reason and taking advantage of the high coverage achieved in our sample, we filtered our results, taking as real virus integrations only those with more than one supporting read-pair and with more than a single region of the insertion detected. The rationale behind this final filtering step was that, by achieving a high sequencing depth, any virus integrated should be totally or almost totally sequenced, thus allowing us to detect many different sequences from the same integration event. By performing this final filtering, all integration candidates were filtered out. Therefore, we can conclude that no virus integration events were confidently detected in our sample. In addition, to assess the validity and performance of our virus integration discovery pipeline, we also analyzed three different datasets, with known virus integration events, as positive controls: an artificially constructed dataset with a small proportion of our sample reads plus 5 artificially generated read-pairs simulating an insertion event of human adenovirus B1 in a known position, and the two real human HCC samples[19] sequenced at about 32X and with known HBV integration. Our virus integration discovery pipeline precisely detected the integration event of the artificial dataset, as well as at least 9 reliable HBV integration events in sample 71T and 6 in sample 200T, being the only virus species detected.

Carrillo-Reixach et al Suppl. Methods 9

Figure M1. Scheme of virus integration pipeline.

Primer and probe design Primers for qPCR were designed using the Primer designing tool (NCBI) provided by Sigma-Aldrich. BLCAP probes for ddPCR were designed by Bio-Rad. AXIN1 was assessed following the approach described in our previous publication[21]. Primers used for detecting gene mutations and fusion transcripts are detailed in Table M1. Primers used for assessing gene expression are detailed in Table M2.

Table M1. Primers used for the validation of fusion transcripts and gene mutations by PCR and Sanger Sequencing. Gene/Event

Amplicon

Melting

Primer

Sequences (5’-3’)

Carrillo-Reixach et al Suppl. Methods 10 Length

Temp. (ºC)

location

300

58

Exon 10

653

58

Exon 2

517

55

Exon 1-4

1043 208

55 55

Exon 1-6 Exon 1-3

317

58

Exon 7

389 200

58 58

502

62

TERF2genomic

300

58

Exon 8-9 Exon 6 Exon 4 Intron (exon 2/3) exon2-4

TMEM163SLC25A16

200

58

(bp) AUHgenomic BLCAP

CTNNB1ϕ

DNAJC15TPT1 NFE2L2

¥

Intron UTR3P

F: TTTATAGCGAGGGGGCCTTT R: GGGGGCTTTCTCTAGCCTGT F: AGGGTTGAAGAAAGCGGAGG R: AGTAATGAGGCGAGAGGGGT F1: GCGTGGACAATGGCTACTCAAG R1: TATTAACCACCACCTGGTCCTC R2¥: TTCAGCACTCTGCTTGTGGTCC R3§: CCCACTCATACAGGACTTGGGAGG F7: GGTTGGTAATATGGCTCTTCT R7: CAGTAGTTAAAGTTCTACCACC F8: TGGCAAAGTGAAGGAAACTG R8: CAAGGAGACCTTCCATCCC F: GTGTGGTCCTTAAGCATCAATGT R: GGGAAACTTGAAGAACAGAGACC F: CGTGTAGCCGATTACCGAGT R: GACTGGGCTCTCGATGTGAC F: CCACTGGAATCAGCTATCAATG R: AAGATGAGAAAAGGGGATTAGAAC F: TAAGCCGTATCAAACAGTCCATGA R: TTCGCTGTATTCCCTCTCAAGC

Reverse primer used with F1, as already published[21]; §primer used for PCR product

sequencing; and ϕCTNNB1 mutations assessed on RNA to detect exon 3 deletions. Abbreviations: F, forward; R, reverse.

Table M2. Primers used for the gene expression assessment by RT-qPCR.

Gene

Amplicon Length

Melting

(bp)

Temp. (ºC)

DLK1

154

60

MEG3

152

60

SNORD113-3

68

59

SNORD114-22

69

60

MEG9

100

60

LGR5

161

60

Sequences (5’-3’)

F: ACCTATGGGGCTGAATGCTT R: CGGGTTCTCCACAGAGTCC F: CTGAAGAACTGCGGATGGA R: CACATTCGAGGTCCCTTCC F: ACCAATGATGAGTATTCTGGGGT R: TGGACCTCAGAGTTACAGGGT F: TGGATCGATGATGACTACCG R: GACCTCAGAGTTCCAAACACG F: CTCACACCCTTGGACAGGAG R: TTTCTGCTTCTTTATTCCCTCAA F: GGAGTTACGTCTTGCGGGAA R: CATCCAGACGCAGGGATTGA

Carrillo-Reixach et al Suppl. Methods 11 VIM

250

60

PNN

151

60

AFP

154

60

DLG7

151

60

DUSP9

151

59

E2F5

151

60

GHR

151

60

IGSF1

152

60

NLE1

151

60

RPL10A

151

59

APCS

151

60

ALDH2

151

61

APOC4

151

60

AQP9

151

60

BUB1

152

59

C1S

141

59

CYP2E1

151

58

F: TCTCTGAGGCTGCCAACCG R: CGAAGGTGACGAGCCATTTCC F: CCTTTCTGGTCCTGGTGGAG R: TGATTCTCTTCTGGTCCGACG F: AACTATTGGCCTGTGGCGAG R: TCATCCACCACCAAGCTGC F: GCAGGAAGAATGTGCTGAAACA R: TCCAAGTCTTTGAGAAGGGCC F: CGGAGGCCATTGAGTTCATT R: ACCAGGTCATAGGCATCGTTG F: CCATTCAGGCACCTTCTGGT R: ACGGGCTTAGATGAACTCGACT F: CTTGGCACTGGCAGGATCA R: AGGTGAACGGCACTTGGTG F: CACTCACACTGAAAAACGCCC R: GGGTGGAGCAATTGAAAGTCA F: ATGTGAAGGCCCAGAAGCTG R: GAGAACTTCGGGCCGTCTC F: TATCCCCCACATGGACATCG R: TGCCTTATTTAAACCTGGGCC F: GGCCAGGAATATGAACAAGCC R: CTTCTCCAGCGGTGTGATCA F: GTTTGGAGCCCAGTCACCCT R: GGGAGGAAGCTTGCATGATTC F: GGAGCTGCTGGAGACAGTGG R: TTTGGATTCGAGGAACCAGG F: GCTTCCTCCCTGGGACTGA R: CAACCAAAGGGCCCACTACA F: ACCCCTGAAAAAGTGATGCCT R: TCATCCTGTTCCAAAAATCCG F: TTGTTTGGTTCTGTCATCCGC R: TGGAACACATTTCGGCAGC F: CAACCAAGAATTTCCTGATCCAG R: AAGAAACAACTCCATGCGAGC

Reverse Transcriptase and Polymerase Chain Reaction (RT-PCR) RNA (0.5-1 µg) was retrotranscribed using the RNA to cDNA Ecodry Premix (Clontech) following the manufacturer’s instructions. The resulting cDNA was the template for PCR amplification using primers designed for ad-hoc for each fusion gene called by RNA-seq (Table M1). To detect the presence of the fusion product, the PCR amplifications were performed in 25 µL of reaction mixture using the following protocol: 95ºC for 2 min, 40 cycles of 95ºC for 30 sec, annealing step at primer-pair specific temperature (see Table M1) for 30 sec, 72ºC for 2 min and final extension of 72ºC for 10 min. Each PCR reaction included a negative control without RNA (blank) and a positive control.

Carrillo-Reixach et al Suppl. Methods 12

Validation of the mutations identified by RNA-seq was performed by PCR on cDNA or genomic DNA using specific primers. Each PCR reaction contained 1X Platinum® Taq DNA Polymerase Buffer (Invitrogen), 0.2 mM dNTPs mix, 1.5 mM MgCl2, 0.2 µM of each primer and 100 ng of gDNA. The following PCR reaction conditions were used: 95°C for 2 min, 35 cycles of 95°C denaturation for 30 sec, annealing step at primer pairspecific temperature for 30 sec, and a 30-sec extension at 72°C, followed by 10 min of final extension at 72°C. We confirmed that the mutations identified were somatic because they were negative in their matched non-tumoral surrounding tissues. CTNNB1 mutational screening, including exon 3 deletions, was performed as described previously[21].

Quantitative PCR (qPCR) cDNA from total RNA was generated using the EcoDry cDNA synthesis kit (Takara, Shiga, Japan), following the manufacturer’s protocol. cDNA from human tissues was diluted 1/100 with Water Molecular Biology Grade (Applichem). Sybr Green Master Mix (Applied Biosystems) and an ABI PRISM 7900HT instrument were used for qPCR. PNN was used as reference gene[21]. Briefly, 5 µL of diluted cDNA was mixed with 5 µL of SYBR® Select Master Mix (Invitrogen) with 10 µM of specific primers (Table M1) and then amplified in a 384 qPCR wellplate. Expression levels were determined in triplicate using a 7900HT Fast Real-Time PCR System (Applied Biosystems). The qRT-PCR conditions used were: 50ºC for 2 min, 95ºC for 2 min, 40 cycles of 95ºC denaturation for 15 sec, annealing step at primer pair-specific temperature, and a 1-min extension at 72°C. Relative quantification (RQ) of mRNA expression was calculated using the expression levels of PNN gene as a reference and non-tumor expression as a control and applying the Delta-Delta Ct method (∆∆Ct= Tumor (Ct gene of interest – Ct reference gene) – mean non-tumor (Ct gene of interest – Ct

reference gene);

RQ

gene of interest

= 2-∆∆Ct). To confirm the specificity of the reaction,

melting curves were carried out during the same qRT-PCR reaction, and only primers with a single peak were chosen. cDNA from CHKA knock-down and siRNA control of Huh6 cells was diluted 1/40 and CHKA was amplified using specific TaqMan probes (Hs00608045_m1). 18S

Carrillo-Reixach et al Suppl. Methods 13

(Hs99999901_s1) was used as reference housekeeping gene. Expression levels were determined in triplicate using a 7900HT Fast Real-Time PCR.

Sanger sequencing The specificity of PCR and qPCR products from positive samples was checked by the presence of a single band of the correct size after agarose gel electrophoresis and confirmed by Sanger sequencing. Sequencing reactions of both DNA strands were performed with the BigDye® Terminator v3.1 Cycle Sequencing Kit (Thermo Fisher Scientific), run on a Genetic Analyzer ABI 3130 and analyzed with SeqScape v5.3.1 (both from Applied Biosystems). Briefly, 5 µL of PCR product was incubated with 2 µL of illustra ExoProStar 1-Step (GE Healthcare Life Sciences) for 45 min at 37ºC and 15 min at 80ºC. After purification, 2 µL of purified PCR product was mixed with 2 µL of BigDye Buffer, 1 µL of BigDye, 1.6 µL of 1 µM of primer and 3.4 µL of DNase RNase-free water. The sequencing reaction conditions were: 95ºC for 5 min, 96ºC for 10 sec, 30 cycles of 50ºC for 50 sec, and a 4-min extension at 60°C.

Immunohistochemistry (IHC) IHC was performed on 5-µm formalin-fixed paraffin-embedded tissue sections using ultraview DAB detection and the Ventana System (Roche Diagnostics). In parallel, a positive sample was stained as control. For a negative control, 0.1% BSA in phosphatebuffered saline was used instead of primary antibody. Monoclonal antibodies against βcatenin (clone 14), Ki67 (Ventana clone 30-9, Roche Diagnostics), KRT19 (clone A53B/A2.26), EpCAM (clone Ber-EP4) and GPC3 (Ventana clone Gc33, Roche Diagnostics) and polyclonal antibody against AFP (CMC20311050) and CHKA (HPA024153, 1/30, Sigma-Aldrich) were used (remaining antibodies were purchased from Cell Marque, Millipore Sigma). Monoclonal antibody against DLK1 used at 1/200 was kindly provided by CHIOME Biosciences Inc. Tumor stained area was calculated by examination of at least six random high-power fields (40x) and quantified with specific thresholds using ImageJ software v. 45.s (National Institutes of Health).

Carrillo-Reixach et al Suppl. Methods 14

Molecular risk assessment Training and test sets were assigned to the different categories of the Molecular Risk Stratification (MRS) on the basis of the biomarkers listed below. Of note, no statistical differences were observed in these biomarkers between the training and test sets, despite distinct techniques were used (Table S7). 14q32-signature. The 14q32-signature included four genes of the 14q32 region: DLK1, MEG3, SNORD113-3 and SNORD114-22. These genes were selected by multiple criteria: DLK1 is the only overexpressed gene that codifies for protein; MEG3 is the largest gene in the region; SNORD113-3 and SNORD114-22 are representatives of their respective SNORD families amenable to amplification by RT-qPCR. The tumor (T) gene expression assessed by HTA or qPCR was referred to the mean of non-tumor (NT) samples. When this T/NT fold change (FC) was ≥ 10 in at least 3 genes, we classified tumors as strong whereas when this criterion was not met, tumors were classified as moderate. The 14q32 signature was defined using HTA array data and further validated by qPCR, with 88% concordance with the HTA-based classification. This classification approach was highly concordant with the 14q32 hierarchical clustering (p<0.0001, Figure 3c), thereby indicating that the four genes selected were representative of the expression level of the 14q32 region. Epigenetic Classification (Epi-C). The global percentage of DNA hypomethylation in tumors as compared with the mean of NT samples measured by 850K array or QUAlu techniques was used for the epigenetic classification of tumor samples. The strong correlation between global methylation beta values of all CpGs (850K-array) and the UnMethylated Alu (QUAlu) in 16 samples of the discovery set confirmed QUAlu as excellent tool to measure the global hypomethylation status of tumor samples (R2=0.91, p<0.0001). We performed a ROC curve to establish a specific and sensitive cut-off to differentiate Epi-CA and Epi-CB tumors of the training set (Figure M2). The hypomethylation cut-off higher than 6.6% was able to classify Epi-CB tumors of the discovery set with 85 and 90% of agreement using 850K-array and QUAlu data. All tumors of the whole series were classified taking into account this cut-off.

Carrillo-Reixach et al Suppl. Methods 15

Figure M2. Receiver operating characteristic (ROC) curves for percentage of DNA hypomethylation ((1-T/mean NT)*100) assessed by 850K-arrays using 26 discovery set samples, including 13 cases defined as Epi-CB and 13 cases as Epi-CA by unsupervised clustering (see Figure 4). The selected cut-off is marked in red. 16+VIM-gene signature. To classify samples on the basis of the 16-gene expression profile, we used the Class prediction tool of BRB ArrayTools software (https://brb.nci.nih.gov/BRB-ArrayTools/) and using 6 algorithms as previously reported[21]. Each sample was classified on the basis of the majority of the 6 algorithms. Here we improved the 16-gene signature by incorporating VIM gene expression, which allowed us to distinguish two distinct C2 subtypes: one with high T/NT VIM levels (C2B) and another with low T/NT VIM levels (C2-Pure). The C2B tumors of the discovery set were defined by unsupervised analysis (Figure S3), corresponding to the cocluster CC3. Then, a T/NT FC cut-off of VIM gene expression was established using a ROC curve and discovery set samples, (Figure M3). There was a linear regression between HTA and qPCR gene expression (R2=0.88, p=0.0006). The T/NT FC cut-off higher than 6.5 was able to classify CC3 tumors of the discovery set with 87% of agreement using HTA-array and qPCR data. All tumors of the whole series were classified taking into account this cut-off.

Carrillo-Reixach et al Suppl. Methods 16

Figure M3. Receiver operating characteristic (ROC) curves for VIM gene expression FC (T/mean NT) of 32 tumor samples of the discovery set and defining the five tumors of the robust co-cluster CC3 as C2B (see Figure S3). The selected cut-off is marked in red.

Molecular Risk Stratification (MRS). Tumor samples were classified on the basis of transcriptomic (e.g. 14q32- and 16+VIM-gene signatures) and epigenomic (DNA hypomethylation) data (see algorithm in Figure M4, below). First, taking into account the 14q32-signature and epigenetic classifications, MRS-1 tumors had a moderate 14q32-signature and Epi-CA, MRS-2 tumors were defined as having a strong 14q32 signature or Epi-CB, whereas MRS-3 tumors had a strong 14q32 signature and Epi-CB. The MRS-3 subclass can be in turn subdivided into MRS-3b or MRS-3a, depending on whether the tumors were classified as C2-Pure or not in the 16+VIM-gene signature, respectively.

Carrillo-Reixach et al Suppl. Methods 17

Strong 14q32 OE? No

Yes

(Moderate)

Epi-CB?

Epi-CB?

No (Epi-CA)

No Yes (Epi-CA)

Yes MRS-3

C2-Pure?

No

MRS-1

MRS-2

(C1/C2B)

Yes

MRS-3a

MRS-3b

Strong 14q32 overexpression (OE): At least 3 out of 4 genes of the 14q32-signature are overexpressed (T/NT FC ≥10) Epi-CB: Strong tumor DNA hypomethylation (T/NT < 6.82%) by 850K-array or QUAlu C2Pure: C2 subclass defined by 16-gene signature and low levels of VIM gene expression (T/NT FC <6.5) by HTA or qPCR

Figure M4. Algorithm for Molecular Risk Stratification of Hepatoblastoma (MRS-HB).

Gene Set Enrichment Analysis (GSEA) GSEA[22] was used to evaluate the correlation of a specific gene list with two different sample groups (phenotypes). Briefly, this method calculates an enrichment score after ranking all genes in the data set on the basis of their correlation with a chosen phenotype and identifying the rank positions of all the members of a defined gene set. To evaluate statistical differences, we used the signal2noise ratio as a statistic to compare specific and random phenotypes. Statistical significance was defined when FDR q-value < 0.25. The 14q32 GSEA analysis was performed using HTA data of 26,831 genes because of the high numbers of 14q32 small RNAs detected as compared with RNA-seq. The

Carrillo-Reixach et al Suppl. Methods 18

14q32 gene set including 96 genes was defined by extracting all genes mapped in the 14q32 overexpressed region (chr14: 101,193,164-101,539,274) using the UCSC Genome Browser (https://genome.ucsc.edu/) annotated using GRCh37/hg19 version.

Cell lines, and reagents HepG2 (ATCC) and Huh6 (Japanese Collection of Research Bioresources) cell lines were grown in Dulbecco's Modified Eagle Medium (ThermoFisher, Waltham, USA) supplemented with 10% heat‐inactivated fetal bovine serum. Cells were maintained at 37 ºC in a 5% CO2 atmosphere. They were then transfected with 434 pmol siRNA targeting CHKA (s3008, ThermoFisher). Sham siRNA (4390843, ThermoFisher) was used as a negative control. Overexpression and knock-down of the targeted genes was confirmed by qRT-PCR. CHKA inhibitors MN58b and TCD-717 were purchased from AOBIUS (Gloucester, USA), respectively.

In vitro functional assays Cell viability and colony formation capacity was evaluated in HepG2 and Huh6 knockdown cells for CHKA and wild type cells treated with the CHKA inhibitors MN58b and TCD-717. For the cell viability assay, cells were seeded at 3000 cells/ml in 96-well plates for 48 h in a humidified atmosphere at 37°C and 5% CO2. They were then incubated with increasing concentrations of MN58b and TCD-717 (0, 2, 4, 6 and 8 µM). Cell viability was determined by 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) dye uptake using the CellTiter 96® Cell Proliferation Kit (Promega, Madison, USA) following the manufacturer’s instructions. To assess colony formation capacity, 300 cells/well were seeded in 6-well plates and incubated for 12 days. Huh6 and HepG2 cells were treated with 0, 2 or 4 µM MN58b or TCD-717. Thereafter, cells were stained with 0.5% Crystal Violet (Sigma) and the number of colonies and colony size was measured.

Carrillo-Reixach et al Suppl. Methods 19

Animal procedures All animal experiments were approved by the Ethical Committee of IDIBELL and performed in accordance with the guidelines for Ethical Conduct in the Care and Use of Animals as stated in The International Guiding Principles for Biomedical Research Involving Animals, developed by the Council for International Organizations of Medical Sciences (CIOMS). All animals received human care. They were housed in groups of five mice in ventilated cages on a 12-hour light-dark cycle at 21 to 23C and 40% to 60% humidity. Mice were allowed free access to an irradiated diet and sterilized water. Orthotopic PDXs of HBs were established as previously reported[23]. Briefly, 5to-6-week-old male Crl:NU-Foxn1nu, weighing 18–22 g (Envigo), were used for tumor tissue implantation. The mice were anesthetized with a continuous flow of 1% to 3% isoflurane/oxygen mixture (2 l/min). After performing a median laparotomy, the tumor fragment was anchored to the anterior hepatic lobe with a Prolene 6-0 suture. The abdominal incision was closed with staples. Mice were inspected twice a week. Tumor growth was examined by an exploratory laparotomy under anesthesia every 3-6 weeks. Successive passages into new mice were done at least 3 months after implantation. A total of 11 mice used for the in vivo experiment targeting CHKA. At that time, the mice were sacrificed by cervical dislocation. At each passage, small fragments of xenograft tumors were snap-frozen or formalin-fixed for tumor characterization. PDXs and their corresponding primary tumors were characterized by pathological analysis and CTNNB1 mutations. For the experiment involving CHKA inhibition, a PDX model established from a molecular high-risk tumor (MRS-3b) with high levels of CHKA was selected. Treatment with MN58b (AOBIOUS, Inc) was performed twice per week for 3 weeks at a dose of 3 mg/kg/day in phosphate-buffered saline (PBS) in 6 mice, whereas only PBS was administered to 5 control mice at the same dates. Tumor diameters were measured twice per week and tumor volumes were calculated using the formula: Tumor Volume= (π x length x width2)/6, where length is the largest tumor diameter and width is the perpendicular tumor diameter. After 3 weeks of treatment, mice were sacrificed and tumor tissue samples were frozen, formalin-fixed and paraffin-

Carrillo-Reixach et al Suppl. Methods 20

embedded for histological and IHC studies. Histological and IHC evaluation of primary and PDX tumors was performed by expert pathologists.

Statistical analysis Statistical analysis (only one sample/patient) was performed with SPSS v.15 (Chicago, IL, USA) and GraphPad Prism v.5 (La Jolla, CA, USA) software. To avoid any bias, all statistical analysis were performed taking only one tumor sample per patient and prioritizing the primary tumors. Appropriate two-side statistical tests were applied when necessary: qualitative variables (Chi-square and Fisher Exact tests) and quantitative variables (parametric data: t-test; non-parametric data: Mann-Whitney test and Wilcoxon matched-pairs signed rank test for unpaired and paired data, respectively) and are indicated in each analysis. We applied the ANOVA test with Tukey post-test for parametric data and the Kruskal-Wallis test with Dunn post-test for non-parametric data to compare more than two groups. The Shapiro-Wilk test was performed for the study of the normality of the data. Data are given as mean ± standard deviation. The correlation analysis was performed using the Spearman or Pearson test when necessary. For event-free survival (EFS) analysis, Kaplan-Meier’s method and Log rank test were performed to compare differences between curves. Receiving Operating Curves were performed to define the best cut-off of molecular data. Variables independently associated with EFS were determined by stepwise forward Cox regression analysis. Pvalues < 0.05 were considered significant.

REFERENCES (SUPPLEMENTARY METHODS) [1]

Nicolle D, Fabre M, Simon-Coma M, Gorse A, Kappler R, Nonell L, et al. Patient-derived mouse xenografts from pediatric liver cancer predict tumor recurrence and advise clinical management. Hepatology 2016;64:1121–35. https://doi.org/10.1002/hep.28621.

[2]

Maibach R, Roebuck D, Brugieres L, Capra M, Brock P, Dall’Igna P, et al. Prognostic stratification for children with hepatoblastoma: The SIOPEL experience. Eur J Cancer 2012. https://doi.org/10.1016/j.ejca.2011.12.011.

[3]

Access O, Gentleman RRC, Carey VJVVJV, Bates DDMDDM, Bolstad B, Dettling M, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004;5:R80. https://doi.org/10.1186/gb-2004-5-10-r80.

Carrillo-Reixach et al Suppl. Methods 21 [4]

Liao Y, Smyth GK, Shi W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 2014;30:923–30. https://doi.org/10.1093/bioinformatics/btt656.

[5]

Robinson M, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 2010;11:1–9. https://doi.org/10.1186/gb-201011-3-r25.

[6]

Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2009;26:139–40. https://doi.org/10.1093/bioinformatics/btp616.

[7]

Piskol R, Ramaswami G, Li JB. Reliable identification of genomic variants from RNA-seq data. Am J Hum Genet 2013;93:641–51. https://doi.org/10.1016/j.ajhg.2013.08.008.

[8]

Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 2013;31:213–9. https://doi.org/10.1038/nbt.2514.

[9]

Ramaswami G, Zhang R, Piskol R, Keegan LP, Deng P, O’Connell MA, et al. Identifying RNA editing sites using RNA sequencing data alone. Nat Methods 2013;10:128–32. https://doi.org/10.1038/nmeth.2330.

[10]

Paz-Yaacov N, Bazak L, Buchumenski I, Porath HT, Danan-Gotthold M, Knisbacher BA, et al. Elevated RNA Editing Activity Is a Major Contributor to Transcriptomic Diversity in Tumors. Cell Rep 2015;13:267–76. https://doi.org/10.1016/j.celrep.2015.08.080.

[11]

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47–e47. https://doi.org/10.1093/nar/gkv007.

[12]

Zynger DL, Gupta A, Luan C, Chou PM, Yang GY, Yang XJXR, et al. Functional normalization of 450k methylation array data improves replication in large cancer studies. Hepatology 2013;18:1–6. https://doi.org/10.1007/s10147-013-0624-8.

[13]

Hu X, Wan S, Ou Y, Zhou B, Zhu J, Yi X, et al. RNA over-editing of BLCAP contributes to hepatocarcinogenesis identified by whole-genome and transcriptome sequencing. Cancer Lett 2015;357:510–9. https://doi.org/10.1016/j.canlet.2014.12.006.

[14]

Buj R, Mallona I, Díez-Villanueva A, Barrera V, Mauricio D, Puig-Domingo M, et al. Quantification of Unmethylated Alu (QUAlu): a tool to assess global hypomethylation in routine clinical samples. Oncotarget 2016;7. https://doi.org/10.18632/oncotarget.7233.

[15]

S. A. FastQC A Quality Control tool for High Throughput Sequence Data 2010.

[16]

Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014;30:2114–20. https://doi.org/10.1093/bioinformatics/btu170.

[17]

Langmead B, Slazberg SL. Fast gapped-read alignmnet with Bowtie 2. Nat Methods 2013;9:357–9. https://doi.org/10.1038/nmeth.1923.Fast.

[18]

Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011;27:863–4. https://doi.org/10.1093/bioinformatics/btr026.

[19]

Sung WK, Zheng H, Li S, Chen R, Liu X, Li Y, et al. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma. Nat Genet 2012;44:765–9.

Carrillo-Reixach et al Suppl. Methods 22 https://doi.org/10.1038/ng.2295. [20]

Hemmrich G, Forster M, Szymczak S, Franke A, Wienbrandt L, Ellinghaus D, et al. VyPER: eliminating false positive detection of virus integration events in next generation sequencing data. Sci Rep 2015;5:1–13. https://doi.org/10.1038/srep11534.

[21]

Cairo-Armengol, De Reyniès A, Wei Y, Thomas E, Renard C-AA, Goga A, et al. Hepatic Stem-like Phenotype and Interplay of Wnt/β-Catenin and Myc Signaling in Aggressive Childhood Liver Cancer. Cancer Cell 2008;14:471–84. https://doi.org/10.1016/j.ccr.2008.11.002.

[22]

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102:15545–50. https://doi.org/10.1073/pnas.0506580102.

[23]

Armengol C, Tarafa G, Boix L, Line B-C, Sole M. Orthotopic Implantation of Human Hepatocellular Carcinoma in Mice : Analysis of Tumor Progression and Establishment of the BCLC-9 Cell Line Orthotopic Implantation of Human Hepatocellular Carcinoma in Mice : Analysis of Tumor Progression and Establishment 2004;10:2150–7. https://doi.org/10.1158/1078-0432.CCR-03-1028.