Identification of candidate SNPs for drug induced toxicity from differentially expressed genes in associated tissues

Identification of candidate SNPs for drug induced toxicity from differentially expressed genes in associated tissues

Gene 506 (2012) 62–68 Contents lists available at SciVerse ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene Identification of candi...

532KB Sizes 0 Downloads 47 Views

Gene 506 (2012) 62–68

Contents lists available at SciVerse ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Identification of candidate SNPs for drug induced toxicity from differentially expressed genes in associated tissues Johanna Hasmats a, 1, Ilya Kupershmidt a, b, 1, Cristina Rodríguez-Antona c, d, Qiaojuan Jane Su b, Muhammad Suleman Khan e, Carlos Jara f, Xabier Mielgo f, Joakim Lundeberg a, Henrik Green a, e,⁎ a

Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Royal Institute of Technology, SE-171 65 Solna, Sweden NextBio, 475 El Camino Real, Santa Clara, CA 95050, USA Endocrine Cancer Group, Human Cancer Genetics Programme, Spanish National Cancer Center (CNIO), Madrid, Spain d ISCIII Center for Biomedical Research on Rare Diseases (CIBERER), Spain e Clinical Pharmacology, Division of Drug Research, Department of Medical and Health Sciences, Faculty of Health Sciences, Linköpings Universitet, SE-581 85 Linköping, Sweden f Unidad de Oncología Médica, Fundación Hospital Alcorcón (FHA), Madrid, Spain b c

a r t i c l e

i n f o

Article history: Accepted 18 June 2012 Available online 1 July 2012 Keywords: Paclitaxel Carboplatin Single nucleotide polymorphism Toxicity Gene expression microarrays Meta-analysis

a b s t r a c t The growing collection of publicly available high-throughput data provides an invaluable resource for generating preliminary in silico data in support of novel hypotheses. In this study we used a cross-dataset meta-analysis strategy to identify novel candidate genes and genetic variations relevant to paclitaxel/ carboplatin-induced myelosuppression and neuropathy. We identified genes affected by drug exposure and present in tissues associated with toxicity. From ten top-ranked genes 42 non-synonymous single nucleotide polymorphisms (SNPs) were identified in silico and genotyped in 94 cancer patients treated with carboplatin/paclitaxel. We observed variations in 11 SNPs, of which seven were present in a sufficient frequency for statistical evaluation. Of these seven SNPs, three were present in ABCA1 and ATM, and showed significant or borderline significant association with either myelosuppression or neuropathy. The strikingly high number of associations between genotype and clinically observed toxicity provides support for our data-driven computations strategy to identify biomarkers for drug toxicity. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Biomarkers are used as a key tool in predicting individual's risk of disease, therapeutic response or drug induced toxicity. Genome-wide association studies (GWAS) represent one high-throughput strategy for discovering genetic biomarkers associated with a phenotype of interest. However, the complex experimental design and the requirements for large sample sizes make GWAS an impractical standard experimental approach for biomarker identification. Large volumes of high-throughput data available in public repositories, such as GEO (Gene Expressio Omnibus (Edgar et al., 2002)), ArrayExpress (Brazma et al., 2003), SMD (Stanford Microarray Database (Sherlock et al., 2001)) and TCGA (The Cancer Genome Atlas (Anon.,

Abbreviations: CYP, cytochrome P450; GWAS, genome-wide association studies; GEO, Gene Expression Omnibus; NCI-CTC, National Cancer Institute Common Toxicity Criteria; SMD, Stanford Microarray Database; SNPs, single nucleotide polymorphisms; TCGA, The Cancer Genome Atlas. ⁎ Corresponding author at: Science for Life Laboratory, Karolinska Institute Science Park, KTH — Royal Institute of Technology, School of Biotechnology, Tomtebodavägen 23 B, S‐171 65 Solna, Sweden. Tel.: +46 8 52481501, +46 76 0511580(mobile); fax: +46 13 104195. E-mail address: [email protected] (H. Green). 1 The authors contributed equally to this work. 0378-1119/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2012.06.053

2008)), provide a great source of experimental data for generating novel hypotheses in silico and for identifying potential candidate biomarkers linked to a phenotype of interest. A large body of work already exists in the field of cancer chemotherapy, for example. Many public datasets that link compound effects to gene expression changes are publicly available to the research community. Given the right strategy, these data can be mined for additional genes and genetic variants associated with clinical responses to anti-cancer therapies. Paclitaxel and carboplatin are highly active anticancer drugs that are used in combination to treat many forms of cancer, such as cancer of the breast, lung, and ovary. In the example of ovarian cancer, although paclitaxel/carboplatin combinatorial therapy improves the survival rates compared with earlier regimens, the patients are at risk for considerable sometimes life-threatening toxicity, especially neuropathy (sometimes immobilizing the patient) and myelosuppression (McGuire et al., 1996). Beyond the impact on quality of life, severe toxicity may necessitate dose reduction, delay, or even cessation of treatment. Several studies have attempted to establish a link between patient genotypes and toxicities associated with paclitaxel/carboplatin chemotherapy (Green et al., 2006, 2008, 2009; Leskela et al., 2011; Marsh et al., 2007; Sissung et al., 2006). It has been suggested that the pharmacodynamics and pharmacokinetics of paclitaxel and carboplatin are influenced by the activity of

J. Hasmats et al. / Gene 506 (2012) 62–68

several proteins, such as metabolic enzymes and drug transporters (Marsh et al., 2007). Systemic elimination of paclitaxel occurs by hepatic metabolism involving the cytochrome P450 (CYP) enzymes, CYP3A4 and CYP2C8 (Walle et al., 1995). Paclitaxel is also a substrate for the multidrug resistance P-glycoprotein, encoded by the ABCB1 gene, which is believed to affect both the tumor resistance (Marsh et al., 2007) towards paclitaxel as well as the elimination of the drug via the liver (Sparreboom et al., 1997). Genetic variations in CYP2C8 have been associated with altered clearance of the drug as well as hematological toxicity (Green et al., 2009). In some studies, polymorphisms in ABCB1 have been associated with development of neuropathy (Sissung et al., 2006) and progression-free survival (Green et al., 2006, 2008). However these associations have not been seen by others (Marsh et al., 2007). The discrepancies in these studies imply a problem in the selection of accurate and relevant markers for different traits. In this study, we developed a novel approach for identifying new candidate biomarkers of toxicity starting with the body of publicly available microarray gene expression data. We applied our strategy to the study of carboplatin and paclitaxel-induced toxicity in lung and ovarian cancer patients. More than 100 high quality public domain gene expression data sets from studies on paclitaxel and carboplatin were used to identify candidate genes which were subsequently evaluated by targeted genotyping in chemotherapy patient samples.

2. Material and methods 2.1. Overall approach for candidate biomarker analysis from gene expression signatures The overall approach to our strategy is represented in Fig. 1. For each drug (carboplatin and paclitaxel) two meta-signatures were computed — one the combined meta-signature representing genes regulated by the drug and the other the genes with tissue-specific expression either in tissues associated with toxicity or tissues responsible for drug elimination and metabolism (kidney and liver). The four meta-signatures for carboplatin and paclitaxel were combined to create a candidate list of genes associated with ovarian cancer toxicity (Fig. 1A). The final step in our analysis involved genotyping non-synonymous SNPs from the top candidate genes in the ovarian cancer patients and testing them for association with toxicity (Fig. 1B).

2.2. Computing individual study signatures of carboplatin and paclitaxel Each microarray dataset that tested drug treatments relative to control cells or tissues was analyzed to identify differentially expressed sets of genes as previously described (Kupershmidt et al., 2010). Briefly, genes that were differentially expressed between treated and control samples were identified using a t-test p-value cut-off of 0.05 and fold change cut-off of 1.2 (both increased or decreased expression). Using this strategy, we created gene expression signatures for carboplatin and paclitaxel from each individual study investigating drug effects (Supplemental Table S1).

2.3. Mapping cross-species and cross-platform identifiers To enable meta-analysis of datasets generated by different platforms supplied by different vendors and using different organisms (e.g., human, mouse, rat), ortholog-based translation tables and a universal gene index were developed. Detailed methodology is outlined in Kupershmidt et al. (2010). Once we mapped the different signature identifiers to common gene references, we applied our meta-analysis statistics (see below).

63

2.4. Computing signatures of tissues associated with toxicity To identify tissue-specific gene expression signatures, we first defined a set of tissues and organs associated with carboplatin and paclitaxel toxicity and pharmacokinetics. Carboplatin is eliminated via the kidneys and is mainly associated with myelosuppression (Table 1). Paclitaxel is metabolized and eliminated via the liver and induces sensory and motor neurotoxicity (Table 1). Bone marrow is mainly associated with myelosuppression, while a panel of multiple tissues is associated with sensory and motor neurotoxicity (Table 1). For each tissue/organ listed in Table 1, an average absolute expression level across replicate samples was computed. For kidney and liver tissue, samples from two independent studies were combined to compute the final tissue meta-signatures. Next, genes were ranked using fold expression change between tissue expression and median expression across all tissues according to Kupershmidt et al. (2010). To select for genes highly expressed in a tissue of interest, we filtered out genes with negative fold changes in a target tissue relative to all tissue panel (Table 1). 2.5. Meta-analysis of signatures To identify the drug meta-signatures (i.e., the ranked sets of genes associated with carboplatin or paclitaxel response) and toxicity related tissues meta-signatures (ranked sets of genes associated with liver, kidney, neurotoxicity or myelosuppression), we applied meta-analysis across multiple gene expression signatures. A meta-score was computed for each significant gene among the N signatures, and genes with top meta-scores were identified for further analysis (Supplemental Fig. 1). The meta-score for gene i (Eq. (1)) was computed as the sum of individual scores for gene i across all signatures j: Meta  scoreðGene iÞ

N X

ScoreðijÞ:

ð1Þ

j ¼1

The individual score for gene i in signature j is defined as follows: ScoreðijÞ ¼

100 : 1 þ Normalized Rank=50; 000

ð2Þ

The formula for the individual score (see below) is defined such that: 1) the range of the score is 0–100; 2) more differentiation is reflected in the rank changes at the top rankings; and 3) the score decreases to 50% when the rank drops below 500 (on a common platform U133A, size 22,215). 2.6. Computing carboplatin and paclitaxel meta-signatures For each study associated with a drug, either carboplatin (n=3) or paclitaxel (n=2) (Table S1), we first computed within-study drug meta-signatures (Supplemental Figs. S1 and S2A). This approach simplified the logistics of our analysis by producing a single meta-signature from the different time points and concentrations used in each study. For each within-study meta-signature, ranks were recomputed according to the logic outlined in the previous section. The final drug meta-signature was computed by combining within-study meta-signatures (Supplemental Fig. S2A) using meta-analysis logic described above. Using this approach we computed one carboplatin and one paclitaxel meta-signature (Supplemental Fig. S1A). 2.7. Computing tissue meta-signatures No meta-analysis was required for kidney, bone marrow, and liver, since a single tissue was associated with toxicity. Thus, they produced

64

J. Hasmats et al. / Gene 506 (2012) 62–68

Fig. 1. Overall workflow to identify candidate toxicity SNPs in ovarian patients treated with carboplatin and paclitaxel. A. Sets of genes that were regulated by carboplatin or paclitaxel and expressed in either toxicity or pharmacokinetics related tissues were organized into ranked lists or signatures. These signatures were combined to create four meta-signatures representing genes that were both regulated by a drug and expressed in toxicity or pharmacokinetics related tissues. These four signatures were then combined in a final meta-analysis to create a signature of ranked genes involved in toxic responses. B. Diagram outlining the workflow of identifying toxicity associated SNPs by genotyping. From the candidate toxicity gene list, we selected non-synonymous SNPs from a subset of genes (10 genes in total), which were then genotyped across a cohort of ovarian cancer patient samples. The final candidate SNPs were identified by performing association analysis of genotyping calls and toxicity associated with each patient sample.

three corresponding meta-signatures. For neurotoxicity, meta-analysis was applied to the multiple tissues that were associated with either sensory or motor neurotoxicity (Table 1). Another round of meta-analysis was applied to combine the sensory and motor neurotoxicity related tissue signatures to produce a single final neurotoxicity tissue meta-signature (Supplemental Fig. S2B).

2.8. Meta-analysis to identify drug-toxicity candidate genes First, we computed carboplatin- and paclitaxel-tissue meta-signatures for both tissues associated with toxicity and tissues associated with pharmacokinetics. This produced paclitaxel-neurotoxicity, paclitaxel-kidney, carboplatin-myelosuppression and carboplatin-liver meta-signatures (Fig. 1A). Each signature was then filtered to only contain genes that are present in both drug- and tissue- meta-signatures (Supplemental Fig. S2C). In the next step, all four results were combined in the final meta-analysis to compute a ranked list of toxicity gene candidates. The same formula as described in Section 2.5 was used to compute final

drug-toxicity candidate gene set. This list was filtered to contain only those genes that are present in at least two out of four meta-signatures. 2.9. SNP selection dbSNP (build 126) was used to identify non-synonymous SNP, within candidate genes, that were at least 40 bp apart and to exclude low complexity regions. Duplicated regions were also excluded. A haplotype analysis was performed using the information available in HapMap (Okazaki et al., 2008) (http://www.hapmap.org/, release 22) to exclude SNPs in high linkage disequilibrium (LD > 0.8). SNPs not recognized by HapMap were excluded. Primers for amplification and pyrosequencing (Supplemental Table S2) were ordered from Eurofins, MWG-operon and used as described previously (Kaller et al., 2006). When initially identifying SNPs in our candidate gene list rs10845981 was mapped to SLC2A3. Analysis of this SNP at later stages, however, revealed it is associated with the nearby gene SLC2A14 as well. This may reflect the dynamic nature of publicly curated databases or may be evidence of duplication events in the region (see Supplemental Fig. S3). The genes

J. Hasmats et al. / Gene 506 (2012) 62–68 Table 1 Tissues and organs involved in carboplatin and paclitaxel induced toxicity, metabolism, and elimination. Drug

Toxicity/metabolism

Tissues/organs

Paclitaxel

Neurotoxicity — sensory

Paclitaxel

Neurotoxicity — motor

Paclitaxel

Pharmacokinetics — elimination and metabolism Myelosuppression Pharmacokinetics — elimination and metabolism

Cerebellum, thalamus, hypothalamus, parietal lobe, dorsal root ganglia Basal ganglia, motor cortex, cerebral cortex, spinal cord Liver

Carboplatin Carboplatin

Bone marrow Kidney

The table lists compounds with their associated toxicities and the tissues and organs where pharmacokinetic activities occur. While we could define single target tissues for the pharmacokinetics of each drug and for the myelosuppressive toxicity of carboplatin, the neurotoxicity profile of paclitaxel is comprised of a larger panel of associated tissues.

were also checked for copy number variations in the Database of Genomic Variants (http://projects.tcag.ca/cgi-bin/variation/gbrowse/hg19/). 2.10. Patients To investigate the influence of the identified SNPs on carboplatin- and paclitaxel-induced toxicity (Fig. 1B), 94 Caucasian patients treated with paclitaxel and carboplatin were genotyped. 33 patients were from a prospective pharmacokinetic study in Sweden (Green et al., 2009) and 61 were treated at the Hospital de Alcorcón, Madrid, Spain (Leskela et al., 2011). In 45 patients, the diagnosis and histology were consistent with epithelial ovarian cancer and in 32 patients with lung cancer. The remaining patients suffered from carcinoma in uteri (n=8), in the peritoneal tissue (n=4), in the breast (n=2) and cancer of uncertain origin (germinal, ovarian or peritoneal, n=3). The major part of the patients were treated with paclitaxel at 175 mg/m2 (n=85) during a 3-hour infusion in combination with carboplatin (AUC 5 or 6), the other patients receive paclitaxel at dosages of 80 mg/m2 (n=2), 135 mg/m2 (n=3), 150 mg/m2 (n=1) or 200 mg/m2 (n=3) all in combination with carboplatin. 2.11. Toxicity assessments Acquired neurosensory and neuromotor toxicities were evaluated according to National Cancer Institute Common Toxicity Criteria (NCI-CTC version 2.0). The total dose of paclitaxel (mg/m2) until the patients acquired grade 2 neuropathy (either neurosensory or neuromotor) was correlated to the genotypes of each patient. Patients that did not acquire grade 2 neuropathy were censored at the total dose of paclitaxel received during chemotherapy. Myelosuppression was recorded as the nadir value for leukocytes, neutrophils and platelets at sampling occasions 21 (±3) days post each chemotherapy cycle. 2.12. DNA isolation, PCR and pyrosequencing Genomic DNA was isolated using QIAamp® DNA mini-kits (VWR International, Stockholm, Sweden) according to the manufacturer's protocol. The quantity of DNA extracted was determined using absorbance spectroscopy (260 and 280 nm) and the DNA was diluted to 10 ng/μl and stored at −20 °C. For working solutions the samples were diluted in 10 mM Tricine–KOH 1 mM EDTA pH 9.2 to a concentration range of 2–5 ng/μl, suitable for the subsequent reactions. Three non-template controls and three genomic DNA controls were present on all plates for each SNP. Amplitaq Gold or HiFi Fast start, (Applied Biosystems Foster City, USA; Roche Basel, Switzerland) was used for PCR using approximately

65

5 ng of DNA in each reaction according to the manufacturer's recommendations. The specific gene regions were amplified using primers shown in Supplemental Table S2. The Pyrosequencing was performed on a PyroMark Q96 HS (Biotage, Uppsala, Sweden) using the sequencing primers and dispensing orders shown in Supplemental Table S2 and according to the manufacturer's instructions. Unresolved Pyrosequencing data was complemented with conventional Sanger DNA sequencing. 2.13. Statistics The effect of the genotype on the total dose of paclitaxel until grade 2 neuropathy was evaluated using Kaplan–Meier curves and Log-Rank tests. Fisher's exact test was used to investigate the correlation between the genotype and toxicity according to the CTC-scale. t-Tests not assuming equal variance were used to evaluate the correlation between the different genotypes and the clinical measures of the nadir value of blood status. For those SNPs where we found both heterozygously and homozygously variant patients, the phenotype of the wild type patients was tested against the phenotype of all variant patients. SPSS version 18 was used for the statistical analysis. 2.14. Ethics statement This study was approved by the Regional Ethical Review Board in Linköping, Sweden and in Madrid, Spain and each patient gave their written informed consent. 3. Results 3.1. Candidate toxicity gene sets for SNP selection We sought to identify a single combined drug-tissue toxicity signature that prioritized genes that were affected by the drugs and were expressed in tissues associated with toxicity or pharmacokinetics. These required successive rounds of meta-analysis applied to gene expression data from the relevant tissues and from experimental studies that measured the effects of drug treatment (see Table S1 for summary of source data materials). First, we computed carboplatin and paclitaxel meta-signatures from the source data (see Materials and methods). We then combined each of these signatures with the respective toxicity-associated and pharmacokinetics-associated tissue signatures to compute a pair of drug-tissue meta-signatures for each drug (Fig. 1A). The final meta-analysis step combined all four metasignatures to compute a set of ranked candidate genes for carboplatin/ paclitaxel-associated toxicity (Fig. 1A). The total number of candidate toxicity genes identified was 320 and Table 2 lists the top 20 toxicity-associated genes identified in our analysis. Partially validating our strategy, we found that a number of genes listed in Table 2 have been associated with carboplatin/paclitaxel efficacy by previous studies. The majority of gene candidates are novel, requiring further validation. The five top ranked genes were selected for subsequent analysis in the lung and ovarian cancer patient cohort, as well as additional five genes out of the top 20, which had prior associations with drug efficacy based on a literature analysis (shown in bold in Table 2). 3.2. SNP selection criteria and genotyping results A total of 123 non-synonymous SNPs were present in these genes. The distribution of SNPs that passed the first criteria (non-synonymous) varied from 1 to 66 per gene (see Supplemental Fig. S4). We evaluated the genomic regions covered by these SNPs using HapMap and selected 42 to represent the haplotype blocks of interest. We then designed primers to investigate these regions (Supplemental Table S2). Of the 42 SNPs selected, the lung and ovarian cancer patients were found to have genetic differences in 11 of these SNPs, Table 3. Of these

66

J. Hasmats et al. / Gene 506 (2012) 62–68

Table 2 Ranked list of candidate genes associated with carboplatin and paclitaxel toxicity. Gene

Rank score

Chromosome location

# Meta-signatures

ADD2 CA2 CAPN3 AQP3 PDS5B EPOR HCN3 C5orf13 STXBP2 YARS CCNF PIP4K2A LILRB3 ABCA1 SLC2A3 DEFB1 CNOT2 AGT MAP4K4 ATM

74.2899 73.0979 70.2063 69.8465 65.5267 65.4159 64.1609 63.2063 62.9387 62.4701 62.3824 61.0837 60.3562 60.2848 60.1262 59.6807 59.345 58.8434 58.4781 58.293

2p13.3 8q22 15q15.1 9p13 13q12.3 19p13.3–p13.2 1q22 5q22.1 19p13.3–p13.2 1p35.1 16p13.3 10p12.2 19q13.4 9q31.1 12p13.3 8p23.1 12q15 1q42–q43 2q11.2–q12 11q22–q23

4/4 4/4 3/4 3/4 4/4 3/4 3/4 3/4 4/4 3/4 4/4 3/4 4/4 3/4 4/4 3/4 3/4 3/4 3/4 3/4

PubMed citations

Annotations Adducin 2 (beta) Carbonic anhydrase II Calpain 3, (p94) Aquaporin 3 (Gill blood group) PDS5, regulator of cohesion maintenance Erythropoietin receptor Potassium/sodium nucleotide-gated channel 3 Chromosome 5 open reading frame 13 Syntaxin binding protein 2 Tyrosyl-tRNA synthetase Cyclin F Phosphatidylinositol-5-phosphate 4-kinase Leukocyte immunoglobulin-like receptor ATP-binding cassette, sub-family A (ABC1) Solute carrier family 2 defensin, beta 1 CCR4-NOT transcription complex, subunit 2 Angiotensinogen Mitogen-activated protein kinase 4 Ataxia telangiectasia mutated

2 9

1 1

14 (MAP4) 40

The rank score for each gene associated with drug toxicity was computed as described in 2.1–2.8 and (Kupershmidt et al., 2010). A cut-off was applied that included only genes that were members of at least two compound tissue meta-signatures. The number of PubMed citations linking a listed gene with drug toxicity is indicated. Note: the abbreviation AGT can also refer to O6-alkylguanine DNA alkyltransferase and is associated with cisplatin toxicity, but in this instance we are referring to the angiotensinogen gene.

11 genetic variants seven were found in a frequency high enough for statistical comparison (Table 3) and were therefore compared to the carboplatin and paclitaxel induced myelosuppression and neuropathy. The remaining SNPs were checked for association to toxicity but without doing any further statistical comparison. 3.3. SNPs in ABCA1 and ATM have an influence on toxicity in cancer patients Significant to borderline significant differences in toxicity between groups of different zygosities were found for three of the seven SNPs with high enough frequency for statistical evaluation and the generation of a toxicity p-value (Table 4). In the transporter gene ABCA1, SNP rs4149313 showed significant (p= 0.03) association with thrombocytopenia. Patients wild type for this variants had lower (mean 162, std deviation 71) platelet counts as compared to the rest of the patients (mean 219, std deviation 108). Patients wild type for rs1800058 in ATM also showed a significant association to thrombocytopenia (p= 0.044). For this SNP wild type patients had significantly higher mean platelet count (180, std deviation 85) as compared to the rest of the patients (mean 121, std deviation 54). rs2230806 in the ABCA1 gene was Table 3 Genotype distribution of the 11 SNPs where genetic variations were found in the ovarian cancer patients. Gene

SNP

Wild type

Heterozygote

Homozygote

Classification group

ABCA1 ABCA1 ABCA1 ABCA1 ABCA1 ABCA1 ATM ATM ATM EPOR SLC2A3/14

rs2230808 rs13306072 rs28937314 rs4149313a rs2066718 rs2230806 rs1800058 rs1800059 rs1801516 rs45516306 rs10845981

54 82 91 70 83 52 88 92 70 90 25

35 2 2 22 11 34 5 1 20 4 66

5 0 0 1 0 8 1 1 4 0 3

High freq Low freq Low freq High freq High freq High freq High freq Low freq High freq Low freq High freq

SNPs that were found at a frequency suitable for further statistical analysis (High freq) were tested for associations with toxicity. SNPs that were found at an allele frequency too low for any further statistical comparison are only described without any statistical reference. a Due to the dynamic structure of databases the SNP rs4149313 has merged into rs2066714. Also one patient's genotype was undeterminable for this SNP.

borderline associated with neurotoxicity grade 2 or higher (p= 0.083, odds ratio = 2.22, 95% CI 0.93–5.30). It is noteworthy that this SNP also was borderline significantly associated with a linear trend for neuropathy when comparing the dose until grade 2 neuropathy using Kaplan–Meier analysis (additive model, p = 0.090). For the SNPs with low frequency (see Table 3) no statistical correlations were done between the genotype and the toxicity. We did observe that the two patients that had G/A for the SNP rs13306072 in ABCA1 had very low neutrophil counts (0.88 and 1.2) as compared to the mean of 2.3 for the rest of the patients. 4. Discussion In this study we present a new strategy for using meta-analysis of publicly available gene expression data to generate new gene candidates for pharmacogenetic traits, such as drug-induced toxicity. Our study demonstrates that by using large collections of gene expression data as a starting point, we can identify sequence variants associated with adverse responses to chemotherapy in ovarian cancer patients. To make our approach even more robust, we expanded our consideration of tissues beyond those that exhibit toxicity to include tissues responsible for metabolism and elimination, since they may regulate the exposure of the drug. Publicly available genome-wide data plays an increasingly important role in today's research. Intelligent use of that data can help researchers generate novel hypothesis and design new experiments. Previous studies have demonstrated the value of meta-analysis across related sets of data for identifying candidate genes (Chan et al., 2008; Gardiner-Garden & Littlejohn, 2001; Jiang et al., 2004; Rhodes et al.,

Table 4 Correlations between SNPs and paclitaxel/carboplatin induced toxicities. Gene

SNP

Toxicity

p-Value

ABCA1

rs4149313

0.031

ABCA1

rs2230806

ATM

rs1800058

Wild type patients had significantly lower nadir platelet counts as compared to the rest of the patients 28% of the wild type patients suffered grade 2 neuropathy or higher as compared to 46% of the rest of the patients. Wild type patients had significantly higher nadir platelet counts as compared to the rest of the patients

0.083

0.044

J. Hasmats et al. / Gene 506 (2012) 62–68

2002). A number of studies have also pointed out the linkage between gene expression changes and corresponding disease associated variants (Brem et al., 2002; Chen et al., 2008; Eaves et al., 2002; Karp et al., 2000; Schadt et al., 2003). In our case, meta-analysis of relevant public datasets led us to novel and known toxicity associated genes, which subsequently drove our SNP selection for genotyping analysis in patients. It is interesting that even with a limited number of patient samples, four out of the five SNPs present at a high enough frequency for comparison showed a significant correlation with toxicity. This suggests that our approach can provide an efficient method of identifying candidate biomarkers that can direct further experiments with patient samples. Our study led to the identification of genetic variations in ABCA1 and ATM and the discovery of correlations between these SNPs with hematological and neurotoxic side effects in cancer patients. The ABCA1 gene encodes a transporter protein. Interestingly, ABCA1 is involved in the efflux of cholesterol, particularly in macrophages (Yvan-Charvet et al., 2010). Furthermore, oxidized low density lipoproteins increase the IC50 for cisplatin in ovarian cancer cell lines, suggesting a potential mechanism (Scoles et al., 2010). Other transporters, such as ABCB1, SLOC1B1, and SLOC1B3, have been investigated for association to paclitaxel or carboplatin effects. Polymorphisms in ABCB1, for example, have been associated with susceptibility to neuropathy and neutropenia in response to paclitaxel treatment (Sissung et al., 2006). The second gene for which we uncovered an association with toxicity is the DNA-repair protein ATM. Polymorphisms in this gene have previously been associated with overall survival in pancreatic cancer patients (Okazaki et al., 2008). Polymorphisms in another DNA repair gene ERCC1 (C8092A), known as a resistance mechanism for carboplatin, have also been linked to neuropathy after taxane and platinum treatment (Kim et al., 2009). Most other found associations between genotype and paclitaxel/carboplatin induced toxicity have been observed in genetic variants of metabolizing enzymes. Recently Leskelä et al. investigated the influence of 13 polymorphisms on neuropathy and showed a correlation between CYP2C8*3, CYP2C8 Hap C (rs1113129) and CYP3A5*3 and paclitaxel dose-dependent induction of neuropathy (Leskela et al., 2011). Interestingly in the material used in this study we also saw a correlation between CYP2C8*3 and neuropathy as previously reported (Green et al., 2009). CYP2C8*3 has also previously been shown to influence paclitaxel clearance in vivo (Bergmann et al., 2011; Green et al., 2009). CYP2C8 Hap C has also been associated with altered activity of CYP2C8 (Rodriguez-Antona et al., 2008). The GSTP1 I105V polymorphism, detoxifying carboplatin, has been associated with hematological toxicity and wild type patient were more susceptible than others (Kim et al., 2009). In contradiction, in a large study on taxane and carboplatin chemotherapy no association between polymorphisms in transporters and relevant metabolizing enzymes and the toxicity was found (Marsh et al., 2007). By bringing an alternative approach to linking genetic variation with drug-toxicity, we have identified candidate biomarkers that might have been otherwise overlooked by existing technologies. In this study we concentrated on non-synonymous SNPs in the coding regions of ten selected genes, since these genetic variants might impact the activity of associated proteins. SNPs in two of these genes showed a significant or borderline significant association with toxicity (Table 4). However since our meta-analysis is based on expression profiling it would be interesting to also investigate the gene-regulatory regions of these genes. Since most of these genes are relatively novel when it comes to toxicity the information on which part of the gene is associated with gene regulation in these tissues is limited, so we did not aim at performing such analysis. The SNPs we found associated with toxicity were present in the top 20 ranked genes, but not in the five top-ranked genes identified in our search (Table 2). However, the five top-ranked genes contained relative few genetic variants in the coding region (Supplemental Fig. S4) and their activity might be affected by non-coding SNPs. This opens up for an investigation into the introns and regulatory regions of these genes using sequence capture or

67

next-generation sequencing techniques. Even better predictive markers for toxicity might be found in the regulatory regions of these genes, especially since their expression in toxicity‐associated tissue is affected by paclitaxel and carboplatin exposure. 5. Conclusions In conclusion, the approach described in this paper represents a novel methodology for using gene expression data to inform the selection of candidate toxicity genes and to subsequently identify genetic variants linked to patient responses to chemotherapy. The large quantities of data accumulating in public databases present a new and exciting opportunity for the research community — to use in silico research approaches and to come up with better designed and more cost-effective experiments. Supplementary data to this article can be found online at http:// dx.doi.org/10.1016/j.gene.2012.06.053. Competing interests Some of the authors (Ilya Kupershmidt, Qiaojuan Jane Su) are employed by a commercial company, NextBio. There are also a number of patents filed with respect to the technology and algorithms described in the article. NextBio also provides a commercial software platform in both free and paid versions. These competing interests do not alter the authors' adherence to all the journal policies on sharing data and materials. Acknowledgments The authors thank Kicki Holmberg for technical help and assistance during the genotyping analysis. We also thank Bahram Amini for his help with the Sanger sequencing. The authors also thank Professor Dimcho Bachvarov, Département de médecine, Université Laval for sharing his microarray data. This work was financially supported by grants from the European Commission (CHEMORES LSHC-CT-2007-037665), the Swedish Cancer Society, Swedish Research Council, Swedish Cancer Foundation and the County Council in Östergötland. References Comprehensive genomic characterization defines human glioblastoma genes and core pathways, Nature 455 (2008) 1061–1068. Bergmann, T.K., et al., 2011. Impact of CYP2C8*3 on paclitaxel clearance: a population pharmacokinetic and pharmacogenomic study in 93 patients with ovarian cancer. Pharmacogenomics J. 11, 113–120. Brazma, A., et al., 2003. ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71. Brem, R.B., Yvert, G., Clinton, R., Kruglyak, L., 2002. Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755. Chan, S.K., Griffith, O.L., Tai, I.T., Jones, S.J., 2008. Meta-analysis of colorectal cancer gene expression profiling studies identifies consistently reported candidate biomarkers. Cancer Epidemiol. Biomarkers Prev. 17, 543–552. Chen, R., et al., 2008. FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Genome Biol. 9, R170. Eaves, I.A., et al., 2002. Combining mouse congenic strains and microarray gene expression analyses to study a complex trait: the NOD model of type 1 diabetes. Genome Res. 12, 232–243. Edgar, R., Domrachev, M., Lash, A.E., 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210. Gardiner-Garden, M., Littlejohn, T.G., 2001. A comparison of microarray databases. Brief Bioinform. 2, 143–158. Green, H., Soderkvist, P., Rosenberg, P., Horvath, G., Peterson, C., 2006. mdr-1 single nucleotide polymorphisms in ovarian cancer tissue: G2677T/A correlates with response to paclitaxel chemotherapy. Clin. Cancer Res. 12, 854–859. Green, H., Soderkvist, P., Rosenberg, P., Horvath, G., Peterson, C., 2008. ABCB1 G1199A polymorphism and ovarian cancer response to paclitaxel. J. Pharm. Sci. 97, 2045–2048. Green, H., et al., 2009. Pharmacogenetic studies of Paclitaxel in the treatment of ovarian cancer. Basic Clin. Pharmacol. Toxicol. 104, 130–137. Jiang, H., et al., 2004. Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinforma. 5, 81. Kaller, M., et al., 2006. Comparison of PrASE and pyrosequencing for SNP genotyping. BMC Genomics 7, 291.

68

J. Hasmats et al. / Gene 506 (2012) 62–68

Karp, C.L., et al., 2000. Identification of complement factor 5 as a susceptibility locus for experimental allergic asthma. Nat. Immunol. 1, 221–226. Kim, H.S., et al., 2009. Genetic polymorphisms affecting clinical outcomes in epithelial ovarian cancer patients treated with taxanes and platinum compounds: a Korean population-based study. Gynecol. Oncol. 113, 264–269. Kupershmidt, I., et al., 2010. Ontology-based meta-analysis of global collections of high-throughput public data. PLoS One 5. Leskela, S., et al., 2011. Polymorphisms in cytochromes P450 2C8 and 3A5 are associated with paclitaxel neurotoxicity. Pharmacogenomics J. 11, 121–129. Marsh, S., Paul, J., King, C.R., Gifford, G., McLeod, H.L., Brown, R., 2007. Pharmacogenetic assessment of toxicity and outcome after platinum plus taxane chemotherapy in ovarian cancer: the Scottish Randomised Trial in Ovarian Cancer. J. Clin. Oncol. 25, 4528–4535. McGuire, W.P., et al., 1996. Cyclophosphamide and cisplatin compared with paclitaxel and cisplatin in patients with stage III and stage IV ovarian cancer. N. Engl. J. Med. 334, 1–6. Okazaki, T., Jiao, L., Chang, P., Evans, D.B., Abbruzzese, J.L., Li, D., 2008. Single-nucleotide polymorphisms of DNA damage response genes are associated with overall survival in patients with pancreatic cancer. Clin. Cancer Res. 14, 2042–2048. Rhodes, D.R., Barrette, T.R., Rubin, M.A., Ghosh, D., Chinnaiyan, A.M., 2002. Meta-analysis of microarrays: interstudy validation of gene expression

profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 62, 4427–4433. Rodriguez-Antona, C., et al., 2008. Characterization of novel CYP2C8 haplotypes and their contribution to paclitaxel and repaglinide metabolism. Pharmacogenomics J. 8, 268–277. Schadt, E.E., et al., 2003. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302. Scoles, D.R., et al., 2010. Liver X receptor agonist inhibits proliferation of ovarian carcinoma cells stimulated by oxidized low density lipoprotein. Gynecol. Oncol. 116, 109–116. Sherlock, G., et al., 2001. The Stanford Microarray Database. Nucleic Acids Res. 29, 152–155. Sissung, T.M., et al., 2006. Association of ABCB1 genotypes with paclitaxel-mediated peripheral neuropathy and neutropenia. Eur. J. Cancer 42 (17), 2893–2896. Sparreboom, A., et al., 1997. Limited oral bioavailability and active epithelial excretion of paclitaxel (Taxol) caused by P-glycoprotein in the intestine. Proc. Natl. Acad. Sci. U. S. A. 94, 2031–2035. Walle, T., Walle, U.K., Kumar, G.N., Bhalla, K.N., 1995. Taxol metabolism and disposition in cancer patients. Drug Metab. Dispos. 23, 506–512. Yvan-Charvet, L., Wang, N., Tall, A.R., 2010. Role of HDL, ABCA1, and ABCG1 transporters in cholesterol efflux and immune responses. Arterioscler. Thromb. Vasc. Biol. 30, 139–143.