Deciphering the genetic landscape of cancer – from genes to pathways

Deciphering the genetic landscape of cancer – from genes to pathways

Review Deciphering the genetic landscape of cancer – from genes to pathways Neal G. Copeland and Nancy A. Jenkins Genomics and Genetics Division, Ins...

764KB Sizes 5 Downloads 89 Views

Review

Deciphering the genetic landscape of cancer – from genes to pathways Neal G. Copeland and Nancy A. Jenkins Genomics and Genetics Division, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, 61 Biopolis Drive, Proteos, Singapore 138673

Advances in genomic technologies have made it possible to screen the entire cancer genome for mutations, leading to a better understanding of the genetic landscape of cancer. Emerging results suggest that the cancer genome is composed of a few commonly mutated genes and many infrequently mutated genes. Although the number of mutated genes in any one tumor is limited, there is much heterogeneity in the genes mutated in two tumors of even the same class because of the large number of infrequently mutated genes. This could explain the wide variation in tumor behavior to chemotherapeutic intervention. Pathway analysis suggests this large collection of cancer genes functions in a few signaling pathways, providing a simplifying picture of cancer, and indicating the possibility of treating cancer using target-based therapeutics directed against the deregulated signaling pathways themselves rather than the individually mutated genes. The identification of cancer-causing genes Small molecule inhibitors, such as Imatinib (Gleevec) [1], that interfere with the activated breakpoint cluster regionAbelson (BCR-ABL) tyrosine kinase found in 95% of cases of chronic myelogenous leukemia, have had great clinical success without inducing the deleterious side effects often associated with conventional cancer therapeutics that target rapidly dividing cells. This has spurred a worldwide effort to identify all the genes that cause cancer, with the hope that some will lead to new drug targets. Although this effort is still in its infancy, the results suggest that it will be a daunting task. Recent studies suggest that number of cancer genes is large and most are infrequently mutated in cancer. There is also great variability in the genes mutated in different cancers of the same type as well as among different cancer types. Validating the large number of candidate cancer (CAN) genes identified by high-throughput genomic methods will, therefore, be difficult and require multiple methods, including the use of model organisms. Finally, because there are few frequently mutated cancer genes it will be challenging to identify new drug targets such as Imatinib. Fortunately, early studies suggest that the large number of infrequently mutated cancer genes function in a relatively small number of signaling pathways that cooperate to induce disease. Therefore, it might be possible to target the signaling pathways themselves rather that the individual mutant proteins such as BCR-ABL. In this article, we review the Corresponding author: Copeland, N.G. ([email protected]). 0168-9525/$ – see front matter ß 2009 Published by Elsevier Ltd. doi:10.1016/j.tig.2009.08.004

recent genome-wide screens that aim to identify cancercausing genes, discuss why there seems to be so many of them and how such candidate genes can be validated. Identifying mutations by exon sequencing – the early days With improvements in DNA sequencing technologies, it became possible in the latter part of the 20th century to contemplate performing systematic genome-wide screens for new cancer genes using exon resequencing. At the time these studies were initiated, sequencing technology had not advanced to the point where it was possible to sequence all coding exons in the human genome. Consequently, early studies prioritized sequencing genes in signaling pathways where at least one gene in the pathway was known to be mutated in cancer, such as the rat sarcoma (RAS) extracellular-signal-regulated kinase (ERK) mitogen-activated protein (MAP) kinase pathway [2]. Coding exons, and their upstream and downstream splice sites, were amplified using PCR from tumor and control DNA from the same patient, and the sequences were compared to find somatic mutations unique to the cancer cell. Among the collection of 15 cancer cell lines and matched lymphoblastoid cell lines screened in these studies, three were identified that

Glossary Candidate cancer (CAN) genes: genes that are mutated at a higher rate in cancer than predicted by random chance. Mutations in these genes are referred to as driver mutations. DNA mismatch repair: a system for recognizing and repairing erroneous insertion, deletion and mis-incorporation of bases that can arise during DNA replication and recombination, as well as repairing some forms of DNA damage. Driver and passenger mutations: many somatic mutations accumulate during the process of tumor development. Small subsets of these mutations contribute to tumor development and are referred to as driver mutations, whereas the majority are effectively neutral and are called passenger mutations. Microsatellite instability: microsatellites are repeated sequences of DNA. In cells with mutations in DNA repair some of these sequences accumulate errors and become longer or shorter. Nonsynonomous mutation: DNA mutation that changes the amino acid sequence of a protein. Pathway analysis: an approach that assigns the genes mutated in cancer to known biological signaling pathways and then determines the pathways that are most commonly mutated. Single nucleotide polymorphism: DNA sequence variation occurring when a single nucleotide – A, T, C, G – differs between members of the same species. Synonomous mutation: DNA mutation that does not change the amino acid sequence of a protein. Transcriptome sequencing: makes use of next-generation massively parallel sequencing to detect, identify and quantitate nearly every class of transcripts in a cell from short microRNAs to the longer 5’ and 3’ untranslated regions to the longest, full-length mRNA.

455

Review had somatic mutations in the v-RAF murine sarcoma viral oncogene homolog B1 (BRAF) serine/threonine kinase domain, which led to follow-up studies in which BRAF coding exons were sequenced in an additional 530 cancer cell lines. BRAF mutations were identified in 66% of malignant melanomas and at a lower frequency in a wide variety of human cancers, including melanomas, colorectal cancers, gliomas, lung cancers, sarcomas, ovarian carcinomas, breast cancers and liver cancers. All mutations were in the kinase domain, elevated kinase activity and were transforming in NIH 3T3 cells. Similar studies performed thereafter identified a high frequency of mutations in the phosphatidylinositol 3kinase (PI3K), catalytic, alpha-subunit gene (PIK3CA) in colorectal cancer (37%), glioblastomas (27%), gastric cancers (25%) and lung cancers (4%) [3]. The mutations clustered in the helical and kinase domains and increased lipid kinase activity consistent with a role in tumorigenesis. These early studies validated the use of large-scale exon resequencing for identifying new CAN genes in human cancers. Mutational analysis of protein tyrosine kinases and phosphatases in colorectal cancer Further advances in exon sequencing rapidly made it possible to sequence even more genes in a larger number of cancers for mutations. In one study making use of these expanded DNA sequencing capabilities, all the coding exons and associated splice sites for the 138 known protein tyrosine kinases (PTKs) and 87 known protein phosphatases (PTPs) were sequenced in a collection of colorectal cancer lines [4]. Phosphorylation of tyrosine kinase residues is a central feature of many cellular signaling pathways involved in cancer and is coordinately controlled by PTKs and PTPs. Although a variety of PTKs had been linked to tumorigenesis, only a few PTPs had been implicated in cancer (http://www.sanger.ac.uk/genetics/CGP/ Census). Among the 138 PTK genes sequenced from 35 colorectal cancer cell lines, 14 contained mutations in their kinase domains. These genes were then reanalyzed for mutations in another 147 colorectal cancers. Seven of the 14 genes were mutated in more than one cancer, and the ratio of nonsynonomous to synonmous mutations (see Glossary) was higher than predicted, supporting the notion that these are driver mutations for cancer. These studies suggested that a minimum of 30% of colorectal cancers contain at least one mutation in a PTK gene. PTKs, as demonstrated for BCR-ABL, provide good drug targets for therapeutic intervention, and these mutant PTKs, therefore, represent many new opportunities for drug development. Among the 87 PTP genes sequenced from 18 colorectal cancers [5], six contained mutations and were further analyzed in another 157 colorectal cancers. Altogether, 83 mutations in six PTPs were identified, affecting 26% of colorectal cancers and a smaller fraction of lung, breast and gastric cancers. Fifteen mutations were predicted to produce truncated proteins lacking phosphatase activity, suggesting that these PTPs are tumor suppressor genes. Consistent with this prediction, five mutations in the 456

Trends in Genetics Vol.25 No.10

protein–tyrosine phosphatase receptor type T (PTPRT) gene were examined biochemically and found to reduce phosphatase activity. These two studies greatly expanded on what was known about the role of PTKs and PTPs in colorectal cancer and suggested that both PTKs and PTPs are major players in the disease. The genome-wide screens for cancer genes In 2006, Sjoblom et al. [6] published their seminal paper reporting the sequencing of all coding exons and splice sites for the majority of protein-coding genes in 11 breast and 11 colorectal cancers. Genes with somatic mutations in at least one cancer were then reanalyzed for mutations in 24 additional cancers of the same type. Genes mutated in two or more cancers were then further analyzed in additional tumors to better define their mutation frequency and aid subsequent bioinformatic analyses. On average, colorectal and breast cancers contained a median of 76 and 84 nonsynonomous mutations, respectively. The great majority were single base substitutions; 81% were missense, 7% were nonsense and 4% were altered splice sites. The remaining 8% were insertions, deletions and duplications. These mutations could have been selected during tumor growth because of their effect on net cell growth (driver mutations) or through the accumulation of nonfunctional alterations that arise spontaneously during repeated rounds of cell division in the tumor or its progenitor stem cell (passenger mutations). Statistical methods were developed to estimate the probability that the number of nonsynonomous mutations in any given gene was greater than predicted by the background mutation rate [6]. This analysis suggested that the majority of nonsynonomous mutations were passenger mutations and that the number of driver genes mutated in an individual colorectal or breast cancer averaged only 15 and 14, respectively. Among the driver genes identified, only a few were mutated in a large proportion of tumors, whereas most were mutated in < 5% of the tumors (Figure 1). Comparison of the driver genes mutated in two tumors of the same type also showed little overlap, which is not surprising given that most driver genes are mutated in only a few percent of the tumors. These differences in the spectrum of driver genes mutated in different tumors of the same type are likely to cause the wide variations in tumor behavior and response to therapy seen in individual tumors. These data support the view that cancer is caused by a large number of driver mutations, each associated with a small fitness advantage, to drive tumor formation. Statistical analysis of the genes mutated in breast cancer suggested that many function in PI3K signaling. This pathway was also overrepresented in the genes mutated in colorectal cancer, in addition to other pathways related to cell adhesion, the cytoskeleton and the extracellular matrix. Exactly how many pathways need to be deregulated to induce cancer remains unknown. Although the fraction of mutations that were single base substitutions was similar in colorectal and breast cancer, the nature of the substitutions was strikingly different. The largest difference occurred at C:G base pairs: 59% of

Review

Trends in Genetics

Vol.25 No.10

Box 1. Pattern of somatic mutations in human cancer genomes

Figure 1. A 2D map of the genes mutated in colorectal cancers, in which a few genes (i.e. TP53, APC, PIK3CA and FBXW7) are mutated in a large proportion of tumors, whereas most genes are mutated infrequently. Mutations in two different colorectal cancers are indicated on the lower map. Note that only a few genes are mutated in both colorectal cancers (in blue). Reproduced, with permission, from [12].

colorectal cancer mutations were C:G to T:A transitions, whereas only 7% were C:G to G:C transversions. By contrast, only 35% of breast cancer mutations were C:G to T:A transitions, whereas 29% were C:G to G:C transversions. A large proportion (44%) of the mutations in colorectal cancers were at 5’-CpG-3’ dinucleotide sites, but only 17% of the mutations in breast cancer occurred at such sites. This leads to an excess of nonsynonomous mutations, resulting in changes of arginine residues in colorectal cancer, but not in breast cancer. By contrast, 31% of mutations in breast cancer occurred at 5’-TpC-3’ sites, whereas only 11% of mutations in colorectal cancer occurred at dinucleotide sites. This difference is highly significant ( p < 0.0001) and consistent with other studies, which suggest that the spectrum of mutations in any given cancer is influenced by previous exposure to environmental and chemical carcinogens (Box 1). Colorectal and breast cancers were also evaluated with single nucleotide polymorphism (SNP) arrays containing at least 317 000 SNP probes to detect homozygous deletions and amplifications. These data were then integrated with the sequencing data, which made it possible to identify genes and cellular pathways affected by both copy number changes and point mutations. Colorectal and breast tumors had on average seven and 18 copy number alterations, respectively, defined as homozygous deletions or amplifications to at least 12 copies per cell. The average number of protein-coding genes affected by amplification or

Insights into the general patterns of somatic mutations in human cancer have been provided by sequencing coding exons and splice sites of 518 protein kinase family genes in 210 different types of human cancers [21–23]. A total of 1007 somatic mutations were detected. Of these, 921 were nonsynonomous and 219 were synonomous mutations. There was substantial variation in the number and pattern of mutations in individual cancers, reflecting the different exposure to carcinogens, DNA repair defects and cellular origins. The highest mutation rate (77 mutations per Mb) was seen in two gliomas that were recurrences after treatment with the anticancer drug temozolomide, a known mutagen. Some melanomas and lung cancers also showed increased mutation rates that might relate to the extent of their past exposure to UV radiation and tobacco smoke, respectively. Another five cancers were identified with defective DNA mismatch repairs, leading to microsatellite instability (MIN) and an increased number of base substitutions (14–40 per Mb), and small insertions/deletions at polynucleotide tracts (5–12 per Mb). Among the primary cancers studied, lung carcinomas showed the greatest number of somatic mutations (4.21 per Mb), followed by gastric (2.10 per Mb), ovarian (1.85 per Mb), colorectal (1.21 per Mb) and renal (0.74 per Mb) cancers. Cancers with the fewest mutations were testicular cancer (0.12 per Mb), lung carcinoids (0 per Mb) and breast cancer (0.19 per Mb). Cancers with the highest mutation rates mainly originated from the high turnover of surface epithelia that are subject to recurrent mutagen exposure. This was not the only factor effecting mutation rates, however, as ovarian cancer had a higher mutation rate than colorectal cancer even though it is thought to arise from a region not normally exposed to carcinogens.

homozygous deletions were nine and 24 per colorectal and breast cancer, respectively. CAN genes that harbored point mutations were sometimes amplified or deleted in other tumors. Among 19 CAN genes amplified in colorectal and breast cancer, only eight were previously implicated in tumorigenesis. A few genes were also altered in both breast and colorectal cancers. For example, the V-ERB-B2 avian erythroblastic leukemia viral oncogene homolog 2 (ERBB2) was amplified in both breast and colorectal cancer, whereas fibroblast growth factor receptor 2 (FGFR2) was mutated in breast cancer and amplified in colorectal cancer. Otherwise, there was little overlap in the genes amplified in breast and colorectal cancer. The same was true for the 19 CAN genes homozygously deleted in breast and colorectal cancer. Three were inactivated in both breast and colorectal cancer. Others were deleted in other types of cancer but not in breast and colorectal cancer. The rest are not known to be affected by homozygous deletions and are candidates for new tumor suppressor genes. Analysis of copy number changes also provided general insights into the functional effects of point mutations. For example, single nucleotide substitutions in genes that were deleted in other tumors are more likely to be inactivating, whereas substitutions in genes that were amplified in other tumors are more likely to be activating. The genetic landscape of glioblastoma multiforme and pancreatic cancer Genome-wide studies have also been reported for glioblastoma multiforme and pancreatic cancer [7,8], with fairly 457

Review similar results. One major difference was in the total number of nonsynonomous mutations in glioblastoma multiforme, which was substantially smaller than the number in pancreas, breast and colorectal cancer. The mostly likely reason is the reduced number of cell generations in glial cells before the onset of cancer. Pathway analysis showed that many of the signaling pathways deregulated in glioblastoma multiforme were also deregulated in pancreas, colorectal and breast tumors, such as those regulating control of cellular growth, apoptosis and cell adhesion. However, a number of pathways were enriched only in glioblastoma multiforme, including channels involved in the transport of sodium, potassium and calcium ions, as well as nervous system-specific cellular pathways such as synaptic transmission, transmission of nerve impulses and axonal guidance. Thus, many signaling pathways seem to be shared among different cancer types, in addition to cancer type-specific pathways. Serial analysis of gene expression was also performed to analyze the transcriptome of glioblastoma multiforme because this technique can identify epigenetic alterations not detectable by sequencing or copy number changes alone. This helped identify previously uncharacterized target genes from amplified and deleted regions. It also aided in the identification of the best CAN gene lying within large amplified and HD regions. Finally, it helped assess the significance of gene sets implicated in pathways enriched for genetic alterations. Interestingly, the integrated genomic analyses of glioblastoma multiforme led to the discovery of recurrent alterations in the active site of isocitrate dehydrogenase 1 (IDH1) in 12% of glioblastoma multiforme patients. IDH1 mutations occurred in a large proportion of young patients and in most patients with secondary glioblastoma multiforme, and were associated with an increase in overall survival. This illustrates the value of a genome-wide screen because in a more limited screen only analyzing genes deemed to have the highest probability of being involved in cancer, IDH1 would almost certainly have been excluded. Analysis of the genes mutated in pancreatic cancer identified 12 partially overlapping pathways that were genetically altered in the great majority of pancreatic cancers (Table 1). One striking finding was that the pathway components that were altered in individual tumors varied widely. For example, 100% of pancreatic cancers had mutations in genes that are predicted to function in the hedgehog-signaling pathway. However, the number of mutated genes in the pathway was large. In addition, with few exceptions, only one gene in a pathway was mutated in any one cancer (Table 2). This again demonstrates the value of a genome-wide screen because only a small fraction of tumors will contain a mutation in any one gene in a pathway, and only through the analysis of all genes in a pathway will the contribution of the pathway to the cancer become evident. Assessing structural aberrations involved in cancer None of the published studies have incorporated approaches designed to detect structural aberrations, such 458

Trends in Genetics Vol.25 No.10

Table 1. Core signaling pathways and processes genetically altered in most pancreatic cancers Regulatory process or pathway

Number of mutated genes in pathway

Apoptosis DNA damage control Regulation of G1/S transition Hedgehog signaling Homophilic cell adhesion Integrin signaling c-Jun N-terminal kinase signaling KRAS Regulation of invasion GTPase-dependent signaling TGF-b signaling WNT/Notch signaling

9 9 19 19 30 24 9 5 46 33 37 29

Percentage of tumors with genetic alterations in pathway 100% 83% 100% 100% 79% 67% 96% 100% 92% 79% 100% 100%

Data from [7], reproduced with permission from AAAS.

as balanced chromosome translocations, inversions and rearrangements on a genome-wide scale, although this has now become possible (Box 2). Structural aberrations are an important class of cancer-causing mutations (Box 3) and have been detected in every major tumor type, representing at total of more than 56 400 tumors. They can be classified based on characteristic chromosome abnormalities (see the Mitelman Database of Chromosome Aberrations in Cancer; http://cgap.nci.nih.gov/Chromosomes/ Mitelman). More than 500 recurrent balanced translocations have been identified, and many show remarkable specificity regarding tumor subtypes. The majority of translocations have been detected in hematological malignancies, which is not surprising given that these tumors are the easiest to karyotype. Solid tumors, such as epithelial tumors, which cause 80% of all cancer deaths, also contain recurrent balanced translocations, although the data on these tumors is much less complete because of the inherent difficulty in karyotyping these tumors, and because they usually have multiple structural aberrations at the time of diagnosis. Why are there so many cancer genes? Analysis of the CAN genes identified so far suggests that signaling pathways rather than individual genes dictate the course of tumorigenesis, and that the number of signaling pathways needed to be deregulated to induce cancer is relatively small. The pathway components that are altered in individual tumors, however, vary widely, suggesting that mutations in many genes in the pathway can have a similar affect on tumor cell growth. This would explain why there are so many infrequently mutated cancer genes. There are a few instances, however, where only one or a few genes in a pathway are commonly mutated. A good example is the adenomatous polyposis of the colon (APC) gene, which functions in the WNT pathway and is mutated in 80% of human colorectal cancer. In other cancers such as breast cancer, APC is not frequently mutated even though WNT pathway activation seems critical in this disease. WNT pathway activation in breast cancer is instead caused by the stabilization of b-catenin, a downstream component in the WNT pathway. Why this

Review

Trends in Genetics

Vol.25 No.10

Table 2. Mutations of the TP53, PI3K and RB1 pathways in GBM tumors Tumor No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

TP53 pathway TP53 MDM2 Del Mut Mut

MDM4

PI3K pathway PTEN PIK3CA Mut Mut

Amp Mut

PIK3R1

IRS1 Mut

RB1 pathway RB1 CDK4

CDKN2A Del

Mut Mut

Del Del

Mut

Del Del

Mut Mut Mut Mut Mut

Amp

Mut Mut

Del Del Del

Amp Mut Mut

Del

Del Mut Mut

Del Del

Mut Mut

Abbreviations: Mut, mutated; Amp, amplified; Del, deleted. Data from [8], reproduced with permission from AAAS.

occurs is unclear but it might be because of the other functions ascribed to APC, which play a critical role in several cellular processes such as cell division, how a cell attaches to other cells within a tissue and whether or not a cell moves within or away from a tissue. Not all these functions are associated with WNT activation. The loss of APC in colorectal cancer might, therefore, be highly selected because this activates WNT signaling and at the same time affects several other cellular processes important for a colon cancer stem cell to become a colorectal cancer. Validating CAN genes There has been some discussion over whether the statistical methods used by Sjoblom et al. [6] to identify driver genes in breast and colon cancer were appropriate. Forr-

Box 2. Methods for detecting structural aberrations in cancer A number of methods for detecting genomic rearrangements have been reported that are amenable to high-throughput analysis of cancer genomes. One method uses high-throughput sequencing to generate millions of sequence reads from both ends of short DNA fragments derived from tumor cells. By investigating read pairs from two individuals with lung cancer that did not align correctly with respect to each other on the reference genome, Campbell et al. [24] identified 306 germline structural variants and 103 somatic rearrangements to the base pair level of resolution. Another method uses transcriptome sequencing to detect gene fusions in cancer cells. High-throughput transcriptome sequencing of a single breast cancer cell line HCC1954 identified genomic rearrangements leading to fusions or truncations of genes known to be involved in cancer, in addition to a number of novel genes [25]. Transcriptome sequencing also successfully rediscovered the BCR-ABL gene fusion in a CML cell line and the TMPRSS2-ERG gene fusion in a prostate cancer cell line [26]. These studies open up an important class of cancer-related mutations for comprehensive characterization.

est et al. argued that P values rather than point probabilities should have been used [9]. Others agreed and further suggested that the background mutation rates used in the analysis were too low [10,11]. Their reanalysis of the data using P values and higher background rate assumptions identified few if any driver genes. Sjoblom et al. acknowledged this problem, which led them to reanalyze their data using a Bayesian approach [12]. The key feature of the Bayesian approach is the in silico generation of a study identical to the one originally performed, except that all mutations are now assumed to occur at the passenger rate (i.e. there are no driver genes). They argued that P values are computationallyintensive to evaluate because the data were collected in a two-stage approach, where only genes that harbored at least one mutation in the discovery screen were reanalyzed for mutations in the validation screen. Although their reanalysis of the data yielded similar results to those originally published [12], the authors concede that sequencing data at best can only point to CAN genes worthy of further study, and that independent validation is required. It is generally felt, however, that the biological methods used in genome-wide screens are sound, but that much larger samples are needed to obtain sufficient power. Towards this end, the National Cancer Institute and the National Human Genome Research Institute have launched The Cancer Genome Atlas Pilot Project (http:// cancergenome.nih.gov), which aims to assess the feasibility of a full-scale effort to systematically explore the entire spectrum of genomic changes in a much larger set of tumors for each of the major cancers (Box 4). Although still in the early days, this project has already yielded interesting results for glioblastoma multiforme and lung cancer, and demonstrated the feasibility of performing large-scale cancer screens in a community-wide setting. 459

Review Box 3. Chromosome aberrations in cancer Translocations A chromosome abnormality can be caused by the reciprocal exchange of genetic material between nonhomologous chromosomes. A fusion gene can be created when the translocation joins two separate genes, which is a common event in cancer. Translocations are referred to as balanced, when genetic material is neither lost or gained, or unbalanced. Amplifications and deletions Chromosome duplications can result from an error in homologous recombination, a retrotransposition event or duplication of an entire chromosome. Duplications resulting from errors in homologous recombination arise from unequal crossing over during meiosis or mitosis, and are facilitated by the sharing of repetitive DNA between chromosomes. The product of the recombination is a duplication at the site of exchange and a reciprocal deletion. Deletions can become homozygous following mitotic recombination involving non-sister chromatids, whereas the process of amplification can continue until multicopy amplicons are formed. Duplications and amplification can be intragenic or span large amounts of DNA. Inversions An inversion is a chromosome rearrangement in which a segment of a chromosome is reversed end-to-end. Paracentric inversions are inversions where both breaks occur in one arm of a chromosome and do not involve the centromere, whereas pericentric inversions include the centromere and are caused by chromosome breaks in both chromosome arms. Inversions can be balanced or unbalanced and can create fusion genes of the type produced following chromosome translocation.

High-throughput insertional mutagenesis for cancer gene discovery Formal proof that a CAN gene is a bone fide cancer gene still requires biological validation. Insertional mutagenesis screens in mice using slow transforming retroviruses have for years provided a powerful method for identifying novel mouse CAN genes and, at the same time, helped validate human CAN genes by demonstrating their potential involvement in mouse cancer [13]. Retroviruses induce cancer by inserting themselves into the mouse genome where they occasionally deregulate a cellular proto-oncogene or inactivate a tumor suppressor gene. Thus, the insertion sites in the tumors serve to mark the location of candidate mouse cancer genes, which can then be identified through the cloning and sequencing of the insertion sites. Hundreds of candidate mouse cancer genes have been identified by insertional mutagenesis [14]. Tumors often also contain multiple insertions, which serve to mark the location of cooperating cancer genes and oncogenic signaling networks [13]. The limitation of retroviruses is that they primarily induce hematopoietic tumors and mammary cancers, but little else. Therefore, retroviral insertional mutagenesis cannot be used to screen for cancer genes in solid tumors, which are the primary tumor type that affects us. In the past few years, it has become possible to mobilize Sleeping Beauty (SB), a member of the Tc1/mariner class of transposons, in mouse somatic cells at frequencies high enough to induce cancer. Like retroviruses, cancer results from SB-induced insertional mutagenesis of oncogenes and tumor suppressor genes (Figure 2), and the SB insertion sites in tumors serve to mark the location of candidate mouse cancer genes. The development of a conditional 460

Trends in Genetics Vol.25 No.10

floxed SB transposase allele that can be activated tissuespecifically by Cre recombinase [15,16] now makes it possible to model solid tumors in mice using SB insertional mutagenesis. When SB transposition was induced early in the development in all somatic cells, the mice died from very aggressive hematopoietic tumors within a few months of birth, and analysis of the SB insertions sites identified a number of genes and signaling pathways important for hematopoietic cancer [17]. Ubiquitous mobilization of SB also accelerated tumor formation in mice that carried a knockout mutation in the Cdkn2a tumor suppressor gene [18], thereby permitting the identification of genes that cooperate with Cdkn2a in tumor formation. Interestingly, among the top 20 most commonly mutated genes in hematopoietic tumors, 13 were validated cancer genes and 11 were validated hematopoietic cancer genes. When SB transposition was limited to the gastrointestinal (GI) tract, GI tumors were induced that were similar to those observed in adenomatous polyposis coli, multiple intestinal neoplasia (ApcMin) mice [16]. Analysis of over 16 000 SB insertion sites from these tumors identified 77 candidate mouse GI tract cancer genes, 60 of which are also mutated and/or deregulated in human CRC and are likely to drive tumorigenesis. These genes include APC, which is mutated in 80% of human colorectal cancer, bone morphogenetic receptor, type 1a (BMPR1A), phosphatase and tensin (PTEN), mothers against decapentaplegic, Drosophila, homolog 4 (SMAD4), which is mutated in human polyposis syndrome patients, and F-box and WD40 domain protein 7 (FBXW7), which is deleted in > 10% of human colorectal cancers. Seventeen new CAN genes were also identified that had not previously been implicated in human CRC. One of these was cyclin-dependent kinase 8 (Cdk8), which has recently been confirmed to be a human colorectal cancer oncogene that regulates b-catenin activity [19]. Similar types of results have also been reported for mouse hepatocellular carcinomas induced by SB [15]. These studies show the value of insertional mutagenesis for validating human CAN genes and identifying new genes involved in cancer that were missed in human studies. A transposon-based platform for validating human CAN genes Transposons also offer vehicles for rapidly transferring CAN genes into the mouse genome to validate their potential biological role in cancer. In a proof of principle experiment, promoter-less cDNAs for known human proto-oncogenes Kirsten murine sarcoma virus 2 (KRAS), BRAF and avian myelocytomatosis viral oncogene homolog (MYC), with or without oncogenic mutations, were cloned into SB transposons and mobilized randomly into mouse somatic cells via constitutively expressed SB transposase [20]. The cancer gene cDNAs then had the opportunity to search the entire genome for the optimal regulatory elements, appropriate temporal points and right cellular compartments to exert their oncogenic potential. This has many advantages over conventional transgenic approaches, where only one promoter and one cDNA are tested in an experiment. The mice

Review

Trends in Genetics

Vol.25 No.10

Figure 2. A mutagenic SB transposon that can both activate cellular oncogenes and inactivate tumor suppressor genes [17]. The mutagenic SB transposon contains a ubiquitously expressed murine stem cell virus (MSCV) LTR and a splice donor (SD) site. When the transposon integrates upstream, or in an intron, of a cellular protooncogene it can deregulate expression of the gene by splicing into a downstream exon. The mutagenic SB transposon also contains splice acceptor sites in both strands (SA, En2-SA) and a bi-directional polyA (pA). Therefore, it can integrate into an intron of a tumor suppressor gene in either orientation and truncate its transcript, and thereby induce a loss-of-function mutation.

developed a broad spectrum of tumors at an early age and expressed the relevant oncogenic cDNAs [20], attesting to the value of this method for the validation of human CAN genes.

Box 4. The Cancer Genome Atlas Pilot Project The Cancer Genome Atlas Pilot Project aims to systematically identify the entire spectrum of genomic changes in 500 tumors and matched controls for the most common human cancers. This will be a massive undertaking given the large number of tumor samples to be analyzed. A pilot project is focusing on three cancers: glioblastoma multiforme, lung squamous cell carcinoma and ovarian serous cystadenocarcinoma. If this pilot is successful, it will be expanded to include 50 different types of human cancers. All data will be placed in the public domain and made available to the international research community. Preliminary results published for glioblastoma multiforme [27] and lung cancer [28] are encouraging. They have revealed a link between the methylguanine-DNA methyltransferase (MGMT) promoter methylation and a hypermutator phenotype in glioblastoma multiforme from a mismatch repair deficiency in treated glioblastoma multiformes. They also suggested a link between smoking status and DNA repair defects in lung cancer, and identified several signaling pathways operative in lung cancer, including MAPK signaling, p53 signaling, WNT signaling, cell cycle and mTOR pathways.

Concluding remarks The ability to study cancer on a genome-wide scale has shown that cancer is much more complex than previously thought (Box 5) and that to really understand cancer hundreds of tumors and matched controls for each of the major cancers that affects us will need to be analyzed. This is a massive project that eclipses even the human genome project and will require both national and international cooperation. It will also be expensive, even with anticipated improvements in sequencing technologies, and there are those in the scientific community who feel that this money could be better spent elsewhere. The amount of data generated will also be massive and require the development of new methods for handling and interpreting it. Validating the hundreds of CAN genes that surely will be identified will also be challenging and require the development of new methods that are not obvious now. Even when all the genes are validated and their roles in cancer known, how will we decide which genes are the best drug targets? Better yet, who will pay for all the new drug development, and who will fund the clinical trials? These are but some of the challenges facing us, but given the burden on society that cancer represents, how could we do anything but accept the challenges that lie ahead? 461

Review Box 5. Major unanswered questions 1. How many cancer genes are there; what are they; and how, when and where do they function during cancer development? 2. What are the core cancer signaling pathways? 3. How much variation is there among the genes and signaling pathways modified in cancers of the same class/subclass and different forms of cancer? 4. What is the role of epigenetic modifications in cancer? 5. How many genes and signaling pathways are mutated or epigenetically modified in an individual cancer? 6. What are the genetic determinants and environmental risks for cancer? 7. Can cancer be prevented before it occurs, or at a very early stage? 8. Once cancer develops can effective therapeutics be developed that are directed at the core cancer signaling pathways themselves rather than individually mutated genes? 9. Which genes are the best drug targets? 10. How much interconnectivity is there between cancer signaling pathways and how will perturbations in one pathway affect the others? 11. Will customized cancer therapies prove more effective than noncustomized ones?

Acknowledgements Support was provided by the Biochemical Research Council (BMRC), Agency for Science and Technology and Research (A*STAR), Singapore.

References 1 Schiffer, C.A. (2007) BCR-ABL tyrosine kinase inhibitors for chronic myelogenous leukemia. N. Engl. J. Med. 357, 258–265 2 Davies, H. et al. (2002) Mutations of the BRAF gene in human cancer. Nature 417, 949–954 3 Samuels, Y. et al. (2004) High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 4 Bardelli, A. et al. (2003) Mutational analysis of the tyrosine kinome in colorectal cancers. Science 300, 949 5 Wang, Z. et al. (2004) Mutational analysis of the tyrosine phosphatome in colorectal cancers. Science 304, 1164–1166 6 Sjoblom, T. et al. (2006) The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–274 7 Jones, S. et al. (2008) Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 8 Parsons, D.W. et al. (2008) An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812

462

Trends in Genetics Vol.25 No.10 9 Forrest, W.F. and Cavet, G. (2007) Comment on ‘‘the consensus coding sequences of human breast and colorectal cancers’’. Science 317, 1500 10 Getz, G. et al. (2007) Comment on ‘‘the consensus coding sequences of human breast and colorectal cancers’’. Science 317, 1500 11 Rubin, A.F. and Green, P. (2007) Comment on ‘‘the consensus coding sequences of human breast and colorectal cancers’’. Science 317, 1500 12 Wood, L.D. et al. (2007) The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–1113 13 Kool, J. and Berns, A. (2009) High-throughput insertional mutagenesis screens in mice to identify oncogenic networks. Nat. Rev. Cancer 9, 389–399 14 Akagi, K. et al. (2004) RTCGD: retroviral tagged cancer gene database. Nucleic Acids Res. 32, D523–527 15 Keng, V.W. et al. (2009) A conditional transposon-based insertional mutagenesis screen for genes associated with mouse hepatocellular carcinoma. Nat. Biotechnol. 27, 264–274 16 Starr, T.K. et al. (2009) A transposon-based genetic screen in mice identifies genes altered in colorectal cancer. Science 323, 1747–1750 17 Dupuy, A.J. et al. (2005) Mammalian mutagenesis using a highly mobile somatic Sleeping Beauty transposon system. Nature 436, 221–226 18 Collier, L.S. et al. (2005) Cancer gene discovery in solid tumors using transposon-based somatic mutagenesis in the mouse. Nature 436, 272– 276 19 Firestein, R. et al. (2008) CDK8 is a colorectal cancer oncogene that regulates beta-catenin activity. Nature 455, 547–551 20 Su, Q. et al. (2008) A DNA transposon-based approach to validate oncogenic mutations in the mouse. Proc. Natl. Acad. Sci. U. S. A. 105, 19904–19909 21 Davies, H. et al. (2005) Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 65, 7591–7595 22 Greenman, C. et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 23 Stephens, P. et al. (2005) A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat. Genet. 37, 590–592 24 Campbell, P.J. et al. (2008) Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 25 Zhao, Q. et al. (2009) Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line. Proc. Natl. Acad. Sci. U. S. A. 106, 1886–1891 26 Maher, C.A. et al. (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 27 Network, T.C.G.A.R. (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 28 Weir, B.A. et al. (2007) Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893–898