Finding the ‘lost’ genes

Finding the ‘lost’ genes

144 News & Comment TRENDS in Biotechnology Vol.20 No.4 April 2002 Journal Club Finding the ‘lost’ genes The full sequence of the yeast genome was ...

15KB Sizes 2 Downloads 118 Views

144

News & Comment

TRENDS in Biotechnology Vol.20 No.4 April 2002

Journal Club

Finding the ‘lost’ genes The full sequence of the yeast genome was reported in 1997. At that time, it was predicted that 6274 genes are encoded by the yeast genome. Well, five years later we are still counting the number of genes. The earliest predictions were obtained using basic gene-finding algorithms and annotated sequences. Basically, anything shorter than 100 codons or any nested open reading frames (ORFs) were not included. ‘ It can be estimated that ~7% of an eukaryotic genome are ‘lost’ genes.’ In a recent publication, Kumar et al. [1] have demonstrated an approach that can rapidly identify some of the ‘lost’ genes in yeast. Their approach comprised combining gene-trapping, microarray-based expression analysis and genome-wide homology searching. First, expressed sequences from the yeast

genome were identified using a minitransposon and a lacZ gene lacking its promoter and start codon. The end result was a library of yeast strains. In essence, this library was tested for β-gal activity, being indicative of the insertion of the minitransposon lacZ within a transcribed and translated gene. Identification of the fusion proteins for the positive strains was achieved by sequencing the corresponding plasmids at the transposon–yeast DNA junction. Next, 196 non-annotated possible ORFs were identified from the 15 360 alleles. These possible ORFs were further screened by microarray expression analysis and genomic criteria were applied, resulting in the confirmation of 101 new genes. A bonus 36 more genes were identified by genome-wide homology searching. In this study, Kumar et al. identified twice as many new non-annotated genes than had been reported over the previous

four years in the literature. As such, the authors added 2% more genes to the yeast genome in this study. Altogether, 3% more genes have been added to the yeast genome over the past four years. It can be estimated that ~7% of an eukaryotic genome are ‘lost’ genes. In the case of human, this could potentially represent 2000 to 4000 new genes. Applying the technique of Kumar et al. to the scale of the human genome would be a challenge. Fortunately, alternative techniques, such as proteomics, can also be applied to the discovery of these ‘lost’ genes. 1 Anuj, K. et al. (2002) An integrated approach for finding overlooked genes in yeast. Nat. Biotechnol. 20, 58–63

Daniel Figeys [email protected] Franco Rossetto [email protected]

Array for cancer prognostics Much publicity has surrounded the human genome project and its promise of revolutionizing medical healthcare but, until recently, little direct benefit has emerged. In the past, many cancer prognoses had relied heavily on general parameters encompassed within the international prognostic index (IPI), such as tumour size and age of patient. The IPI is used to determine the level of treatment (e.g. chemo-, radio- and hormonal therapies) appropriate for the patient. In a recent article, Margaret Shipp and colleagues [1] take cancer prognosis a step further showing that gene activities that determine the biological behaviour of a tumour are more likely to reflect its aggressiveness. Oligonucleotide microarrays were used to determine the expression profile of 6817 genes in tissues of 58 patients with diffuse large B-cell lymphoma (DLBCL), the most common adult lymphoid malignancy with a mortality of >50%. Full treatment records and long-term follow-up were available for all 58 DLBCL patients in the http://tibtech.trends.com

study. When analysed using a ‘supervised learning’ prediction method, the gene expression patterns for the DLBCL patients could be divided into two categories; one with a 5 year survival rate of 70% and another with greatly reduced 5 year survival rates (12%). In short, some genes are over expressed in tissues from patients in remission from DLBCL (such as E2F) whereas others are over expressed in tissues from patients with fatal, or refractory, DLBCL (such as VEGF). The highest accuracy in prediction was obtained when 13 ‘key’ genes were used in the DLBCL outcome model. Given the success of these 13 key genes in predicting the clinical outcome of DLBCL, many might represent attractive therapeutic targets. ‘...the technology (...) should help in predicting the outcome of patients undergoing therapy.’ The microarray classifier developed by Shipp et al. offers a refinement of tumour classification and therefore the technology

could be used to improve the selection of patients for currently available treatments and should help in predicting the outcome of patients undergoing therapy. In a recent letter to Nature, the same technology was used to distinguish between embryonic tumours of the central nervous system, which had previously been difficult to diagnose [2]. The enormous amount of data generated by the human genome project has, through necessity, brought about collaboration between clinicians, pathologists, molecular biologists and bioinformaticians and progress in cancer prognosis and treatment is the result. 1 Shipp, M.A. et al. (2002) Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised learning. Nat. Med. 8, 68–74 2 Pomeroy, S.L. et al. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442

Andrew J. Mungall [email protected]

0167-7799/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved.