UPDATE
BIOSILICO Vol. 1, No. 4 September 2003
CONFERENCE these tests.Very few websites comprehensively list where to send blood for these tests (http://www.genetests.org). Biomedical informaticians have to fill this role of connecting patients with physicians with discovery researchers, as well as diagnostic providers. How does your recent work in the dynamics of gene expression fit into the dominant research trends or agendas you would like to move forward? In studying at the genome scale, we often forget that these microarray measurements are only a snapshot in time. Consider the cancer that was taken out during surgery: the microarray measurement might be different if cancer were kicking in 12 hours later. With circadian rhythms, expression could change depending on the time of day. We might do analyses of microarray measurements forgetting that biology takes time. Many analytical techniques assume simultaneity. If we see two genes that fit the same the pattern – every time A goes up, B goes up – we tend to drawn conclusions. But a biological connection between two genes might take a few minutes, enough to throw off a strict correlation. Only by measuring across time can we start to see connections; this is why the study of measurements changing across time is going to have an important role in interpreting genome scale measurements.
You have gone to some lengths to distinguish gene dynamics from temporal or static approaches to clustering and other computational approaches, and have also written how dynamic relationships to gene expression levels could imply a functional relationship.Which of these have yet to be investigated? As an endocrinologist, a lot of cancer biologists study the cell cycle, where cells grow and replicate. Many cancer biologists feel that this is a fundamental cycle in their field. But I know there are many other frequencies and cycles in the body. If I have a newborn patient in front of me and infuse sugar into her veins, the insulin expression level will go up in seconds.Twelve years later, that baby starts undergoing puberty, and something controls genes expression over that decade. In endocrinology, we have daily cycles and reproductive monthly cycles.. We should not assume that there is one frequency with which to study these measurements. We need more methods in determining better ways to study the temporal process. Only through studying temporality can we get to causality and only when we have causality can we really think about etiology. Using the dynamics approach to gene expression you found an interesting complementarity between statics and dynamics approaches.Why can’t we study everything using dynamics?
A real problem is that there are very few data measurements made across time. We should measure them: with a fixed time interval, for example, every 20 minutes or every hour. Unfortunately, many time series measurements are made using exponential time intervals, for example, from time 1, time 2, time 12, to time 24, so there is no fixed time interval. We typically give very few time points to later things, yet if I see that something occurs in 24 hours, something must have happened at 23 hours to cause that and something else happened at 22 hours to cause that. I suggest that researchers wishing to move into dynamics make measurement using fixed time intervals. What professional achievements are you most proud of and what would you like to have achieved before the end of you career? I am proud to have reached my present point early enough in my life and career and early enough in a nascent field like genomics and bioinformatics. I would love to have an impact on diabetes, especially as it relates to paediatrics. Atul Butte Assistant in Endocrinology and Informatics and Attending Physician Children’s Hospital Boston, USA
Bioinformatics: a glimpse of the future Mark Ragan, e-mail:
[email protected]
The 11th International Conference on Intelligent Systems in Molecular Biology (ISMB 2003) was held 29 June–3 July 2003 in Brisbane, Australia. ISMB is the major annual international conference in bioinformatics. This year’s conference was divided into seven main themes: (1) phylogeny and genome rearrangements; (2) expression arrays and networks; (3) predicting clinical outcomes; (4) protein clustering, alignment and patterns; (5) transcription motifs and modules; (6) structure and hidden Markov models; and (7) text mining and high-throughput
methods. Reference to key problems in bioinformatics cropped up under multiple themes ranging from motif detection and pattern classification at the computer-science end of the field to detecting remote homologs and understanding gene transcription at the more biological end.
Systems bioinformatics This year at ISMB, there was a reduced emphasis on single stand-alone problems (e.g. motif finding, microarray analysis) and an increased focus on systems. In the
1478-5282/03/$ – see front matter ©2003 Elsevier Science Ltd. All rights reserved. PII: S1478-5282(03)02361-4
biological context, this meant more studies of metabolic and signaling pathways, protein–protein interaction networks and complex genetic regulation. For those more interested in informatics, there were challenging papers on chain functions, networked probabilistic models and Bayesian belief networks. In his keynote address, Ron Shamir (Tel Aviv University; http://www.math.tau.ac.il/ ~rshamir/) critically discussed module-based strategies for recognizing, assembling and annotating genetic networks. In addition, there was a
www.drugdiscoverytoday.com
119
UPDATE
BIOSILICO Vol. 1, No. 4 September 2003
CONFERENCE growing sophistication of higher level analyses in bioinformatics. Complex biological problems require the mobilization of multiple analytical methods whether sequentially or – as illustrated in several papers – in matrix-like schemes. This in turn necessitates careful attention to standards, integration, data management, workflow and overall optimization.The oftendaunting issues of data management and computation merited only occasional mention and were discussed in special interest groups. Some of the more powerful analyses involved the coordinated use of different data types, for example, genomic sequence and expression data, germplasm and SNP data, or phylogenetic trees and graphs of metabolic pathways. Integration in such contexts requires, but extends beyond, common standards, semantics or ontology.The best paper and best student paper awards went to two studies, both presented by Eran Segal (Stanford University; www-cs.stanford.edu/~eran), that elegantly used the expectation maximization (EM) algorithm to mobilize gene expression data with protein–protein interactions and promoter-sequence data, respectively.
New directions prefigured Genes and gene products play the lead roles in the so-called central dogma of molecular biology, so it is hardly surprising that much of classical bioinformatics has been developed to explain the structure, function and behavior of genes and proteins.These include the numerous methods for localizing exons and introns, locating upstream signals for the binding of ribosomes and transcription factors, understanding how proteins fold, annotating genes, and assembling enzymes into signaling, regulatory or biosynthetic pathways. As at previous ISMB conferences, many of the papers this year dealt with these undeniably important issues. Four of the keynote lectures, however, fired a series of loud warning shots across the bow of bioinformatics-as-usual. David Haussler (University of California, Santa Cruz; http://www.cse.ucsc.edu/ ~haussler/) reported on the use of comparative mammalian genomics to
120
www.drugdiscoverytoday.com
identify functional elements in the human genome. Regions that show evidence of having evolved under purifying selection, and in which synteny and finer-scale order has been conserved between human and mouse, do not necessarily map neatly to exons. Alternative splicing of complex genes can likewise be conserved across mammalian species. Haussler argued that understanding of function requires reconstruction of the history of each base in the human genome – not those of exons alone. John Mattick (The University of Queensland: http://www.imb.uq.edu.au/ mattick.html) has produced so-called noncoding RNAs as mediators of the development of complex organisms. Most of the human genome, and of genomes of other morphologically complex organisms, is transcribed in a tissue-specific manner. Evidence is accumulating that RNAs originating from transcripts of intergenic regions, exons, pseudogenes and other supposed genomic ‘junk’ can regulate gene expression.Thus, one gene or a genomic region can yield multiple outputs that could be modified to yield subtle – or perhaps not-so-subtle – temporal and spatial phenotypic variants. Mattick contended that this hidden layer of RNA-based developmental control has been the biggest ‘missed story’ of the molecular age in biology. Yoshihide Hayashizaki (RIKEN Genomic Sciences Center; http://www.gsc.riken. go.jp/e/group/themegenomee.html) reviewed the technical tour-de-force that has made the mouse transcriptome the best-known transcriptome for any complex organism. Among the 37 086 transcriptional units currently recognized, 16 599 (44.8%) do not code for protein, and 2057 (5.5%) occur as sense–antisense pairs. Among regions represented by multiple sequences, 41% are alternatively spliced and 79% of these alter the amino acid sequence of the protein. Many others contain 3′-end variants that probably modulate the lifetime of the transcript.The terms gene and locus are now inadequate to capture such complexity; transcriptional unit, covering both proteincoding and non-coding messages, is more
useful both conceptually and computationally. Hayashizaki described the multi-tiered databases that will be needed to represent genome networks and he showed us the unique copy of a book printed with not only descriptions and annotations of mouse complementary DNAs (cDNAs), but also reconstitutable bacterial clones containing these cDNAs. Sydney Brenner (Salk Institute; http:// www.molsci.org) provided a memorable take-home message to the conference by reminding us that genes do not evolve independently from one another. In genomes of complex eukaryotes, genes exist in isochores (large regions of a genome that have a characteristic G+C content) and exhibit genetic linkage.These isochores could be maintained by differential rates of change in degenerate codon positions – a mutational dynamism that occurs to different extents in the germlines of human, rodents or frogs. Individual genes within genomes (and by extension, different genomic regions such as isochores) could thus take on different characteristics over time. The Overton Prize of the International Society for Computational Biology (ISCB), recognizing for outstanding accomplishment by a scientist, was awarded to W. James Kent (UC Santa Cruz; http://www.cse.ucsc.edu/ ~kent/) for his contribution to bioinformatic tools for genomic research.The ISCB Senior Scientist Accomplishment Award, introduced at ISMB 2003, was given to David Sankoff (University of Ottawa; http://www. dms.umontreal.ca/professeurs/sankoff/) for his foundational work in many areas of computational biology including algorithmics, complexity, molecular structure and evolution, and genomic rearrangement. For the novice bioinformatician or interested biologist, computer scientist or mathematician, ISMB 2003 presented a superb opportunity to get a sense of the breadth and excitement of computational biology and bioinformatics. For the specialist, there was outstanding rigor and depth. And all of us were left with plenty to think about – better ways to do what we do, and the challenge to do what we currently cannot.