Where next?
The hunt for the “dark matter” in our genome
A
CCORDING to one genome scan, my DNA makes me 9 per cent more likely than the average European man to develop type 2 diabetes. Should I be concerned? Not really, because in fact I could well be less likely to get diabetes than the typical man with similar ancestry. The reason is not just that my predicted risk is only slightly higher than normal, and that diet and exercise are more important than genes. It is that so far we have only identified a tiny fraction of the genetic variants that affect our risk of developing common diseases. The 21 variants I had scanned by the Icelandic company DeCode Genetics, for instance, collectively explain just a shade over 6 per cent of the heritability of type 2 diabetes. Why is it proving so hard to pin down the genetic variations underlying common disorders, and what does the “dark matter” in the genome that accounts for this “missing heritability” consist of?
Based on studies of diseases caused by a mutation in a single gene, such as sickle cell anaemia, some geneticists predicted that more complex conditions would be caused by a larger number of mutations in a series of genes. Each mutation would have a smaller effect, but collectively they would explain most of the heritability of the disease. Once geneticists knew what a “normal” human genome looked like, the thinking went, they could catalogue the most common variations and look for links with common diseases. That, it was hoped, would lead to ways to treat or prevent these diseases. Many common genetic variants have now been catalogued, most of them single-letter changes in the genetic code known as SNPs. With the help of “DNA chips” packed with genetic probes, individuals can now be scanned for about a million common SNPs. In this way, geneticists have been comparing thousands of people affected by common
diseases with thousands of healthy controls. Although such studies have revealed more than 500 disease-associated DNA variants, most only have a tiny effect. In fact, the common variants found so far explain less than 10 per cent of the heritability of most complex traits. A small part of the reason for this might be that the effect of a variant sometimes depends on the environment in which its owner lives. It can also depend on what other variants an individual has. Conventional studies do not have the power to spot subtleties such as these. Ironically, though, it seems that the main explanation for the missing heritability is that geneticists underestimated the power of evolution. If a mutation has a seriously bad effect, natural selection prevents it becoming common. The exceptions are variants whose ill effects strike late in life, after people have passed on their genes, or variants that have benefits as well as downsides.
Dark matter Many biologists now think that the “dark matter” responsible for common diseases consists of a host of mutations that are each very rare, but have a powerful effect in individuals unlucky enough to inherit them. That makes the task much harder, because DNA-chip scans cannot reveal rare mutations. Instead, the genomes of many individuals will have to be sequenced. It will not be easy even then. When genomics pioneer Craig Venter unveiled his entire genome in 2007, it contained 4.1 million
Making sense of it all The finished human DNA sequences in official databases now add up to a whopping 300 billion base pairs – and this is just a fraction of the total. Nearly 4000 organisms have had their genomes sequenced, mostly viruses and bacteria, but the club now includes ever more plants and animals, from rice and wheat to the pufferfish and the platypus. Even death is no longer an obstacle. The genome of the long-extinct mammoth was unveiled two years ago, followed this year by the almost-complete sequence of a man who died 4000 years ago and most recently by the Neanderthal genome.
36 | NewScientist | 19 June 2010
While the amount of information is growing exponentially, our understanding of it is not keeping pace. “The sequencing is going faster than we have people to analyse the data,” says anthropologist John Hawks of the University of Wisconsin-Madison. Researchers are increasingly using software to predict what DNA sequences do – “in silico” biology – and trying to crack the codes that determine, for instance, how RNAs are spliced (see diagram, page 34) or where in the cell RNAs get turned into proteins. In the end, though, there is no substitute for getting your hands dirty: the only way to be sure what a particular
sequence does is to experiment with living cells and organisms. Fortunately, in some ways this is getting easier. Once a gene’s sequence is known, it is now possible to switch off, or silence, that specific gene in living cells. So biologists are now systematically silencing genes in cells in dishes or in animals such as mice and fruit flies to see what happens. In one recent study, almost every one of the 23,500-odd human genes was switched off, one at a time, and 200,000 videos recorded as cells divided under a microscope. The recordings will help reveal which genes are involved in cell division and what they do. MLP
Myth – the human genome has been completely sequenced
it’s not just the sequence, stupid
MAURO FERMARIELLO/spl
All existing sequencing methods work by breaking DNA strands into tiny pieces, sequencing them and reassembling the resulting “jigsaw”. The trouble is that some parts of the human genome – mainly the middle and ends of chromosomes – are so repetitive that it’s impossible to put it all back together. Around 10 per cent of the genome has yet to be sequenced, though it is unlikely to be hiding much of great importance
Genes underlie many traits, but it is hard to find the variants responsible
variants not seen in the reference human DNA sequence (PloS Biology, vol 5, p e254). Most were rare SNPs, but there were also numerous insertions, deletions and variations in the number of copies of individual genes. Everyone’s genome is similarly varied, so the problem is how to distinguish variants that cause disease from harmless ones. “We might as well admit that we don’t know how to do it,” says David Goldstein of Duke University in Durham, North Carolina. Still, geneticists do have some strategies in mind. One is looking for variants shared by people who represent the extreme of a trait – this might work for high blood pressure, for example. Another is to focus on variants that have cropped up as new mutations within families. A third approach is to combine
conventional studies looking for links between DNA variants and diseases with all kinds of other biological information. With the help of information on gene expression in human tissues and breeding experiments in mice, for instance, geneticists recently pinpointed a gene linked to type 2 diabetes. The link between variants in this gene and diabetes had previously been hidden in the statistical “noise” (PLoS Genetics, vol 6, p e1000932). On the face of it, the failure to find many common variants underlying common diseases is bad news for us all – companies will not develop drugs that work only for a few. However, many rare mutations may act through the same pathways, meaning one drug might still work for large numbers of people even though they do not have precisely the same mutation. Peter Aldhous n
Making sense of the 6 billion As, Gs, Ts and Cs that make up our genome is hard enough. Making matters even worse are chemical modifications known as “epigenetic” marks. Some of the Cs in our DNA have a small molecule called a methyl group attached. The methylation of lots of Cs in a region blocks gene expression, meaning none of a gene’s protein will be made. As an embryo develops, methylation is used to shut down different genes in different tissues. Incorrect methylation can lead to disorders such as cancers. While epigenetic marks can be inherited, the slate is mostly wiped clean with each generation. For this reason, most geneticists think epigenetic effects cannot explain our “missing heritability” (see “Where next?”, left) – though not all agree. Indeed, there may be much more to learn about epigenetics. Just last year, we discovered that in many brain cells, a slightly different molecule, 5-hydroxymethyl, is attached to some Cs (Science, vol 324, p 929). The importance, or not, of this form of methylation in the brain remains unclear. One of the reasons for this is that detecting epigenetic marks is still difficult. The technology is advancing fast, though: last month, Pacific Biosciences of Menlo Park, California, revealed that its new DNA sequencing method can detect methylation patterns at the same time (Nature, DOI: 10.1038/nmeth.1459). It can even distinguish between a methyl and a 5-hydroxymethyl group. MLP
Bijal Trivedi is a freelance science writer based in Washington DC. Michael Le Page is New Scientist’s biology features editor. Peter Aldhous is chief of New Scientist’s San Francisco bureau 19 June 2010 | NewScientist | 37