Update
264
TRENDS in Genetics Vol.21 No.5 May 2005
16 Gollnick, P. et al. (1990) The mtr locus is a two-gene operon required for transcription attenuation in the trp operon of Bacillus subtilis. Proc. Natl. Acad. Sci. U. S. A. 87, 8726–8730 17 Amster-Choder, O. and Wright, A. (1993) Transcriptional regulation of the bgl operon of Escherichia coli involves phosphotransferase system-mediated phosphorylation of a transcriptional antiterminator. J. Cell. Biochem. 51, 83–90
18 Zuker, M. and Stiegler, P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 10, 133–148
0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2005.03.002
Guanine–adenine bias: a general property of retroid viruses that is unrelated to host-induced hypermutation Viktor Mu¨ller1,2 and Sebastian Bonhoeffer2 1 2
Department of Plant Taxonomy and Ecology, Eo¨tvo¨s Lora´nd University, Pa´zma´ny P.s. 1/C, 1117 Budapest, Hungary Department of Ecology and Evolution, ETH Zu¨rich, ETH Zentrum NW, 8092 Zu¨rich, Switzerland
The recently discovered mammalian enzymes, APOBEC3G and 3F, induce guanine-to-adenine hypermutation in retroviruses. However, the preference of adenine over guanine in retroviral codon usage is not correlated with the presence or absence of APOBEC3G or its viral inhibitor (Vif), and its pattern does not reflect the biochemical properties of APOBEC3G action. The guanine–adenine bias of retroviruses is thus probably not a result of host-induced mutational pressure, but rather reflects a general predisposition associated with reverse transcription.
Introduction The high rates of retroviral mutation are often regarded as advantageous for viruses; however, a host-induced guanine-to-adenine (G/A) hypermutation process has recently been identified as an innate defence mechanism against these pathogens [1–5]. Apolipoprotein B mRNAediting enzyme, catalytic polypeptide-like 3G (APOBEC3G), a cytidine deaminase identified in the immune cells of primates and rodents, induces hypermutation by deaminating cytidine to uridine in the minusstrand DNA of retroviruses, which results in a G/A change in the positive coding strand. This mechanism was first discovered because of its ability to inhibit the replication of HIV-1 and has since been demonstrated to be effective against the b-retrovirus murine leukaemia virus [3,4] and hepatitis B virus [6], a hepadnavirus that also uses reverse transcription during its life cycle. APOBEC3G is expressed in the cells of the lymphoid and haematopoietic system [7], which are a major target of all retroviral genera. There is conflicting evidence for its expression in liver, pancreas and kidney (see Refs [7,8]). The viral infectivity factor (Vif) protein of HIV-1 and its Corresponding author: Mu¨ller, V. (
[email protected]). Available online 25 March 2005 www.sciencedirect.com
close relatives has evolved to counteract the effects of APOBEC3G [1–5]. The outcome of the struggle between APOBEC3G and Vif depends on the relative amount of the gene products within the cell [1,3,9], and the protection from the APOBEC-effect might therefore not always be complete – as shown by the high prevalence of hypermutated HIV-1 sequences [10]. G/A hypermutation has been observed in vivo in several retroviruses and also in the hepatitis B virus [11]. Unopposed hypermutation results in defective viruses that cannot produce viable offspring. Partially suppressed hypermutation, however, might produce a mutational pressure weak enough to enable the survival of mutated sequences, but strong enough to affect the codon usage of viruses. It has long been known that lentiviruses, the genus that includes HIV-1, preferentially use A-rich codons and have an adenine content that exceeds the neutral expectation of 25% [12]; this has been hypothesized to be due to APOBEC-induced mutational pressure [13]. This bias is shared by some but not all other retroviruses [14]. A general expectation would be to find a stronger bias in viruses that infect cells that are known to express a homologue of APOBEC3G compared with viruses that infect a non-expressing cell type, and to find the strongest bias in the viruses that lack Vif but infect cells that are armed with APOBEC3G. We investigated whether the codon usage of retroviruses corresponds to these patterns. We characterized guanine–adenine usage in terms of a guanine-to-adenine silent nucleotide bias (SNBG/A) index (Box 1). No correlation between G/A bias and the taxonomical distribution of APOBEC3G and Vif In primates, APOBEC3G is a member of a cluster of seven genes [7], one of which, APOBEC3F, shares the ability to induce G/A hypermutation [8,15,16], whereas only a single homologue of the gene is known in mice [17]. We
Update
TRENDS in Genetics Vol.21 No.5 May 2005
Box 1. Sequence editing and analysis One thousand six hundred and thirty complete viral reference sequences were downloaded from the NCBI database (http://www. ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html). Satellite and unclassified viruses, and sequences with no identified open reading frame were discarded; segments of segmented genomes were merged to obtain 1115 unique complete genomic sequences. Complete non-overlapping sequences were generated by in-frame concatenation of all non-overlapping and in-frame overlapping gene fragments. Out-of-frame overlapping regions or overlapping regions read from different nucleic acid strands were discarded. Overall guanine and adenine content (Gtot and Atot, respectively), and the frequency of the two nucleotides in codons where a G-to-A change is silent (Gsyn and Asyn), were then determined on the edited nonoverlapping sequences. We defined the index of G/A silent nucleotide bias (SNBG/A) as the natural logarithm of the ratio of the two nucleotides in synonymous codon positions, divided by their ratio in the whole non-overlapping genome: SNBG/A Z log
Asyn =Gsyn Atot =Gtot
[Eqn I]
C/T silent nucleotide bias (SNBC/T) was defined analogously as: SNBC/T Z log
Tsyn =Csyn Ttot =Ctot
[Eqn II]
Overall nucleotide frequencies are shaped by forces acting on both the nucleotide composition of the nucleic acids and on the aminoacid sequences that they encode. By contrast, the nucleotide frequency of synonymous codon positions is only affected by forces acting on the nucleotide composition of nucleic acids, including mutational pressure and selection acting on optimal codon usage and RNA structure [28]. The ratio of the two relative frequencies thus provides a good measure of nucleotide-specific effects, scaling the bias of synonymous nucleotide usage by the bias of overall usage. A natural logarithm is used in the index to make deviations of the ratio from unity comparable in the two directions. Positive SNBG/A values correspond to a preference of adenine over guanine, whereas negative values correspond to a preference of guanine over adenine. Sequence editing and analysis was performed using PERL programs and are available on request.
investigated whether G/A bias would therefore be observed in the viruses of primate and rodent hosts but not in the viruses of other host groups (Box 2). In addition to retroviral genera, we also analysed the other two families of viruses that use reverse transcription: hepadnaviruses, which might be affected by APOBEC3G in vivo, and caulimoviruses, a plant virus family. These two virus families together with retroviruses are collectively referred to as retroid viruses. The results demonstrate that the presence of APOBEC3G homologues is not a decisive factor in shaping G/A bias. One could argue that a homologue, or analogue, of APOBEC3G might be expressed in those vertebrate groups where no such gene has been identified. [A distant homologue of APOBEC3G, activation-induced deaminase (AID), is present in all jawed vertebrates, although it is not involved in viral hypermutation.] In this example, however, the presence or absence of a defence mechanism in the virus should be decisive. Vif proteins that can block the effect of APOBEC3G and its homologues have been described in all lentiviruses except for equine infectious anaemia virus (EIAV) [18], but no homologues are known in the other virus groups. Vif-encoding lentiviruses would therefore be www.sciencedirect.com
265
expected to have a weaker G/A bias than the other species that infect APOBEC3G-expressing cells. However, this remains to be confirmed (Box 2). No genomic imprint of APOBEC3G-action in viruses that are potentially affected The presence or absence of APOBEC3G and/or Vif does not appear to explain the pattern of G-A usage across retroviral genera. However, we still cannot exclude the possibility that mutational pressure by APOBEC3G contributes to the G/A bias of some retroid viruses, and that its effect is not apparent in the taxonomical analysis because other factors blur the overall picture. If the G/A bias of the viruses with distorted codon usage is caused by host-induced hypermutation, the genomes of these species should be imprinted with the context preference of APOBEC3G action. Human APOBEC3G and 3F and mouse APOBEC3G preferentially mutate guanines that are followed by adenine or guanine in the C1 and C2 downstream positions [3–5,8,10,13,19,20]. Therefore, we performed a test of independence between the G-A usage of codons that are synonymous for guanine and adenine in the third position and the occurrence of guanine or adenine in the C1 and C2 downstream flanking nucleotides (Table 1 in the supplementary material online). In summary, third-letter synonymouscodon usage is correlated with the preferred mutational spectra of APOBEC3G homologues in only a minority of retro- and hepadnaviruses, and the species with statistically significant association typically belong to groups that do not have an overall G/A bias. Thus, the analysis of context-dependence confirms that the G/A bias of retroid viruses is not related to APOBEC3G-induced hypermutation. Finally, most APOBEC3G homologues act specifically on single-stranded DNA (ssDNA), and can therefore operate almost exclusively on the negative DNA strand during reverse transcription [13,21]. Cytosine deamination on the negative strand results in a G/A mutation on the positive coding strand, whereas deamination on the positive strand results in a cytosine-to-thymine (C/T) mutation. In retroviruses, the G/A and C/T bias indexes are strongly correlated (Figure 1), which indicates a deamination mechanism acting on both strands. The correlation is not significant in hepadnaviruses, but it is highly significant in caulimoviruses (rZ0.85, P!10K5; data not shown). We note that negative-strand selectivity might not be complete for all members of the APOBEC family: rat APOBEC1 can deaminate viral RNA thereby also generating C/T mutations [22], although to a lesser extent than G/A mutations, and the same might be true of human APOBEC3F [16]. G/A bias is a general property of retroid viruses Three independent lines of evidence demonstrated that the G/A bias of retroviruses is not related to APOBEC3G-induced hypermutation. We cannot exclude the possibility that the bias reflects the effect of a related system that might have operated at an earlier stage of evolution or might be active even now. Alternatively, G/ A bias might reflect a general predisposition of reverse
Update
266
TRENDS in Genetics Vol.21 No.5 May 2005
Box 2. G/A bias and the taxonomical distribution of APOBEC3G and Vif
SNBC→T
1.0 0.8 0.6 0.4 0.2 0.0 –0.2 –0.4 –0.6 –0.8 –1.0 –1.2 –1.2 –1.0 –0.8 –0.6 –0.4 –0.2 0.0 0.2 0.4 0.6 0.8 1.0 SNBG→A TRENDS in Genetics
Figure 1. The correlation between G/A and C/T silent nucleotide bias in the family of retroviruses. The two bias indexes are strongly correlated (rZ0.79, P!10K6) and the regression line crosses close to the origin (the ordinate is not significantly different from zero) with a slope of 0.79, which indicates a biasing mechanism acting on both nucleic acid strands. Avian a-retroviruses, which have an A/G bias, and the groups with no consistent bias (g- and d-retroviruses) also fit the common regression line remarkably well. Within the individual genera, the correlation is statistically significant (P!0.05) in a- and g-retroviruses, lentiviruses and spumaviruses. b-, d- and 3-retroviruses are represented with four, six and two species, respectively, which might contribute to the lack of significant correlation in these groups. www.sciencedirect.com
1.2 0.9 0.6 SNBG→A
Homologues of APOBEC3G are present in primates and rodents only. From among the virus groups that infect more than one host group, b-retroviruses usually have a strong G/A bias; the bias is strongest in the species that infect primates and weakest in the species that infect rodents. The SNBG/A index of g- and d-retroviruses is scattered around zero, indicating that no G/A bias exists. Among lentiviruses, the first identified targets of the APOBEC3G-family, the bias of primate species lies within the broader range of non-primate (and non-rodent) species. Spumaviruses are also characterized by a strong G/A bias, which is stronger in primate than in non-primate, non-rodent species – making spumaviruses the only retroviral genus in which the expectations based on host APOBEC3G expression are fulfilled. Hepadnaviruses also have SNBG/A consistently above zero, and are the only group in which the greatest nucleotide-bias values are found in rodent species. Interestingly, avian hepadnaviruses also have positive SNBG/A, although the avian a-retroviruses are strongly biased in the A/G direction. The documented presence of APOBEC3G homologues is thus not a decisive factor in shaping G/A bias in retroid viruses: the bias of species infecting hosts with known APOBEC3G homologues (primates and rodents) is not consistently above the bias of viruses infecting other hosts. Vif proteins that can block the effect of APOBEC3G are only present in lentiviruses, with the exception of the equine infectious anaemia virus (EIAV). Remarkably, from the genera that have a host range overlapping that of lentiviruses, b-retroviruses and spumaviruses have similar SNBG/A, whereas g- and d-retroviruses have a lower SNBG/A than lentiviruses. In particular, human T-lymphotropic virus type 1 (HTLV-1), which replicates in T cells expressing APOBEC3G and lacks Vif, has a much lower SNBG/A than HIV-1, which replicates in the same cell type and encodes Vif. Furthermore, the bias of EIAV (SNBG/AZ0.400), the only known lentivirus that lacks Vif, is close to the average of all lentiviruses (0.407). The presence of Vif proteins in all other non-primate lentiviruses suggests that either the Vif protein encoded by these viruses has evolved to perform a function that is unrelated to the counteracting hypermutation, or that an APOBEC3G-like enzyme is also present in these host groups. The second possibility is supported by the finding that Vif-deleted variants of maedi-visna virus, a lentivirus that infects sheep,
0.3 0 1
2
3
4
5
6
7
8
9
–0.3 –0.6 –0.9 –1.2 Retroid virus taxa TRENDS in Genetics
Figure I. G/A silent nucleotide bias (SNBG/A) in retroid viruses. Seven genera of retroviruses (1–7) and two families of retroid viruses (8,9) are shown: 1, a-retroretroviruses (avian type-C virus group); 2, b-retroviruses (mammalian B-type and D-type viruses); 3, g-retroviruses (mammalian type-C virus group); 4, d-retroviruses (T-lymphotropic viruses); 5, 3-retroviruses; 6, lentiviruses; 7, spumaviruses; 8, hepadnaviruses; 9, caulimoviruses. Colour-coding indicates the taxonomical classification of hosts: red, primate; yellow, rodent; blue, other mammalian species; green, avian; black, fish; brown, plant. HTLV-1 and HIV-1 are depicted as open red circles in the genus of d-retroviruses (4) and lentiviruses (6), respectively. SNBG/A values above zero indicate a preference for adenine over guanine in the nucleotide usage.
cannot establish a productive infection and undergo increased G/A mutation [29]. At any rate, the presence of Vif cannot explain the variation in G/A bias, neither between retroviral genera nor among lentiviruses (Figure I).
transcription. This is supported by the observation that hepadnaviruses and caulimoviruses, which also use reverse transcription in their life cycle, both show a general preference of adenine over guanine. Caulimoviruses are presumably not affected by APOBEC3G because the enzyme has not been shown to have a homologue in plants. To determine whether G/A biased genome composition is indeed a distinguishing characteristic of retroid viruses, we calculated the SNBG/A index for all 1115 viral reference sequences and found that this groups displayed the greatest mean among the main virus groups (Figure 2). The ancestry of this property in retroviruses is evidenced by the A-biased codon usage of some human endogenous retroviruses, which probably reflects the genomic composition of the ancestral exogenous retrovirus that entered the genome w30 million years ago [23]. These observations indicate that reverse transcription might have a general tendency to generate G/A bias. Therefore, it is remarkable that the reverse transcriptase of HIV-1 generates G/A mutations at a greater rate than any other substitution, even in the absence of APOBEC3G [24]. Furthermore, several retroviruses use deoxyuridine triphosphatase that breaks down dUTP, and uracil-DNA glycosylase that removes misincorporated uracil from DNA – both enzymes protect against G/A substitutions [25]. Distorted DNA-precursor availability
Update
TRENDS in Genetics Vol.21 No.5 May 2005
(a)
(b)
SNBG→A
Mean SNBG→A
Mean ±SE ±2*SE
ssRNA–
ssRNA+
ssDNA
retroid
Main group
Mean ±SE ±2*SE
dsDNA
ssRNA+
ssRNA–
ssDNA
retroid
dsDNA
dsRNA
0.5 0.4 0.3 0.2 0.1 0.0 –0.1 –0.2 –0.3 –0.4
dsRNA
0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 –0.05 –0.10
267
Main group TRENDS in Genetics
Figure 2. The G/A silent nucleotide bias in the main groups of viruses. (a)The distribution of values is plotted for individual species (nZ1115) and (b) for the means of families (nZ78). The means of families (and of genera not assigned to a family) were calculated to reduce the sampling bias arising from the uneven representation of virus groups (e.g. a strong emphasis on important human and agricultural pathogens) in the set of reference sequences. In both cases, retroid viruses have the strongest mean G/A bias among the main virus groups categorized by general life cycle. Moreover, all three families of retroid viruses – retroviruses, hepadnaviruses and caulimoviruses – are biased in the G/A direction. Individual indices for species and families are displayed in Figures 1 and 2 in the supplementary material online.
has also been implicated in the generation of G/A hypermutation [26]. However, although a G/A mutational pressure produces G/A bias if it acts unopposed (e.g. if synonymous substitutions are truly neutral), a G/ A bias does not necessarily imply this type of mutational pressure. C/A or G/T mutational pressure, for example, can also increase the proportion of adenine compared with guanine, when acting on the positivecoding strand; if they act on both strands, the effect on the coding strand will be doubled, by increasing A and decreasing G simultaneously. In lentiviruses, the increased percentage of adenine is actually compensated by a reduced cytosine but not guanine content [12]. There are two explanations that can account for the apparent inability of APOBEC3G-induced hypermutation to leave a detectable genomic trace. On the one hand, a functional vif gene might provide perfect protection against APOBEC3G, and vif-negative species might have comparably efficient alternative defence mechanisms. APOBEC could then affect recently generated vifmutants only, which would give rise to defective genomes that are never passed on to the next generation. On the other hand, if the viral defences are not 100% effective, G/A mutations could occur at an elevated rate during all rounds of infection, thereby generating a genuine mutational pressure. This is supported by the observation that a G/A substitution is also the most prevalent mutation in non-hypermutated HIV-sequences, including adaptive drug-resistant genotypes [27]. In this scenario, the lack of a genomic imprint could also indicate the action of selective forces that override the effect of G/A mutational pressure in synonymous codon usage. For example, the correlation between G/A and C/T bias could, in principle, indicate selection for optimal GC-content, which might vary in different species. Concluding remarks Mutational pressure is one of the processes that can give rise to deviations from the expectation that the genome www.sciencedirect.com
uses all synonymous codons evenly. The discovery of a G/A-hypermutation mechanism affecting retroviruses, which have long been known to have an A-rich genome and undergo G/A hypermutation, apparently provided an example of this phenomenon. However, in this article, we have presented evidence that the adenine-biased genome composition of most retroviruses is probably not a result of APOBEC3G-induced mutational pressure, but instead reflects a general predisposition associated with reverse transcription. The biochemical background for this predisposition remains to be elucidated. Acknowledgements We thank P. Sharp, R.J. De Boer, M.S. Neuberger, D. Trono, B. Berkhout, B. Papp, C. Pa´l and I. Miklo´s for valuable comments and technical advice. We also acknowledge the support of the Hungarian National Research Fund (OTKA grant No. D45948 and M037082), and the Swiss National Science Foundation.
Supplementary data Supplementary data associated with this article can be found at doi:10.1016/j.tig.2005.03.004
References 1 Sheehy, A.M. et al. (2002) Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature 418, 646–650 2 Lecossier, D. et al. (2003) Hypermutation of HIV-1 DNA in the absence of the Vif protein. Science 300, 1112 3 Harris, R.S. et al. (2003) DNA deamination mediates innate immunity to retroviral infection. Cell 113, 803–809 4 Mangeat, B. et al. (2003) Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature 424, 99–103 5 Zhang, H. et al. (2003) The cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature 424, 94–98 6 Turelli, P. et al. (2004) Inhibition of hepatitis B virus replication by APOBEC3G. Science 303, 1829 7 Jarmuz, A. et al. (2002) An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics 79, 285–296
Update
268
TRENDS in Genetics Vol.21 No.5 May 2005
8 Wiegand, H.L. et al. (2004) A second human antiretroviral factor, APOBEC3F, is suppressed by the HIV-1 and HIV-2 Vif proteins. EMBO J. 23, 2451–2458 9 Kao, S. et al. (2003) The human immunodeficiency virus type 1 Vif protein reduces intracellular expression and inhibits packaging of APOBEC3G (CEM15), a cellular inhibitor of virus infectivity. J. Virol. 77, 11398–11407 10 Janini, M. et al. (2001) Human immunodeficiency virus type 1 DNA sequences genetically damaged by hypermutation are often abundant in patient peripheral blood mononuclear cells and may be generated during near-simultaneous infection and activation of CD4(C) T cells. J. Virol. 75, 7973–7986 11 Vartanian, J.P. et al. (2003) Death and the retrovirus. Trends Mol. Med. 9, 409–413 12 van Hemert, F.J. and Berkhout, B. (1995) The tendency of lentiviral open reading frames to become A-rich: constraints imposed by viral genome organization and cellular tRNA availability. J. Mol. Evol. 41, 132–140 13 Yu, Q. et al. (2004) Single-strand specificity of APOBEC3G accounts for minus-strand deamination of the HIV genome. Nat Struct Mol Biol 11, 435–442 14 Bronson, E.C. and Anderson, J.N. (1994) Nucleotide composition as a driving force in the evolution of retroviruses. J. Mol. Evol. 38, 506–532 15 Zheng, Y.H. et al. (2004) Human APOBEC3F is another host factor that blocks human immunodeficiency virus type 1 replication. J. Virol. 78, 6073–6076 16 Liddament, M.T. et al. (2004) APOBEC3F properties and hypermutation preferences indicate activity against HIV-1 in vivo. Curr. Biol. 14, 1385–1391 17 Mariani, R. et al. (2003) Species-specific exclusion of APOBEC3G from HIV-1 virions by Vif. Cell 114, 21–31 18 Oberste, M.S. and Gonda, M.A. (1992) Conservation of amino-acid sequence motifs in lentivirus Vif proteins. Virus Genes 6, 95–102
19 Beale, R.C. et al. (2004) Comparison of the differential contextdependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J. Mol. Biol. 337, 585–596 20 Bishop, K.N. et al. (2004) Cytidine deamination of retroviral DNA by diverse APOBEC proteins. Curr. Biol. 14, 1392–1396 21 Suspene, R. et al. (2004) APOBEC3G is a single-stranded DNA cytidine deaminase and functions independently of HIV reverse transcriptase. Nucleic Acids Res. 32, 2421–2429 22 Bishop, K.N. et al. (2004) APOBEC-mediated editing of viral RNA. Science 305, 645 23 Zsiros, J. et al. (1999) Biased nucleotide composition of the genome of HERV-K related endogenous retroviruses and its evolutionary implications. J. Mol. Evol. 48, 102–111 24 Mansky, L.M. and Temin, H.M. (1995) Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. J. Virol. 69, 5087–5094 25 Chen, R. et al. (2002) Roles of uracil-DNA glycosylase and dUTPase in virus replication. J. Gen. Virol. 83, 2339–2345 26 Vartanian, J.P. et al. (1994) G/A hypermutation of the human immunodeficiency virus type 1 genome: evidence for dCTP pool imbalance during reverse transcription. Proc. Natl. Acad. Sci. U. S. A. 91, 3092–3096 27 Berkhout, B. and de Ronde, A. (2004) APOBEC3G versus reverse transcriptase in the generation of HIV-1 drug-resistance mutations. AIDS 18, 1861–1863 28 Sharp, P.M. et al. (1995) DNA sequence evolution: the sounds of silence. Philos. Trans. R. Soc. Lond. B Biol. Sci. 349, 241–247 29 Kristbjornsdottir, H.B. et al. (2004) The vif gene of maedi-visna virus is essential for infectivity in vivo and in vitro. Virology 318, 350–359 0168-9525/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2005.03.004
Letters
Conventional P-values fail to assure reproducibility for genetic association tests Kenneth F. Manly1,2 1 2
Department of Pathology and Laboratory Medicine, University of Tennessee Health Science Center, Memphis, TN 38163 USA Department of Biostatistics, University at Buffalo, Buffalo, NY 14216 USA
Trends in Genetics recently published a timely article on meta-analysis of genetic association studies, a method that promises to remove some of the confusion generated by conflicting association results [1]. However, the review did not identify the cause of much of the conflict; namely, standard significance tests do not, in general, assure that a result will be reproducible. Scientists generally evaluate reproducibility by repeating an experiment – an expensive strategy for association studies. Irreproducible associations Most attempts to find statistical associations between genetic alleles and diseases have been irreproducible [2–7], and many causes have been invoked for this dismal record [2,5,8,9]. One cause is common to many studies: the confusion between Type I error and the posterior error rate (PER) [10], which is closely related to the false discovery rate Corresponding author: Manly, K.F. (
[email protected]). Available online 3 March 2005 www.sciencedirect.com
(FDR) [11,12], the proportion of false positives (PFP) [13] and false positive report probability (FPRP) [14]. The Type I error rate is the probability of an apparent association when no true association exists. The PER is the probability of no true association when an apparent association has been observed. These two quantities are related by Bayes’ Theorem, a relationship that also involves the power of the statistical test and the prior probability of a true association [10,13–15]. For a genetic association test, controlling the Type I error rate at conventional levels does not control either the PER or related error rates. The problem is that the prior probability of association is generally !0.05, and, when this is true, controlling the Type I error rate at 0.05 does not protect against the large proportion of false positives among apparent positive results [8,10–16]. Comparison with genetic linkage Declaring linkage between a genetic locus and a complex trait in a genetic cross involves the same statistical