Hybrid Bayes factors for genome-wide association studies when a robust test is used

Hybrid Bayes factors for genome-wide association studies when a robust test is used

Computational Statistics and Data Analysis 55 (2011) 2698–2711 Contents lists available at ScienceDirect Computational Statistics and Data Analysis ...

473KB Sizes 0 Downloads 9 Views

Computational Statistics and Data Analysis 55 (2011) 2698–2711

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Hybrid Bayes factors for genome-wide association studies when a robust test is used Gang Zheng a,∗ , Ao Yuan b , Neal Jeffries a a

Office of Biostatistics Research, National Heart, Lung and Blood Institute, Bethesda, MD, USA

b

National Human Genome Center, Howard University, Washington, DC, USA

article

info

Article history: Received 11 February 2011 Received in revised form 29 March 2011 Accepted 30 March 2011 Available online 9 April 2011 Keywords: Bayesian model averaging Bayes factors Genetic models Genome-wide scan and ranking Posterior weighted likelihood Profile likelihood

abstract Bayes factor (BF) is often used to measure evidence against the null hypothesis in Bayesian hypothesis testing. In the analysis of genome-wide association (GWA) studies, extreme BF values support the associations detected based on significant p-values. Results from recent GWA studies are presented, which show that existing BFs may not be consistent with p-values when a robust test is used due to using different genetic models in the BF and p-value approaches and this may result in misleading conclusions. Two hybrid BFs, which combine the advantages of both the frequentist and Bayesian methods, are then proposed for the markers showing at least moderate associations (p-value < 10−5 ) based on a robust test. One is Bayesian model averaging using a posterior weighted likelihood and the other is the maximum BF using a profile likelihood. The proposed hybrid BFs and p-values of robust tests do not depend on a single genetic model, but instead, consolidate information over a set of models. We compare the hybrid BFs with two existing BF approaches, including an existing Bayesian model averaging method, in terms of false and true positive rates by simulations. The results show that, for markers showing at least moderate associations, both the hybrid BFs have higher true positive rates than the two existing BFs, while all false positive rates are similar. Applications of the two hybrid BFs to the markers associated with bipolar disorder, type 2 diabetes and age-related macular degeneration are presented. Our hybrid BFs provide better and more robust measures to compare significantly associated markers within and across GWA studies. Published by Elsevier B.V.

1. Introduction The frequentist and Bayesian analyses are two main methods for statistical inference. The former has the advantage of objectivity and simplicity to use, while the latter has the advantage of utilizing valuable prior information and interpreting results within and across studies. In the analysis of a genome-wide association (GWA) study, hundreds of thousands of genetic markers (single-nucleotide polymorphisms — SNPs) are tested. In the frequentist inference, an association of a marker with a disease is evaluated based on whether or not the p-value of an association test is less than a stringent significance level. The markers that have significant p-values are tested for replication in subsequent studies. A typical significance level for GWA studies is 5 × 10−7 . However, this significance level is determined and used in practice, independent of the sample size of studies, the effect size of associated markers, and the power to detect the association. For many common diseases, the effect size in GWA studies is small (odds ratio or genotype relative risk is in the range

∗ Corresponding address: Office of Biostatistics Research, National Heart, Lung, and Blood Institute, 6701 Rockledge Drive, Bethesda, MD 20892-7913, USA. Tel.: +1 301 435 1287; fax: +1 301 480 1862. E-mail address: [email protected] (G. Zheng). 0167-9473/$ – see front matter. Published by Elsevier B.V. doi:10.1016/j.csda.2011.03.021

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

2699

of 1.1–1.5). Markers with true associations can have p-values greater than the significance level or are not even ranked near the top based on the p-values (Zaykin and Zhivotovsky, 2005). The frequentist inference has difficulty in interpreting these intermediate p-values and in comparing results within and across studies. In Bayesian hypothesis testing, the Bayes factor (BF) is often used. BF has also been proposed for the analysis of GWA studies (WTCCC, 2007; Stephens and Balding, 2009). It measures the strength of association by integrating the significance of association with the power to detect it, while p-value only measures the significance of association (Stephens and Balding, 2009; Sawcer, 2010). When the sample size is large enough (e.g. thousands of cases and thousands of controls), large BF values, e.g. log10 BF > 5, strongly support observed small p-values and therefore the association. On the other hand, small BF values, e.g. log10 BF < 0, can be used to exclude the markers with small p-values as false positives with high confidence (Sawcer, 2010). In the analysis of GWA studies, we do not take a full frequentist approach nor a full Bayesian analysis. To our best knowledge, there has been no publication of finding novel genes in GWA studies by a full Bayesian analysis without reporting p-values (see also discussions in Stephens and Balding, 2009). Typically, both BFs and p-values may be reported for the top markers detected by significant p-values (e.g. WTCCC, 2007; Yasuno et al., 2010). In particular, BFs are reported to support the observed small p-values and hence the associations. For example, for a marker with true association and an observed small p-value, one expects log10 BF ≫ 0. In a real example shown later, log10 BFs ≈ 0 were observed for markers associated with bipolar disorder with very small p-values. Normally, this substantially different profile of p-values and BFs can be used as an evidence against associations. However, in the above example, the small BFs were obtained because the p-values and BFs were not calculated based on comparable genetic models. Thus, for this example, it is the genetic model mismatch, not lack of association or other factors, that contributes to the small BFs and small p-values. In this case, we say BF and p-value are not consistent. In GWA studies, the markers with true associations do not always show an additive effect and may have different modes of inheritance due to linkage disequilibrium of the functional loci (Zheng et al., 2009). Hence tests robust to the unknown genetic models are preferred. Many robust tests have been studied in the analysis of case-control genetic association studies. In this paper, we focus on two robust tests employed by the WTCCC (2007) and Sladek et al. (2007). Results from recent GWA studies (Section 3) show that existing BFs may not be consistent with the small p-values of the robust tests of the WTCCC (2007) and Sladek et al. (2007), when different genetic models are used in calculating BFs and p-values. Because the true genetic models are usually unknown in GWA studies, in this paper, we attempt to combine the advantages of both the frequentist and Bayesian methods, and propose two hybrid BF methods, Bayesian model averaging (BMA1 ) and maximum BF (maxBF). Here the hybrid refers to using BFs along with the smallest p-values and replacing the likelihood under the alternative hypothesis in the BF by the posterior weighted likelihood or the profile likelihood. By doing so, the two proposed hybrid approaches are not only robust to the model uncertainty but also more effectively support small p-values detected by the p-values of robust tests when associations are true. In Section 2, we give notation and briefly discuss trend tests and BF. Two motivating examples are presented in Section 3. In Section 4, the hybrid BFs are presented. Simulation studies are reported in Section 5 along with application to real data from three GWA studies, type 2 diabetes of Sladek et al. (2007), bipolar disorder of the WTCCC (2007), and age-related macular degeneration of Klein et al. (2005). Conclusions are given at the end. 2. Trend test and Bayes factor Let r = (r0 , r1 , r2 ) and s = (s0 , s1 , s2 ) be the counts of genotypes (G0 , G1 , G2 ) of a SNP among r cases and s controls, respectively. Under the null hypothesis of no association H0 , r and s follow the same multinomial distribution. Under the alternative hypothesis of association H1 , assume that G2 contains two risk alleles. Denote nj = rj + sj (j = 0, 1, 2) and n = r + s = n0 + n1 + n2 . Denote the observed data by X = (r, s). When the genetic model is known, let c (G0 ) = 0, c (G1 ) = θ and c (G2 ) = 1 be the coding function of the genetic effect. For the recessive, additive, and dominant models, θ = 0, 1/2, and 1, respectively. Using logistic regression model, the following likelihood functions are obtained: p0 = p0 (X |α, H0 ) = exp(r α)/ {1 + exp(α)}n , p1 = p1 (X |α, β, θ , H1 ) = exp {r α + β(θ r1 + r2 )} /

2 ∏ 

nj

1 + exp α + β c (Gj )



j =0

where pi is the likelihood under Hi (i = 0, 1). In both likelihoods, α is the log odds for the disease given G0 and a nuisance parameter, and β is the scalar parameter for the log odds ratio (OR) of the genetic effect. The trend test, denoted as Z (θ ) for a given θ , can be obtained as a score statistic using p1 to test H0 : β = 0 (Sasieni, 1997; Freidlin et al., 2002). Under H0 , for a given θ , Z 2 (θ ) ∼ χ12 , a chi-squared distribution with 1 degree of freedom. When the p-value of Z (θ ) is reported for a given θ , the Bayes factor (BF) is obtained as follows: BF(θ ) =

P (X |θ , H1 ) P (X |H0 )

 =

Ω1



p1 (X |η, θ , H1 )π1 (η|θ )dη

Ω0

p0 (X |α, H0 )π0 (α)dα

,

where η = (α, β)T and πi (·) is the prior density under Hi . One difference between Bayesian and classical inferences is that the parameters are averaged out in Bayesian analysis while they are maximized in classical inference. We call BF(θ )

2700

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

Table 1 p-value and log10 BF(θ) (θ = 0, 1/2, and 1) of the 14 SNPs with moderate or strong associations with BD reported by the WTCCC (2007). Strong association is indicated with *. SNP

p-valuea

log10 BF(0)

log10 BF(1/2)

log10 BF(1)

rs4027132 rs7570682 rs1375144 rs2953145 rs4276227 rs683395 rs6458307 rs2609653 rs10982256 rs10134944 rs11622475 rs420259* rs1344484 rs3761218

9.68 × 10−6 3.11 × 10−6 2.43 × 10−6 6.57 × 10−6 4.57 × 10−6 2.30 × 10−6 4.35 × 10−6 6.86 × 10−6 8.80 × 10−6 3.21 × 10−6 2.10 × 10−6 6.29 × 10−8 1.64 × 10−6 6.71 × 10−6

4.28 1.90 3.06 3.11 1.80 0.33 0.09 0.59 2.86 0.33 1.02 0.33 3.36 0.22

3.63 3.99 4.15 3.60 4.01 3.64 0.12 3.02 3.78 3.59 4.26 2.50 4.32 3.18

1.19 3.27 2.44 2.21 3.41 4.58 3.23 3.33 2.22 4.48 4.38 6.02 2.34 4.64

a

p-values (min2 ) were reported in WTCCC (2007).

consistent with Z (θ ) for a given θ , with which a small p-value for a small BF indicates that the association is likely to be false positive. Once the priors π0 (α) and π1 (η|θ ) are specified, BF(θ ) can be obtained by Laplace approximation (Kass and Raftery, 1995; WTCCC, 2007). See the Appendix for some computational detail. There are different ways to specify the priors, especially π1 (η|θ ). Readers can refer to the WTCCC (2007), Wakefield (2007, 2009), Stephens and Balding (2009), Sawcer (2010) and Fridley et al. (2010) for more discussions. Because our goal is not to discuss how to choose the priors, we do not repeat those discussions here. For comparing results presented later, we consider π0 (α) = N (0, 1), π1 (η|θ ) = π1 (α|θ )π1 (β|θ ), where π1 (α|θ ) = π0 (α) and π1 (β|θ ) is either normal or a mixture of normals. In fact, these specifications are quite standard (e.g. WTCCC, 2007; Stephens and Balding, 2009). For the normal prior, we use π1 (β|θ ) = N (0, W ), where the variance W is pre-specified, which may also depend on the genetic model θ , the effect size, or the allele frequency (WTCCC, 2007; Wakefield, 2007, 2009). But here we fix W = 0.06. The larger the variance W the larger the BF. For the mixture of normals, we use 0.9N (0, 0.04) + 0.05N (0, 0.16) + 0.05N (0, 0.64) (Stephens and Balding, 2009), which also places a small weight on larger variances. Note that, when the mixture of normals is used as the prior for the genetic effect, the BF can be obtained as a mixture of three BFs with the same weights as in the prior, each BF being calculated with one normal prior. Denote the prior odds of association as PO0 = P (H1 |θ )/P (H0 ) and the posterior odds of association as PO1 (θ ) = P (H1 |θ , X )/P (H0 |X ). Then PO1 (θ ) = BF(θ ) × PO0 (θ ),

(1)

for a given θ . Thus, given the same PO0 (θ ) = PO0 , PO1 (θ ) can be different when θ varies. Hence it is sensitive to the underlying genetic model. 3. Motivating examples 3.1. Bipolar disorder (BD) The WTCCC (2007) conducted a GWA study for seven common diseases using about 2000 cases and 3000 shared controls. One disease was BD. Both the trend test Z (1/2) and Pearson’s chi-squared test were applied to all the SNPs after quality control. The smaller of the two p-values was reported as the p-value (denoted as min2 here) to measure the significance of association. In the WTCCC (2007), a strong association corresponded to min2 < 5 × 10−7 ; otherwise an association was moderate if min2 < 10−5 . Fourteen representative SNPs were identified by the WTCCC (2007), among which only one SNP had strong association (rs420259) and others had moderate associations. These 14 SNPs were listed in Tables 3 and 4 in the WTCCC (2007), which are also listed in Table 1 here. Table 1 reports log10 BF(θ ) for θ = 0, 1/2, 1 using the normal prior N (0, 0.06). If one only presented log10 BF(1/2), the only SNP (rs420259) with strong association based on p-value (the smallest p-value among the 14 SNPs) would have the second smallest log10 BF(1/2). This BF would not support the association of the SNP with the observed smallest p-value. However, when log10 BF(1) was also presented, SNP rs420259 actually had the largest log10 BF(1) among the 14 SNPs. SNP rs6458307 had log10 BF(1/2) ≈ 0. But the value of log10 BF(1) for SNP rs6458307 becomes 3.23. 3.2. Type 2 diabetes (T2D) Sladek et al. (2007) conducted a GWA study for T2D. They applied MAX, the maximum of the three trend tests MAX = max(Z 2 (0), Z 2 (1/2), Z 2 (1)) and reported the minimum p-value of the three trend tests (denoted as min3 here) as the

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

2701

Table 2 p-values of MAX and log10 BF(θ ) (θ = 0, 1/2, and 1) of the 8 SNPs having associations with T2D reported by Sladek et al. (2007). SNP

p-valuea

rs7903146 rs13266634 rs1111875 rs7923837 rs7480010 rs3740878 rs1103790 rs1113132

3.2 × 10 2.1 × 10−5 9.1 × 10−6 3.4 × 10−6 1.5 × 10−5 1.8 × 10−5 1.8 × 10−5 3.7 × 10−5

a

−17

log10 BF(0)

log10 BF(1/2)

log10 BF(1)

5.50 3.67 2.90 2.64 0.61 0.31 0.34 0.36

12.6 2.05 3.53 3.83 2.80 1.32 1.35 1.32

13.3 0.21 2.23 3.09 3.61 2.62 2.62 2.41

p-values (min3 ) were reported in Sladek et al. (2007).

p-value of MAX. About 800 cases and 700 controls were used. Eight SNPs were confirmed to have associations whose p-values are given in Table 2. We also calculated log10 BF(θ ) for θ = 0, 1/2, 1 using the same normal prior as in Section 3.1. The results are reported in Table 2. Like the results in Table 1, any single log10 BF(θ ) for a given θ may not be consistent with the p-values of MAX. If log10 BF(1/2) were employed, it would have strong evidence to separate SNP rs7480010 from SNPs rs3740878 and rs1103790. All three SNPs had similar p-values (or p-values cannot be really used to separate these three SNPs) but SNPs rs3740878 and rs1103790 had only less than half of the log10 BF of SNP rs7480010. This example itself implies that p-values alone cannot be used to compare markers/results within the study because it only measures the significance of association without taking into account the effect size of the marker, while the BF integrates both parts and hence is a better measure to compare the results. But using log10 BF(1/2) to compare the three SNPs would over-estimate their difference because a better and consistent BF for the three SNPs is log10 BF(1) (their dominant trend tests are also the largest). The difference in log10 BF(1) among the three SNPs is substantially reduced. Like the previous example, if the p-value of MAX is used to measure the significance of association in GWA studies, one might need to report all three BFs in the hope that at least one of them would support the association based on the p-value. Both motivating examples suggest that we need BF measures to be consistent with the p-values of robust tests, so that a false positive can be claimed with high probability if a substantially different interpretations of association are given by the p-value and BF approaches. 4. Hybrid BF measures The examples presented in the previous section show there is a need to report multiple BFs to be consistent with p-values of robust tests in GWA studies. We are interested in whether or not a single BF-type measure would be consistent with the p-value of a robust test. A single BF measure per SNP has an advantage in genome-wide ranking, where all SNPs can be ranked by their single BF measures. Unique ranks of SNPs based on multiple BF measures per SNP cannot be obtained. 4.1. Bayesian inference under genetic model uncertainty In Bayesian analysis, the model parameter θ can be averaged out as follows. p (X |η, θ )π1 (η|θ )π1 (θ )dηdθ Ω1 ×Θ 1  BF = = p0 (X |α)π0 (α)dα Ω



0

∑ j

π1 (θj ) 

Ω0



Ω1

p1 (X |η, θj )π1 (η)dη

p0 (X |α)π0 (α)dα

=



π1 (θj )BF(θj ),

(2)

j

where π1 (θ ) is the prior probability of model θ . The above Bayesian model averaging (BMA) was considered by Stephens and Balding (2009), denoted here as BMA0 . Note that (2) is a weighted BF by the prior probabilities of models and a BF itself. ∑ ∑ 1 0 If PO0 in (1) is independent of θ , we have PO1 = j π1 (θj )PO (θj ) = j π1 (θj )BF(θj )PO , a posterior odds of association weighted by π1 (θ ). 4.2. Bayesian model averaging (BMA1 ) The Bayesian model averaging is a standard approach to handle model uncertainty in Bayesian inference (Madigan and Raftery, 1994; Kass and Raftery, 1995; Hoeting et al., 1999). It is often used to make Bayesian prediction to take into account of model uncertainty. We adopt it here for BF. Denote the posterior probabilities of models (PPM) as π(θj |X , H1 ), j = 0, 1, 2. Then, P (θj , X |H1 )

π (θj |X , H1 ) = ∑ θj

P (θj , X |H1 )

π1 (θj )P (X |θj , H1 )/P (X |H0 ) π1 (θj )BF(θj ) = ∑ = ∑ , π1 (θj )P (X |θj , H1 )/P (X |H0 ) π1 (θj )BF(θj ) θj

θj

2702

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

where π1 (θ ) is the prior of θ given in (2). The BMA that we propose is a weighted average of BFs of the three common genetic models with weights equal to the PPMs, given by

∑ BMA1 =



π1 (θj )BF2 (θj )

j

π (θj |X , H1 )BF(θj ) = ∑

j

π1 (θj )BF(θj )

.

(3)

j

We choose π1 (0) = π1 (1) = 0.1 and π1 (1/2) = 0.8 as in BMA0 considered by Stephens and Balding (2009) because most associated SNPs reported in the literature show the additive effect. This choice is most relevant to genetic association studies. Otherwise, non-informative priors, e.g. π1 (0) = π1 (1/2) = π1 (1) = 1/3 can be considered. BMA0 of Stephens and Balding (2009) and BMA1 are different. First, they use π1 (θ ) and π1 (θ|X ) as weights respectively. Second, BMA0 is still a BF in which the genetic model is treated as a parameter under H1 and averaged out with respect to π1 (θ). But BMA1 is not a typical BF. The likelihood of the data X in the numerator of BF(θ ) is replaced by a∑ weighted likelihood ∑ ∑ ∑ 2 2 j p1 (X |η, θj , H1 )π1 (θj |X ). Thus, we call it a hybrid BF. Finally, using { j π1 (θj )BF(θj )} ≤ { j π1 (θj )}{ j π1 (θj )BF (θj )} =



j

π1 (θj )BF2 (θj ), one can show that BMA1 ≥ BMA0 , where the equality holds only when BF(θj ) is independent of θj , which

is asymptotically true under H0 or when the effect size goes to infinity under H1 . Since we calculate BMA0 and BMA1 for the markers with the smallest p-values, it is expected BMA1 > BMA0 . Both BMA0 and BMA1 are functions of BFs, so they both incorporate the significance of association (p-value) and the power to detect the association (or the effect size). 4.3. Maximum BF (maxBF) The second hybrid BF is the maximum of BF(θj ), j = 0, 1, 2, denoted as maxBF, which is given by max P (X |θj , H1 ) maxBF =

j

P (X |H0 )



Ω1

=

max p1 (X |η, θj , H1 )π1 (η)dη j



Ω0

p0 (X |α, H0 )π0 (α)dα



Ω =  1 Ω0

p˜ 1 (X |η, H1 )π1 (η)dη p0 (X |α, H0 )π0 (α)dα

,

where p˜ 1 is the profile likelihood of p1 with respect to θ for any η = (α, β)T . Note that the genetic model θ is independent of η. Thus, in maxBF, the parameter θ is not averaged out but maximized as in the classical inference. maxBF can also be written as maxBF = maxθ BF(θ ). Hence log10 maxBF = maxθ log10 BF(θ ). Since maxBF ≥ BF(θ ) for any θ , we have maxBF ≥ BMA1 . 5. Results From Sections 4.2 and 4.3, BMA1 ≥ BMA0 and maxBF ≥ BMA1 ≥ BMA0 , which hold under both H0 and H1 . Thus, we cannot directly use BMA1 and maxBF to measure the strength of association because they would have higher false positive rates under H0 than BMA0 when they are compared to a threshold level. Numerical results presented later confirm this. However, we apply them to the top SNPs in GWA studies. Hence, we are interested in the performance of BF(1/2), BMA0 , BMA1 and maxBF conditional on the SNPs whose p-values are <10−5 to demonstrate at least moderate associations. Common threshold levels for p-values in GWA studies are well established, e.g. 10−5 and 5 × 10−7 for moderate and strong associations, respectively (WTCCC, 2007). Such levels, however, are not clearly given for BFs in GWA studies. Stephens and Balding (2009) mentioned BF > 104 –106 and Sawcer (2010) suggested BF > 105 . In order to compare the four BF measures in simulations, we need to determine threshold values to use. Examining the results in Tables 1 and 2 using the maximum of log10 BFs over the three genetic models, it shows that the maximum BF for the SNP with strongest association >5. Most of the maximum log10 BFs in Tables 1 and 2 range from 2.5 to 5.0. Thus, we report results based on the two thresholds for log10 BFs: 2.5 for moderate association and 5.0 for strong association. In the following, the unconditional results refer to using the threshold 2.5 (5.0) alone. For example, in Table 3, the unconditional number of false positive using log10 maxBF with the threshold 2.5 (5.0) is the number of replicates under H0 such that log10 maxBF > 2.5 (>5.0). The conditional results refer to using both thresholds of log10 BF and p-value. For example, in Tables 5 and 4A–4C, the conditional number of true positive with threshold 2.5 (5.0) is the number of replicates under H1 such that log10 maxBF > 2.5 (>5.0) and p-value < 10−5 . 5.1. Simulations under H0 Under H0 , we estimated the frequency that each log10 BF was greater than a threshold t = 2.5 or 5 among 1 million replicates. For comparison, the frequencies that the p-value of the WTCCC (2007) (min2 ) and the p-value of Sladek et al. (2007) (min3 ) were less than 5 × 10−7 (strong) or 10−5 (at least moderate) were also estimated. The minor allele frequency (MAF) was 0.15, 0.30 or 0.45. Hardy–Weinberg equilibrium (HWE) was assumed in the population. Sample sizes were r = s = 2000 (n = 4000). We used the priors given in Section 2.1. Under H0 , genotype relative risks (GRRs) λi = P (case|Gi )/P (case|G0 ), i = 1, 2 were both 1, where fi = P (case|Gi ), i = 1, 2, 3, are penetrances. Given MAF p, the genotype counts of cases and controls were generated from multinomial distributions Mul(r ; g0 , g1 , g2 ) and Mul(s; g0 , g1 , g2 ), respectively, where g0 = (1 − p)2 , g1 = 2p(1 − p) and g2 = p2 .

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

2703

Table 3 Observed unconditional (conditional) numbers of false positives given a threshold t among 1 million simulations. min2 and min3 are the p-values of robust tests of WTCCC (2007) and Sladek et al. (2007), respectively and m for p-value < 10−5 and s for p-value < 5 × 10−7 . The conditional numbers in the parentheses are those conditional on min3 < 10−5 (the first number) and min2 < 10−5 (the second number). The mixture of normals is 0.9N (0, 0.04) + 0.05N (0, 0.16) + 0.05N (0, 0.64). Prior

t

log10 BF > t

MAF

BF N (0 , W ) W = 0.06

2.5

5.0

Mixture of normals

2.5

5.0

1 2

min3

min2

BMA0

BMA1

maxBF

m

s

m

s

0.15 0.30 0.45 0.15 0.30 0.45

6 (6, 6) 32 (15, 14) 46 (17, 14) 0 (0, 0) 0 (0, 0) 0 (0, 0)

18 (18, 12) 44 (23, 16) 53 (27, 16) 0 (0, 0) 0 (0, 0) 0 (0, 0)

64 (18, 12) 119 (31, 21) 142 (33, 16) 0 (0, 0) 0 (0, 0) 0 (0, 0)

124 (18, 12) 208 (34, 22) 228 (33, 16) 0 (0, 0) 0 (0, 0) 0 (0, 0)

20 36 33 20 22 38

2 0 1 1 0 3

14 22 18 19 10 21

2 0 0 1 0 2

0.15 0.30 0.45 0.15 0.30 0.45

14 (8, 7) 28 (9, 8) 37 (13, 11) 0 (0, 0) 0 (0, 0) 0 (0, 0)

17 (12, 8) 31 (15, 10) 37 (17, 12) 0 (0, 0) 0 (0, 0) 0 (0, 0)

47 (16, 11) 75 (19, 11) 92 (25, 13) 0 (0, 0) 0 (0, 0) 0 (0, 0)

91 (16, 11) 128 (19, 11) 153 (25, 13) 0 (0, 0) 0 (0, 0) 0 (0, 0)

17 19 25 15 21 37

0 0 1 0 1 2

12 14 16 12 15 24

0 0 1 0 1 2

Table 4A Observed unconditional (conditional) numbers of true positives under recessive model given a threshold t among 100,000 simulations. The conditional numbers in the parentheses are those conditional on min2 < 10−5 . Prior

t

MAF

GRR

log10 BF > t BF

N (0 , W ) W = 0.06

2.5

0.15 0.30 0.45

5.0

0.15 0.30 0.45

Mixture of normals

2.5

0.15 0.30 0.45

5.0

0.15 0.30 0.45

1 2

BMA0

BMA1

maxBF

1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5

1 (1) 2 (2) 19 (9) 529 (284) 118 (42) 5623 (3366) 0 (0) 0 (0) 0 (0) 1 (1) 0 (0) 34 (34)

1 (1) 6 (5) 19 (10) 672 (456) 145 (58) 9259 (5291) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 123 (123)

6 (1) 12 (6) 44 (15) 1754 (691) 395 (64) 16938 (5528) 0 (0) 0 (0) 0 (0) 6 (6) 1 (1) 468 (468)

12 (1) 28 (7) 64 (15) 2338 (692) 584 (64) 20619 (5528) 0 (0) 0 (0) 0 (0) 7 (7) 1 (1) 535 (535)

1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5

3 (1) 9 (5) 22 (7) 472 (267) 102 (40) 4818 (3225) 0 (0) 0 (0) 1 (1) 1 (1) 0 (0) 82 (82)

3 (1) 7 (5) 23 (9) 810 (600) 143 (62) 7921 (5167) 0 (0) 0 (0) 1 (1) 5 (5) 0 (0) 237 (237)

7 (1) 18 (8) 61 (10) 1875 (677) 330 (69) 14556 (5467) 0 (0) 0 (0) 1 (1) 50 (50) 4 (4) 661 (661)

11 (1) 28 (8) 75 (10) 2487 (679) 448 (69) 17736 (5471) 0 (0) 0 (0) 1 (1) 51 (51) 4 (4) 711 (711)

The results are reported in Table 3. When the threshold t = 2.5 is used, maxBF and BMA1 have much more unconditional false positives than BMA0 and BF(1/2), and maxBF has more unconditional false positives than BMA1 , regardless of the prior used (the normal or the mixture of normals). The largest unconditional number of false positives using maxBF is 228 out of 1 million replicates. Using threshold t = 5.0, however, no false positive is observed. On the other hand, BF(1/2) and BMA0 always have similar false positive rates. These unconditional results confirm that BMA1 and maxBF cannot be used directly in applications unless some corrections are applied (see discussions of this in the last section). If BFs are calculated for the SNPs having at least moderate associations (min3 < 10−5 or min2 < 10−5 ), we are interested in the conditional false positive rates, which are the false positives given that min3 < 10−5 or min2 < 10−5 . Two conditional numbers of false positives are reported in Table 3 in the parentheses. Although the conditional false positive rates of maxBF and BMA1 are still slightly higher than those of BF1 (1/2) and BMA0 , with at most 22 false positives out of 1 million replicates, they are no more than those using p-values. In other words, using the hybrid BFs conditionally would not result in more false positive findings than the frequentist approach. In particular, when min2 < 10−5 was used for the conditional results, all four measures had similar conditional false positive rates. We also notice that using min2 < 10−5 has smaller false positive rates than using min3 < 10−5 (indicated with label ‘‘m’’ in Table 3). However, they result in similar false positive rates using the threshold level 5 × 10−7 (indicated with label ‘‘s’’ in Table 3). The counts under ‘‘m’’ (or ‘‘s’’) only depend on the MAF.

2704

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

Table 4B Observed unconditional (conditional) of true positives under additive model given a threshold t among 100,000 simulations. The conditional numbers in the parentheses are those conditional on min2 < 10−5 . Prior

t

MAF

log10 BF > t

GRR

BF N (0, W ) W = 0.06

2.5

0.15 0.30 0.45

5.0

0.15 0.30 0.45

Mixture of normals

2.5

0.15 0.30 0.45

5.0

0.15 0.30 0.45

1 2

BMA0

BMA1

maxBF

1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5

6 (6) 535 (534) 87 (41) 4219 (2315) 167 (50) 5883 (2820) 0 (0) 0 (0) 0 (0) 14 (14) 0 (0) 27 (27)

14 (9) 846 (661) 93 (44) 4383 (2367) 168 (54) 5784 (2875) 0 (0) 0 (0) 0 (0) 21 (21) 0 (0) 28 (28)

54 (10) 1763 (709) 171 (45) 5770 (2275) 278 (54) 7099 (2880) 0 (0) 22 (22) 1 (1) 73 (73) 0 (0) 53 (53)

117 (11) 2875 (720) 301 (45) 7363 (2375) 379 (54) 8452 (2880) 0 (0) 26 (26) 1 (1) 90 (90) 0 (0) 65 (65)

1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5

31 (25) 1034 (717) 92 (40) 4099 (2319) 109 (45) 5264 (2860) 0 (0) 15 (15) 0 (0) 63 (63) 0 (0) 95 (95)

33 (26) 1116 (733) 95 (45) 4106 (2362) 111 (50) 5107 (2907) 0 (0) 15 (15) 0 (0) 61 (61) 0 (0) 94 (94)

64 (26) 1661 (741) 157 (46) 6439 (2363) 185 (50) 6195 (2909) 0 (0) 23 (23) 0 (0) 95 (95) 1 (1) 124 (124)

110 (26) 2426 (742) 222 (46) 2533 (2363) 281 (50) 7311 (2909) 0 (0) 30 (30) 0 (0) 107 (107) 1 (1) 129 (129)

Table 4C Observed unconditional (conditional) of true positives under dominant model given a threshold t among 100,000 simulations. The conditional numbers in the parentheses are those conditional on min2 < 10−5 . Prior

t

MAF

GRR

log10 BF > t BF

N (0, W ) W = 0.06

2.5

0.15 0.30 0.45

5.0

0.15 0.30 0.45

Mixture of normals

2.5

0.15 0.30 0.45

5.0

0.15 0.30 0.45

1 2

BMA0

BMA1

maxBF

1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5

71 (71) 8953 (8950) 324 (152) 15840 (11909) 191 (67) 6159 (3987) 0 (0) 31 (31) 0 (0) 197 (197) 0 (0) 42 (42)

178 (121) 14999 (11708) 456 (173) 22874 (14272) 259 (89) 10371 (6056) 0 (0) 409 (409) 0 (0) 961 (961) 0 (0) 179 (179)

531 (129) 24946 (12015) 1040 (183) 34084 (14495) 626 (100) 18829 (6287) 2 (2) 1517 (1517) 5 (5) 2809 (2809) 1 (1) 670 (670)

884 (130) 31267 (12045) 1548 (183) 39798 (14497) 885 (100) 22918 (6288) 3 (3) 1661 (1661) 5 (5) 3097 (3097) 2 (2) 745 (745)

1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5 1.2 1.5

185 (132) 13047 (10910) 316 (149) 15529 (11816) 173 (70) 5681 (3964) 1 (1) 667 (667) 2 (2) 756 (756) 0 (0) 132 (132)

219 (144) 15534 (11660) 407 (180) 20926 (14136) 204 (89) 9207 (6114) 1 (1) 815 (815) 3 (3) 1245 (1245) 0 (0) 287 (287)

403 (153) 22118 (11951) 826 (196) 30680 (14451) 479 (94) 16455 (6426) 3 (3) 1351 (1351) 6 (6) 2576 (2576) 1 (1) 791 (791)

665 (155) 27666 (11962) 1171 (196) 36663 (14453) 679 (94) 20088 (6430) 6 (6) 1637 (1637) 6 (6) 2836 (2836) 1 (1) 857 (857)

Therefore, the varying observations under ‘‘m’’ (or ‘‘s’’) reflect randomness in the simulations. For example, the counts under ‘‘m’’ for MAF = 0.3 are 36, 22, 19, 21 using min2 and 22, 10, 14, 15 using min3 . 5.2. Simulations under H1 Under H1 , given a MAF, the genotype frequencies gi were calculated as in Section 5.1. Given a genetic model (recessive, additive or dominant) and λ2 = 1.2 or 1.5, λ1 was obtained. Then, penetrances were calculated as f0 = k/(g0 + g1 λ1 +

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

2705

Table 5 Ranks of 10 associated SNPs among 500K SNPs. F is the Wright’s inbreeding coefficient in the controls. The counts in the parentheses are the numbers of 500K SNPs with min2 < 10−5 and min3 < 10−5 , respectively. Counts

(10,16)

n

4000

Model

REC ADD

DOM

(10,16)

6000

REC ADD

DOM

F

MAF

GRR

Ranks by min2

min3

BF( 12 )

BMA0

BMA1

maxBF

0.01 0.05 0.03 0.01 0.00 0.05 0.00 0.02 0.02 0.02

0.31 0.31 0.19 0.46 0.49 0.16 0.42 0.35 0.46 0.14

1.56 1.30 1.18 1.10 1.29 1.25 1.25 1.14 1.28 1.22

1 7 3511 228K 2 75K 25K 52K 344 1787

1 4 5110 284K 3 100K 35K 71K 290 2098

882 1262 2169 214K 1 46K 17K 38K 184 1163

1 8 2520 254K 2 51K 19K 42K 183 1297

1 4 3405 270K 3 60K 23K 49K 285 1903

1 4 4686 346K 3 87K 34K 71K 243 2656

0.01 0.00 0.03 0.04 0.03 0.01 0.00 0.01 0.02 0.05

0.39 0.39 0.41 0.24 0.41 0.21 0.49 0.22 0.36 0.31

1.52 1.54 1.14 1.25 1.22 1.24 1.20 1.13 1.26 1.21

1 2 66K 7868 5 1055 3 987 606 2380

1 2 77K 10K 6 523 3 1481 531 874

3 1 50K 4945 4 626 2 586 358 2851

2 1 55K 5243 5 572 3 650 356 1833

2 1 64K 7030 6 757 3 980 536 1205

2 1 96K 8933 6 509 4 1317 479 765

g2 λ2 ) and fi = λi f0 (i = 1, 2). The genotype counts of cases and controls were simulated from Mul(r ; P0 , P1 , P2 ) and Mul(r ; Q0 , Q1 , Q2 ), respectively, where Pi = gi fi /k and Qi = gi (1 − fi )/(1 − k), where disease prevalence was k = 0.1. The sample size was r = s = 2000 (n = 4000). The unconditional and conditional numbers of true positives of the four measures were estimated among 100,000 replicates and are presented in Tables 4A–4C for the recessive, additive and dominant models, respectively. The conditional numbers of true positives are only based on min2 < 10−5 . To compare the four measures, we focus on the conditional true positive rates. First, BMA0 always has greater true positives than BF(1/2), especially under the recessive or dominant models, except for a couple of scenarios. Second, both maxBF and BMA1 have greater true positives than BMA0 . Moreover, maxBF has at least the same numbers of true positives as BMA1 . The conclusions hold regardless of the thresholds t, MAFs and the underlying genetic models. Note that, when t = 5.0, the unconditional and conditional numbers of true positives are identical for each BF measure. 5.3. Genome-wide rankings The simulations reported in Tables 4A–4C are based on a single MAF and a given genetic model. We need to compare the performance of the four measures in ranking SNPs in a GWA study. In particular, we are interested in the ranks of true SNPs by min2 , min3 , BF(1/2), BMA0 , BMA1 , and maxBF, respectively. The design of the simulation is given below. We simulated 500,000 SNPs in a GWA study, where 10 SNPs were associated with a disease, among which the numbers of associated SNPs under the recessive, additive and dominant models were 2, 6 and 2, respectively. The rest 499,990 SNPs were null. The disease prevalence was fixed at 0.15. The MAFs of 500,000 SNPs were simulated from a uniform distribution U (0.1, 0.5). For the null SNPs, we simulated Wright’s inbreeding coefficient F from U (0, 0.05) with which the genotype frequencies in the population were calculated as g0 = (1 − p)2 + Fp(1 − p), g1 = 2p(1 − p)(1 − F ) and g2 = p2 + Fp(1 − p), where p is a MAF. Then the genotype frequencies in cases and controls were computed respectively as in Sections 5.1 and 5.2. For the controls of the 10 associated SNPs, we repeated the simulation procedures of the null SNPs as if the controls represented the general population. For the cases of the 10 associated SNPs, however, we chose and fixed a larger F = 0.1 for the 4 associated SNPs under the recessive and dominant models, but simulated F from U (0, 0.05) for the 6 associated SNPs under the additive model. The purpose of choosing F = 0.1 was to create strong deviation from Hardy–Weinberg equilibrium in the cases for some associated SNPs, but not in the controls. The GRR λ2 was simulated from U (1.2, 1.6) for the recessive model, U (1.1, 1.3) for the additive model, and U (1.2, 1.3) for the dominant model. The GRR λ1 was calculated given λ2 and a given genetic model. For the null SNPs, λ1 = λ2 = 1. The sample sizes were r = s = 2000 (n = 4000) and r = s = 3000 (n = 6000), respectively. In Table 5, we report the ranks of the 10 associated SNPs by each measure as well as the numbers of top SNPs such that min2 < 10−5 or min3 < 10−5 . The latter is reported in the parentheses. Only the normal prior was considered. When n = 4000, all ranks are similar except for the two SNPs under the recessive model, which cannot be detected using BF(1/2) but they are detected using p-values or other BF measures, including the existing BMA0 . All methods have exactly the same numbers of false positive conditional on min2 < 10−5 or min3 < 10−5 except for BF(1/2) which has more false positives. When n = 6000, all measures have similar ranks and the same numbers of false positives. Note that an associated SNP

2706

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

Table 6 Comparing p-values of robust tests and hybrid BFs within and across studies. All studies had the same additive model, the same disease prevalence 0.15, the same MAF 0.3, and the same normal prior with the same prior probabilities of models. They had either same sample sizes and GRRs for within study comparison or different sample sizes and GRRs for between study comparison. n

(r0 , r1 , r2 )

(s0 , s1 , s2 )

GRR

BMA1

maxBF

min2

min3

150 150 150 600 600 600 3000 3000 3000

(16, 28, 6) (15, 25, 10) (17, 25, 8) (117, 136, 47) (121, 143, 36) (128, 129, 43) (449, 445, 106) (454, 429, 117) (461, 444, 95)

(52, 43, 5) (52, 37, 11) (51, 44, 5) (144, 125, 31) (150, 126, 24) (151, 125, 24) (985, 843, 172) (984, 837, 179) (1007, 843, 150)

5.0 5.0 5.0 2.0 2.0 2.0 1.2 1.2 1.2

3.7 4.1 3.7 8.9 8.8 8.7 12.7 14.1 13.0

4.5 5.3 3.7 9.3 9.8 9.1 13.4 14.8 13.8

0.012 0.011 0.011 0.011 0.011 0.011 0.012 0.010 0.011

0.012 0.011 0.011 0.011 0.011 0.011 0.012 0.010 0.011

with larger GRR does not necessarily have smaller rank. The actual rank of an associated SNP depends not only on the MAF, F and genetic models, but also on all the null SNPs. For example, using min2 , min3 or BF(1/2) when n = 6000, the SNP with GRR 1.24 under the additive model has worse rank than the SNP with GRR 1.13 under the same model with similar F and MAF. 5.4. Comparison across studies using p-values and BFs It is known that p-values only summarize the significance of association while BF integrates both the significance of association and the power (sample size and effect size). Thus, to compare results across studies, the BF is a better measure than the p-value. Sawcer (2010) demonstrated this by simulating different studies with different sample sizes and genetic effects, and showed that the different studies had different BFs while the p-values were the same. We conducted similar simulations but used the p-values of robust tests, min2 and min3 , and two hybrid BFs BMA1 and maxBF. We simulated 9 studies, which were divided into 3 groups, each of which had three replicates. Within each group, the three replicated studies were simulated under the identical setting, while across the three groups, only sample sizes and GRRs were different. Details are given in Table 6. All p-values are close to 0.011. Thus, in terms of p-values, there would be no difference of the associations within and across studies. Within each of the three groups, their hybrid BFs are also comparable. However, between the groups, their hybrid BFs are quite different, which indicates that the hybrid BFs have better interpretation of associations across studies than the p-values of robust tests. 5.5. Applications For illustration, we apply log10 BF(1/2), log10 BMA0 , log10 BMA1 and log10 maxBF to the SNPs presented in Tables 1 and 2 sorted by their p-values min2 for BD and min3 for T2D. To compare results across GWA studies, we also included the top two SNPs associated with age-related macular degeneration (AMD) from Klein et al. (2005). The p-values min2 of these two SNPs were also included. The results are presented in Table 7. The highlighted SNP IDs in Table 7 are those that log10 BF(1/2) is not consistent with the p-value compared to other SNPs with similar p-values (see discussions in Section 3), while log10 BMA1 and log10 maxBF are more consistent with the p-values. Inconsistency between BF and p-value can arise from many reasons, which may also provide insight into the strength of associations. However, if it is due to using different genetic models, it would mislead the interpretation of results. Overall, BMA1 and maxBF have similar performance. Fig. 1 also plots − log10 min2 (the first row) or − log10 min3 (the second row) against log10 BF(1/2) (the left column) or log10 maxBF (the right column). The plots confirm the results presented in Table 7 that − log10 maxBF is more consistent with − log10 min2 or − log10 min3 than − log10 BF(1/2) in the SNPs associated with BD and T2D as seen by tighter clustering around a 45 degree line through the origin. Although we conducted simulations for genome-wide rankings (Section 5.3 and Table 5), in the next application, we analyzed all SNPs of BD in genome-wide ranking. The total number of SNPs in this application, after quality control, was 391,573. All SNPs were ranked by each method: min2 , min3 , BF(1/2), BMA0 , BMA1 , and maxBF. Then the ranks of those 14 SNPs were obtained for each method. Fig. 2 presents the plot of ranks of min2 versus min3 of the 14 SNPs. The results show that both rankings are quite comparable except for one SNP. For ranking using BFs, we first report the results based on the normal prior N (0, W ) with W = 0.06. Figs. 3 and 4 plot ranks of min3 and min2 versus one of the four BF measures of the 14 SNPs, respectively. It shows that one SNP had an extremely large ranking (>200,000) using BF(1/2) while the ranks of this SNP by other methods are significantly smaller. The results also show that the ranks of BMA1 and maxBF are more consistent with the ranks of min3 (Fig. 3) or min2 (Fig. 4) than those of BF(1/2) and BMA0 . The rank for that extreme SNP using BMA0 is still nearly 400. Fig. 5 demonstrates that the ranking of SNPs with association using a normal prior is very consistent with that using a mixture of normals prior. Finally, we discuss the results of AMD and compare them with the results of BD. Table 7 shows two SNPs (rs380390 and rs1329428) had very strong associations with AMD in terms of p-values although the sample sizes were small (about 50 cases

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

2707

Table 7 Two existing BFs (log10 BF(1/2) and log10 BMA0 ) and two hybrid BFs (log10 BMA1 and log10 maxBF) for the top SNPs from GWA studies of BD reported by WTCCC (2007), T2D reported by Sladek et al. (2007), and AMD reported by Klein et al. (2005). The normal and the mixture of normals priors were used. The results reported in the parentheses are based on the mixture of normals. Disease

log10 BF(1/2)

log10 BMA0

log10 BMA1

log10 maxBF

rs420259 rs1344484 rs11622475 rs683395 rs1375144 rs7570682 rs10134944 rs6458307 rs4276227 rs2953145 rs3761218 rs2609653 rs10982256 rs4027132

−8

6.29 × 10 1.64 × 10−6 2.10 × 10−6 2.30 × 10−6 2.43 × 10−6 3.11 × 10−6 3.21 × 10−6 4.35 × 10−6 4.57 × 10−6 6.57 × 10−6 6.71 × 10−6 6.86 × 10−6 8.80 × 10−6 9.68 × 10−6

2.50 (2.42) 4.32 (4.20) 4.26 (4.12) 3.64 (3.53) 4.15 (4.02) 3.99 (3.85) 3.59 (3.46) 0.12 (0.16) 4.01 (3.89) 3.60 (3.46) 3.18 (3.09) 3.02 (2.98) 3.78 (3.68) 3.63 (3.53)

5.02 (4.96) 4.23 (4.11) 4.23 (4.11) 3.86 (3.75) 4.06 (3.93) 3.90 (3.76) 3.79 (3.67) 2.23 (2.12) 3.93 (3.81) 3.52 (3.38) 3.74 (3.68) 3.02 (2.97) 3.69 (3.59) 3.73 (3.63)

6.01 (5.96) 4.32 (4.19) 4.29 (4.17) 4.34 (4.23) 4.15 (4.02) 3.98 (3.84) 4.23 (4.13) 3.22 (3.11) 4.00 (3.88) 3.58 (3.44) 4.53 (4.48) 3.10 (3.04) 3.78 (3.67) 3.98 (3.88)

6.02 (5.96) 4.32 (4.20) 4.40 (4.35) 4.58 (4.47) 4.15 (4.02) 3.99 (3.85) 4.48 (4.38) 3.23 (3.11) 4.01 (3.89) 3.60 (3.46) 4.64 (4.58) 3.33 (3.22) 3.78 (3.68) 4.28 (4.18)

rs7903146 rs7923837 rs1111875 rs7480010 rs3740878 rs1103790 rs13266634 rs1113132 rs380390 rs1329428

3.2 × 10−17 3.4 × 10−6 9.1 × 10−6 1.5 × 10−5 1.8 × 10−5 1.8 × 10−5 2.1 × 10−5 3.7 × 10−5 3.1 × 10−7 8.7 × 10−7

12.6 (15.5) 3.83 (3.83) 3.53 (3.47) 2.80 (2.73) 1.32 (1.23) 1.35 (1.27) 2.05 (1.93) 1.32 (1.23) 1.50 (2.79) 1.08 (2.08)

12.7 (15.4) 3.75 (3.74) 3.44 (3.38) 2.96 (2.86) 1.77 (1.89) 1.78 (1.90) 2.74 (2.60) 1.64 (1.68) 1.51 (2.73) 1.25 (2.24)

12.9 (15.5) 3.83 (3.82) 3.51 (3.46) 3.33 (3.17) 2.49 (2.72) 2.48 (2.72) 3.59 (3.43) 2.22 (2.40) 1.55 (2.78) 1.63 (2.62)

13.3 (15.5) 3.84 (3.83) 3.66 (3.47) 3.61 (3.46) 2.62 (2.81) 2.62 (2.81) 3.66 (3.52) 2.42 (2.54) 1.75 (2.79) 1.91 (2.89)

SNP

p-value

BD

T2D

AMD

Fig. 1. Plots of − log10 min2 (or − log10 min3 ) and log10 BF(1/2) (the left column) and − log10 min2 (or − log10 min3 ) and log10 maxBF (the right column) for 14 SNPs associated with BD (the first row) and for 8 SNPs associated with T2D (the second row).

and 96 controls). The association of AMD with the gene covering of these two SNPs is true because it has been confirmed by multiple independent replications with larger sample sizes. One point to notice is that using the prior with mixture of normals almost doubled the hybrid BFs compared to using a single normal prior. This example perhaps demonstrates the benefit of using the mixture of normals prior. However, when comparing the results of AMD with those of BD, the p-values (min2 ) of AMD are smaller than all the p-values (min2 ) of BD except for one SNP. On the other hand, all hybrid BFs of AMD are smaller than those of BD. This provides real evidence to support the simulation results presented in Table 6 that the hybrid BFs have better interpretation than the p-values of robust tests.

2708

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

Fig. 2. Ranks of 14 SNPs associated with BD by min2 and min3 in genome-wide ranking of 319,573 SNPs.

Fig. 3. Ranks of 14 SNPs associated with BD by min3 versus each of the four BF measures in genome-wide ranking of 319,573 SNPs.

6. Discussion In this paper, we proposed two simple hybrid BFs, BMA1 and maxBF, for GWA studies when robust tests are used. They take advantages of both the frequentist and Bayes methods. Although the two hybrid BFs are not BF themselves, in terms of genetic models, they are more consistent with the p-values of the two robust tests used in the WTCCC (2007) and Sladek et al. (2007) than two existing BFs, including an existing Bayesian model averaging BMA0 . When other robust tests are employed (Zheng et al., 2009), it is expected that BMA1 and maxBF can still be used to accompany the p-values. The two hybrid BFs, when used alone, tend to have higher false positive rates than the two existing ones. However, we propose to use them for SNPs showing at least moderate association based on p-values (p-value < 10−5 ). Conditional on this, the false positive rates of BMA1 and maxBF are similar to those of existing BFs BF(1/2) and BMA0 . In other words, since the hybrid BFs are applied conditionally, they are not used to find novel genes in GWA studies that are missed by using p-values. Instead, as shown in our simulations and real applications, they are used to provide better interpretation and comparison of results within and across GWA studies than the two existing BFs. In practice, however, if BMA1 or maxBF is directly used without conditioning on the markers with significant p-values, a parametric bootstrap procedure has to be used to simulate appropriate thresholds for them given the thresholds used by

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

2709

Fig. 4. Ranks of 14 SNPs associated with BD by min2 versus each of the four BF measures in genome-wide ranking of 319,573 SNPs.

Fig. 5. Ranks of 14 SNPs associated with BD by maxBF (normal prior versus mixture of normals prior) in genome-wide ranking of 319,573 SNPs.

BF(1/2) and BMA0 (e.g. see the Appendix). This correction for BMA1 and maxBF is like finding new critical values in the frequentist inference to control the tail probabilities (desired type I error rates). Acknowledgments This work extended a presentation of the maximum of three approximate Bayes factors at the 2009 Joint Statistical Meeting in Washington, DC by Mark Meyer and Gang Zheng. Mark Meyer was a post-Baccalaureate fellow at the Office of Biostatistics Research, National Heart, Lung and Blood Institute in 2008–2009. We would also like to thank two referees for their helpful suggestions and comments. Appendix A. Computing BF using Laplace approximation The computation of BF is often obtained by using Laplace approximation (Kass and Raftery, 1995). The following detail of the application of Laplace approximation to case-control genetic studies can be found in the WTCCC (2007).

2710

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

To compute BF, it is necessary to evaluate



pi (X |ηi , Hi )πi (ηi )dηi for i = 0, 1. Let ηˆ i maximize pi (X |ηi , Hi )πi (ηi ).

Expanding h(ηi ) = log pi (X |ηi , Hi )πi (ηi ) as h(ηi ) ≈ h(ηˆ i ) − 12 (ηi − ηˆ i )T Σi−1 (ηi − ηˆ i ), where Σi = −(∂ 2 h(ηi )/∂ηi ∂ηiT )−1 evaluated at ηi = ηˆ i . Then pi (X |ηi , Hi )πi (ηi ) ≈ pi (X |ηˆ i , Hi )πi (ηˆ i ) exp{−(ηi − ηˆ i )T Σi−1 (ηi − ηˆ i )}. Integrating both sides using multivariate normal,



pi (X |ηi , Hi )πi (ηi )dηi ≈ pi (X |ηˆ i , Hi )πi (ηˆ i )(2π )di /2 |Σi |1/2 ,

where di is the dimension of ηi under Hi . Using the likelihood p1 under H1 given in Section 2, for a given θ and priors π1 (α) ∼ N (0, 1) and π1 (β) ∼ N (0, W ), ˆ T satisfies ηˆ 1 = (α, ˆ β) r − αˆ −

2 − nj exp{αˆ + βˆ c (Gj )} = 0, ˆ + βˆ c (Gj )} j=0 1 + exp{α

r1 θ + r2 −

βˆ W



2 − nj c (Gj ) exp{αˆ + βˆ c (Gj )} j=0

1 + exp{αˆ + βˆ c (Gj )}

= 0.

And |Σ1−1 | is given by

−  1 + nj ∆j (ηˆ 1 )  − j −1 |Σ1 | =   nj c (Gj )∆j (ηˆ 1 )  j

− j

nj c (Gj )∆j (ηˆ 1 )

1+



− j

 , 

nj ∆j (ηˆ 1 ) 

where ∆j (η1 ) = exp{α + β c (Gj )}/[1 + exp{α + β c (Gj )}]2 . Using the likelihood p0 under H0 , ηˆ 0 = αˆ 0 satisfies αˆ 0 + n exp(αˆ 0 )/{1 + exp(αˆ 0 )} = r and Σ0−1 = 1 + n exp(αˆ 0 )/{1 + exp(αˆ 0 )}2 . Appendix B. Correcting BMA1 and maxBF without conditional on min2 or min3 We describe a simulation-based approach to correct for Bayesian type I error rates for BMA1 and maxBF, where Bayesian type I error for a BF is defined as P (BF > c |H0 ) for a given threshold level c. The following parametric bootstrap method can be used. Let (r0 , r1 , r2 ) and (s0 , s1 , s2 ) be the observed genotype counts for a SNP among r cases and s controls, respectively. Let nj and n be the counts of genotype Gj and the total sample size. Denote Pˆ j = nj /n for j = 0, 1, 2. Randomly draw

(r0 , r1 , r2 ) ∼ Mul(r ; Pˆ0 , Pˆ1 , Pˆ2 ) and (s0 , s1 , s2 ) ∼ Mul(s; Pˆ0 , Pˆ1 , Pˆ2 )M times. These replicate samples are drawn under H0 . For the mth replicate (m = 1, . . . , M), compute BF(θ ) for a given model θ = 0, 1/2, 1 (denoted as BFm (θ )), BMA1 (denoted

as BMA1m ) and maxBF (denoted as maxBFm ). These M replicates provide an estimation of the empirical null distribution of BF(θ), BMA1 and maxBF, respectively. Note that the BMA1 depends on the priors of models while maxBF does not. Suppose the threshold c is used for BF for a given model θ (by default we choose θ = 1/2). Then pc = P (BF > c |H0 ) is the Bayesian type I error, which can be simulated given c by the parametric bootstrap method. When data are generated under H0 , if maxBF and BF are not much different, it is expected to observe similar pc for maxBF. The empirical Bayesian ∑M type I error using maxBF is given by p ∗c = m=1 I (maxBFm > c )/M, where I () is an indicator function. If p ∗c > pc , then we need to find c ∗ > c so that p ∗c ∗ = m=1 I (maxBFm > c ∗)/M ≈ pc . Note that when M is large enough, we can always find c ∗ such that p ∗c ≥ pc ≥ p ∗c +1/M . So p ∗c exists. Thus, maxBF can be compared with BF under H1 (for their Bayesian powers) using thresholds c ∗ and c, respectively, because they have comparable probabilities in favor H1 when H0 is true. This approach is an analogue of the frequentist approach of comparing empirical power given the same significance level. It can also be applied to BMA1 as well where maxBFm is replaced by BMA1m . Note that the above correction depends on the marginal genotype distributions of SNPs. Thus, it has to be carried out for each SNP, just like the standard permutation procedure. Hence, this correction would be computationally demanding for GWA studies, under which the conditional approach based on min2 < 10−5 that we discussed is more appropriate and efficient. Results based on the above simulation procedure are not presented.

∑M

References Freidlin, B., Zheng, G., Li, Z., Gastwirth, J.L., 2002. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum. Hered. 53, 146–152. Fridley, B.L., Serie, D., Jenkins, G., White, K., Bamlet, W., Potter, J.D., Goode, E.L., 2010. Bayesian mixture models for the incorporation of prior knowledge to inform genetic association studies. Genet. Epidemiol. 34, 418–426.

G. Zheng et al. / Computational Statistics and Data Analysis 55 (2011) 2698–2711

2711

Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T., 1999. Bayesian model averaging: a tutorial. Stat. Sci. 14, 382–417. Kass, R., Raftery, A., 1995. Bayes factors. J. Am. Stat. Assoc. 90, 773–795. Klein, R.J., Zeiss, C., Chew, E.Y., et al., 2005. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389. Madigan, D., Raftery, E., 1994. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Am. Stat. Assoc. 89, 1535–1546. Sasieni, P.D., 1997. From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261. Sawcer, S., 2010. Bayes factors in complex genetics. Eur. J. Hum. Genet. 18, 746–750. Sladek, R., Rocheleau, G., Rung, J., et al., 2007. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885. Stephens, M., Balding, D.J., 2009. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690. The Wellcome Trust Case Control Consortium (WTCCC), 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447, 661–678. Wakefield, J., 2007. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227. Wakefield, J., 2009. Bayes factors for genome-wide association studies: comparison with p-values. Genet. Epidemiol. 33, 79–86. Yasuno, K., Bilguvar, K., Bijlenga, P., et al., 2010. Genome-wide association study of intracranial aneurysm identifies three new risk loci. Nat. Genet. 42, 420–425. Zaykin, D.V., Zhivotovsky, L.A., 2005. Ranks of genuine associations in whole-genome scans. Genetics 171, 813–823. Zheng, G., Joo, J., Zaykin, D.V., Wu, C.O., Geller, N.L., 2009. Robust tests in genome-wide scans under incomplete linkage disequilibrium. Stat. Sci. 24, 503–516.