A powerful truncated tail strength method for testing multiple null hypotheses in one dataset

A powerful truncated tail strength method for testing multiple null hypotheses in one dataset

Journal of Theoretical Biology 277 (2011) 67–73 Contents lists available at ScienceDirect Journal of Theoretical Biology journal homepage: www.elsev...

664KB Sizes 0 Downloads 58 Views

Journal of Theoretical Biology 277 (2011) 67–73

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

A powerful truncated tail strength method for testing multiple null hypotheses in one dataset Bo Jiang a, Xiao Zhang a, Yijun Zuo b, Guolian Kang c,n a

Department of Biostatistics, University of Alabama at Birmingham, AL 35294, USA Department of Probability and Statistics, Michigan State University, East Lansing, MI 48824, USA c Department of Biostatistics and Epidemiology, School of Medicine, University of Pennsylvania, Pennsylvania, PA 19104, USA b

a r t i c l e i n f o

abstract

Article history: Received 13 June 2010 Received in revised form 14 January 2011 Accepted 19 January 2011 Available online 3 February 2011

In microarray analysis, medical imaging analysis and functional magnetic resonance imaging, we often need to test an overall null hypothesis involving a large number of single hypotheses (usually larger than 1000) in one dataset. A tail strength statistic (Taylor and Tibshirani, 2006) and Fisher’s probability method are useful and can be applied to measure an overall significance for a large set of independent single hypothesis tests with the overall null hypothesis assuming that all single hypotheses are true. In this paper we propose a new method that improves the tail strength statistic by considering only the values whose corresponding p-values are less than some pre-specified cutoff. We call it truncated tail strength statistic. We illustrate our method using a simulation study and two genome-wide datasets by chromosome. Our method not only controls type one error rate quite well, but also has significantly higher power than the tail strength method and Fisher’s method in most cases. Published by Elsevier Ltd.

Keywords: Tail strength Truncated tail strength Monte Carlo

1. Introduction In genome-wide association studies, microarray analysis and other biomedical studies, assessing an overall significance with a large number of single null hypotheses (m) in a dataset often exists. The overall null hypothesis is that all m single null hypotheses are true. Fisher’s probability method (Fisher, 1932) is used to test this overall null hypothesis by combining these m p-values from the overall null hypothesis. In this method, a logarithm function of the product of all p-values is used to conduct the test statistic that has a chi-square distribution with a degree freedom of 2m if the m p-values are independent. However, when the number of single hypothesis tests in one dataset is large, Fisher’s method loses its power because of the large number of degree of freedom of the Fisher’s test statistic (Zaykin et al., 2002). Zaykin et al. (2002) generalized Fisher’s method and proposed a truncated product method by removing p-values larger than a cutoff from consideration and found that this truncated product method can have a better performance than Fisher’s method in most situations. An alternative method for testing the overall null hypothesis in one dataset is the tail strength (TS) method. This TS test statistic is a function of ordered p-values (Taylor and Tibshirani, 2006), which has an asymptotic normal distribution with mean of 0 and

n

Correspondence author. Tel.: + 1 215 746 3519; fax: + 1 215 573 1050. E-mail address: [email protected] (G. Kang).

0022-5193/$ - see front matter Published by Elsevier Ltd. doi:10.1016/j.jtbi.2011.01.029

variance of 1/m under the overall null hypothesis if the m p-values are independent. It also relates to the false discovery rate of the collection of tests (Benjamni and Hochberg, 1995) and the area of receiver operating characteristic (Hanley and Mcneil, 1982). Similar to Zaykin’s method, here we intend to improve TS method by removing p-values bigger than a cutoff from consideration using an indicator function and define a new truncated tail strength (TTS) statistic for testing the overall null hypothesis in one dataset. Since the asymptotic distribution of the truncated tail strength statistic is unknown, we propose a Monte Carlo method to evaluate the empirical p-value of the TTS statistic. This TTS statistic appears to have good properties, especially when there are a large number of independent tests in one dataset. In this paper, we review the concepts on tail strength measure and propose our method in Section 2, and conduct a simulation study in Section 3, then illustrate our method using two real data in Section 4. Finally, we give a brief discussion in Section 5.

2. Method Suppose there are m (mZ1000) single null hypotheses (H1, H2, y, Hm) in one dataset, which denote their corresponding p-values as (p1, p2, y, pm). The overall null hypothesis for this dataset is that Hi holds for all i, 1 rirm and the alternative hypothesis is that there exists at least one of single hypotheses satisfying that Hi (1rirm) does not hold. We assume that under

68

B. Jiang et al. / Journal of Theoretical Biology 277 (2011) 67–73

this hypothesis, the p-values are independent and identical distribution U[0, 1] random variables. This assumption for the p-values has also been used by Taylor and Tibshirani (2006). Before we introduce our truncated tail strength statistic, let us first review the tail strength statistic. 2.1. Tail strength statistic Assume that p(1), p(2), y, p(m) are the ordered p-values with p(1) rp(2) r? rp(m). Taylor and Tibshirani (2006) defined the tail strength (TS) measure as  m  1X mþ1 , ð1Þ 1pðiÞ TSðp1 ,p2 , . . ., pm Þ ¼ mi¼1 i which has an asymptotic normal distribution with a mean of 0 and variance of 1/m under the overall null hypothesis as m goes pffiffiffiffiffi to infinity. The overall null hypothesis is rejected if m TS(p1, 1 p2, y, pm)4 ^ (1  a) for one-side test, where a is a preset nominal significance level and ^ is the upper tail probability of a standard normal cumulative distribution function. However, when only a small subset of hypotheses are false in one dataset with a fixed number of single hypotheses, the tail strength method will lose power because its mean value is small but the variance is constant (1/m); by removing p-values bigger than some specific cutoff from consideration in Eq. (1), the power can be improved. We will illustrate this point later. 2.2. Truncated tail strength statistic Similar to the truncated product method of Zaykin’s et al. (2002), we suggest to consider p-values that are less than a fixed cutoff t   m 1X m þ1 , ð2Þ TTSðp1 ,p2 , . . ., pm Þ ¼ IðpðiÞ r tÞ 1pðiÞ mi¼1 i where I(.) is the indicator function. We call it the truncated tail strength (TTS) measure of an overall null hypothesis. For this method, we use a Monte Carlo method to estimate the empirical p-values if the p-values are independent (see Section 2.3). Otherwise, a re-sampling method such as permutation will be applied if the p-values are correlated (see Section 2.4). In Eq. (2), if the cutoff point t ¼1, then the truncated tail strength is the tail strength statistic. 2.3. Monte Carlo algorithm for independent p-values If the p-values of multiple single tests are independent, then the following Monte Carlo algorithm is applied to obtain the empirical p-value of truncated tail strength statistic. First we set up the value of the truncation point, t. 1) Calculate TTSo ¼

  m 1X mþ 1 : IðpðiÞ r tÞ 1pðiÞ mi¼1 i

Set N ¼0. a. Generate m independent uniform random numbers, u1 , u2 , . . ., um , on (0, 1). b. Calculate   m 1X mþ1 : TTS ¼ IðuðiÞ r tÞ 1uðiÞ mi¼1 i c. If TTS Z TTSo , N ¼N + 1.

2) Repeat steps (2a)–(2c) M times. 3) The empirical p-value is N/M. Since we consider there are a large number of single individual hypotheses in one dataset, we can choose the cutoff value of t as small as 0.005. For example, if m¼10,000, then the p-values less than the cutoff is approximately 50 under the nulls with t ¼0.005. But if the number of single tests is small such as 100 or 1000, then the cutoff value of 0.05 is preferred because there will be no or about 5 p-values left for consideration in Eq. (2) with the cutoff value of 0.005 (see below). To get very small empirical p-values of the truncated tail strength statistic, the number of sampling (M) needs to be very large. But, this can be achieved by running in a parallel way at high performance computing clusters. If we are not in position to get very small p-values, then we can use a value of M ¼10,000. Based on our simulation below, this value can provide good estimation of the empirical p-value of the truncated tail strength statistic. 2.4. Correlated p-values In practice, correlations among p-values exist. For example, in single nucleotide polymorphism (SNP)-based association studies and gene-based association studies, there is linkage disequilibrium among SNPs, which cause the p-values of some markers to become correlated (Hedrick and Kumar, 2001; Zaykin et al., 2002). Following the idea of Zaykin et al. (2002), if the correlation matrix of p-values is known, then we can transfer the correlated p-values of multiple single tests to the independent ones and then apply our new method to them. In practice, the correlation matrix will not be known in some cases. In this case, we can apply one re-sampling method such as permutation method to obtain the distribution and the empirical p-values of TTS. Furthermore, for some specific data, we can also choose a subset of the independent features and then calculate their corresponding individual p-values. For example, for SNP datasets in genome-wide association studies, we can first obtain a set of independent SNPs based on their linkage disequilibrium coefficient r2 threshold and then apply our method to these tests (Purcell et al., 2007).

3. Simulation results 3.1. Independence p-values From the reported 12 real data analyses (Taylor and Tibshirani, 2006), we know that the minimum number of single tests in one dataset is 2000 (see Appendix A). Thus, we will consider m¼2000 and 10,000 single tests in one dataset in our simulations. 3.1.1. Empirical distribution of the truncated tail strength statistic Before we evaluate the type one error rate, we first evaluate the distribution of the truncated tail strength statistic under the overall null hypothesis by simulations. In each dataset, we randomly simulated m¼2000 and 10,000 p-values independently from U[0, 1]. We simulated 10,000 replicates and considered the cutoff values to be 1, 0.9, 0.8, 0.7, 0.5, 0.25, 0.1, 0.05, 0.01 and 0.005. Fig. 1 plotted the histograms and empirical cumulative distribution functions of truncated tail strength statistic with t ¼1, 0.05 and 0.005 for m ¼2000 and 10,000, respectively. From Fig. 1 we found that for all chosen cutoff values of t, the truncated tail strength statistic seems to follow normal distribution with means close to zero. Fig. 2 plotted the sample mean, sample variance and skewness of truncated tail strength statistic as a function of t for m ¼2000 and 10,000. From Fig. 2, we know that as the cutoff value of t is close to zero and the number of

B. Jiang et al. / Journal of Theoretical Biology 277 (2011) 67–73

69

Fig. 1. Histograms and empirical cumulative distribution functions of the truncated tail strength statistics with cutoff values of 1, 0.05 and 0.005 under the overall null hypothesis with m¼ 2000 and 10,000. X-axis is the value of test statistics and Y-axis is frequency.

tests increases, the sample mean and the sample variance of truncated tail strength will be close to zero. As the cutoff value of t decreases, the skewness increases. But as the number of tests increases, the skewness will also be close to zero. Based on Figs. 1 and 2, similar to Zaykin et al., we will choose t ¼1, 0.5, 0.25, 0.1, 0.05 and 0.01 below for the evaluation of type one error rare and choose t ¼1 and 0.05 below for the evaluation of power. 3.1.2. Type one error rates To evaluate the type one error rate of the truncated tail strength method, we re-simulated m¼2000 and 10,000 p-values from U[0, 1] for 10,000 replicates. For each dataset, we considered

the cutoff values t to be 1, 0.5, 0.25, 0.1, 0.05, 0.01 and 0.005. We set the nominal significance level of 0.01, 0.05 and 0.1. When t ¼1, Eq. (2) is exactly the same as Eq. (1). It means that we consider all the p-values of all single hypotheses in Eq. (2). But here, we use the Monte Carlo algorithm to calculate its empirical p-value of the TTS, not from the asymptotic distribution. Table 1 presented the empirical type one error rates of the truncated tail strength statistic under the overall null hypothesis with m ¼2000 and 10,000. Table 1 showed that all empirical type one error rates of both the tail strength statistic and the truncated tail strength statistic are close to their nominal level. This means the truncated tail strength statistic controls type one error rate quite well.

70

B. Jiang et al. / Journal of Theoretical Biology 277 (2011) 67–73

Fig. 2. Plots of the sample mean, sample variance and skewness of the truncated tail strength statistic as a function of t for m¼ 2000 and 10,000. The solid is for m¼ 2000 and the dash line is for m¼10,000. Table 1 The empirical type one error rate for both tail strength statistic and truncated tail strength statistic when all m hypotheses are true with a nominal level of 0.01, 0.05 and 0.1 (10,000 simulations). Test statistic s in TTS m

m1

2000 0.01

0.05

0.1

0.01

0.05

0.1

1.00

0.0108 0.0493 0.103

0.0104 0.0512 0.0982

TTSb

0.01 0.05 0.10 0.25 0.50

0.0093 0.0106 0.0097 0.0103 0.0102

0.0103 0.0094 0.0093 0.0095 0.0098

b

Test statistic

s in TTS l

10,000

TSa

a

Table 2 The empirical power of truncated tail strength statistic with a nominal level of 0.05 when m1 of m¼2000 hypotheses are false (1000 simulations, 10,000 MC samples).

0.0473 0.0509 0.052 0.0527 0.0492

0.0994 0.0991 0.1032 0.1027 0.1035

0.0487 0.0494 0.049 0.0502 0.0504

0.0975 0.0995 0.097 0.0993 0.0988

Tail strength statistic. Truncated tail strength.

3.1.3. Power To compare the power of our truncated tail strength method with those of tail strength method and Fisher’s method (see Appendix B), the p-values under the alternative hypotheses were calculated from random values that are generated from a normal distribution with mean of m and variance of 1, where m is used to indicate the effect size on the outcome (Rubin et al., 2006), and p-values under the null hypotheses were independently generated from the uniform distribution U[0, 1] in a random manner. We simulated m ¼2000 and 100,000 p-values each with 5, 10, 20 and 50 from alternative hypothesis tests in one dataset. We considered the cutoff values of 1 and 0.05 for the TTS method and used a nominal significance level of 0.05 for the evaluation of power. Table 2 presented the empirical power of Fisher’s method, tail strength method and truncated tail strength method with m¼2000. From Table 2, we found that (1) TTS with t ¼0.05 have the highest power and TS method have the lowest power among the three methods in most situations; (2) TTS with t ¼0.05 has significantly higher power than TS in most situations. As the effect on the outcome increases, their power difference increases. The maximum difference of power between truncated tail strength with t ¼0.05 and tail strength statistic is 49% (0.798

5

10

20

50

a b c

1.5

2

2.5

3

3.5

4

Fisher TSa TTSb

1 0.05

0.085 0.084 0.078

0.109 0.106 0.138

0.126 0.113 0.2

0.193 0.136 0.264

0.205 0.123 0.321

0.273 0.153 0.367

Fisher TSa TTSb

1 0.05

0.12 0.112 0.126c

0.189 0.174 0.255

0.287 0.22 0.428

0.402 0.256 0.66

0.533 0.271 0.753

0.681 0.308 0.798

Fisher TSa TTSb

1 0.05

0.236 0.219 0.29

0.417 0.352 0.616

0.641 0.489 0.893

0.858 0.593 0.98

0.97 0.671 0.997

0.992 0.679 1

Fisher TSa TTSb

1 0.05

0.697 0.655 0.791

0.966 0.921 0.998

0.999 0.987 1

1 0.993 1

1 0.997 1

1 1 1

Tail strength statistic. Truncated tail strength. The empirical power of TTS bigger than that of TS is in bold.

versus 0.308) when m ¼2000, m1 ¼10 and m ¼4, and that between TTS with t ¼ 0.05 and Fisher’s method is 26% (0.66 versus 0.402) when m¼2000, m1 ¼10 and m ¼3. Table 3 presented the empirical power of Fisher’s method, tail strength method and truncated tail strength method with m¼10,000. We obtained the same results as those in Table 2. Furthermore, the maximum difference of power between truncated tail strength with t ¼0.05 and tail strength statistic is 55% (0.882 versus 0.338) when m¼10,000, m1 ¼ 20 and m ¼4, and that between TTS with t ¼0.05 and Fisher’s method is 28% (0.609 versus 0.326) when m¼ 2000, m1 ¼20 and m ¼3.

3.2. Correlated p-values In this case we considered the association studies between correlated SNPs and a complex disease. We simulated 1000 correlated SNPs on 1000 cases and 1000 controls. SNPs are correlated by the following compound symmetry variance–covariance matrix

B. Jiang et al. / Journal of Theoretical Biology 277 (2011) 67–73

71

Table 3 The empirical power of truncated tail strength statistic with a nominal level of 0.05 when m1 of m¼ 10,000 hypotheses are false (1000 simulations, 10,000 MC samples). m1

Test statistic

5

10

20

50

a b c

s in TTS l 1.5

2

2.5

3

3.5

4

Fisher TSa TTSb

1 0.05

0.058 0.06 0.05

0.073 0.072 0.089

0.073 0.067 0.103

0.096 0.092 0.137

0.111 0.092 0.156

0.13 0.096 0.201

Fisher TSa TTSb

1 0.05

0.08 0.08 0.088c

0.091 0.088 0.121

0.144 0.125 0.196

0.129 0.112 0.259

0.195 0.141 0.369

0.272 0.182 0.405

Fisher TSa TTSb

1 0.05

0.095 0.096 0.103

0.165 0.156 0.256

0.23 0.199 0.44

0.326 0.245 0.609

0.478 0.321 0.751

0.617 0.338 0.882

Fisher TSa TTSb

1 0.05

0.292 0.281 0.34

0.477 0.442 0.719

0.725 0.626 0.971

0.914 0.719 1

0.982 0.822 1

1 0.832 1

Tail strength statistic. Truncated tail strength. The empirical power of TTS bigger than that of TS is in bold.

Table 4 The empirical type one error rate for both tail strength statistic and truncated tail strength statistic when all m ¼1000 hypotheses are true and p-values are correlated (1000 simulations). Test statistic

TS

a

TTSb

a b

s in TTS

a 0.01

0.05

0.1

1.00

0.006

0.056

0.104

0.01 0.05 0.10 0.25 0.50

0.005 0.007 0.005 0.005 0.006

0.053 0.048 0.051 0.054 0.057

0.099 0.103 0.109 0.108 0.104

Tail strength statistic. Truncated tail strength.

(Han and Pan, 2010) 2 3 1 ... r r 6 r r7 6... 1 7 6 7: 6 r 7 r 1 . . . 4 5 r r ... 1 Specifically, a latent vector was first generated from a multivariate normal distribution with a compound symmetry matrix with r ¼0.4. The latent vector was then used to form a haplotype with the minor allele frequency (MAF) of each SNP ranging from 0.1 to 0.5; two haplotypes were then combined to generate genotype data for one individual. Finally, the disease status of the ith individual was generated by a logistic regression model: logit(P(disease)) ¼ log(4) + log(OR)X, where the odd ratio is equal to 1 for no association and Xi is the genotype of the ith individual at one SNP randomly selected from 1000 SNPs. 1000 replicates were generated for the evaluation of the type one error rate. We used allelic association test to calculate the p-value for each single association test and 1000 permutation re-samplings to get the empirical p-values for our TTS. Table 4 showed the empirical type one error rate of our TTS method with 1000 correlated SNPs at significance levels of 0.01,

Fig. 3. Results of tail strength and truncated tail strength method with a cutoff of 0.05 for rheumatoid arthritis disease (A) and hypertension (B). X-axis is the number of chromosomes and Y-axis is the negative logarithm function of p-values of both tail strength and truncated tail strength method. Solid line represents a threshold of  log(0.05/22) for rejection and 0.05 is an overall nominal significance level; dot line represents tail strength method; dash line represents truncated tail strength method with a cutoff of 0.05.

0.05 and 0.1. It is clear that our truncated tail strength has the empirical type one error rates close to the preset significance levels and can control type one error rate well.

4. Application to two real datasets To demonstrate the performance of our truncated tail strength method, we applied it into two real datasets, a rheumatoid arthritis (RA) dataset and a coronary artery disease (CAD) dataset, from the Wellcome Trust Case-Control Consortium (WTCCC, 2007). Here, we test the overall null hypothesis that whether a chromosome (or all SNPs located at this chromosome) is not associated with the disease.

72

B. Jiang et al. / Journal of Theoretical Biology 277 (2011) 67–73

4.1. RA dataset

5. Discussion

RA is a common inflammatory arthritis and it can cause chronic inflammation of the joints, the tissue around the joints, as well as in other organs in the body (Plenge et al., 2007). For the analyses of this RA dataset, we did data quality control analysis by removing SNPs without SNP IDs, SNPs with the frequency of any genotype in cases and controls below 5, SNPs with bad clustering properties and SNPs with p-value of HWD test among controls less than 5  10  7 (Li et al., 2008). 318,958 SNPs for the RA dataset are left. Then we obtained 273,308 independent SNPs for all 22 chromosomes by doing linkage disequilibrium based SNP pruning using PLINK (Purcell et al., 2007). We calculated p-values for each SNP using allelic association test and then applied our method to 22 chromosomes in RA dataset. We used the sampling of 10,000 in Monte Carlo method to get the empirical p-values for the 22 chromosomes. Fig. 3A plotted the function (  log10(p)) of p-values for tail strength and truncated tail strength with t ¼0.05 versus the number of chromosomes with an overall nominal level of 0.05 with RA dataset. The threshold for rejection is  log10(0.05/22). From Fig. 3A, we know that for the RA dataset, the truncated tail strength method identified 7 chromosomes, five of which are also identified by the tail strength method. Our TTS method identified two more chromosomes significant. Furthermore, the p-values of all 22 chromosomes for the truncated tail strength method are smaller than those for the tail strength method, especially for the significant chromosomes.

In this paper we have extended the tail strength method for combining p-values in one dataset and proposed a new truncated tail strength measure by removing p-values larger than a preset specific cutoff. We have applied a Monte Carlo method to estimate the empirical p-value for the truncated tail strength method if p-values are independent. Simulations show that it can control type one error rate quite well, that it is more powerful than tail strength measure with a cutoff of 0.05 and that it is more power than Fisher’s method when the number of single hypotheses in a dataset is large. In practice, for our truncated tail strength method, we suggest using a cutoff value of 0.05 and the number of sampling of 10,000 in Monte Carlo method for a large number of single hypotheses in a dataset based on our extensive simulation experience. In the tail strength method, p-values of multiple hypothesis tests in one dataset are assumed independent and identically distributed with a uniform distribution. However, as described above, in the truncated tail strength, we use a Monte Carlo method to estimate its empirical p-value. Thus, this method is invalid when p-values in one dataset are dependent. We can apply the re-sampling method such as permutation method to get the distribution of the truncated tail strength method. In addition, TS has some relationships with false discovery rate (FDR) and the area of receiver operating characteristic (ROC) curve. Specifically, tail strength measure is asymptotically approximate to 1-FDR and AUC-1/2 under some conditions (Taylor and Tibshirani, 2006). We will further investigate the relationships between the truncated tail strength measure with FDR and the area of ROC in the further. If there is some prior information available for researchers, then we can assign different weights for different single hypothesis tests with different importance in one dataset (Kang et al., 2009) to improve the tail strength and truncated tail strength methods. We will explore this in the future. In summary, this truncated tail strength method can be applied to any dataset for an overall hypothesis of no association between multiple features and an outcome of interest, such as in microarray analysis, haplotype-based analysis, and gene-based analysis (Cui et al., 2008), pathway-based analysis (Yu et al., 2009) and so on. Furthermore, the R and MATLAB programs (tts.R and tts.m) used in the simulation can be available by request.

4.2. CAD dataset CAD is the most common type of heart disease. It is the leading cause of death all over the world among the elder person. We also did the data quality control analysis and LD-based SNP pruning similar to the above RA dataset and then focused on 273,308 SNPs for this CAD dataset. We calculated p-values for each SNP by using allelic association test and then applied our method to 22 chromosomes in CAD dataset. We also used the sampling of 10,000 in Monte Carlo. Fig. 3B plotted the function (log10(p)) of p-values for tail strength and truncated tail strength with t ¼0.05 versus the number of chromosomes with an overall nominal level of 0.05 with CAD dataset. The threshold for rejection is –log10(0.05/22). From Fig. 3B, we know that the truncated tail strength and tail strength methods identified 8 and 7 chromosomes significant, respectively, where five chromosomes are identified by both methods. For the other two chromosomes identified by TS method, their p-values of TTS method are very close to the threshold of 0.05/22. The p-values of the truncated tail strength method for the significant chromosomes are much smaller than those for the tail strength method.

Acknowledgements The authors thank two reviewers for their valuable suggestions which have improved the quality of the manuscript. This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who

Table A1 Name

Description

No. of single tests

Source

FL Skin cancer Diffuse large cell lymphoma Small round blue cell tumors Colon cancer Leukemia Prostate cancer Brain cancer Lymphoma aud-over aud-sent dtiTS

Microarray, survival Microarray, two classes Microarray, survival Microarray, four classes Microarray, two classes Microarray, two classes Microarray, two classes Microarray, five classes Microarray, two classes FMRI FMRI Diffusion tensor imaging

44,928 12,625 7399 2308 2000 3571 6033 5597 4026 187,782 187,762 20,931

Dave et al. (2004) Rieger et al. (2004) Rosenwald et al. (2002) Khan et al. (2001) Alon et al. (1999) Golub et al. (1999) Singh et al. (2002) Pomeroy et al. (2002) Alizadeh et al. (2000) Taylor and Worsley (2005) Taylor and Worsley (2005) Schwartzmann et al. (2005)

FMRI, functional magnetic resonance imaging.

B. Jiang et al. / Journal of Theoretical Biology 277 (2011) 67–73

contributed to the generation of the data is available from www. wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113.

Appendix A The number of single tests in one dataset used by Taylor and Tibshirani (2006) (see Table A1).

Appendix B Fisher’s method Let p1, p2, y, pm be p-values for the m independent tests with hypotheses (H1, H2, y, Hm, respectively) in one dataset and they are independent and identically distributed uniform variables. Fisher’s method for testing whether the overall null hypothesis that all m single hypotheses are true is given by

w2F ¼ 2

m X

ln pi ,

i¼1

which has a chi-square distribution with 2m degrees of freedom under the null hypothesis (Fisher, 1932; Zaykin et al., 2002). References Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A., 1999. Broad patterms of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceeding of the National Academy of Sciences of the United States of America 96, 6745–6750. Alizadeh, A., Eisen, M., Davis, R.E., Ma, C., Lossos, I., Rosenwal, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., et al., 2000. Identification of molecularly and clinically distinct substypes of diffuse large B cell lymphoma by gene expression profiling. Nature 403, 503–511. Benjamni, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 85, 289–300. Cui, Y., Kang, G., Sun, K., Qian, M., Romero, R., Fu, W., 2008. Gene-centric genomewide association study via entropy. Genetics 179, 637–650. Dave, S.S., Wright, G., Tan, B., Rosenwald, A., Gascoyne, R.D., Chan, W.C., Fisher, R.I., Braziel, R.M., Rimsza, L.M., Grogan, T.M., et al., 2004. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. The New England Journal of Medicine 351, 2159–2169. Fisher, R.A., 1932. Statistical Methods for Research Workers. Oliver and Boyd, London. Golub, T., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al., 1999. Molecular classification

73

of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–536. Han, F., Pan, W., 2010. A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70, 42–54. Hanley, J.A., Mcneil, B.J., 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36. Hedrick, P.W., Kumar, S., 2001. Mutation and linkage disequilibrium in human mtDNA. Eur. J. Hum. Genet. 9 (12), 969–972. Kang, G., Ye, K., Liu, N., Allison, D., Gao, G., 2009. Weighted multiple hypothesis testing procedures. Stat. Appl. Genet. Mol. Biol. 8, 1–21. Khan, J., Wei, J.S., Ringn’er, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C.etal., 2001. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679. Li, Q.Z., Yu, K., Li, Z., Zheng, G., 2008. MAX-rank: a simple and robust genome-wide scan for case-control association studies. Hum. Genet. 123, 617–623. Plenge, R.M., Seielstad, M., Padyukov, L., et al., 2007. TRAF1–C5 as a risk locus for rheumatoid arthritis—a genomewide study. N. Engl. J. Med. 357, 1199–1209. Pomeroy, S., Tamayo, P., Gaasenbeek, M., Sturla, L., Angelo, M., Mclaughin, M., Kim, J., Goumnerova, L., Black, P., Lau, C., et al., 2002. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 5, 436–442. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., Sham, P.C., 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. Rieger, K., Hong, W., Tusher, V., Tang, J., Tibshirani, R., Chu, G., 2004. Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage. Proceedings of the National Academy of Sciences of the United States of America 101, 6634–6640. Rosenwald, A., Wright, G., Chan, W.C., Connors, J.M., Campo, E., Fisher, R.I., Gascoyne, R.D., Muller-Hermelink, H.K., Smeland, E.B., Staudt, L.M., 2002. The use of molecular profiling to predict survival after chemotherapy for diffuse large b-cell lymphoma. The New England Journal of Medicine 346, 1937–1947. Rubin, D., Dudoit, S., van der Laan, M.J., 2006. A method to increase the power of multiple testing procedures through sample splitting. U.C.Statistical Applications in Genetics and Molecular Biology 5: article 19. Schwartzmann, A., Dougherty, R., Taylor, J., 2005. Cross-subject comparison of principal diffusion direction maps. Magnetic Resonance in Medicine 53, 1423–1431. Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D’Amico, A., Richie, J., et al., 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1, 203–209. Taylor, J., Worsley, K., 2005. Analysis of hemodynamic delay in the FIAC data. In: Proceedings of the 11th Annual Meeting of the Organization for Human Brain Mapping, Toronto, June 12–16, 2005. Taylor, J., Tibshirani, R., 2006. A tail strength measure for assessing the overall univariate significance in a dataset. Biostatistics 7 (2), 167–181. The Wellcome Trust Case Control Consortium (WTCCC), 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447, 661–683. Yu, K., Li, Q., Bergen, A.W., Pfeiffer, R.M., Rosenberg, P.S., Caporaso, N., Kraft, P., Chatterjee, N., 2009. Pathway analysis by adaptive combination of P-values. Genet. Epidemiol. 33, 700–709. Zaykin, D.V., Zhivotovsky, L.A., Westfall, P.H., Weir, B.S., 2002. Truncated product method for combining P-values. Genet. Epidemiol. 22, 170–185.