The importance of the microbiome in epidemiologic research

The importance of the microbiome in epidemiologic research

Accepted Manuscript The importance of the microbiome in epidemiologic research Blake M. Hanson, George M. Weinstock PII: S1047-2797(16)30078-3 DOI: ...

715KB Sizes 11 Downloads 121 Views

Accepted Manuscript The importance of the microbiome in epidemiologic research Blake M. Hanson, George M. Weinstock PII:

S1047-2797(16)30078-3

DOI:

10.1016/j.annepidem.2016.03.008

Reference:

AEP 7936

To appear in:

Annals of Epidemiology

Received Date: 11 January 2016 Revised Date:

3 February 2016

Accepted Date: 23 March 2016

Please cite this article as: Hanson BM, Weinstock GM, The importance of the microbiome in epidemiologic research, Annals of Epidemiology (2016), doi: 10.1016/j.annepidem.2016.03.008. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

1

The importance of the microbiome in epidemiologic research

3

Blake M. Hanson* and George M. Weinstock

4

The Jackson Laboratory for Genomic Medicine

5

Farmington, CT 06032

RI PT

2

6 *corresponding author

SC

7 8

M AN U

Abstract

EP

TE D

Purpose: The human microbiome is the community of microorganisms that live on and in the body. Currently, most applications of microbiome analysis derive from the perspective of discovery and characterization. The completion of the NIH Human Microbiome and the European MetaHIT projects will change the focus to studying the role of the microbiome on human health and disease. Methods: Recent developments in technology and bioinformatics have afforded an opportunity to explore more fully the importance of community structure, detection of pathogens, and community interactions. The current state of microbiome research in terms of effect size, power calculations, how stratification on community classes can increase this power, and the importance of study design and power in reproducibility is reviewed. Results: Work is needed to characterize microbiome development, ecological stability, and variation. Development and implementation of variance stabilization techniques should replace rarefaction of data, which reduces study power, in future research. Conclusions: Epidemiologists have most of the the tools necessary to explore the relationship between the microbiome and human health. Further development of tools for large-scale multivariate datasets will be helpful. Applying the methods of epidemiology will be critical in translating research results to preventive interventions and population health.

AC C

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Keywords: Microbiota, Microbial Consortia,Epidemiology,

ACCEPTED MANUSCRIPT

The term “microbiome” was originally coined by Joshua Lederberg and represents

36

“the ecological community of commensal, symbiotic and pathogenic microorganisms

37

that literally share our body space.”(1, 2)This microbial community is composed of

38

roughly 100 trillion microbial cells(3), which is between three and ten times the

39

number of human cells (4, 5). Lederberg put a strong emphasis on viewing the

40

microbiome not as a distinct element, dividing the microbial cells and H. sapiens

41

cells, but as a superorganism composed of all cells present, both microbial and H.

42

sapiens(1). Currently there are an estimated 19,797 protein-encoding genes in the

43

human genome according to GENCODE release 23 (6, 7)and there are 536,112

44

unique microbial genes within an average individual’s gut (8)resulting inroughly 27

45

times as many microbial genes present as human genes in this tissue alone.Of

46

course, microbes are not limited to the gut,but are present on every body surface

47

that comes into contact with the environment (9-13) and fulfill a number of

48

metabolic roles unfulfilled by human cells (14).This superorganismcan be

49

investigated utilizing a modification of the conceptual epidemiologic framework of

50

the epidemiologic triangle where the microbiome is added in the form of a fourth

51

node (15). Traditionally, the epidemiologic triangle has been used to organize the

52

relationship between three potential causal determinants: the host, the agent, and

53

the environment. The microbiome is impacted by and impacts all three of these

54

determinants and must be considered not as a component of one these three nodes,

55

but as its own distinct node (Figure 1).

AC C

EP

TE D

M AN U

SC

RI PT

35

56

2

ACCEPTED MANUSCRIPT

This microbiome has historically been outside the reach of in depth scientific

58

inquiry since culture based methods are not sufficient for sampling communities of

59

hundreds of different taxa present at a range of abundances. With the advent of

60

sampling by Sanger DNA sequencing of the ensemble community (“culture-

61

independent” sampling) the microbiome began to come into focus. However recent

62

technological developments (“next-generation” sequencing) provided much deeper

63

and cost-effective sampling. A typical microbiome data point from a human sample

64

(stool, saliva, vaginal swab, skin scrape, nasal lavage, etc) is a set of from thousands

65

to hundreds of millions of sequences, assigned to a list of hundreds of taxa or

66

thousands of genes to provide abundances for each. Such data points are typically

67

produced from tens of samples, with larger studies up to hundreds of samples.This

68

with the accompanimentof appropriate bioinformatics, has opened up new research

69

avenues for analysis of the microbiome, such as details of community structure,

70

detection of pathogens existing within the microbiome and their associated

71

virulence mechanisms, as well as community interactions, such as commensalism,

72

mutualism and amensalism.

SC

M AN U

TE D

EP

AC C

73

RI PT

57

74

With the completion of the first NIH Human Microbiome Project (HMP) (13) and the

75

EuropeanMetaHIT project(8), the focus has turned from characterization and

76

cataloging of the microbiome to investigating the interaction of the microbiome and

77

human health. Human microbiome biomarkers have potential clinical utility for

78

predicting disease risk, identifying disease onset,predicting treatment response,

79

determining treatment success, guiding preventative measures, as well as 3

ACCEPTED MANUSCRIPT

developing therapeutics based on microbiome manipulation. However to date most

81

applications of microbiome analysis to clinical situations have had low diagnostic

82

accuracy so the results are primarily useful from a discovery perspective but not for

83

general clinical use. This is in part due to a need for statistical tools and

84

experimental designs to move the microbiome from the lab to the clinic. As

85

concluded in a recent Annual Review of Statistics and Its Application article: “unique

86

characteristics of the (microbiome) data produced by the new technologies, as well

87

as the sheer magnitude of these data, make drawing valid biological inferences from

88

microbiome studies difficult(16). Analysis of these big data poses great statistical

89

and computational challenges.”The microbiome data and analysis are all part of the

90

general trend in genomic big data. As summarized by a National Biomarker

91

Development Alliance report: “new ‘omics technologies are generating data with

92

rapidly escalating volume, velocity, and variety, and data analysis usually requires

93

complex deconvolution of low signal-to-noise signatures(17). In addition, the large-

94

scale molecular datasets that these new technologies generate differ fundamentally

95

from traditional biological and clinical datasets. Traditional clinical and

96

epidemiological datasets comprise a small number of variables tracked across a

97

proportionately larger number of samples, while today’s ‘omics’ technologies are

98

measuring variables per sample whose numbers far exceed the typical number of

99

samples. Together, these factors create a need for new data analytics and

100

AC C

EP

TE D

M AN U

SC

RI PT

80

infrastructure that are only now being developed…”

101

4

ACCEPTED MANUSCRIPT

Epidemiologists have the tools to investigate and elucidate the relationship between

103

the microbiome and human health but it will require consideration of the

104

microbiome not just as an external entity, but also as a component of our self.

105

Epidemiologists often analyze large, integrated datasets that contain different types

106

of data and are therefore well equipped to work with this type of data. As

107

microbiome development, ecological stability, and variation become better

108

understood, epidemiologists are well poised to further characterize the microbiome

109

and aid in the translation of research results to interventions and population health.

SC

RI PT

102

M AN U

110

We will present a few key topics in the current state of microbiome research,

112

focusing on the assessments of power calculation, how stratification can increase

113

this power, and the importance of study design selection.

114

TE D

111

Effect Size

116

To developpower calculation tools, one considers what an effect size is in

117

microbiome studies. This can be addressed two ways: how much of a difference

118

between the microbiome does one need to differentiate two groups, and how much

119

of an effect size is biologically relevant. In these analyses, lists of taxa can be

120

compared at different phylogenetic levels (phylum to species) where the number of

121

categories increases the lower taxonomic level, or list of genes can be compared as

122

single genes or grouped into pathways also reducing categories. Additionally, we

123

can consider more aggregate measures such as beta-diversity (18) – an assessment

124

of the similarity between two populations or samples – or measures with finer

AC C

EP

115

5

ACCEPTED MANUSCRIPT

resolution such as changes in specific organisms or operational taxonomic units

126

(OTUs). When considering effect size, identifying the number of individuals for each

127

group, as well as the number of sequences that need to be generated for each

128

sample is essential (19, 20)

RI PT

125

129 Power Calculation

131

In characterizing the microbiome and generating new hypotheses, many studies

132

compare the microbial communities of groups with different exposures or

133

interventions. When designing these studies, special attention must be paid to

134

statistical power – the ability to detect an expected effect and reject the null

135

hypothesis – and recruiting enough participants to achieve this power.

M AN U

SC

130

136

One way to represent the microbial community structure and compare them

138

between individuals and groups is through the use of distance matrices. One such

139

method utilizes UniFrac(21, 22) or Jaccard(23)pairwise distance matrices for

140

PERMANOVA (permutational multivariate analysis of variance) using distance

141

matrices(24). Statistical power of PERMANOVA relies upon the number of exposure

142

or intervention groups, the number of subjects in each group, the size of the effect,

143

and the distances between subjects within each group (25). Additionally, because

144

PERMANOVA utilizes a pseudo-F ratio, the power estimation techniques for

145

parametric ANOVA are not applicable to PERMANOVA (24). Kelly et al. recently

146

published a novel power and sample size calculation tool, implemented in the R

AC C

EP

TE D

137

6

ACCEPTED MANUSCRIPT

147

programming language(26) as the micropowerpackage,that allows researchers to

148

simulate and model different effect sizes within microbiome composition (25).

149 Another representation of a microbial community is to consider it as a whole

151

assemblage of distinct microbes that can be assessed using multivariate tools. In

152

addition to assessments of microbial diversity, multivariate analysis tools allow

153

researchers to investigate the effects of multiple exposures or interventions and the

154

association with individual members of the microbial community while still

155

considering the other members of the microbial community (27). Many non-

156

parametric multivariate tests have the limitation, however, that either the size of the

157

effect cannot be quantified, or that the dispersion of specific taxa within a set of

158

samples is consistent. The Dirichlet-multinomial distribution modelprovides

159

adjustments over a general multinomial distribution that limit Type I error

160

(rejecting the null hypothesis incorrectly) due to overdispersion in specific taxa

161

frequencies (27). La Rosa et al. have published a power, sample size, and parameter

162

estimation tool, also implemented in the R programming language (26), as the HMP

163

package (27). This package requires an estimate of overdispersion, reflecting the

164

variance of each individual taxon and the covariance between taxa, and an estimate

165

of the number of expected taxa. This package was employed by Zhou et al. where

166

they utilized a range of observed values of overdispersion and number of expected

167

taxa(28). The observed overdispersion was highest in the posterior fornix of the

168

vagina and lowest in multiple oral sites. Additionally, a range of estimates of

AC C

EP

TE D

M AN U

SC

RI PT

150

7

ACCEPTED MANUSCRIPT

169

observed taxa were used to correspond to different relative sequencing depths as

170

the greater the sequencing depth, the more taxa the authors observed (28).

171 Relative sequencing depth is an important factor to consider in both of these power

173

calculations. A common technique employed when there are different numbers of

174

sequencing reads between the samples in a dataset is to rarify the data, or to

175

normalize the number of reads across samples through random subsampling(29,

176

30). The level of subsampling is often set to the smallest number of reads present

177

within the cohort of samples where the sample is considered a sequencing success.

178

This subsampling removes useful and informative data, and has been deemed

179

“statistically inadmissible” by McMurdie and Holmes (31).

M AN U

SC

RI PT

172

180

When considering the statistical methods utilized in the power calculations above,

182

the Dirichlet-multinomial distribution model relies on count data(27), the

183

Jaccard(23)and unweightedUniFrac(21) utilize presence/absence data, and

184

weighted UniFrac(22) uses either count or percent data once the data has been

185

rarified. It has been previously observed that UniFrac distance has differing

186

sensitivities depending on the number of sequencing reads (21), and this is

187

particularly apparent when considering rare OTUs (32). This is because more rare

188

taxa are detected with increasing sequencing reads, causing a sequencing depth

189

dependent effect on the diversity metric(33). This depth dependent effect is also

190

present with other alpha- and beta-diversity measures (23, 32), but to a decreased

191

effect as that observed with UniFrac. McMurdie and Holmes suggested this effect is

AC C

EP

TE D

181

8

ACCEPTED MANUSCRIPT

“a failure of the implementation of these methods to properly account for rare

193

species” and demonstrated it was better to utilize all of the data and “model the

194

noise and address extra species using statistical normalization methods based on

195

variance stabilization and robustification/filtering” (31). Implementations of these

196

variance stabilization methodologies have been implemented in R packages such as

197

edgeR(34), metagenomeSeq(35), and DESeq2 (36). There is still work to be done

198

developing and implementing these variance stabilization techniques so they are

199

ubiquitous in microbiome analysis tools, but rarefaction of data should be avoided if

200

at all possible. It is a poor substitute for appropriate statistical techniques and

201

removes useful data from study, decreasing the power of the study.

M AN U

SC

RI PT

192

202 Community Classes

204

Different human body sites are composed of distinct microbial communities (9-13,

205

37-39) with a wide amount of variation between individuals (11, 39). Despite this

206

diversity and variation, in many body sites, this diversity can be clustered into

207

conserved groupings, known as community classes or biotypes(40). Two of the

208

better-characterized body sites and samples with biotypes are the vagina, and the

209

gut.

EP

AC C

210

TE D

203

211

Previous studies have identified five biotypes of the vagina that are defined by the

212

species or genera present: Lactobacillus iners, L. crispatus, L. gasseri, L. jensenii, or a

213

heterogeneous community composed primarily of obligate anaerobes(12, 41).

214

These bacterial communities are diverse and show frequent species turnover over 9

ACCEPTED MANUSCRIPT

time (41). Additionally, a shift from a vaginal biotype dominated by lactobacilli to a

216

biotype dominated by obligate anaerobes has been associated with the development

217

of bacterial vaginosis(42). Vaginal biotypes have also been associated with race

218

(40).

RI PT

215

219

The community of bacteria present within the gut has been studied extensively, with

221

many studies observing gut biotypes, referred to as enterotypes, based upon

222

differing abundances of the genera Bacteroides, Ruminococcus, and Prevotella(43).

223

While cohort size and the method of clustering have been found to impact the

224

detection of these enterotypes(44), theBacteroidesand Prevotellaenterotypes are

225

often observed (45, 46) and these enterotypes appear to be independent of age,

226

gender, and nationality (43). A fourth enterotype has been proposed that has very

227

low Bacteroidesand is driven by a diverse collection of Firmicutes(47), as well as a

228

“enterogradient” made up of a continuum of Prevotellaand Bacteroideswith an

229

assorted composition of Firmicutes. While the number and makeup of enterotypes

230

has been debated in the literature, it is clear the gut microbiome can be grouped into

231

clusters and these clusters are stable over time (46).

M AN U

TE D

EP

AC C

232

SC

220

233

In addition to the vagina and stool, biotypes have been identified in a number of

234

other body sites, including the anterior nares, anticubital fossa, retroauricular

235

crease, hard palate, buccal mucosa, subgingival plaque, supragingival plaque, throat,

236

palatine tonsils, and saliva(40). A number of these biotypes have been associated

237

with gender, age, and race. These community classes can be used to stratify 10

ACCEPTED MANUSCRIPT

238

individuals within a study, allowing for more focused study design, creation of more

239

specific inclusion/exclusion criteria, and enrollment of participants specifically of

240

interest, all of which help focus a study and increase the power.

RI PT

241 Reproducibility and Power

243

In the short history of the study of the microbiome, there have been many

244

compelling theories of how the microbiome modulates health, few more compelling

245

than the theory that there is a causal link between obesity and the gut microbiome.

246

Prompted by the observation that obesity can be provoked in lean mice by fecal

247

transfer from an obese donor (48, 49), many researchers began investigating the

248

link in both humans and mice (38, 50). While the mechanism was not understood,

249

the theory was obese individuals have a lower proportion of Bacteroidesin

250

comparison to the proportion of Firmicutes than that observed in lean individuals.

251

Larger studies aiming to reproduce the findings did not observe this association

252

between obesity and the ratio of Bacteroidesand Firmicutes(13, 43), and a separate

253

study observed a higher proportion of Bacteroidescompared to Firmicutesin obese

254

individuals (51).

M AN U

TE D

EP

AC C

255

SC

242

256

These contradictory findings led Finucane et al. to conduct an assessment of the

257

relationship between the gut microbiome and obesity through the use of the HMP

258

dataset and compare the results to the MetaHIT dataset (52). Finucane et al. found

259

no association between the “Bacteroidetes:Firmicutesratio” and obesity or BMI, and

260

further indicated they “had 96% power to detect a difference in the relative 11

ACCEPTED MANUSCRIPT

abundance of Bacteroidetes and 80% power for Firmicutes” when comparing obese

262

to lean individuals. The researchers also assessed for any possible sampling or

263

measurement error and proposed and discussed the possibility of unmeasured

264

confounders affecting the relationship between obesity and the microbiome (52).

265

This study stands as a good example of a rigorous assessment of a finding from an

266

underpowered study, and as such should be seen as an example that needs to be

267

replicated for many other assessments of the relationship between the microbiome

268

and human health.

SC

RI PT

261

M AN U

269 Conclusion

271

There are many inherent complexities in how the microbiome interacts with the

272

host, environment, and agent. Often, community characteristics (the makeup of the

273

microbial community), the co-occurrence of a number of microbes within a

274

community, or the specific abundance of a microbe or group of microbes are of

275

interest. The challenge is associating those complex community dynamics with

276

human health. Epidemiologists are well poised to contribute to the growing body of

277

microbiome research through the implementation of well-designed studies that are

278

powered to test the hypothesis of interest. There is still work to be done in

279

characterizing microbiome development, ecological stability, and variation, and

280

there is a great need for further development of statistical methodologies for dealing

281

with large-scale epidemiologic data. There is a great deal of promise in the study of

282

the microbiome, however, and the potential utilization of the microbiome for new

283

diagnostic, preventative, and therapeutic technologies.

AC C

EP

TE D

270

12

ACCEPTED MANUSCRIPT

284 285 286 Works Cited

288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324

1. Lederberg J. Infectious history. Science (New York, NY). 2000;288(5464):287-93. 2. Lederberg J. 'Ome Sweet 'Omics-- A Genealogical Treasury of Words. The Scientist. 2001. 3. Ley RE, Peterson DA, Gordon JI. Ecological and Evolutionary Forces Shaping Microbial Diversity in the Human Intestine. Cell. 2006;124(4):837-48. 4. Savage DC. Microbial ecology of the gastrointestinal tract. Annual review of microbiology. 1977;31:107-33. 5. Reid A, Greene S. FAQ: Human Microbiome. Washington, DC 20036: American Academy of Microbiology, 2014. 6. Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, et al. Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes. Human Molecular Genetics. 2014;23(22):5866-78. 7. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome research. 2012;22(9):1760-74. 8. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59-65. 9. Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner AC, Yu WH, et al. The human oral microbiome. Journal of bacteriology. 2010;192(19):5002-17. 10. Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, et al. Diversity of the human intestinal microbial flora. Science (New York, NY). 2005;308(5728):1635-8. 11. Grice EA, Kong HH, Conlan S, Deming CB, Davis J, Young AC, et al. Topographical and temporal diversity of the human skin microbiome. Science (New York, NY). 2009;324(5931):1190-2. 12. Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SS, McCulle SL, et al. Vaginal microbiome of reproductive-age women. Proceedings of the National Academy of Sciences of the United States of America. 2011;108 Suppl 1:4680-7. 13. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207-14. 14. Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS, et al. Evolution of mammals and their gut microbes. Science (New York, NY). 2008;320(5883):1647-51. 15. Foxman B, Rosenthal M. Implications of the human microbiome project for epidemiology. American journal of epidemiology. 2013;177(3):197-201.

AC C

EP

TE D

M AN U

SC

RI PT

287

13

ACCEPTED MANUSCRIPT

EP

TE D

M AN U

SC

RI PT

16. Li H. Systems biology approaches to epidemiological studies of complex diseases. Wiley interdisciplinary reviews Systems biology and medicine. 2013;5(6):677-86. 17. Alliance NBD. Workshop I - Report. 2012. 18. Morgan XC, Huttenhower C. Chapter 12: Human Microbiome Analysis. PLoS computational biology. 2012;8(12):e1002808. 19. Kuczynski J, Costello EK, Nemergut DR, Zaneveld J, Lauber CL, Knights D, et al. Direct sequencing of the human microbiome readily reveals community differences. Genome biology. 2010;11(5):210. 20. Goodrich JK, Di Rienzi SC, Poole AC, Koren O, Walters WA, Caporaso JG, et al. Conducting a microbiome study. Cell. 2014;158(2):250-62. 21. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Applied and environmental microbiology. 2005;71(12):8228-35. 22. Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities. Applied and environmental microbiology. 2007;73(5):157685. 23. Chao A, Chazdon RL, Colwell RK, Shen TJ. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics. 2006;62(2):361-71. 24. McArdle BH, Anderson MJ. Fitting Multivariate Models to Community Data: A Comment on Distance-Based Redundancy Analysis. Ecology. 2001;82(1):290-7. 25. Kelly BJ, Gross R, Bittinger K, Sherrill-Mix S, Lewis JD, Collman RG, et al. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics (Oxford, England). 2015;31(15):2461-8. 26. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2015. 27. La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, et al. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PloS one. 2012;7(12):e52078. 28. Zhou Y, Gao H, Mihindukulasuriya KA, La Rosa PS, Wylie KM, Vishnivetskaya T, et al. Biogeography of the ecosystems of the healthy human body. Genome biology. 2013;14(1):R1. 29. Hughes JB, Hellmann JJ. The application of rarefaction techniques to molecular inventories of microbial diversity. Methods in enzymology. 2005;397:292-308. 30. Navas-Molina JA, Peralta-Sanchez JM, Gonzalez A, McMurdie PJ, VazquezBaeza Y, Xu Z, et al. Advancing our understanding of the human microbiome using QIIME. Methods in enzymology. 2013;531:371-444. 31. McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology. 2014;10(4):e1003531. 32. Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R. UniFrac: an effective distance metric for microbial community comparison. The ISME journal. 2011;5(2):169-72.

AC C

325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369

14

ACCEPTED MANUSCRIPT

EP

TE D

M AN U

SC

RI PT

33. Schloss PD. Evaluating different approaches that test whether microbial communities have the same structure. The ISME journal. 2008;2(3):265-75. 34. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England). 2010;26(1):139-40. 35. Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nature methods. 2013;10(12):1200-2. 36. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15(12):550. 37. Fierer N, Hamady M, Lauber CL, Knight R. The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proceedings of the National Academy of Sciences of the United States of America. 2008;105(46):17994-9. 38. Ley RE, Turnbaugh PJ, Klein S, Gordon JI. Microbial ecology: human gut microbes associated with obesity. Nature. 2006;444(7122):1022-3. 39. Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial community variation in human body habitats across space and time. Science (New York, NY). 2009;326(5960):1694-7. 40. Zhou Y, Mihindukulasuriya KA, Gao H, La Rosa PS, Wylie KM, Martin JC, et al. Exploration of bacterial community classes in major human habitats. Genome biology. 2014;15(5):R66. 41. Gajer P, Brotman RM, Bai G, Sakamoto J, Schutte UM, Zhong X, et al. Temporal dynamics of the human vaginal microbiota. Science translational medicine. 2012;4(132):132ra52. 42. Shipitsyna E, Roos A, Datcu R, Hallen A, Fredlund H, Jensen JS, et al. Composition of the vaginal microbiota in women of reproductive age--sensitive and specific molecular diagnosis of bacterial vaginosis is possible? PloS one. 2013;8(4):e60670. 43. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature. 2011;473(7346):174-80. 44. Koren O, Knights D, Gonzalez A, Waldron L, Segata N, Knight R, et al. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets. PLoS computational biology. 2013;9(1):e1002863. 45. Roager HM, Licht TR, Poulsen SK, Larsen TM, Bahl MI. Microbial enterotypes, inferred by the prevotella-to-bacteroides ratio, remained stable during a 6-month randomized controlled diet intervention with the new nordic diet. Applied and environmental microbiology. 2014;80(3):1142-9. 46. Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, et al. Linking long-term dietary patterns with gut microbial enterotypes. Science (New York, NY). 2011;334(6052):105-8. 47. Ding T, Schloss PD. Dynamics and associations of microbial community types across the human body. Nature. 2014. 48. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444(7122):1027-31.

AC C

370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414

15

ACCEPTED MANUSCRIPT

49. Turnbaugh PJ, Backhed F, Fulton L, Gordon JI. Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell host & microbe. 2008;3(4):213-23. 50. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457(7228):480-4. 51. Ley RE. Obesity and the human microbiome. Current opinion in gastroenterology. 2010;26(1):5-11. 52. Finucane MM, Sharpton TJ, Laurent TJ, Pollard KS. A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter. PloS one. 2014;9(1).

RI PT

415 416 417 418 419 420 421 422 423 424

AC C

EP

TE D

M AN U

SC

425

16

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

Figure 1: An epidemiologic triangle incorporating the microbiome as a fourth

428

important and distinct node. Adapted from Foxman and Rosenthal, 2013 (15).

TE D

426 427

429

Table 1: List of programs and packages with their associated websites and citations.

Program/Package

Website

Citation

https://www.r-project.org https://github.com/brendankelly https://cran.r-project.org/web/packages/HMP/index.html https://bioconductor.org/packages/release/bioc/html/edgeR.html https://bioconductor.org/packages/release/bioc/html/metageno meSeq.html https://bioconductor.org/packages/release/bioc/html/DESeq2.ht ml

(26) (25) (27) (34) (35)

AC C

R micropower HMP edgeR metagenomeSeq

EP

430

DESeq2

(36)

431

17