Accepted Manuscript The importance of the microbiome in epidemiologic research Blake M. Hanson, George M. Weinstock PII:
S1047-2797(16)30078-3
DOI:
10.1016/j.annepidem.2016.03.008
Reference:
AEP 7936
To appear in:
Annals of Epidemiology
Received Date: 11 January 2016 Revised Date:
3 February 2016
Accepted Date: 23 March 2016
Please cite this article as: Hanson BM, Weinstock GM, The importance of the microbiome in epidemiologic research, Annals of Epidemiology (2016), doi: 10.1016/j.annepidem.2016.03.008. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
1
The importance of the microbiome in epidemiologic research
3
Blake M. Hanson* and George M. Weinstock
4
The Jackson Laboratory for Genomic Medicine
5
Farmington, CT 06032
RI PT
2
6 *corresponding author
SC
7 8
M AN U
Abstract
EP
TE D
Purpose: The human microbiome is the community of microorganisms that live on and in the body. Currently, most applications of microbiome analysis derive from the perspective of discovery and characterization. The completion of the NIH Human Microbiome and the European MetaHIT projects will change the focus to studying the role of the microbiome on human health and disease. Methods: Recent developments in technology and bioinformatics have afforded an opportunity to explore more fully the importance of community structure, detection of pathogens, and community interactions. The current state of microbiome research in terms of effect size, power calculations, how stratification on community classes can increase this power, and the importance of study design and power in reproducibility is reviewed. Results: Work is needed to characterize microbiome development, ecological stability, and variation. Development and implementation of variance stabilization techniques should replace rarefaction of data, which reduces study power, in future research. Conclusions: Epidemiologists have most of the the tools necessary to explore the relationship between the microbiome and human health. Further development of tools for large-scale multivariate datasets will be helpful. Applying the methods of epidemiology will be critical in translating research results to preventive interventions and population health.
AC C
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Keywords: Microbiota, Microbial Consortia,Epidemiology,
ACCEPTED MANUSCRIPT
The term “microbiome” was originally coined by Joshua Lederberg and represents
36
“the ecological community of commensal, symbiotic and pathogenic microorganisms
37
that literally share our body space.”(1, 2)This microbial community is composed of
38
roughly 100 trillion microbial cells(3), which is between three and ten times the
39
number of human cells (4, 5). Lederberg put a strong emphasis on viewing the
40
microbiome not as a distinct element, dividing the microbial cells and H. sapiens
41
cells, but as a superorganism composed of all cells present, both microbial and H.
42
sapiens(1). Currently there are an estimated 19,797 protein-encoding genes in the
43
human genome according to GENCODE release 23 (6, 7)and there are 536,112
44
unique microbial genes within an average individual’s gut (8)resulting inroughly 27
45
times as many microbial genes present as human genes in this tissue alone.Of
46
course, microbes are not limited to the gut,but are present on every body surface
47
that comes into contact with the environment (9-13) and fulfill a number of
48
metabolic roles unfulfilled by human cells (14).This superorganismcan be
49
investigated utilizing a modification of the conceptual epidemiologic framework of
50
the epidemiologic triangle where the microbiome is added in the form of a fourth
51
node (15). Traditionally, the epidemiologic triangle has been used to organize the
52
relationship between three potential causal determinants: the host, the agent, and
53
the environment. The microbiome is impacted by and impacts all three of these
54
determinants and must be considered not as a component of one these three nodes,
55
but as its own distinct node (Figure 1).
AC C
EP
TE D
M AN U
SC
RI PT
35
56
2
ACCEPTED MANUSCRIPT
This microbiome has historically been outside the reach of in depth scientific
58
inquiry since culture based methods are not sufficient for sampling communities of
59
hundreds of different taxa present at a range of abundances. With the advent of
60
sampling by Sanger DNA sequencing of the ensemble community (“culture-
61
independent” sampling) the microbiome began to come into focus. However recent
62
technological developments (“next-generation” sequencing) provided much deeper
63
and cost-effective sampling. A typical microbiome data point from a human sample
64
(stool, saliva, vaginal swab, skin scrape, nasal lavage, etc) is a set of from thousands
65
to hundreds of millions of sequences, assigned to a list of hundreds of taxa or
66
thousands of genes to provide abundances for each. Such data points are typically
67
produced from tens of samples, with larger studies up to hundreds of samples.This
68
with the accompanimentof appropriate bioinformatics, has opened up new research
69
avenues for analysis of the microbiome, such as details of community structure,
70
detection of pathogens existing within the microbiome and their associated
71
virulence mechanisms, as well as community interactions, such as commensalism,
72
mutualism and amensalism.
SC
M AN U
TE D
EP
AC C
73
RI PT
57
74
With the completion of the first NIH Human Microbiome Project (HMP) (13) and the
75
EuropeanMetaHIT project(8), the focus has turned from characterization and
76
cataloging of the microbiome to investigating the interaction of the microbiome and
77
human health. Human microbiome biomarkers have potential clinical utility for
78
predicting disease risk, identifying disease onset,predicting treatment response,
79
determining treatment success, guiding preventative measures, as well as 3
ACCEPTED MANUSCRIPT
developing therapeutics based on microbiome manipulation. However to date most
81
applications of microbiome analysis to clinical situations have had low diagnostic
82
accuracy so the results are primarily useful from a discovery perspective but not for
83
general clinical use. This is in part due to a need for statistical tools and
84
experimental designs to move the microbiome from the lab to the clinic. As
85
concluded in a recent Annual Review of Statistics and Its Application article: “unique
86
characteristics of the (microbiome) data produced by the new technologies, as well
87
as the sheer magnitude of these data, make drawing valid biological inferences from
88
microbiome studies difficult(16). Analysis of these big data poses great statistical
89
and computational challenges.”The microbiome data and analysis are all part of the
90
general trend in genomic big data. As summarized by a National Biomarker
91
Development Alliance report: “new ‘omics technologies are generating data with
92
rapidly escalating volume, velocity, and variety, and data analysis usually requires
93
complex deconvolution of low signal-to-noise signatures(17). In addition, the large-
94
scale molecular datasets that these new technologies generate differ fundamentally
95
from traditional biological and clinical datasets. Traditional clinical and
96
epidemiological datasets comprise a small number of variables tracked across a
97
proportionately larger number of samples, while today’s ‘omics’ technologies are
98
measuring variables per sample whose numbers far exceed the typical number of
99
samples. Together, these factors create a need for new data analytics and
100
AC C
EP
TE D
M AN U
SC
RI PT
80
infrastructure that are only now being developed…”
101
4
ACCEPTED MANUSCRIPT
Epidemiologists have the tools to investigate and elucidate the relationship between
103
the microbiome and human health but it will require consideration of the
104
microbiome not just as an external entity, but also as a component of our self.
105
Epidemiologists often analyze large, integrated datasets that contain different types
106
of data and are therefore well equipped to work with this type of data. As
107
microbiome development, ecological stability, and variation become better
108
understood, epidemiologists are well poised to further characterize the microbiome
109
and aid in the translation of research results to interventions and population health.
SC
RI PT
102
M AN U
110
We will present a few key topics in the current state of microbiome research,
112
focusing on the assessments of power calculation, how stratification can increase
113
this power, and the importance of study design selection.
114
TE D
111
Effect Size
116
To developpower calculation tools, one considers what an effect size is in
117
microbiome studies. This can be addressed two ways: how much of a difference
118
between the microbiome does one need to differentiate two groups, and how much
119
of an effect size is biologically relevant. In these analyses, lists of taxa can be
120
compared at different phylogenetic levels (phylum to species) where the number of
121
categories increases the lower taxonomic level, or list of genes can be compared as
122
single genes or grouped into pathways also reducing categories. Additionally, we
123
can consider more aggregate measures such as beta-diversity (18) – an assessment
124
of the similarity between two populations or samples – or measures with finer
AC C
EP
115
5
ACCEPTED MANUSCRIPT
resolution such as changes in specific organisms or operational taxonomic units
126
(OTUs). When considering effect size, identifying the number of individuals for each
127
group, as well as the number of sequences that need to be generated for each
128
sample is essential (19, 20)
RI PT
125
129 Power Calculation
131
In characterizing the microbiome and generating new hypotheses, many studies
132
compare the microbial communities of groups with different exposures or
133
interventions. When designing these studies, special attention must be paid to
134
statistical power – the ability to detect an expected effect and reject the null
135
hypothesis – and recruiting enough participants to achieve this power.
M AN U
SC
130
136
One way to represent the microbial community structure and compare them
138
between individuals and groups is through the use of distance matrices. One such
139
method utilizes UniFrac(21, 22) or Jaccard(23)pairwise distance matrices for
140
PERMANOVA (permutational multivariate analysis of variance) using distance
141
matrices(24). Statistical power of PERMANOVA relies upon the number of exposure
142
or intervention groups, the number of subjects in each group, the size of the effect,
143
and the distances between subjects within each group (25). Additionally, because
144
PERMANOVA utilizes a pseudo-F ratio, the power estimation techniques for
145
parametric ANOVA are not applicable to PERMANOVA (24). Kelly et al. recently
146
published a novel power and sample size calculation tool, implemented in the R
AC C
EP
TE D
137
6
ACCEPTED MANUSCRIPT
147
programming language(26) as the micropowerpackage,that allows researchers to
148
simulate and model different effect sizes within microbiome composition (25).
149 Another representation of a microbial community is to consider it as a whole
151
assemblage of distinct microbes that can be assessed using multivariate tools. In
152
addition to assessments of microbial diversity, multivariate analysis tools allow
153
researchers to investigate the effects of multiple exposures or interventions and the
154
association with individual members of the microbial community while still
155
considering the other members of the microbial community (27). Many non-
156
parametric multivariate tests have the limitation, however, that either the size of the
157
effect cannot be quantified, or that the dispersion of specific taxa within a set of
158
samples is consistent. The Dirichlet-multinomial distribution modelprovides
159
adjustments over a general multinomial distribution that limit Type I error
160
(rejecting the null hypothesis incorrectly) due to overdispersion in specific taxa
161
frequencies (27). La Rosa et al. have published a power, sample size, and parameter
162
estimation tool, also implemented in the R programming language (26), as the HMP
163
package (27). This package requires an estimate of overdispersion, reflecting the
164
variance of each individual taxon and the covariance between taxa, and an estimate
165
of the number of expected taxa. This package was employed by Zhou et al. where
166
they utilized a range of observed values of overdispersion and number of expected
167
taxa(28). The observed overdispersion was highest in the posterior fornix of the
168
vagina and lowest in multiple oral sites. Additionally, a range of estimates of
AC C
EP
TE D
M AN U
SC
RI PT
150
7
ACCEPTED MANUSCRIPT
169
observed taxa were used to correspond to different relative sequencing depths as
170
the greater the sequencing depth, the more taxa the authors observed (28).
171 Relative sequencing depth is an important factor to consider in both of these power
173
calculations. A common technique employed when there are different numbers of
174
sequencing reads between the samples in a dataset is to rarify the data, or to
175
normalize the number of reads across samples through random subsampling(29,
176
30). The level of subsampling is often set to the smallest number of reads present
177
within the cohort of samples where the sample is considered a sequencing success.
178
This subsampling removes useful and informative data, and has been deemed
179
“statistically inadmissible” by McMurdie and Holmes (31).
M AN U
SC
RI PT
172
180
When considering the statistical methods utilized in the power calculations above,
182
the Dirichlet-multinomial distribution model relies on count data(27), the
183
Jaccard(23)and unweightedUniFrac(21) utilize presence/absence data, and
184
weighted UniFrac(22) uses either count or percent data once the data has been
185
rarified. It has been previously observed that UniFrac distance has differing
186
sensitivities depending on the number of sequencing reads (21), and this is
187
particularly apparent when considering rare OTUs (32). This is because more rare
188
taxa are detected with increasing sequencing reads, causing a sequencing depth
189
dependent effect on the diversity metric(33). This depth dependent effect is also
190
present with other alpha- and beta-diversity measures (23, 32), but to a decreased
191
effect as that observed with UniFrac. McMurdie and Holmes suggested this effect is
AC C
EP
TE D
181
8
ACCEPTED MANUSCRIPT
“a failure of the implementation of these methods to properly account for rare
193
species” and demonstrated it was better to utilize all of the data and “model the
194
noise and address extra species using statistical normalization methods based on
195
variance stabilization and robustification/filtering” (31). Implementations of these
196
variance stabilization methodologies have been implemented in R packages such as
197
edgeR(34), metagenomeSeq(35), and DESeq2 (36). There is still work to be done
198
developing and implementing these variance stabilization techniques so they are
199
ubiquitous in microbiome analysis tools, but rarefaction of data should be avoided if
200
at all possible. It is a poor substitute for appropriate statistical techniques and
201
removes useful data from study, decreasing the power of the study.
M AN U
SC
RI PT
192
202 Community Classes
204
Different human body sites are composed of distinct microbial communities (9-13,
205
37-39) with a wide amount of variation between individuals (11, 39). Despite this
206
diversity and variation, in many body sites, this diversity can be clustered into
207
conserved groupings, known as community classes or biotypes(40). Two of the
208
better-characterized body sites and samples with biotypes are the vagina, and the
209
gut.
EP
AC C
210
TE D
203
211
Previous studies have identified five biotypes of the vagina that are defined by the
212
species or genera present: Lactobacillus iners, L. crispatus, L. gasseri, L. jensenii, or a
213
heterogeneous community composed primarily of obligate anaerobes(12, 41).
214
These bacterial communities are diverse and show frequent species turnover over 9
ACCEPTED MANUSCRIPT
time (41). Additionally, a shift from a vaginal biotype dominated by lactobacilli to a
216
biotype dominated by obligate anaerobes has been associated with the development
217
of bacterial vaginosis(42). Vaginal biotypes have also been associated with race
218
(40).
RI PT
215
219
The community of bacteria present within the gut has been studied extensively, with
221
many studies observing gut biotypes, referred to as enterotypes, based upon
222
differing abundances of the genera Bacteroides, Ruminococcus, and Prevotella(43).
223
While cohort size and the method of clustering have been found to impact the
224
detection of these enterotypes(44), theBacteroidesand Prevotellaenterotypes are
225
often observed (45, 46) and these enterotypes appear to be independent of age,
226
gender, and nationality (43). A fourth enterotype has been proposed that has very
227
low Bacteroidesand is driven by a diverse collection of Firmicutes(47), as well as a
228
“enterogradient” made up of a continuum of Prevotellaand Bacteroideswith an
229
assorted composition of Firmicutes. While the number and makeup of enterotypes
230
has been debated in the literature, it is clear the gut microbiome can be grouped into
231
clusters and these clusters are stable over time (46).
M AN U
TE D
EP
AC C
232
SC
220
233
In addition to the vagina and stool, biotypes have been identified in a number of
234
other body sites, including the anterior nares, anticubital fossa, retroauricular
235
crease, hard palate, buccal mucosa, subgingival plaque, supragingival plaque, throat,
236
palatine tonsils, and saliva(40). A number of these biotypes have been associated
237
with gender, age, and race. These community classes can be used to stratify 10
ACCEPTED MANUSCRIPT
238
individuals within a study, allowing for more focused study design, creation of more
239
specific inclusion/exclusion criteria, and enrollment of participants specifically of
240
interest, all of which help focus a study and increase the power.
RI PT
241 Reproducibility and Power
243
In the short history of the study of the microbiome, there have been many
244
compelling theories of how the microbiome modulates health, few more compelling
245
than the theory that there is a causal link between obesity and the gut microbiome.
246
Prompted by the observation that obesity can be provoked in lean mice by fecal
247
transfer from an obese donor (48, 49), many researchers began investigating the
248
link in both humans and mice (38, 50). While the mechanism was not understood,
249
the theory was obese individuals have a lower proportion of Bacteroidesin
250
comparison to the proportion of Firmicutes than that observed in lean individuals.
251
Larger studies aiming to reproduce the findings did not observe this association
252
between obesity and the ratio of Bacteroidesand Firmicutes(13, 43), and a separate
253
study observed a higher proportion of Bacteroidescompared to Firmicutesin obese
254
individuals (51).
M AN U
TE D
EP
AC C
255
SC
242
256
These contradictory findings led Finucane et al. to conduct an assessment of the
257
relationship between the gut microbiome and obesity through the use of the HMP
258
dataset and compare the results to the MetaHIT dataset (52). Finucane et al. found
259
no association between the “Bacteroidetes:Firmicutesratio” and obesity or BMI, and
260
further indicated they “had 96% power to detect a difference in the relative 11
ACCEPTED MANUSCRIPT
abundance of Bacteroidetes and 80% power for Firmicutes” when comparing obese
262
to lean individuals. The researchers also assessed for any possible sampling or
263
measurement error and proposed and discussed the possibility of unmeasured
264
confounders affecting the relationship between obesity and the microbiome (52).
265
This study stands as a good example of a rigorous assessment of a finding from an
266
underpowered study, and as such should be seen as an example that needs to be
267
replicated for many other assessments of the relationship between the microbiome
268
and human health.
SC
RI PT
261
M AN U
269 Conclusion
271
There are many inherent complexities in how the microbiome interacts with the
272
host, environment, and agent. Often, community characteristics (the makeup of the
273
microbial community), the co-occurrence of a number of microbes within a
274
community, or the specific abundance of a microbe or group of microbes are of
275
interest. The challenge is associating those complex community dynamics with
276
human health. Epidemiologists are well poised to contribute to the growing body of
277
microbiome research through the implementation of well-designed studies that are
278
powered to test the hypothesis of interest. There is still work to be done in
279
characterizing microbiome development, ecological stability, and variation, and
280
there is a great need for further development of statistical methodologies for dealing
281
with large-scale epidemiologic data. There is a great deal of promise in the study of
282
the microbiome, however, and the potential utilization of the microbiome for new
283
diagnostic, preventative, and therapeutic technologies.
AC C
EP
TE D
270
12
ACCEPTED MANUSCRIPT
284 285 286 Works Cited
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
1. Lederberg J. Infectious history. Science (New York, NY). 2000;288(5464):287-93. 2. Lederberg J. 'Ome Sweet 'Omics-- A Genealogical Treasury of Words. The Scientist. 2001. 3. Ley RE, Peterson DA, Gordon JI. Ecological and Evolutionary Forces Shaping Microbial Diversity in the Human Intestine. Cell. 2006;124(4):837-48. 4. Savage DC. Microbial ecology of the gastrointestinal tract. Annual review of microbiology. 1977;31:107-33. 5. Reid A, Greene S. FAQ: Human Microbiome. Washington, DC 20036: American Academy of Microbiology, 2014. 6. Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, et al. Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes. Human Molecular Genetics. 2014;23(22):5866-78. 7. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome research. 2012;22(9):1760-74. 8. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59-65. 9. Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner AC, Yu WH, et al. The human oral microbiome. Journal of bacteriology. 2010;192(19):5002-17. 10. Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, et al. Diversity of the human intestinal microbial flora. Science (New York, NY). 2005;308(5728):1635-8. 11. Grice EA, Kong HH, Conlan S, Deming CB, Davis J, Young AC, et al. Topographical and temporal diversity of the human skin microbiome. Science (New York, NY). 2009;324(5931):1190-2. 12. Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SS, McCulle SL, et al. Vaginal microbiome of reproductive-age women. Proceedings of the National Academy of Sciences of the United States of America. 2011;108 Suppl 1:4680-7. 13. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207-14. 14. Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS, et al. Evolution of mammals and their gut microbes. Science (New York, NY). 2008;320(5883):1647-51. 15. Foxman B, Rosenthal M. Implications of the human microbiome project for epidemiology. American journal of epidemiology. 2013;177(3):197-201.
AC C
EP
TE D
M AN U
SC
RI PT
287
13
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
16. Li H. Systems biology approaches to epidemiological studies of complex diseases. Wiley interdisciplinary reviews Systems biology and medicine. 2013;5(6):677-86. 17. Alliance NBD. Workshop I - Report. 2012. 18. Morgan XC, Huttenhower C. Chapter 12: Human Microbiome Analysis. PLoS computational biology. 2012;8(12):e1002808. 19. Kuczynski J, Costello EK, Nemergut DR, Zaneveld J, Lauber CL, Knights D, et al. Direct sequencing of the human microbiome readily reveals community differences. Genome biology. 2010;11(5):210. 20. Goodrich JK, Di Rienzi SC, Poole AC, Koren O, Walters WA, Caporaso JG, et al. Conducting a microbiome study. Cell. 2014;158(2):250-62. 21. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Applied and environmental microbiology. 2005;71(12):8228-35. 22. Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities. Applied and environmental microbiology. 2007;73(5):157685. 23. Chao A, Chazdon RL, Colwell RK, Shen TJ. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics. 2006;62(2):361-71. 24. McArdle BH, Anderson MJ. Fitting Multivariate Models to Community Data: A Comment on Distance-Based Redundancy Analysis. Ecology. 2001;82(1):290-7. 25. Kelly BJ, Gross R, Bittinger K, Sherrill-Mix S, Lewis JD, Collman RG, et al. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics (Oxford, England). 2015;31(15):2461-8. 26. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2015. 27. La Rosa PS, Brooks JP, Deych E, Boone EL, Edwards DJ, Wang Q, et al. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PloS one. 2012;7(12):e52078. 28. Zhou Y, Gao H, Mihindukulasuriya KA, La Rosa PS, Wylie KM, Vishnivetskaya T, et al. Biogeography of the ecosystems of the healthy human body. Genome biology. 2013;14(1):R1. 29. Hughes JB, Hellmann JJ. The application of rarefaction techniques to molecular inventories of microbial diversity. Methods in enzymology. 2005;397:292-308. 30. Navas-Molina JA, Peralta-Sanchez JM, Gonzalez A, McMurdie PJ, VazquezBaeza Y, Xu Z, et al. Advancing our understanding of the human microbiome using QIIME. Methods in enzymology. 2013;531:371-444. 31. McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS computational biology. 2014;10(4):e1003531. 32. Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R. UniFrac: an effective distance metric for microbial community comparison. The ISME journal. 2011;5(2):169-72.
AC C
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369
14
ACCEPTED MANUSCRIPT
EP
TE D
M AN U
SC
RI PT
33. Schloss PD. Evaluating different approaches that test whether microbial communities have the same structure. The ISME journal. 2008;2(3):265-75. 34. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England). 2010;26(1):139-40. 35. Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nature methods. 2013;10(12):1200-2. 36. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology. 2014;15(12):550. 37. Fierer N, Hamady M, Lauber CL, Knight R. The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proceedings of the National Academy of Sciences of the United States of America. 2008;105(46):17994-9. 38. Ley RE, Turnbaugh PJ, Klein S, Gordon JI. Microbial ecology: human gut microbes associated with obesity. Nature. 2006;444(7122):1022-3. 39. Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. Bacterial community variation in human body habitats across space and time. Science (New York, NY). 2009;326(5960):1694-7. 40. Zhou Y, Mihindukulasuriya KA, Gao H, La Rosa PS, Wylie KM, Martin JC, et al. Exploration of bacterial community classes in major human habitats. Genome biology. 2014;15(5):R66. 41. Gajer P, Brotman RM, Bai G, Sakamoto J, Schutte UM, Zhong X, et al. Temporal dynamics of the human vaginal microbiota. Science translational medicine. 2012;4(132):132ra52. 42. Shipitsyna E, Roos A, Datcu R, Hallen A, Fredlund H, Jensen JS, et al. Composition of the vaginal microbiota in women of reproductive age--sensitive and specific molecular diagnosis of bacterial vaginosis is possible? PloS one. 2013;8(4):e60670. 43. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature. 2011;473(7346):174-80. 44. Koren O, Knights D, Gonzalez A, Waldron L, Segata N, Knight R, et al. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets. PLoS computational biology. 2013;9(1):e1002863. 45. Roager HM, Licht TR, Poulsen SK, Larsen TM, Bahl MI. Microbial enterotypes, inferred by the prevotella-to-bacteroides ratio, remained stable during a 6-month randomized controlled diet intervention with the new nordic diet. Applied and environmental microbiology. 2014;80(3):1142-9. 46. Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, et al. Linking long-term dietary patterns with gut microbial enterotypes. Science (New York, NY). 2011;334(6052):105-8. 47. Ding T, Schloss PD. Dynamics and associations of microbial community types across the human body. Nature. 2014. 48. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444(7122):1027-31.
AC C
370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414
15
ACCEPTED MANUSCRIPT
49. Turnbaugh PJ, Backhed F, Fulton L, Gordon JI. Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell host & microbe. 2008;3(4):213-23. 50. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457(7228):480-4. 51. Ley RE. Obesity and the human microbiome. Current opinion in gastroenterology. 2010;26(1):5-11. 52. Finucane MM, Sharpton TJ, Laurent TJ, Pollard KS. A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter. PloS one. 2014;9(1).
RI PT
415 416 417 418 419 420 421 422 423 424
AC C
EP
TE D
M AN U
SC
425
16
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
Figure 1: An epidemiologic triangle incorporating the microbiome as a fourth
428
important and distinct node. Adapted from Foxman and Rosenthal, 2013 (15).
TE D
426 427
429
Table 1: List of programs and packages with their associated websites and citations.
Program/Package
Website
Citation
https://www.r-project.org https://github.com/brendankelly https://cran.r-project.org/web/packages/HMP/index.html https://bioconductor.org/packages/release/bioc/html/edgeR.html https://bioconductor.org/packages/release/bioc/html/metageno meSeq.html https://bioconductor.org/packages/release/bioc/html/DESeq2.ht ml
(26) (25) (27) (34) (35)
AC C
R micropower HMP edgeR metagenomeSeq
EP
430
DESeq2
(36)
431
17