Prediction of functional metagenomic composition using archived 16S rDNA sequence data from the gut microbiota of livestock

Prediction of functional metagenomic composition using archived 16S rDNA sequence data from the gut microbiota of livestock

Accepted Manuscript Prediction of functional metagenomic composition using archived 16S rDNA sequence data from the gut microbiota of livestock B. Av...

902KB Sizes 0 Downloads 44 Views

Accepted Manuscript

Prediction of functional metagenomic composition using archived 16S rDNA sequence data from the gut microbiota of livestock B. Avila-Jaime , J.R. Kawas , J.F. Garcia-Mazcorro PII: DOI: Reference:

S1871-1413(18)30115-X 10.1016/j.livsci.2018.04.017 LIVSCI 3447

To appear in:

Livestock Science

Received date: Revised date: Accepted date:

1 December 2017 26 April 2018 29 April 2018

Please cite this article as: B. Avila-Jaime , J.R. Kawas , J.F. Garcia-Mazcorro , Prediction of functional metagenomic composition using archived 16S rDNA sequence data from the gut microbiota of livestock , Livestock Science (2018), doi: 10.1016/j.livsci.2018.04.017

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Prediction of functional metagenomic composition using archived 16S rDNA sequence data from the gut microbiota of livestock B. Avila-Jaimea, J.R. Kawasb, J.F. Garcia-Mazcorroa,c,* a

Research and Development, MNA de Mexico, San Nicolas de los Garza, Nuevo Leon,

Mexico Faculty of Agronomy, Universidad Autonoma de Nuevo Leon, General Escobedo,

CR IP T

b

Nuevo Leon, Mexico c

Faculty of Veterinary Medicine, Universidad Autonoma de Nuevo Leon, General

AN US

Escobedo, Nuevo Leon, Mexico

*Corresponding author. Email address: [email protected] (J.F. Garcia-

M

Mazcorro)

To be submitted to Livestock Science as a Position paper:



Archived genetic sequence data is a valuable resource for livestock science. Functional bacterial metagenomes vary throughout the gut of cows and lambs.

AC



CE

Highlights



PT

ED

https://www.elsevier.com/journals/livestock-science/1871-1413/guide-for-authors

Feed efficiency may not be related with different functional bacterial metagenomes.

ACCEPTED MANUSCRIPT Abstract The volume of microbial genetic terabases in public repositories is getting infamous, yet much of this data has been used only once. On top of this global concern, there is still much to be done to fully understand the nature and function of microbial ecosystems, for example the gastrointestinal (GI) tract. Here we used archived 16S rDNA sequence

CR IP T

data to predict the metabolic profile of the GI microbiota from livestock using Phylogenetic Investigation of Communities by Reconstruction of Unobserved States

(PICRUSt). Study 1 (cows’ study) comprised samples of both tissues and contents of rumen, jejunum, ileum, cecum, and colon. There was no difference in any predicted

AN US

metagenomic feature between tissues and contents but interesting patterns were observed among the different sections of the gut (e.g. genes involved in protein

digestion and absorption were lower in the ileum and the jejunum). Study 2 (lambs’

M

study) comprised samples of rumen, duodenum, jejunum, ileum, colon and feces. The results revealed that the GI microbiota in efficient and inefficient lambs (as determined

ED

by Residual Feed Intake) did not display a distinctive metabolic profile for any anatomical region; however, over 200 features were significantly different (adjusted P <

PT

0.05) among different locations of the digestive tract, thus strongly supporting the well-

CE

known notion of distinctive metabolic profiles of the microbiota depending on the location where they inhabit. These results were confirmed using another tool to predict

AC

metagenomic content (Piphillin). It is our hope that this study encourages others to visualize the potential of using public archived DNA data for the benefit of livestock science. Moreover, this study adds valuable information for understanding the impact of the gut microbiota on livestock health and performance.

Keywords: microbiota; metagenome; 16S rRNA gene; livestock; feed efficiency; bioinformatics

ACCEPTED MANUSCRIPT Introduction There are two key thought-provoking matters to be addressed in this position paper. First, the growing amount of sequence data in public repositories is getting much bigger and much faster than anyone could have ever anticipated. For us the scientific community, the growing amount of data is as worrying as the fact that most of this data

CR IP T

has been used only once. This is a global concern because of the huge investment of time and other resources to produce and analyze the data as well as the resources used in storing the data. Also, a growing number of scientists are acknowledging the potential of existing (archived) data to help discover new biological phenomena and elucidate

AN US

novel molecular mechanisms (Plocik and Graveley, 2013).

The second matter we would like to address here is the need for more multidisciplinary efforts to understand the impact of the gut microbiota in livestock. The GI tract of

M

ruminants and other livestock is permanently colonized by complex microbial communities (the gut microbiota) that can affect health and performance of the animal

ED

host. Bacteria, Eukarya, Archaea as well as viruses all interact with each other and with the animal host throughout their whole productive life (Berg-Miller et al., 2011; Rudi et

PT

al., 2012; Jami et al., 2013; Kittelmann et al., 2013; Ross et al., 2013). Despite massive

CE

efforts and more technology to understand the gut microbial ecosystem, the association of the gut microbiota with the animal’s intestinal physiology and health as well as the

AC

role of host genetics is still considered a large “black box” (Malmuthuge and Guan, 2017), and the scenario is not better off for studies of the human gut microbiota (Koskella et al., 2017). A growing number of studies have helped define the taxonomic membership and diversity profile of gut microbial communities in several livestock species (Chaucheyras-Durand & Ossa, 2014; Weimer 2015). More importantly, metagenomic

ACCEPTED MANUSCRIPT and metatranscriptomic studies are offering vital clues about microbial metabolic function in ruminants (Hess et al., 2011; Li et al., 2017; Wolff et al., 2017). These studies have provided a strong foundation for understanding several key metabolic processes in livestock, such as feed digestion and methane production (Attwood et al., 2011). Most of the studies on host-associated microbiomes in livestock, however, have

CR IP T

the disadvantage of looking only at the microbial (mostly bacterial) membership composition; in most cases, the metabolic profile of such communities is not determined. The reasons for this are various but whole-genome sequencing,

metatranscriptomics and metabolomics remain prohibitive for many budgets and are

AN US

more computationally demanding (Hess et al., 2011; Stewart et al., 2018). Moreover,

the study of the function of complex microbial ecosystems are technically challenging and often confined to the transcripts or the metabolic products from the most abundant

M

microbial groups only (Aguiar-Pulido et al., 2016).

PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of

ED

Unobserved States) was made available by Langille et al. in 2013 in response to a growing number of researchers aiming to determine the functional roles of microbial

PT

communities in different ecosystems without the need of sequencing full genomes or

CE

explore intricate omics data. PICRUSt is a tool that is especially useful in areas of the world where research funding is scarce because it only requires 16S rRNA gene (rDNA)

AC

sequences (as opposed to the more expensive whole genomes or other omics technologies). Although other tools have been developed that can also predict the metabolic profile based on 16S rDNA data (e.g. Tax4Fun, Aßhauer et al., 2015; Piphillin, Iwai et al., 2016), PICRUSt remains the most widely used tool for this purpose. The main objective of this paper was to use PICRUSt to predict metabolic profiles of bacterial communities residing along the GI tract of livestock. This work was

ACCEPTED MANUSCRIPT done using publicly available data in an effort to persuade others to start considering this resource for scientific endeavors. Material and methods This current study used freely-available 16S rDNA sequence data from the Sequence Read Archive (SRA, Leinonen et al., 2011) at the NCBI from two different studies in

CR IP T

livestock, one in cows (study 1) and one in lambs (study 2) (Table 1). These two studies are interesting for livestock scientists because the authors sampled different regions throughout the GI tract. Study 1 (cows’ study)

AN US

Three-week-old Holstein preweaned bull calves (n=8) were used by Malmuthuge et al. (2014) from a previous study of their own research group (Malmuthuge et al., 2012). The authors sampled both tissues and contents from 5 different anatomical regions,

M

namely the rumen, jejunum, ileum, cecum, and colon from all animals within 20 minutes after euthanization. DNA extraction was performed on all 40 samples (8 calves,

ED

5 anatomical regions) and then subject to amplification using PCR using primers (5’-3’) 27F (GAGGTTTGATCCTGGCTCAG) and 338R (TGCTGCCTCCCGTAGGAGT),

PT

which yields a ~300 nucleotides amplicon. Accordingly to the authors, these primers

CE

can amplify the V1-V3 regions, but others have mentioned that the primers can amplify the V1-V2 regions (Donia et al., 2011). Indeed, accordingly to Baker et al. (2003), these

AC

primers do not reach the V3 region. The reverse primer contained 1 of 10 barcode identifiers and was used to amplify samples from the 5 anatomical regions for mucosal and digesta separately. After PCR reactions, the amplicons from all animals were pooled and sequenced using pyrosequencing. The resulting 16S rDNA data from the 10 pooled samples (5 anatomical regions, both contents and tissues) is freely available at the SRA (SRR1036467-SRR1036476).

ACCEPTED MANUSCRIPT A total of 10 sra files, two files (i.e. tissues and contents) for each portion of the digestive tract, were downloaded from the SRA and converted to fastq files using the fastq-dump tool (SRA Toolkit). The convert_fastaqual_fastq.py script in QIIME (Caporaso et al., 2010) was used to convert individual fastq files in fasta and qual files. Each fasta file contained various numbers of 16S sequences, each with varying numbers

CR IP T

of nucleotides (median=334 nucleotides, minimum 28, maximum 710). The split libraries step was not necessary because the files were already demultiplexed. The

QIIME script add_qiime_labels.py was used for combining individual fasta files. The read orientation of the sequences had to be changed by calculating the reverse

AN US

complement of the sequences because the reverse primer contained the barcode

identifiers (Malmuthuge et al., 2014). After changing the read orientation, the chop.seqs command in Mothur (Schloss et al., 2009) was used to trim the sequences to obtain a

M

fasta file containing only 300 nucleotide-long sequences (this was done based on the assumption that the primers would yield 300 nucleotide amplicons). The

ED

pick_closed_reference_otus.py script in QIIME was used for picking Operational Taxonomic Units (OTUs) against the GreenGenes (DeSantis et al., 2006) v13.5

PT

reference sequence file as currently requested by PICRUSt (see below). Unlike other

CE

OTU picking methods, this “closed” approach does not allow the generation of new clusters of OTUs.

AC

Study 2 (lambs’ study) Four-month-old crossbred wethers (n=65) were provided a 2-week dietary acclimation period, and then a 47-days residual feed intake (RFI) feeding trial after vaccination for enterotoxaemia. Samples of the rumen, duodenum, jejunum, ileum, colon and feces were collected from a total of 12 selected lambs (6 efficient and 6 inefficient based on RFI) within 30 minutes of harvesting. DNA extraction was performed on all 72 samples

ACCEPTED MANUSCRIPT (12 lambs, 6 anatomical regions) and then subject to PCR amplification using primers 341F (CCTACGGGAGGCAGCAG) and 806R (GGACTACHVGGGTWTCTAAT) that can amplify a region of ~465 nucleotides encompassing the V3-V4 region. The PCR amplicons were then used for 16S sequencing analysis using Illumina MiSeq (Perea et al., 2017). This data set comprises a total of 72 samples and are freely

CR IP T

available at the SRA (Bioproject PRJNA354152). A total of 72 fastq files, corresponding to 6 anatomical sites from 12 lambs, were

downloaded from the SRA using the fastq-dump tool (SRA Toolkit). The fastq files

were converted to fasta and qual files using QIIME as described above. Each fasta file

AN US

from study 2 contained a set of 602 nucleotide-long sequences without barcodes or linker sequences. Although the authors mentioned they assembled the forward and reverse reads, examination of the sequence data set from the SRA revealed that,

M

apparently, the sequences were uploaded before assembly. This is supported by the fact that the V3-V4 region is about 465 nucleotides long and the public sequences were 602

ED

nucleotides long. Therefore, we used the chop.seqs command in Mothur (Schloss et al., 2009) to split the sequences by half, resulting in two separate files each containing 301

PT

nucleotides. Indeed, the pick_closed_reference_otus.py script in QIIME confirmed our

CE

suspicions (see “Bioinformatics on data from lambs’ study” in Supplementary Information for more on this). The closed approach described above to pick OTUs in

AC

the cows’ study was also used for the samples of the lambs’ study against the same GreenGenes reference dataset. Metagenome prediction using PICRUSt The OTU tables from the closed_reference QIIME script described above were normalized by dividing the abundance of each organism by its predicted 16S copy number (Langille et al., 2013) and used to predict metagenome functional content using

ACCEPTED MANUSCRIPT the online Galaxy version of PICRUSt v1.1.1 (http://galaxy.morganlangille.com/). PICRUSt uses existing annotations of gene content as well as 16S copy numbers from reference microbial genomes in the Integrated Microbial Genomes (IMG) database (Markowitz et al., 2012) v3.5 (2,590 genomes) and a functional classification scheme to catalogue the predicted metagenome content. The current PICRUSt v1.1.1 supports two

CR IP T

types of functional predictions; this current study used the popular KEGG Orthologs (Kanehisa et al., 2012). STAMP (Parks et al., 2010) was used to visualize and analyze the PICRUSt data. Metagenome prediction using Piphillin

AN US

Similarly to PICRUSt, Piphillin is an algorithm that can also help predict functional metagenomes and has been shown to perform better than PICRUSt and Tax4Fun in some scenarios (Iwai et al., 2016), in part due to the use of a more updated IMG

M

database. However, only small files (<10 MB for the reference sequence file) can be uploaded in the current web user interface of Piphillin, which obligates users to select

ED

subsamples for analysis. Despite this pitfall, we also used Piphillin to predict the metagenomic content of data from study 2 (lambs’ study) to confirm the PICRUSt

PT

results. Piphillin allows choosing a more updated version of KEGG (May 2017), which

CE

was used in this study instead of BioCyc to allow more meaningful comparisons with PICRUSt.

AC

Statistical analysis

For the cows’ study (study 1), a White’s non parametric t-test in STAMP was used to compare the relative abundance of PICRUSt features between tissue (n=5) and content (n=5) samples (in study 1 we did not compare features among different anatomical regions because the pooling of samples from different animals yielded n=2 for each region). For the lambs’ study (study 2), we used the Welch’s t-test also in STAMP to

ACCEPTED MANUSCRIPT compare efficient and inefficient lambs regardless of anatomical region (n=36 for each group), the White’s non-parametric t-test for comparison of efficient and inefficient animals for each anatomical region (n=6 in each group), and ANOVA to compare the features among all different anatomical regions regardless of efficiency phenotype (n=12 for each group). The Benjamini-Hochber multiple test correction in STAMP was

CR IP T

used to avoid type 1 errors. An adjusted P < 0.05 was used to reject null hypotheses. Additionally, we constructed a general linear model in SAS University Edition (yijt=μ + γi + θj + λij +eijt) where i= represents the anatomical region, j= represents the efficiency group, and t=represents each lamb; yijt: relative proportion of each predicted feature (for

AN US

anatomical region i from efficiency group j from lamb t); γi: fixed main effect of

anatomical region; θj: fixed main effect of efficiency group; λit: fixed interaction effects of anatomical region i with efficiency group j; and eijt: error term for region i, efficiency

M

j and lamb t, mainly to investigate any possible interaction between anatomical region and efficiency group in the variability of each predicted feature. The results from

ED

Piphillin were only used to confirm the PICRUSt results of study 2 (lambs’ study). Results

PT

Study 1 (cows’ study)

CE

A total of 2,793 different OTUs (97% similarity) were detected among all 10 pooled samples when considering the first 300 nucleotides from each sequence. This number is

AC

much lower compared to the 15,771 OTUs originally published by Malmuthuge et al. (2014), likely because of the OTU clustering algorithm used. In order to contribute to answering this question, we also 1) used all sequences (not only the ones with at least 300 nucleotides) to assign OTUs and were able to detect a total of 9,729 OTUs, and 2) used a new open reference algorithm developed by Rideout et al. (2014) on all sequences and obtained a total 18,190 different OTUs in this current study. Based on

ACCEPTED MANUSCRIPT these additional analyses, we hypothesize that the 15,771 OTUs discovered by Malmuthuge et al. (2014) may be explained by the OTU picking approach used by the authors. Regardless, the taxonomic identities of the bacterial communities in this study were nearly identical to the results published by Malmuthuge et al. (2014) (results not shown).

CR IP T

The original PICRUSt v1.0.0 yielded information about 328 categories of genes organized in 7 main Level 1 categories, based on the KEGG (Kanehisa et al., 2012). However, 27 features were catalogued as “Unclassified” and therefore they are not

provided in the new PICRUSt version 1.1.1. Thus, the new v.1.1.1 of PICRUSt now

AN US

yields information from a total of 301 features for analysis. In this current study, we only detected 262 of these categories for the cows’ study (39 categories were not detected in any of the samples of study 1). The e-only Supplementary file

M

Results_cows.xlsx shows the relative abundance and associated P values. There was no statistical difference when comparing lumen vs. contents (corrected P > 0.17). Although

ED

we could not perform a statistical comparison among the different anatomical segments (n=2 for each segment of the GI tract), there was an interesting pattern of several gene

PT

categories among the different compartments (Fig. 1). For example, genes involved in

CE

protein digestion and absorption as well as energy metabolism were lower in the ileum and the jejunum, which is interesting because it is in these two sections where most of

AC

the absorption of nutrients take place. Study 2 (lambs’ study) A total of 3,961 different OTUs were detected (unfortunately we cannot compare this number with the original publication because the authors did not publish the total number of OTUs detected using their approach). As mentioned above, the new version of PICRUSt yields a total of 301 features for analysis. A total of 55 features were not

ACCEPTED MANUSCRIPT detected in any sample from the lambs’ study, thus leaving us with a new total of 273 features for analysis. Additionally, at least one sample showed an abundance of 0 in a total of 19 predicted features (in red in e-only Supplementary Results_lambs.xlsx), thus leaving us with 254 features for analysis. The Welch’s t-test did not reveal any difference between efficient and inefficient when

CR IP T

comparing all anatomical regions at once or each region separately. Additional PICRUSt analyses on specific taxa (i.e. orders Clostridiales and Bacteroidales, the

family Ruminococcaceae and the genus Fibrobacter) also did not reveal any difference between efficient and inefficient lambs (see “PICRUSt analyses on separate taxa” in

AN US

Supplementary Information). Interestingly, and unexpectedly, ANOVA revealed a total of 219 features that were significantly different (corrected P < 0.05; 199 features significantly different at corrected P < 0.01) among the anatomical regions (please note

M

that this analysis included both efficient and inefficient lambs regardless of any potential difference between the two). In our experience this high number of significant

ED

features is not due to stochastic variations; other studies from our research group have shown that PICRUSt sometimes yields no differences even in scenarios when one

PT

would expect to find many differences (Garcia-Mazcorro et al., 2017). Some of the most

CE

significant features (i.e. features with the lowest P values) that showed statistical difference were genes associated with transcription machinery, the ubiquitin system,

AC

penicillin and cephalosporin biosynthesis, epithelial cell signaling in Helicobacter pylori infection, pores ion channels, protein digestion and absorption, membrane and intracellular structural molecules, and vitamin B6 metabolism, revealing very interesting patterns among the different segments of the gut (Fig. 2), especially when compared to the results in study 1 (Fig. 1). Please note that although some features may apparently not be related with bacteria at all, some other features may open up new

ACCEPTED MANUSCRIPT venues for research. For instance, for over 30 decades ubiquitin or ubiquitin like proteins were thought to be exclusive to eukaryotes (Pearce et al., 2008) but we know now a great deal of the role of ubiquitination for example during bacterial infections (Bhaskarla et al., 2017). In this study it is intriguing that bacteria from colon and feces harbor more genes related to the ubiquitin system compared to the more proximal gut

CR IP T

sections (Fig. 2). Also, both ileum and jejunum showed lower abundances of genes related to protein digestion and absorption in a similar fashion than the cows’ study, and the rumen seems to harbor more genes related to energy metabolism. The e-only

Supplementary file Results_lambs.xlxs shows summary statistics of all 273 features

AN US

with associated original and adjusted P values that may be of interest for the reader.

Additional analyses on SAS using PICRUSt data from the full OTU tables revealed a significant Anatomical Region-Efficiency group interaction for carotenoid biosynthesis

M

(P = 0.0124) and mineral absorption (P = 0.0205), thus strongly supporting the hypothesis that the differences in predicted features between animals with different

ED

efficiency phenotypes may not be same for all segments of the gut. Overall, the predictions using Piphillin confirmed the PICRUSt results with regards to

PT

the lack of significant differences between the efficiency groups, and the high number

CE

of features that showed statistical significance among the different anatomical regions (see “Metagenome prediction using Piphillin” in Supplementary Information).

AC

Discussion

Currently there is a considerable amount of useful genomic data in public repositories. This paper argues that this archived data is a valuable resource that can be used for the benefit of livestock science. Here we used public 16S rDNA data from the bacterial communities inhabiting different sections of the GI tract of cows and lambs to predict metabolic profiles using PICRUSt.

ACCEPTED MANUSCRIPT Growing evidence shows that the study of the gut microbiota is of uttermost importance to optimize domestic livestock operations (Myer et al., 2017), yet there is currently much more information on gut microbial composition (answering the question of: who lives there?) compared to their functional profile (what do they do?) (Deusch et al., 2014). The reasons for this are various but metatranscriptomics and metabolomics are

CR IP T

still too expensive for some laboratories and do require a more complex computational approach. Tools like PICRUSt and others have helped the scientific community to investigate the presence or absence of certain functional profiles using only the

information in the 16S rDNA. Despite a growing community of users of such predictive

AN US

tools in several areas of microbial ecology (especially on the human microbiome), very few papers have used this tool in livestock science. Liu et al. (2016) used PICRUSt to reveal that different forage types are associated with significant differences in 21

M

metabolic pathways in cannulated Holstein cows. Also using PICRUSt, Zhang et al. (2017) showed that goats in a high-grain diet (65% grain) had a higher relative

ED

abundance of gene families related to energy metabolism and other features, and Gomez et al. (2017) discovered that diarrheic calves had decreased abundances of genes

PT

responsible for metabolism of amino acids, carbohydrates and vitamins in feces

CE

compared to healthy calves.

At the taxonomy level, each environment possess a remarkably unique microbial

AC

composition, even at the strain level (The Human Microbiome Project Consortium, 2012). Surprisingly, the comparisons across different environments have suggested that highly divergent communities share the same proportions of functional metabolic potential, suggesting a phenomenon of functional redundancy (i.e. overlap of function among multiple species, The Human Microbiome Project Consortium, 2012; Weimer, 2015). Ross et al. (2013) also found strong similarities at the functional level among

ACCEPTED MANUSCRIPT highly divergent ruminal viral samples, thus suggesting that metabolic redundancy may be a universal phenomenon. The analyses of functional potential, however, are performed using data from all microbial communities where some organisms have low abundance of one metabolic feature while others will have higher abundance of the same feature, thus masking potentially relevant differences for each taxa separately. We

CR IP T

have recently shown that the analysis of the functional profile for specific taxa (i.e. using filtered OTU tables) is useful to show the contribution of different groups to the overall functional profile (Garcia-Mazcorro et al., 2017; Garcia-Mazcorro et al., 2018, see “Functional redundancy” in Supplementary Information for more thoughts on this).

AN US

Interestingly, in this current paper we showed that the analysis of all OTUs and selected OTUs did not yield any difference between efficient and inefficient lambs either for all anatomical regions at once or for each region separately. Although here it is not our

M

purpose to investigate the nature or the generality of functional similarities and we acknowledge the efforts in this area (Fetzer et al., 2015; Taxis et al., 2015; Delgado-

ED

Baquerizo et al., 2016; Beier et al., 2017; Wolff et al., 2017; Eng and Borenstein, 2018), we believe that the functional predictive analysis of all taxa at once may overemphasize

PT

the functional similarities among whole communities in some scenarios.

CE

Any tool that predicts functional profiles (including PICRUSt) suffers from the disadvantage that the presence of a gene or a set of genes does not inform well about the

AC

expression of the genes. This is particularly important in livestock science because we have evidence that the differences in microbial profiles based on efficiency phenotypes is accompanied by differences in active microbial metabolic functions (Li et al., 2017) and metabolomics profile (Artegoitia et al., 2017), yet these important phenomena cannot be studied using predictors of functional composition. More importantly, it has been shown that identical environmental conditions can induce different solutions in the

ACCEPTED MANUSCRIPT metabolic networks of the microorganisms that inhabit them (Pál et al., 2006; Matias Rodrigues and Wagner, 2009). Both these observations and the fact that individual microbial strains can differ greatly in gene content are important limitations of PICRUSt and other predictive tools. Finally, taxonomy may not be a good predictor of ecosystem function (Doolittle and Booth, 2017) and ruminal genomes are underrepresented in

CR IP T

public databases (Stewart et al., 2018). In summary, this study adds valuable information to the current literature about the

metabolic profile of the GI microbiota in ruminant livestock. More studies are needed to study the relationship of the gut microbiota functional profile and health and diseases

AN US

processes, a topic of great interest to producers (Chen et al., 2012; González et al.,

2012). This study will also hopefully convince others of the potential of using archived genetic sequence data for the benefit of livestock science.

M

Acknowledgements

We thank the Institute of Innovation and Technology Transfer (i2t2.org.mx) from the

ED

State of Nuevo Leon, Mexico, for financial support to JFGM. BAJ and JFGM are

AC

CE

PT

employees of MNA de Mexico.

ACCEPTED MANUSCRIPT

References Aßhauer, K.P., Wemheuer, B., Daniel, R., Meinicke, P., 2015. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31(17), 2882-2884.

CR IP T

Aguiar-Pulido, V., Huang, W., Suarez-Ulloa, V., Cickovski, T., Mathee, K., Narasimhan, G., 2016. Metagenomics, metatranscriptomics, and metabolomics approaches for microbiome analysis. Evol. Bioinform. 12(S1), 5-16.

Artegoitia, V.M., Foote, A.P., Lewis, R.M., Freetly, H.C., 2017. Rumen fluid

AN US

metabolomics analysis associated with feed efficiency on crossbred steers. Sci. Rep. 7, 2864.

Attwood, G.T., Altermann, E., Kelly, W.J., Leahy, S.C., Zhang, L., Morrison, M., 2011.

M

Exploring rumen methanogen genomes to identify targets for methane mitigation strategies. Anim. Feed Sci. Tech. 166-167, 65-75.

ED

Baker, G.C., Smith, J.J., Cowan, D.A., 2003. Review and re-analysis of domain-specific 16S primers. J. Microbiol. Meth. 55, 541-555.

PT

Beier, S., Shen, D., Schott, T., Jürgens, K. 2017. Metatranscriptomic data reveal the

CE

effect of different community properties on multifunctional redundancy. Mol. Ecol. 26(24), 6813-6826.

AC

Berg Miller, M.E., Yeoman, C.J., Chia, N., Tringe, S.G., Angly, F.E., Edwards, R.A., Flint, H.J., Lamed, R., Bayer, E.A., White, B.A., 2011. Phage-bacteria relationship and CRISPR elements revealed by a metagenomic survey of the rumen microbiome. Environ. Microbiol. 14, 207-227. Bhaskarla, C., Bhosale, M., Banerjee, P., Chandra, N., Nandi, D., 2017. Protein tagging, destruction and infection. Curr. Protein Pept. Sci. 19(2), 155-171.

ACCEPTED MANUSCRIPT Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K., Fierer, N., Gonzalez Peña, A., Goodrich, J.K., Gordon, J.I., Huttley, G.A., Kelley, S.T., Knights, D., Koenig, J.E., Ley, R.E., Lozupone, C.A., McDonald, D., Muegge, B.D., Pirrung, M., Reeder, J., Sevinsky, J.R., Turnbaugh, P.J., Walters, W.A., Widmann, J., Yatsunenko, T., Zaneveld, J., Knight, R., 2010.

CR IP T

QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335-336.

Chaucheyras-Durand, F., Ossa, F., 2014. Review: the rumen microbiome: composition, abundance, diversity, and new investigative tools. J. Prof. Scientist. 30, 1-12.

AN US

Chen, Y., Masahito, O, Guan, L.L., 2012. Variation of bacterial communities and expression of Toll-like receptor genes in the rumen of steers differing in susceptibility to subacute ruminal acidosis. Vet. Microbiol. 159, 451,459.

M

Delgado-Baquerizo, M., Giaramida, L., Reich, P.B., Khachane, A.N., Hamonts, K., Edwards, C., Lawton, L.A., Singh, B.K., 2016. Lack of functional redundancy in

104, 936-946.

ED

the relationship between microbial diversity and ecosystem functioning. J. Ecol.

PT

DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber,

CE

T., Dalevi, D., Hu, P., Andersen, G.L., 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ.

AC

Microbiol. 72, 5069-5072.

Deusch, S., Tilocca, B., Camarinha-Silva, A., Seifert, J., 2014. News in livestock research – use of Omics-technologies to study the microbiota in the gastrointestinal tract of farm animals. Comput. Struct. Biotechnol. J. 13, 55-63. Doolittle, W.F., Booth, A., 2017. It’s the song, not the singer: an exploration of holobiosis and evolutionary theory. Biol. Philos. 32, 5-24.

ACCEPTED MANUSCRIPT Donia, M.S., Florian Fricke, W., Partensky, F., Cox, J., 2011. Proc. Natl. Acad. Sci. USA 108(51), E1423-E1432. Eng, A., Borenstein, E., 2018. Taxa-function robustness in microbial communities. Microbiome 6, 45. Fetzer, I., Johst, K., Schäwe, R., Banitz, T., Harms, H., Chatzinotas, A., 2015. The

CR IP T

extent of functional redundancy changes as species’ roles shift in different environments. Proc. Natl. Acad. Sci. USA 112(48), 14888-14893.

Garcia-Mazcorro, J.F., Mills, D.A., Murphy, K., Noratto, G., 2017. Effect of barley supplementation on the fecal microbiota, caecal biochemistry and key

AN US

biomarkers of obesity and inflammation in obese db/db mice. Eur. J. Nutr. Epub ahead of print.

Garcia-Mazcorro, J.F., Nunes Lage, N., Mertens-Talcott, S., Talcott, S., Chew, B.,

M

Dowd, S.E., Kawas, J.R., Noratto, G.D., 2018. Effect of dark sweet cherry powder consumption on the gut microbiota, short-chain fatty acids, and

ED

biomarkers of gut health in obese db/db mice. PeerJ 6, e4195. González, L.A., Manteca, X., Calsamiglia, S., Schwartzkopf-Gensweing, K.S., Ferret,

PT

A., 2012. Ruminal acidosis in feedlot cattle: interplay between feed ingredients,

CE

rumen function and feeding behavior (a review). Anim. Feed Sci. Tech. 172, 6679.

AC

Hess, M., Sczyrba, A., Egan, R., Kim, T.W., Chokhawala, H., Schroth, G., Luo, S., Clark, D.S., Chen, F., Zhang, T., Mackie, R.I., Pennacchio, L.A., Tringe, S.G., Visel, A., Woyke, T., Wang, Z., Rubin, E.M., 2011. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331(6016), 463-467.

ACCEPTED MANUSCRIPT Iwai, S., Weinmaier, T., Schmidt, B.L., Albertson, D.G., Poloso, N.J., Dabbagh, K., DeSantis, T.Z., 2016. Piphillin: improved prediction of metagenomic content by direct inference from human microbiomes. PLoS ONE 11(11), e0166104. Jami, E., Israel, A., Kotser, A., Mizrahi, I., 2013. Exploring the bovine rumen bacterial community from birth to adulthood. ISME J. 7, 1069-1079.

CR IP T

Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., Tanabe, M., 2012. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109-D114.

Kittelmann, S., Seedorf, H., Walters, W.A., Clemente, J.C., Knight, R., Gordon, J.I.,

AN US

Janssen, P.H., 2013. Simultaneous amplicon sequencing to explore co-

occurrence patterns of bacterial, archaeal and eukaryotic microorganisms in rumen microbial communities. PLoS ONE 8, e47879.

M

Koskella, B., Hall, L.J., Metcalf, C.J.E., 2017. The microbiome beyond the horizon of ecological and evolutionary theory. Nat. Ecol. Evol. 1(11), 1606-1615.

ED

Langille, M.G.I., Zaneveld, J., Caporaso, J.G., McDonald, D., Knights, D., Reyes, J.A., Clemente, J.C., Burkepile, D.E., Vega Thurber, R.L., Knight, R., Beiko, R.G.,

PT

Huttenhower, C., 2013. Predictive functional profiling of microbial communities

CE

using 16S rRNA marker gene sequences. Nat. Biotechnol. 8, 1-10. Leinonen, R., Sugawara, H., Shumway, M., 2011. The sequence read archive. Nucleic

AC

Acids Res. 39, D19-D21.

Li, F., Guan, L.L., 2017. Metatranscriptomic profiling reveals linkages between the active rumen microbiome and feed efficiency in beef cattle. Appl. Env. Microbiol. 83(9), e00061-17.

ACCEPTED MANUSCRIPT Li, F., Neves, A.L.A, Ghoshal, B., Guan, L.L., 2017. Mining metagenomic and metatranscriptomic data for clues about microbial metabolic functions in ruminants. J. Dairy Sci. pii:S0022-0302(17)31175-X. Liu, J., Zhang, M., Xue, C., Zhu, W., Mao, S., 2016. Characterization and comparison of the temporal dynamics of ruminal bacterial microbiota colonizing rice straw

CR IP T

and alfalfa hay within ruminants. J. Dairy Sci. 99(12), 9668-9681. Malmuthuge, N., Chen, Y., Li, M., Fries, P., Griebel, P.J., Baurhoo, B., Zhao, X., Guan, L.L., 2012. Distinct commensal bacteria associated with ingesta and mucosal

epithelium throughout the gastrointestinal tract. FEMS Microbiol. Ecol. 79, 337-

AN US

347.

Malmuthuge, N., Griebel, P.J., Guan, L.L., 2014. Taxonomic identification of commensal bacteria associated with the mucosa and digesta throughout the

M

gastrointestinal tracts of preweaned calves. Appl. Environ. Microbiol. 80, 20212028.

ED

Malmuthuge, N., Guan, L.L., 2017. Understanding host-microbial interactions in rumen: searching the best opportunity for microbiota manipulation. J. Anim. Sci.

PT

Biotech. 8, 8.

CE

Markowitz, V.-M., Chen, I.-M., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., Ratner, A., Jacob, B., Huang, J., Williams, P., Huntemann, M., Anderson, I.,

AC

Mavromatis, K., Ivanova, N.N., Kyrpides, N.C., 2012. IMG: the Integrated Microbial Genomes database and comparative analysis system. Nucleic Acids Res. 40, D115-D122.

Matias Rodrigues, J.F., Wagner, A., 2009. Evolutionary plasticity and innovations in complex metabolic reaction networks. PLoS Comput. Biol. 5(12), e1000613.

ACCEPTED MANUSCRIPT Myer, P.R., Freetly, H.C., Wells, J.E., Smith, T.P.L., Kuehn, L.A., 2017. Analysis of the gut bacterial communities in beef cattle and their association with feed intake, growth, and efficiency. J. Anim. Sci. 95(7), 3215-3224. Pál, C., Papp, B., Lercher, M.J., Csermely, P., Oliver, S.G., Hurst, L.D., 2006. Chance and necessity in the evolution of minimal metabolic networks. Nature

CR IP T

440(7084), 667-670. Parks, D.H., Beiko, R.G., 2010. Identifying biologically relevant differences between metagenomic communities. Bioinformatics. 26, 715-721.

Pearce, M.J., Mintseris, J., Ferreyra, J., Gygi, S.P., Darwin, K.H., 2008. Ubiquitin-like

AN US

protein involved in the proteasome pathway of Mycobacterium tuberculosis. Science 322, 1104-1107.

Perea, K., Perz, K., Olivo, S.K., Williams, A., Lachman, M., Ishaq, S.L., Thomson, J.,

M

Yeoman, C.J., 2017. Feed efficiency phenotypes in lambs involve changes in ruminal, colonic, and small-intestine-located microbiota. J. Anim. Sci. 95, 2585-

ED

2592.

Plocik, A.M., Graveley, B.R., 2013. New insights from existing sequence data:

PT

generating breakthroughs without a pipette. Mol. Cell 49, 605-617.

CE

Rideout, J.R., He, Y., Navas-Molina, J.A., Walters, W.A., Ursell, L.K., Gibbons, S.M., Chase, J., McDonald, D., Gonzalez, A., Robbins-Pianka, A., Clemente, J.C.,

AC

Gilbert, J.A., Huse, S.M., Zhou, H.W., Knight, R., Caporaso, J.G., 2014. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ. 2, e545.

Ross, E.M., Petrovski, S., Moate, P.J., Hayes, B.J., 2013. Metagenomics of rumen bacteriophage from thirteen lactating dairy cattle. BMC Microbiol. 13, 242.

ACCEPTED MANUSCRIPT Rudi, K., Moen, B., Sekelja, M., Frisli, T, Lee, M.R., 2012. An eight-year investigation of bovine livestock fecal microbiota. Vet. Microbiol. 160, 369-377. Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Van Horn, D.J., Weber, C.F., 2009. Introducing mothur:

CR IP T

open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75(23), 7537-7541.

Sequence Read Archive Submissions Staff. Using the SRA Toolkit to convert .sra files

AN US

into other formats. In: SRA Knowledge Base [Internet]. Bethesda (MD):

National Center for Biotechnology Information (US); 2011-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK158900/.

M

Stewart, R.D., Auffret, M.D., Warr, A., Wiser, A.H., Press, M.O., Langford, K.W., Liachko, I., Snelling, T.J., Dewhurst, R.J., Walker, A.W., Roehe, R., Watson,

ED

M., 2018. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9(1), 870.

PT

Taxis, T.M., Wolff, S., Gregg, S.J., Minton, N.O., Zhang, C., Dai, J., Schnabel, R.D.,

CE

Taylor, J.F., Kerley, M.S., Pires, J.C., Lamberson, W.R., Conant, G.C., 2015. The players may change but the game remains: network analyses of ruminal

AC

microbiomes suggest taxonomic differences mask functional similarity. Nucl. Acids Res. 43(20), 9600-9612.

The Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486, 207-214.

ACCEPTED MANUSCRIPT Weimer, P.J., 2015. Redundancy, resilience, and host specificity of the ruminal microbiota: implications for engineering improved ruminal fermentations. Front. Microbiol. 6, 296. Wolff, S.M., Ellison, M.J., Hao, Y., Cockrum, R.R., Austin, K.J., Baraboo, M., Burch, K., Lee, H.J., Maurer, T., Patil, R., Ravelo, A., Taxis, T.M., Truong, H.,

CR IP T

Lamberson, W.R., Cammack, K.M., Conant, G.C., 2017. Diet shifts provoke complex and variable changes in the metabolic networks of the ruminal microbiome. Microbiome 5, 60.

Zhang, R., Ye, H., Liu, J., Mao, S. 2017. High-grain diets altered rumen fermentation

AN US

and epithelial bacterial community and resulted in rumen epithelial injuries of

AC

CE

PT

ED

M

goats. Appl. Microbiol. Biotechnol. 101(18), 6981-6992.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

Figure captions

Fig. 1. Box plots showing proportion of sequences related to different predicted features for the cows’ study (study 1). A statistical analysis was not possible due to the small sample size (n=2) for each anatomical region. Stars denote means.

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Fig. 2. Box plots showing proportion of sequences related to the 8 more significantly different features (i.e. lowest P values) among the anatomical regions predicted by PICRUSt for the lambs’ study (study 2). Stars denote means, the plus (+) symbol denotes outliers as described by STAMP.

ACCEPTED MANUSCRIPT Table 1. Summary of the two studies from which the 16S rDNA data was obtained from in this manuscript.

Number of samples with sequencing Tissues collected

10 (pooled)

Lambs’ study (study 2) 12 (6 efficient and 6 inefficient) 72

Rumen, jejunum, ileum, cecum, colon (tissues and contents) V1-V2 420,453 sequences

Rumen, duodenum, jejunum, ileum, colon and feces (contents only) V3-V4 659,561 sequences

M

262

Malmuthuge et al. (2014)

ED

PICRUSt features detected Reference

319,523 sequences (48% of all sequences)

AN US

16S rDNA target Sequences subject to OTU picking Sequences that 136,958 sequences (33% of all failed to be clustered sequences) in a OTU OTUs* 2,793 OTUs using the first 300 nucleotides Original OTUs 15,771 OTUs

CR IP T

Number of animals

Cows’ study (study 1) 8

3,961 OTUs using first 300 nucleotides Not disclosed by the original authors 273 Perea et al. (2017)

*The OTUs were selected using a QIIME script pick_closed_reference_otus.py against

AC

CE

PT

a curated 16S rDNA dataset (GreenGenes).