Accepted Manuscript Relative abundance of deformed wing virus, Varroa destructor virus 1, and their recombinants in honey bees (Apis mellifera) assessed by kmer analysis of public RNA-Seq data Robert Scott Cornman PII: DOI: Reference:
S0022-2011(17)30134-9 http://dx.doi.org/10.1016/j.jip.2017.07.005 YJIPA 6973
To appear in:
Journal of Invertebrate Pathology
Received Date: Revised Date: Accepted Date:
16 March 2017 6 July 2017 20 July 2017
Please cite this article as: Cornman, R.S., Relative abundance of deformed wing virus, Varroa destructor virus 1, and their recombinants in honey bees (Apis mellifera) assessed by kmer analysis of public RNA-Seq data, Journal of Invertebrate Pathology (2017), doi: http://dx.doi.org/10.1016/j.jip.2017.07.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Relative abundance of deformed wing virus, Varroa destructor virus 1, and their recombinants in honey bees (Apis mellifera) assessed by kmer analysis of public RNA-Seq data Robert Scott Cornman U.S. Geological Survey, Fort Collins Science Center, Fort Collins, Colorado, U.S.A.
[email protected]
1
Abstract Deformed wing virus (DWV) is a major pathogen of concern to apiculture, and recent reports have indicated the local predominance and potential virulence of recombinants between DWV and a related virus, Varroa destructor virus 1 (VDV). However, little is known about the frequency and titer of VDV and recombinants relative to DWV generally. In this study, I assessed the relative occurrence and titer of DWV and VDV in public RNA-seq accessions of honey bee using a rapid, kmer-based approach. Three recombinant types were detectable graphically and corroborated by de novo assembly. Recombination breakpoints did not disrupt the capsid-encoding region, consistent with previous reports, and both VDVand DWV-derived capsids were observed in recombinant backgrounds. High abundance of VDV kmers was largely restricted to recombinant forms. Non-metric multidimensional scaling identified genotypic clusters among DWV isolates, which was corroborated by read mapping and consensus generation. The recently described DWV-C lineage was not detected in the searched accessions. The data further highlight the utility of high-throughput sequencing to monitor viral polymorphisms and statistically test biological predictors of titer, and point to the need for consistent methodologies and sampling schemes. Keywords Apis mellifera, deformed wing virus, Varroa destructor virus 1, viral recombination, pathogen detection, viral titer, kmer analysis
2
Introduction Poor growth and low survival of managed honey bee colonies is an ongoing agricultural concern (Evans et al. 2011, Potts et al. 2016). In addition to stressors such as long-distance transport, poor nutrition, and pesticide exposure (Goulson et al. 2015), high pathogen loads are linked to poor colony survival (Cornman et al. 2009, Dainat et al. 2012, McMenamin and Genersch 2015; Engel et al. 2016) and their control is expensive and challenging. RNA viruses have a substantial impact on honey bee health, particularly when hives are also infested with the parasitic mite, Varroa destructor (McMenamin and Genersch 2015). In addition to the direct effects of parasitism, Varroa mites depress honey bee immune function and vector viruses (Yang and Cox-Foster, 2005). A growing body of work has highlighted the impact of deformed wing virus (DWV; Picornavirales:Iflavirus) in particular (Highfield 2009, Dainat et al. 2012, Martin et al. 2012, DiPrisco et al. 2016, Wilfert et al. 2016, Martin et al. 2012, Mordecai 2015, McMahon et al. 2016, Benaets et al. 2017), a common virus of honey bees that has also been detected in other arthropods (Genersch 2006), including V. destructor, which is an effective vector of DWV (Bowen-Walker et al. 1999). The interaction of DWV and Varroa appears synergistic with respect to bee health (DiPrisco et al. 2016) and has enabled the predominance of a subset of DWV strains (Martin et al. 2012, Wilfert et al. 2016). As a consequence, DWV (sensu stricto) shows low genetic variation globally (Wilfert et al. 2016). Varroa destructor virus 1 (VDV) is ~84% similar at the nucleotide level to DWV (Ongus et al. 2004) and appears less common in both honey bees and mites (e.g., Chejanovsky et al. 2014, Mordecai 2015), although it may be less frequently assayed. It is not clear how the ecology of this virus differs from DWV, and co-infection may be suppressed if DWV is already established (Mordecai 2015). Recombinants between DWV and VDV are sometimes detected (Zioni et al. 2011, Moore et al. 2011, Wang et al. 2013, Dalmon et al. 2017), yet, as is typical of picornaviruses (Heath et al. 2006), breakpoints rarely fall within the coding sequence of capsid peptides or the enzymes responsible for genome maintenance, suggesting that each region constitutes a co-adapted complex within which recombination may be selected against. Ryabov and colleagues (2014) showed that recombinants containing a VDV-derived capsid region reached extremely high titer in bees if, and only if, vectored by mites or by injection (an experimental surrogate of Varroa parasitism). However, the generality of these observations remains to be determined, as strong effects of host genetic background and microbiome on pathogen loads are frequently noted (Mordecai 2015, Carrillo-Tripp et al. 2016, Schwarz et al. 2016). Furthermore, while Varroa is ubiquitous, recombinant DWV-VDV are not reported to be widespread. Given the potential impact of recombinants and the poorly known factors affecting the distribution of VDV, improved understanding of the abundance of VDV and/or recombinants is needed. In this study, I searched publically released sequence data using a kmer-counting approach to estimate titers of DWV and VDV relative to a subset of honey bee mRNA. Kmer counting tabulates the number of exact matches to short “words” of nucleotide sequence of arbitrary length k. Kmers unique or shared among reference sequences can be defined and rapidly tabulated in high-throughput sequence data without computationally intensive alignment or assembly procedures. In comparison, the read “mapping” 3
algorithms typically associated with RNA-Seq attempt to align entire sequences to references, using scoring algorithms that penalize mismatches and indels. The probability that an alignment is erroneous can then be expressed and is frequently used to filter alignments below a minimum threshold. Thus, novel sequences may not be discovered if too divergent to map at high quality, or if they constitute a minority of mappings to a reference. The presence of a kmer is unambiguous and can easily be reclassified as to taxonomic origin as new information becomes available, whereas read mapping can be highly contingent on the database searched and alignment parameters used, and must be repeated if reference sequences are updated. De novo assembly is an alternative to read mapping when no suitable reference is available or references share regions of high sequence identity, but they may be unable to reconstruct complex metagenomic mixtures accurately, particularly if coverage varies along the genomes or recombinants are present. Computational requirements for assembly of NGS data are typically higher than for mapping. My primary objective here is to compare the distribution and relative titers of DWV and VDV in a large, ad hoc sample. I then examine kmer coverage along each genome to 1) graphically identify potential recombinants, and 2) use a kmer-based distance metric to search for genotypic clusters of DWV. Materials and methods The bioinformatics workflow, including specified parameters, is summarized in Supplemental File 1. Sequences were retrieved from the Short Read Archive (SRA) of the National Center for Biotechnology Information (NCBI), which is one of three primary curators of the International Nucleotide Sequence Database Collaboration and is comprehensive with respect to publically accessible genetic data. The taxonomy database was searched on 09/26/16, using “Apis mellifera” as the search term and subsequently filtering matches to include only NCBI taxonomic ID “7460” (Apis mellifera sensu lato). SRA summary data associated with this taxid were downloaded and filtered to identify RNA-Seq transcriptomic sequence runs performed on the Illumina platform with at least 5 million “spots” (unique templates sequenced in either one or two directions) and a read length of 50 or greater (Supplemental File 2). Downloads were performed with the “fastq-dump” utility of the SRA Tookit (http://ncbi.github.io/sra-tools/), and capped at the first 50 million reads. Only the first read of a pair was included in the analysis, as the quality of the first read is typically higher (Magoc and Salzberg 2011). The software package KMC v. 2.3.0 (Deorowicz et al. 2015) was used to identify and process kmers of length 25, as described in Supplemental File 1. To provide equal ascertainment of DWV and VDV, a single reference genome was used for each (NC_004830.2 and NC_006494.1, respectively). Each kmer was classified as unique to one of the two references or shared by both, using the intersect function in kmc. This terminology does not imply that kmers unique to one reference are truly fixed differences between two viral sequence pools. However, existing studies of variation in DWV (Wilfert et al. 2016, Cornman et al. 2013; Martin et al. 2012) indicate that spatial and temporal variation within each group is significantly less than between the two viruses (Ongus et al. 2004), such that one reference for each virus is sufficient to categorize kmers for the present purposes.
4
Kmers reported by KMC may be on either strand; only those reported on the plus strand were used in this analysis. The genome coordinate of each kmer was determined by aligning DWV-specific and shared kmers to the reference genome using BLASTN, or by aligning VDV-specific kmers to the VDV reference. The BLASTN alignment used the default parameters for the “blastn-short” setting. Counts were determined for all kmers presence at least twice in an SRA accession, capped at 100,000 counts. Additionally, the mean count for all analyzed kmers in consecutive 250-bp windows was determined. The median counts were used to produce a heat map of relative abundance, using conditional formatting in Excel (Microsoft), after normalizing the counts in each window to a maximum value of one. Kmers of length 25 were also tabulated for A. mellifera coding sequence (NCBI accession GCA_000002195.1 [Elsik et al. 2014]). Of these, 50,000 were selected randomly to represent relative coverage of A. mellifera transcripts. Relative viral titers among the accessions in this study were estimated as the ratio of summed viral kmer counts to summed A. mellifera kmer counts, then normalized by total number of kmers of each class. Kmer tables produced by KMC were converted to fasta format and a dissimilarity matrix for all SRA accessions as well as the reference genomes was computed with ffp (Sims et al. 2009), using the default Jensen-Shannon distance metric. Genetic clustering within the dissimilarity matrix was investigated with non-metric multidimensional scaling, using PAST3 (Hammer et al. 2001). SRA accessions identified as potential recombinants were assembled with CLC Genomics software v. 9.3 (Qiagen), using a kmer size of 25, a minimum contig length of 200, and “automatic” bubble size selection. To generate a consensus DWV sequence of accession SRR1255326 (see Results), bowtie2 v. 2.2.8 was used to map reads to the DWV reference using the “very-sensitive” and “end-to-end” switches. The consensus of this alignment was generated with samtools v. 1.3 using the mpileup, bcftools call, and bcftools consensus commands. Results Kmer sequences used and their genome coordinates are provided in Supplemental File 3. 181 SRA accessions had at least 10,000 total viral kmer counts and were included in the final data matrix. One accession was removed from analysis because coverage was highly erratic across the viral genomes, contrary to the pattern otherwise seen (see below), and had a titer approximately 10X higher than any other accession. These characteristics suggested the possibility that the source viral sequence was degraded and/or PCR artifacts were present. Twelve accessions were removed as duplicates, evident from identical file sizes, sample names, and kmer counts. The final data set included 168 accessions (Supplemental File 2). The median counts in 250-bp windows of each virus genome are shown (Figure 1, Supplemental File 4). The values are presented as a heat map, and scaled to a maximum of one within each accession. The relative viral titer (the ratio of all virus kmers to all honey bee kmers) is represented as a green-to-red color scale at the top of the figure, and is also shown in Figure 2 as a function of the proportion of VDVspecific kmers. The relative viral titer spanned approximately six orders of magnitude (Fig. 1), although these bounds are affected by lower and upper thresholds imposed methodologically (see Materials and Methods). 5
All but one of the 168 accessions (SRR798764) had “DWV-specific” kmers, whereas most had few or no “VDV-specific” kmers unless they were recombinant (see below). Allowing that some kmers differentiating the single accessions of DWV and VDV used in this study may not differentiate all such pairs, or may arise at low frequency through de novo mutation, this approach may be unable to differentiate very low levels of VDV from a background of high DWV. Graphical detection of recombinants Visual examination of kmer abundance windows clearly revealed known examples of recombinant strains (Ryabov et al. 2014) and suggested new ones (red asterisks at the bottom of Figure 1). Most accessions with appreciable levels of VDV kmers were from studies conducted by Ryabov et al. (2014) and Ryabov et al. (2016) (accessions ERR413293 to ERR413305 and ERR528750 to ERR528754, respectively). The former BioProject was already known to contain a predominantly recombinant viral type, whereas the latter had not been previously described as containing recombinant DWV-VDV to my knowledge. Another pair of recombinants with a different coverage pattern, indicating an independent recombination event, is represented by accessions ERR274419 and ERR274423, which are the first two columns of Figure 1. In these examples, all samples of each BioProject have comparable kmer profiles. In contrast, BioProject PRJNA243651 contained only a single sample with an apparent recombinant profile (SRR1255541), despite frequent co-infection of samples with both viruses. This visual approach assesses only predominant patterns and does not imply that a low-frequency recombinant, or strain diversity in general, is absent. Assembly of accession SRR1255541 recovered 7 contigs (Supplemental File 5). The two longest contigs could be manually overlapped based on an identical 86-bp region at their ends, creating an approximately full-length genome. The capsid region of the recombinant was DWV-derived, with a recombination breakpoint at approximately position 4621 (Table 1). Assembly of accession ERR528754 also recovered an approximately full-length genome (Supplemental File 6), with recombination breakpoints at 1738 and 5258, with the intervening capsid sequence VDV-derived. This accession was chosen to represent the general pattern seen in BioProject PRJEB6511 (Ryabov et al. 2016) because it had the highest apparent titer. Assembly of accessions ERR274419 and ERR274423 did not produce any contigs >200 bp, however, other than the phiX control sequence used to calibrate Illumina sequence runs, and a 16S fragment of Acinetobacter, a known honey bee commensal (Kim et al. 2014). This result was expected because that study targeted microRNAs and due to size-selection criteria would not be expected to generate long contigs. Other than the presence of recombinants, the most striking pattern of variation in kmer abundance is the relative enrichment of 3’ sequence. Some projects produced relatively uniform coverage of DWV, whereas others were much more strongly 3’ biased. These differences are likely attributable to RNA selection/conversion method, given the consistency within, but variation among, BioProjects. BioProjects are usually submitted by single groups of investigators and typically share the same laboratory methodology and sequencing strategy. Fragmentation of cDNA during library preparation is one source of
6
3’ bias, whereas fragmentation of RNA produces more even coverage overall but low coverage of ends of molecules (Wang et al. 2009). Relative viral titer As noted above, VDV-matched kmers rarely achieved high abundance in A. mellifera unless in the form of a DWV-VDV recombinant. The proportion of DWV-specific kmers in recombinant samples was 0.695 (SD 0.148), whereas it was 0.979 (SD 0.085) when not classified as such. However, high viral titer did not require the presence of VDV-like sequence, and only some recombinants reached high titer. As demonstrated by (Ryabov et al. 2014), the transmission mode greatly influences recombinant titer, which likely explains the lack of a simple relation between titer and proportion of VDV-specific kmers (Fig. 2). NMDS of kmer distances An NMDS plot of kmer distance among SRA accessions and the reference genomes reveals three distinct “genotypic” clusters (Fig. 3). The largest cluster, shaded in blue, appears to represent “typical” DWV that has spread pandemically (Wilfert et al. 2016). Another cluster, shaded in pink, encompasses the DWVVDV recombinant type examined by Ryabov et al. 2014 and Moore et al. 2011. A third cluster is composed of accessions from BioProject PRJNA243651 mentioned above, which examined a large number of tissues in both nurses and foragers. A consensus sequence produced by mapping representative accession SRA1255326 to the DWV reference resulted in a model that differed at 2.5% of nucleotides and 0.9% of predicted amino acids of the polyprotein (Supplemental File 7). (Mapping of whole reads was used rather than assembly to minimize any impact of VDV kmers on the result.) A neighbor-joining nucleotide tree placed this consensus as sister to other DWV whole-genome accessions in NCBI, when rooted with VDV (Fig. 4). The study design of BioProject PRJNA243651 also allowed an examination of titer by tissue and by adult developmental stage. While there is no apparent enrichment of DWV in any tissue, there appears to be a consistent difference in titer between nurses and foragers (Fig. 5). Treating each accession is an independent measure, the difference in mean log2-transformed titer was highly significant, as was the non-parametric Mann-Whitney U-test, which does not require an assumption of normality (Supplemental File 8). However, as the different tissues all derived from the same three foragers and three nurses, accessions from different tissues of the same individual should be considered pseudoreplicates and the apparent difference only suggestive until further evaluated. A significant difference in DWV level between nurses and foragers was not found in a previous study (Jefferson et al. 2013). Discussion Kmer counting provided a useful metric of the relative abundance of DWV-like sequence in metagenomic sequence, at low computational cost, allowing many data sets to be analyzed relatively quickly. This metric is readily transferable, in the sense that only a list of kmers and their classification needs to be provided (see Supplemental File 3). This ‘dictionary’ can also be revised or expanded as new information becomes available, without regenerating the kmer counts themselves. Given the large amount of sequence upon which titer estimates can be based, dispersed throughout the genomes of host and 7
pathogen, kmer counting may perform better than qPCR, which is usually based on one or a few marker sequences that are susceptible to base mismatches in the primer sites. Without access to the sequenced samples, however, a direct comparison cannot yet be made between the two approaches. Additional applications include graphical assessments of coverage, including detection of recombinants, and clustering analysis based on kmer-derived distance metrics. These applications are aided by the fact that kmers themselves are unambiguous (although their source may be), whereas mapping of metagenomic reads directly to reference genomes often entails substantial ambiguity when similar references are used, akin to that arising from alternatively spliced isoforms and closely related paralogs (Trapnell et al. 2012). Kmer-based metagenomic assembly is similarly challenging given the likely occurrence of identical kmers in different species. Current metagenomic assembly programs appeal to the additional information in the relative abundance of kmers to disentangle such mixtures (Scholz et al. 2012; Nielsen et al. 2014). While subject to more ambiguity, assembly ultimately provides more biological information and can follow as a second tier of analysis for data sets with kmer compositions of interest, as can more specialized approaches developed for the detection of mosaics (e.g., MosaicSolver (Wood et al. 2014)). For the recombinants identified here, graph-based assembly was in fact able to reconstruct approximately full-length recombinant viral genomes from two data sets, albeit requiring manual adjustment for one of the data sets at the break point. Recombinants should of course always be confirmed at the molecular level with an appropriate assay, if necessary for the question at hand. NMDS analysis of kmer counts identified three genotypic clusters of DWV in these data sets. One large cluster representing a relatively homogenous DWV type strongly contrasted with DWV-VDV recombinants, particularly the numerous recombinant accessions from Warwick, UK (Moore et al. 2011, Ryabov et al. 2014, Ryabov et al. 2016). A third distinct genotypic cluster was made up of accessions from a single BioProject, PRJNA243651. Phylogenetic analysis of a consensus sequence produced by mapping a representative accession (SRR1255326) to the reference genome confirmed that this DWV pool is an outlier relative to other DWV genome accessions (Fig. 4). As the NMDS analysis reduces multivariate distances to two axes, less strongly differentiated genotypic clusters may also exist that are not detectable by this means, but could be explored with UPGMA trees, k-means clustering, and the like. Kmer abundance declines from the 3’ end to the 5’ end of the viral genomes, but in two distinct patterns (Fig. 1). These two patterns conform to known differences in RNA vs. cDNA fragment recovery during library preparation (Wang et al. 2009). Thus, RNA-fragmentation is clearly preferable for addressing genomic polymorphism in these and other RNA viruses, particularly for ascertainment of recombinants. Kmer analysis identified additional DWV-VDV recombinants. The break points were consistent with known recombination hotspots in picornavirus evolution (Heath et al. 2006) that avoid disruption of the capsid-encoding region (encompassing positions 1795-4621 of NC_004830.2, based on the annotation of (de Miranda and Genersch 2010)). However, three distinct breakpoint patterns were observed in separate BioProjects, two of which included VDV-derived capsid and one of which had a DWV-derived capsid (Figure 1). Recombinants between DWV and VDV have become a focus of recent research because of evidence that they can be more prevalent (Zioni et al. 2011; Wang et al. 2013; Dalmon et al. 2017) or 8
more virulent than parental types (Moore et al. 2011, Ryabov et al. 2014). The virulence of recombinants depended on transmission by Varroa or by injection (an analog of Varroa parasitism) in Ryabov et al. (2014), however. Loss of polymorphism in the recombinant region was observed by both Wang et al. (2013) and Ryabov et al. (2014), consistent with a selective bottleneck. On the other hand, non-targeted amplification and detection of recombinants in sequence data is likely enhanced when such recombinants predominate, creating a potentially artificial positive correlation between detection and titer. Furthermore, the existence of hotspots for recombination (either due to genome structure or recombinant fitness [Heath et al. 2006]) means that commonalities among independently derived DWVVDV recombinants are not themselves evidence of evolutionary convergence on a highly virulent genotypic class. There is also limited basis for assessing whether a particular recombinant lineage is spreading, despite the high level of Varroa parasitism endured by managed honey bees globally (cf. Wilfert et al. 2016). The overall pattern in this study is that independentally-derived recombinants constitute the minority of infections globally, even if locally predominant. Of course, certain recombinant haplotypes may indeed be spreading but such spread is not evidenced in an eclectic pool of public data. Repeated experimental and epidemiological analysis should ultimately settle this hypothesis. In this study, single accessions of each virus were used to define clade-specific kmers. As DWV and VDV are about 84% similar at the nucleotide level (Ongus et al. 2014), results regarding the relative titer of each should be insensitive to misclassifications of a fraction of kmers as unique when they are actually shared or rare variants. However, genotypic clustering or phylogenetics based on kmer distance estimation should ultimately be based on a wide representation of total viral diversity. For example, Mordecai and colleagues (Mordecai 2015, Mordecai 2016) have recently offered a new terminology for DWV and VDV (“DWV-A” and “DWV-B”, respectively) and identified a third variant “DWV-C” that is sister to the two. To assess the abundance of this third variant in SRA data, I identified kmers unique to the EBI accession CEND01000001 (the DWV-C reference genome), DWV, or VDV. To maximize sensitivity, I collected counts for all unique kmers of each of three reference genomes, not the random subsample used in the DWV/VDV analysis, and I included all 398 unique SRA accessions regardless of total viral counts. The median counts per 250-bp window are again shown as a heat map (Supplemental File 9), but with the background color inverted to black to better visualize very low counts. No sample had appreciable coverage of DWV-C kmers in contiguous 250-bp windows. The relative abundances of DWVC-specific and DWV-specific kmers, expressed as the average of median counts across all genomic windows of the two viruses when both were detected, ranged from 3.73E-6 to 0.38 (mean 0.0065) and DWV-C-specific kmers were never detected in the absence of the other two virus types (Supplemental File 10). These results imply that DWV-C is not widely distributed in honey bees at levels detectable by this kmer-counting approach. While the present study focused on DWV lineages as a demonstration of principle, this approach can quantify the relative titer of any identifiable member of the bee microbiome, although few are expected to achieve titers as consistently high as DWV. As with traditional RNA-Seq, specific honey-bee transcripts can also be examined, such as those underlying nutritional status (e.g., vitellogenin), pesticide detoxification (e.g., cytochrome p450s), and immunity (e.g., antimicrobial peptides). Use of highthroughput sequencing as a first line of analysis for these diverse targets requires both high depth and 9
highly parallel processing of samples, whether at the level of tissues, individuals, or colonies. With multiplex labelling schemes allowing hundreds of samples to be combined in single runs (e.g. Kozich et al. 2013) and the depths of runs exceeding 109 sequences (e.g., the Illumina NovaSeq platform), an efficiently designed monitoring scheme can be cost-effective and processed with modest computing infrastructure. Such monitoring could serve several stakeholder interests, not least of which is the beekeeper community providing essential pollination services to agriculture. Conclusion This study confirmed in a large sample from multiple continents that VDV has a much lower incidence and titer than DWV in honey bees. Distinct patterns of recombinants between DWV and VDV were detectable by kmer counting, including variants with VDV-derived or DWV-derived capsids, and were corroborated by assembly of the flagged accessions. High abundance of VDV kmers was largely restricted to recombinant forms. Non-metric multidimensional scaling identified genotypic clusters among DWV isolates, but strong variation in genomic coverage was clearly related to sequencing methodologies. The data highlight the possibilities of high-throughput sequencing to monitor viral polymorphisms and statistically test biological predictors of titer, and point to the need for consistent methodologies and sampling schemes. Acknowledgments This work was supported by the Trust Species and Habitats Branch, Fort Collins Science Center, U. S. Geological Survey. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the US Government.
References Benaets, K., Van Geystelen, A., Cardoen, D., De Smet, L., de Graaf, D., Schoofs, L., et al. (2017). Covert deformed wing virus infections have long-term deleterious effects on honeybee foraging and survival. Proc. Roy. Soc. B, doi:10.1098/rspb.2016.2149. Bowen-Walker, P., Martin, S., Gunn, A. (1999). The transmission of deformed wing virus between honeybees (Apis mellifera L.) by the ectoparasitic mite Varroa jacobsoni Oud. J. Invert. Pathol. 73, 101– 106. Carrillo-Tripp, J., Dolezal, A.G., Goblirsch, M.J., Miller, W.A., Toth, A.L., Bonning, B.C. (2016). In vivo and in vitro infection dynamics of honey bee viruses. Scientific Reports 6, doi:10.1038/srep22265. Chejanovsky, N., Ophir, R., Schwager, M.S., Slabezki, Y., Grossman, S., Cox-Foster, D. (2014). Characterization of viral siRNA populations in honey bee colony collapse disorder. Virology 454, 176–183. Cornman, R.S., Tarpy, D.R., Chen, Y., Jeffreys, L., Lopez, D., Pettis, J.S., et al. (2012). Pathogen webs in collapsing honey bee colonies. PloS One 7, e43562. 10
Cornman, R.S., Boncristiani, H., Dainat, B., Chen, Y., Weaver, D., Evans, J.D., et al. (2013). Populationgenomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing. BMC Genomics 14, 154. Dainat, B., Evans, J.D., Chen, Y.P., Gauthier, L., Neumann, P. (2012). Dead or alive: deformed wing virus and Varroa destructor reduce the life span of winter honeybees. Appl. Env. Microbiol. 78, 981–987. Dalmon, A., Desbiez, C., Coulon, M., Thomasson, M., Le Conte, Y., Alaux, C., et al. (2017). Evidence for positive selection and recombination hotspots in Deformed wing virus (DWV). Scientific Reports 7, doi:10.1038/srep41045. De Miranda, J.R., Genersch, E. (2010). Deformed wing virus. J. Invert. Pathol. 103, S48–S61. Deorowicz, S., Kokot, M., Grabowski, S., Debudaj-Grabysz, A. (2015). KMC 2: Fast and resource-frugal kmer counting. Bioinformatics 31, 1569–1576. Di Prisco, G., Annoscia, D., Margiotta, M., Ferrara, R., Varricchio, P., Zanni, V., et al. (2016). A mutualistic symbiosis between a parasitic mite and a pathogenic virus undermines honey bee immunity and health. Proc. Nat’l. Acad. Sci. USA 113:3203-3208. Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., et al. (2014). Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics 15, 86. Engel, P., Kwong, W.K., McFrederick, Q., Anderson, K.E., Barribeau, S.M., Chandler, J.A., et al. 2016. The bee microbiome: impact on bee health and model for evolution and ecology of host-microbe interactions. mBio 7: e02164-15. Fürst, M., McMahon, D.P., Osborne, J., Paxton, R., Brown, M. (2014). Disease associations between honeybees and bumblebees as a threat to wild pollinators. Nature 506, 364–366. Genersch, E., Yue, C., Fries, I., de Miranda, J. R. (2006). Detection of deformed wing virus, a honey bee viral pathogen, in bumble bees (Bombus terrestris and Bombus pascuorum) with wing deformities. J. Invert. Pathol. 91, 61–63. Goulson, D., Nicholls, E., Botías, C., Rotheray, E.L. (2015). Bee declines driven by combined stress from parasites, pesticides, and lack of flowers. Science 347, 1255957. Hammer, O, Harper, D. A. T., Ryan, P. D. (2001) PAST: Paleontological statistics software package for education and data analysis. Palaeontologia Electronica 4.
11
Heath, L., van der Walt, E., Varsani, A., Martin, D.P. (2006). Recombination patterns in aphthoviruses mirror those found in other picornaviruses. J. Virol. 80, 11827–11832. Highfield, A.C., El Nagar, A., Mackinder, L.C., Laure, M.-L.N., Hall, M.J., Martin, S.J., et al. (2009). Deformed wing virus implicated in overwintering honeybee colony losses. Appl. Env. Microbiol. 75, 7212–7220. Jefferson, J.M., Dolstad, H.A., Sivalingam, M.D., Snow, J.W. (2013). Barrier immune effectors are maintained during transition from nurse to forager in the honey bee. PloS One 8, e54097. Kim, P.S., Shin, N.-R., Kim, J.Y., Yun, J.-H., Hyun, D.-W., Bae, J.-W. (2014). Acinetobacter apis sp. nov., isolated from the intestinal tract of a honey bee, Apis mellifera. J. Microbiol. 52, 639–645. Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K., Schloss, P.D. (2013). Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl. Env. Microbiol. 79, 5112–5120. Lamp, B., Url, A., Seitz, K., Eichhorn, J., Riedel, C., Sinn, L.J., et al. (2016). Construction and rescue of a molecular clone of deformed wing virus (DWV). PloS One 11, e0164639. Magoč, T., and Salzberg, S.L. (2011). FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963. Martin, D.P., Murrell, B., Golden, M., Khoosal, A., Muhire, B. (2015). RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evolution 1, vev003. Martin, S.J., Highfield, A.C., Brettell, L., Villalobos, E.M., Budge, G.E., Powell, M., et al. (2012). Global honey bee viral landscape altered by a parasitic mite. Science 336, 1304–1306. McMahon, D.P., Fürst, M.A., Caspar, J., Theodorou, P., Brown, M.J., Paxton, R.J. (2015). A sting in the spit: widespread cross-infection of multiple RNA viruses across wild and managed bees. J. Animal Ecol. 84, 615–624. McMahon, D.P., Natsopoulou, M.E., Doublet, V., Fürst, M., Weging, S., Brown, M.J., et al. (2016). Elevated virulence of an emerging viral genotype as a driver of honeybee loss. Proc. Roy. Soc. B, doi: 10.1098/rspb.2016.0811. McMenamin, A.J., and Genersch, E. (2015). Honey bee colony losses and associated viruses. Curr. Opin. Insect Sci. 8, 121–129. Moore, J., Jironkin, A., Chandler, D., Burroughs, N., Evans, D.J., Ryabov, E.V. (2011). Recombinants between Deformed wing virus and Varroa destructor virus-1 may prevail in Varroa destructor-infested honeybee colonies. J. Gen. Virol. 92, 156–161. 12
Mordecai, G.J., Brettell, L.E., Martin, S.J., Dixon, D., Jones, I.M., Schroeder, D.C. (2015). Superinfection exclusion and the long-term survival of honey bees in Varroa-infested colonies. ISME J. 10, 1182–1191. Mordecai, G.J., Wilfert, L., Martin, S.J., Jones, I.M., Schroeder, D.C. (2016). Diversity in a honey bee pathogen: first report of a third master variant of the Deformed Wing Virus quasispecies. ISME J. 10, 1264–1273. Nielsen, H.B., Almeida, M., Juncker, A.S., Rasmussen, S., Li, J., Sunagawa, S., et al. (2014). Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotech. 32, 822–828. Ongus, J.R., Peters, D., Bonmatin, J.-M., Bengsch, E., Vlak, J.M., van Oers, M.M. (2004). Complete sequence of a picorna-like virus of the genus Iflavirus replicating in the mite Varroa destructor. J. Gen. Virol. 85, 3747–3755. Potts, S.G., Imperatriz-Fonseca, V., Ngo, H.T., Aizen, M.A., Biesmeijer, J.C., Breeze, T.D., et al. (2016). Safeguarding pollinators and their values to human well-being. Nature 540, 220–229. Ryabov, E.V., Wood, G.R., Fannon, J.M., Moore, J.D., Bull, J.C., Chandler, D., et al. (2014). A virulent strain of deformed wing virus (DWV) of honeybees (Apis mellifera) prevails after Varroa destructor-mediated, or in vitro, transmission. PLoS Pathogens 10, e1004230. Ryabov, E.V., Fannon, J.M., Moore, J.D., Wood, G.R., Evans, D.J. (2016). The Iflaviruses Sacbrood virus and Deformed wing virus evoke different transcriptional responses in the honeybee which may facilitate their horizontal or vertical transmission. PeerJ 4, e1591. Scholz, M.B., Lo, C.-C., Chain, P.S. (2012). Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr. Opin. Biotech. 23, 9–15. Schwarz, R.S., Moran, N.A., Evans, J.D. (2016). Early gut colonizers shape parasite susceptibility and microbiota composition in honey bee workers. Proc. Nat’l. Acad. Sci. USA 113, 9345–9350. Sims, G.E., Jun, S.-R., Wu, G.A., Kim, S.-H. (2009). Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Nat’l. Acad. Sci. USA 106, 2677–2682. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., et al. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocols 7, 562– 578.
13
Wang, H., Xie, J., Shreeve, T.G., Ma, J., Pallett, D.W., King, L.A., et al. (2013). Sequence recombination and conservation of Varroa destructor virus-1 and deformed wing virus in field collected honey bees (Apis mellifera). PloS One 8, e74508. Wang, Z., Gerstein, M., Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63. Wilfert, L., Long, G., Leggett, H., Schmid-Hempel, P., Butlin, R., Martin, S., et al. (2016). Deformed wing virus is a recent global epidemic in honeybees driven by Varroa mites. Science 351, 594–597. Wood, G.R., Ryabov, E.V., Fannon, J.M., Moore, J.D., Evans, D.J., Burroughs, N. (2014). MosaicSolver: a tool for determining recombinants of viral genomes from pileup data. Nucl. Acids Res. 42, e123. Yang, X., Cox-Foster, D.L. (2005). Impact of an ectoparasite on the immunity and pathology of an invertebrate: evidence for host immunosuppression and viral amplification. Proc. Nat’l. Acad. Sci. USA 102, 7470–7475. Zioni, N., Soroker, V., Chejanovsky, N. (2011). Replication of Varroa destructor virus 1 (VDV-1) and a Varroa destructor virus 1–deformed wing virus recombinant (VDV-1–DWV) in the head of the honey bee. Virology 417, 106–112.
Figure Legends Figure 1. A heat map of viral kmer abundance in 168 short-read data sets of Apis mellifera RNA. Each accession is a column of the matrix, and each row represents either a summary statistic or the median count of kmers in a 250-bp genomic window of each virus. The first row is color-scaled from green (lowest value) to red (highest value) to indicate relative proportion of viral kmers to honey bee kmers. The second row is colored alternately grey or black to group accessions by BioProject, and this pattern is repeated at the bottom of the matrix to aid differentiation of related SRA accessions. DWV kmers are represented as blue and VDV kmers are represented as red, with the intensity of color proportional to the relative abundance of kmers in that window. The total viral kmer abundance (unique plus shared) is represented by purple coloration. The maximum value within 250-bp windows is scaled to one within each accession, so that the relative coverage patterns can be discerned independent of total titer. A schematic of the genome organization of each virus is indicated, with blue representing non-coding sequence, green representing non-structural protein-coding sequence, and yellow representing capsid protein-coding sequence. Coverage patterns suggestive of recombination are marked with red asterisks at the bottom of the matrix. The data underlying the heat map is provided in Supplemental File 4.
14
Figure 2. The proportion of unique VDV kmers in each short-read data set, plotted as a function of hostnormalized abundance of DWV- or VDV-matching kmers. Only two accessions had a higher proportion of VDV kmers (labelled). Figure 3. A non-metric multidimensional scaling (NMDS) plot of relative titer versus proportion of VDV kmers. The inset shows the NMDS “stress plot”, indicating little distortion of actual multivariate distances when reduced to two dimensions. Figure 4. A neighbor-joining phylogeny of whole-genome accessions of DWV, as well as the consensus sequence inferred for sample SRR1255326. Bootstrap values are based on 1,000 bootstrap replicates. Figure 5. Relative titer of DWV kmers in nurse and forager bees in BioProject PRJNA243651. For each tissue, N indicates that it derives from a nurse and F indicates that it derives from a forager. The average of all tissue replicates (pseudoreplicates of the three individuals performing each activity) is also shown. Supplemental File 1. A summary of program settings and pseudocode used in the bioinformatics analysis.
Supplemental File 2. List of SRA accessions used in this study, with metadata. Supplemental File 3. Dictionary of unique and shared kmers of VDV and DWV tabulated with kmc (Deorowicz et al. 2015) based on the reference accessions used. Supplemental File 4. Raw and transformed kmer count data. Supplemental File 5. DWV-matching contigs produced by initial de novo assembly of SRR125541, and final genome model produced by manual overlap of two contigs. Supplemental File 6. The longest DWV-matching contig produced by assembly of SRA accession ERR528754. Supplemental File 7. Consensus sequences generated by mapping reads of SRR1255326 to DWV accession NC_004830.2. Both the genome consensus and the conceptual translation of the polyprotein region are provided. “X” in the conceptual translation indicates an ambiguity in the original reference accession. Supplemental File 8. Statistical tests of log2-transformed titer by caste, estimated from SRA accessions of BioProject PRJNA243651. Output is from the program PAST3 (Hammer et al. 2001). Supplemental File 9. Heat map scaled to median counts of virus-specific kmers of DWV-C and DWV, respectively. The structure of the figure is similar to that of Figure 1, with each virus being represented by forty 250-bp windows from 5’ to 3’, and each column representing a single SRA accession from which kmer counts were obtained. DWV-C kmers are represented in the first set of forty rows, in green, 15
whereas DWV kmers are represented in the second set of forty rows, in blue. The values in each column are scaled to a maximum of one. Values underlying the heat map are provided in Supplemental File 10. Supplemental File 10. Median counts of virus-specific kmers of DWV-C and DWV, respectively, in forty 250-bp windows.
16
Tables Table 1. Statistical tests of recombination and estimated break points for two DWV-like genome sequences reconstructed by de novo assembly. The six recombination tests listed were implemented in RDP4 (Martin et al. 2015). Sequence SRR1255541 contig_859 ERR528745 manually overlapped contig
Begin (alignment)
End (alignment)
Begin (recombinant)
End (recombinant)
4672
10135
4621
10080
RDP 9.04E177
1738
5262
1734
5258
4.52E177
1
GENECONV 3.88E-165
Bootscan 4.02E175
Maxchi 1.62E69
3.56E-171
2.99E175
2.56E56
Chimaera 2.67E-61
3Seq 2.22E16
9.64E-58
NS
Fig. 1
Fig. 2
1
Fig. 3
Fig. 4
2
Fig. 5
3
Graphical abstract
4
Highlights Examination of DWV and VDV kmers in honey bee sequence data identified known and novel recombinants VDV-like sequence was found at high titer only in recombinants Viral kmers provide an efficient means of estimating relative titer and identifying novel genotypes
5