Predicting tick-borne encephalitis using Google Trends

Predicting tick-borne encephalitis using Google Trends

Ticks and Tick-borne Diseases xxx (xxxx) xxxx Contents lists available at ScienceDirect Ticks and Tick-borne Diseases journal homepage: www.elsevier...

283KB Sizes 1 Downloads 78 Views

Ticks and Tick-borne Diseases xxx (xxxx) xxxx

Contents lists available at ScienceDirect

Ticks and Tick-borne Diseases journal homepage: www.elsevier.com/locate/ttbdis

Short communication

Predicting tick-borne encephalitis using Google Trends Mihály Sulyoka,⁎, Hardy Richterb, Zita Sulyokc, Máté Kapitány-Fövényd,e, Mark D. Walkerf a

Department of Pathology and Neuropathology, Eberhard Karls University of Tübingen Liebermeisterstrasse 8, 72076, Tübingen, Germany Department of Neurology & Stroke, and Hertie-Institute for Clinical Brain Research, Eberhard Karls University of Tübingen, Hoppe-Seyler-Strasse 3, 72076, Tübingen, Germany c Institute of Tropical Medicine, Eberhard Karls University, Wilhelmstraße 27, 72074, Tübingen, Germany d Faculty of Health Sciences, Semmelweis University, Vas Str. 17, 1088, Budapest, Hungary e Nyírő Gyula National Institute of Psychiatry and Addictions, Budapest, Hungary f Department of the Natural and Built Environment, Sheffield Hallam University, Sheffield, S1 1WB, United Kingdom b

A R T I C LE I N FO

A B S T R A C T

Keywords: Web browser Tick-borne encephalitis Forecasting ARIMA

Data generated through public Internet searching offers a promising alternative source of information for monitoring and forecasting of infectious disease. Here future cases of tick-borne encephalitis (TBE) were predicted using traditional weekly case reports, both with and without Google Trends data (GTD). Data on the weekly number of acute, confirmed TBE cases in Germany were obtained from the Robert Koch Institute. Data relating to the volume of Internet searching on TBE was downloaded from the Google Trends website. Data were split into training and validation parts. A SARIMA (0,1,1) (1,1,1) [52] model was used to describe the weekly TBE case number time series. Google Trends Data was used as an external regressor in a second, as optimal identified SARIMA (4,1,1) (1,1,1) [52] model. Predictions for the number of future cases were made with both models and compared with the validation dataset. GTD showed a significant correlation with reported weekly case numbers of TBE (p < 0.0001). A comparison of forecasted values with reported ones resulted in an RMSE (residual mean squared error) of 0.71 for the model without Google search values, and an RMSE of 0.70 for the Google Trends values enhanced model. However, difference between predictive performances was not significant (Diebold Mariano test, p-value = 0.14).

1. Introduction Tick-borne encephalitis virus (TBEV), family Flaviviridae, genus flavivirus, is the causative agent of tick-borne encephalitis (TBE) (Grard et al., 2007; Mansfield et al., 2009; Simmonds et al., 2017). Infection can result in acute meningoencephalitis, sometimes with myelitis (Reviewed: Lindquist and Vapalahti, 2008; Bogovic and Strle, 2015). The vector of the European subtype is the Ixodes ricinus tick (Mansfield et al., 2009). The epidemiology of this tick-borne disease is transforming rapidly; TBE incidence has increased in Europe in the last decades (Süss, 2008, 2011; Beauté et al., 2018), as possibly has its geographical range (Bogovic and Strle, 2015; Dekker et al., 2019). Thus, novel methods to both monitor and predict TBE incidence are highly desirable to support public health interventions. An interesting alternative to modelling with traditional data is to use data generated through online Internet activity. The volume of online searching has been found to be strongly correlated with disease incidence (Cook et al., 2011; Gluskin et al., 2014; Alicino et al., 2015;



Teng et al., 2017; Kapitány-Fövény et al., 2019). There was initially much research interest in these supplementary methods of disease monitoring and prediction. However, previous studies obtained sometimes conflicting results; sole reliance on such data could be misleading, as demonstrated with Google Flu Trends (Lazer et al., 2014). Nevertheless, combining online activity-based data with traditional epidemiological methods may still offer valuable insights in infectious disease prediction (Teng et al., 2017; Kapitöny-Fövény et al., 2019). Such alternative techniques could be particularly useful in the study of tickborne diseases, where traditional epidemiological forecasting faces challenges due to rapidly changing environmental factors. Recently Kapitány-Fövény et al. (2019) successfully used Internet search data to model Lyme borreliosis. Can this be done for other tickborne diseases? TBE provides a good comparison. Diagnosis of Lyme borreliosis may be protracted due to the delayed clinical onset or identification of late stage symptoms; thus, possibly affecting correlations. Internet searching on TBE is likely to reflect disease incidence more precisely; Lyme borreliosis is often the subject of media reports.

Corresponding author. E-mail address: [email protected] (M. Sulyok).

https://doi.org/10.1016/j.ttbdis.2019.101306 Received 20 May 2019; Received in revised form 9 September 2019; Accepted 21 September 2019 1877-959X/ © 2019 Elsevier GmbH. All rights reserved.

Please cite this article as: Mihály Sulyok, et al., Ticks and Tick-borne Diseases, https://doi.org/10.1016/j.ttbdis.2019.101306

Ticks and Tick-borne Diseases xxx (xxxx) xxxx

M. Sulyok, et al.

differentiation (4), i.e. SARIMA (4,1,1)(1,1,1) [52]. The AIC of the 'without GTD ' model was 308.12, for the 'GTD enhanced' model it was 314.41. Thus, AIC preferred the model without GTD. Detailed statistical results including residual diagnostics are shown in the Supplementary File. Forecasting using both models produced similar values; Fig. 2 shows these graphically along with the back-transformed values of weekly TBE case numbers. The closeness of the two curves illustrates the similarity in predictive performance. The forecasting accuracy of the first model ('without GTD') for the validation dataset was as follows: ME 0.18; RMSE 0.71. These figures are slightly less in case of the 'GTD enhanced' model, which indicates slightly better prediction (ME 0.16; RMSE 0.70) with somewhat narrower confidence bands (Supplementary File). The two-sided Diebold Mariano test showed no significant difference (DM = 1.48; p-value = 0.14).

Additionally, TBE is a potentially more serious condition (Lindquist and Vapalahti, 2008). Thus, further study on TBE; with the aim of performing modelling, both with, and without Google Trends Data (GTD) is pertinent. 2. Methods 2.1. Data Weekly case numbers of TBE in Germany from the 20th of April 2014 to the 24th of March 2019 were downloaded from the Robert Koch Institute website (SurvStat@rki 2.0; https://survstat.rki.de/ Content/Query/Create.aspx/en, accessed: 11th Apr 2019) using’ FSME’ as disease keyword (‘Frühsommer Meningoenzephalitis’ is the usual German term for TBE) and’ Jahr und Meldewoche/Year and Week of notification’ as time unit. These reported case numbers represented the laboratory confirmed (Robert Koch Institut, 2019) cases of acute TBEV infections in Germany. Data relating to Internet searches was downloaded from Google Trends on the 15th of April 2019 (https://trends.google.com/trends/) searching under 'FSME' in 'Germany' with the timespan category as 'last five years'. Values represented the relative weekly search activity (with the maximum value of 100). Both data sets (the reported case numbers and the Google Trends data-GTD) were split into training and validation parts. Training data included data from the 20th of April 2014 to 1st of April 2018, while the validation dataset contained data from the 2nd of April 2018 to the 24th of March 2019.

4. Discussion Can data on Internet searching be used to increase forecasting accuracy for TBE? This study extends the research of Kapitány-Fövény et al. (2019) on Lyme borreliosis, thus allowing comparison and providing valuable information on these techniques for this tick-borne condition. As such this is the first study evaluating the predictive performance of models based upon Internet searching data specifically for TBE. Predictions of the 'GTD enhanced' model were better than models using exclusively traditional data ('without GTD'). However, the difference between predictive performances was not statistically significant. To note, the validation data showed an abrupt sharp increase in case numbers compared to the training data. Apparently the models have captured the seasonality of TBE, however have underestimated the predicted number of cases for the majority of the forecasted weeks. The identified significant contemporaneous correlation contrasts with the only previous such study where no significant correlation was identified between TBE case numbers and GTD (Walker, 2018). TBE has complex epidemiology, with incidence rates being influenced by multiple factors. Incidence may vary greatly annually and regionally (Beauté et al., 2018). Such temporal and spatial variations may be due to weather, tick and host abundance, and general ecological factors. Most published models utilise meteorological and ecological data (Daniel et al., 2018; Jaenson et al., 2018). However, human factors appear to play a particularly important role in determining TBE incidence also (Sumilo et al., 2008; Randolph and EDEN-TBD sub-project team, 2010; Godfrey and Randolph, 2011). The study by Kapitány-Fövény et al. (2019) on Lyme borreliosis used a similar methodology, and yielded similar results. Although GTD did not improve the forecasting performance for Lyme borreliosis significantly, the models used (with and without GTD) did provide a more accurate prediction than was seen here in this study on TBE. Media coverage can influence GTD based models (Butler, 2013; Lazer et al., 2014). The specificity of the searching term used here, ‘FSME’, probably resulted relatively focused data. Our choice to integrate traditional data hopefully also lessened this problem. The amount of data utilized in this study was limited. Over a five-year time span, GT provides data at only a weekly resolution. Even less data can be obtained when examining longer time spans, as only monthly Internet searching data is provided. Prediction and modelling of zoonotic disease is challenging, thus identifying alternative methods or data sources may be of great value. GTD provides near-real time data which compared to traditional methods would allow a near real-time monitoring of disease activity (Althouse et al., 2011). Thus, combining traditional field techniques with Internet based data could be a promising avenue for future investigation on tick-borne conditions. For example research by Bogdziewicz and Szymkowiak (2016) found a correlation between acorn masting and Google searching on ticks over a two year period.

2.2. Statistics Initially, Kendall Tau-B correlation between the Google Trends data and the weekly case numbers was assessed. Then both the weekly case numbers and the Google Trends values were inverse hyperbolic sine transformed (as skewed numeric data with 0 values) to approximate normal distribution (Zhang et al., 2000). Time series were modelled using a seasonal autoregressive integrated moving average model (SARIMA) as described previously (Kapitány-Fövény et al., 2019). A second model was established with the transformed GTD entered as an external regressor. Forecasting accuracies were characterized using RMSE (root-mean-square error) and ME (mean error). Predictions were compared with the two-sided Diebold-Mariano test (Diebold and Mariano, 2002). All statistical analyses were performed under R version 3.4.3 (R Core Team, 2017), using the forecast package version 8.4 (Hyndman and Khandakar, 2008; Hyndman et al., 2019). The script and the dataset are available online at: https://github.com/msulyok/GoogleTBEforecasting, detailed results are provided as an online Supplementary File. 3. Results Both the GTD and the reported case numbers showed strong seasonality. The reported case numbers peaked in mid-summer, from the 28th of June to the 10th of July (median calendar week: 26; IQR: 26–27). The GTD peaked earlier in each transmission season, from the 22th of April to the 22th of June (median calendar week: 23; IQR 23–24). The median lag between the two peaks was 3 weeks (IQR 2–4). The GTD showed a significant contemporaneous correlation (τ = 0.308; p < 0.0001) with the reported weekly TBE case numbers. The correlation is also visually apparent (Fig. 1). The optimal model without GTD used a single seasonal and a single non-seasonal differentiation, with 1 autoregressive order seasonally and 0 non-seasonally. The moving average order was 1 in both parts of the model, i.e. SARIMA (0,1,1)(1,1,1) [52]. The 'GTD enhanced' model showed the same orders, with the exemption of non-seasonal 2

Ticks and Tick-borne Diseases xxx (xxxx) xxxx

M. Sulyok, et al.

Fig. 1. Weekly TBE case numbers are shown with blue curve; relative Google Trends search volumes are shown with the red curve (RKI: Robert Koch Institute, GTD: Google Trends). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article). Fig. 2. Forecasting of the optimal SARIMA Model (blue curve) compared to the Google Trends extended SARIMA Model (red curve) and to the actual weekly TBE case numbers (black curve). Shaded areas illustrate 80 and 95% prediction interval of the optimal SARIMA model. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article).

Appendix A. Supplementary data

Ecological processes and human activities are important in tick-borne conditions; considering these factors is important. In conclusion, GTD showed a significant correlation with reported weekly case numbers of TBE. However, enhancing models using weekly case numbers with the addition of GTD could not increase predictive performance significantly. GTD, and other Internet based data sources offer great potential in improving disease forecasting. Nevertheless, further refinement such as integration of other ecological and human activities-related factors is required. Investigation into such improvement offers promising avenues for future research.

Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.ttbdis.2019.101306. References Alicino, C., Bragazzi, N.L., Faccio, V., Amicizia, D., Panatto, D., Gasparini, R., Icardi, G., Orsi, A., 2015. Assessing ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes. Infect. Dis. Poverty 4, 54. https://doi.org/10.1186/s40249-015-0090-9. Althouse, B.M., Ng, Y.Y., Cummings, D.A.T., 2011. Prediction of dengue incidence using search query surveillance. PLoS Negl. Trop. Dis. 5, e1258. https://doi.org/10.1371/ journal.pntd.0001258. Beauté, J., Spiteri, G., Warns-Petit, E., Zeller, H., 2018. Tick-borne encephalitis in Europe, 2012 to 2016. Euro Surveill. Bull. Eur. Sur Mal. Transm. Eur. Commun. Dis. Bull. 23. https://doi.org/10.2807/1560-7917.ES.2018.23.45.1800201. Bogdziewicz, M., Szymkowiak, J., 2016. Oak acorn crop and Google search volume predict lyme disease risk in temperate Europe. Basic Appl. Ecol. 17, 300–307. https:// doi.org/10.1016/j.baae.2016.01.002. Bogovic, P., Strle, F., 2015. Tick-borne encephalitis: a review of epidemiology, clinical characteristics, and management. World J. Clin. Cases 3, 430. https://doi.org/10. 12998/wjcc.v3.i5.430. Butler, D., 2013. When Google got flu wrong. Nature 494, 155–156. https://doi.org/10. 1038/494155a. Cook, S., Conrad, C., Fowlkes, A.L., Mohebbi, M.H., 2011. Assessing Google Flu Trends

Declaration of Competing Interest This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Mihály Sulyok received a research grant UKT-AKF Nr. 346-0-0 unrelated to the current project. Máté Kapitány-Fövény acknowledges the support by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. We have no other conflict of interest to declare.

3

Ticks and Tick-borne Diseases xxx (xxxx) xxxx

M. Sulyok, et al.

https://doi.org/10.1016/S0140-6736(08)60800-4. Mansfield, K.L., Johnson, N., Phipps, L.P., Stephenson, J.R., Fooks, A.R., Solomon, T., 2009. Tick-borne encephalitis virus - a review of an emerging zoonosis. J. Gen. Virol. 90, 1781–1794. https://doi.org/10.1099/vir.0.011437-0. The R Core Team, 2017. R: a Language and Environment for Statistical Computing. R foundation for statistical computing, Vienna, Austria. https://www.R-project.org/. Randolph, S.E., EDEN-TBD sub-project team, 2010. Human activities predominate in determining changing incidence of tick-borne encephalitis in Europe. Euro Surveill. Bull. Eur. Sur Mal. Transm. Eur. Commun. Dis. Bull. 15, 24–31. https://doi.org/10. 2807/ese.15.27.19606-en. Robert Koch Institut, 2019. Falldefinitionen des Robert Koch-Instituts zur Übermittlung von Erkrankungs-oder Todesfällen und Nachweisen von Krankheitserreger. Accessed on: 18th May 2019. https://www.rki.de/DE/Content/Infekt/IfSG/Falldefinition/ Downloads/Falldefinitionen_des_RKI_2019.pdf?__blob=publicationFile. Simmonds, P., Becher, P., Bukh, J., Gould, E.A., Meyers, G., Monath, T., Muerhoff, S., Pletnev, A., Rico-Hesse, R., Smith, D.B., Stapleton, J.T., ICTV Report Consortium, 2017. ICTV virus taxonomy profile: Flaviviridae. J. Gen. Virol. 98, 2–3. https://doi. org/10.1099/jgv.0.000672. Sumilo, D., Bormane, A., Asokliene, L., Vasilenko, V., Golovljova, I., Avsic-Zupanc, T., Hubalek, Z., Randolph, S.E., 2008. Socio-economic factors in the differential upsurge of tick-borne encephalitis in Central and Eastern Europe. Rev. Med. Virol. 18, 81–95. https://doi.org/10.1002/rmv.566. Süss, J., 2008. Tick-borne encephalitis in Europe and beyond–the epidemiological situation as of 2007. Euro Surveill. Bull. Eur. Sur Mal. Transm. Eur. Commun. Dis. Bull. 13. https://doi.org/10.2807/ese.13.26.18916-en. Süss, J., 2011. Tick-borne encephalitis 2010: epidemiology, risk areas, and virus strains in Europe and Asia—an overview. Ticks Tick-Borne Dis. 2, 2–15. https://doi.org/10. 1016/j.ttbdis.2010.10.007. Teng, Y., Bi, D., Xie, G., Jin, Y., Huang, Y., Lin, B., An, X., Feng, D., Tong, Y., 2017. Dynamic forecasting of zika epidemics using Google Trends. PLoS One 12, e0165085. https://doi.org/10.1371/journal.pone.0165085. Walker, M.D., 2018. Can Google be used to study parasitic disease? Internet searching on tick-borne encephalitis in Germany. J. Vector Borne Dis. 55, 327–329. https://doi. org/10.4103/0972-9062.256571. Zhang, M., Fortney, J.C., Tilford, J.M., Rost, K.M., 2000. An application of the inverse hyperbolic sine transformation—a note. Health Serv. Outcomes Res. Methodol. 1, 165–171. https://doi.org/10.1023/A:1012593022758.

performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One 6, e23610. https://doi.org/10.1371/journal.pone.0023610. Daniel, M., Danielová, V., Fialová, A., Malý, M., Kříž, B., Nuttall, P.A., 2018. Increased relative risk of tick-borne encephalitis in warmer weather. Front. Cell. Infect. Microbiol. 8, 90. https://doi.org/10.3389/fcimb.2018.00090. Dekker, M., Laverman, G.D., de Vries, A., Reimerink, J., Geeraedts, F., 2019. Emergence of tick-borne encephalitis (TBE) in the Netherlands. Ticks Tick-Borne Dis. 10, 176–179. https://doi.org/10.1016/j.ttbdis.2018.10.008. Diebold, F.X., Mariano, R.S., 2002. Comparing predictive accuracy. J. Bus. Econ. Stat. 20, 134–144. https://doi.org/10.1198/073500102753410444. Gluskin, R.T., Johansson, M.A., Santillana, M., Brownstein, J.S., 2014. Evaluation of internet-based dengue query data: google Dengue Trends. PLoS Negl. Trop. Dis. 8, e2713. https://doi.org/10.1371/journal.pntd.0002713. Godfrey, E.R., Randolph, S.E., 2011. Economic downturn results in tick-borne disease upsurge. Parasit. Vectors 4, 35. https://doi.org/10.1186/1756-3305-4-35. Grard, G., Moureau, G., Charrel, R.N., Lemasson, J.-J., Gonzalez, J.-P., Gallian, P., Gritsun, T.S., Holmes, E.C., Gould, E.A., de Lamballerie, X., 2007. Genetic characterization of tick-borne flaviviruses: new insights into evolution, pathogenetic determinants and taxonomy. Virology 361, 80–92. https://doi.org/10.1016/j.virol. 2006.09.015. Hyndman, R.J., Athanasopoulos, G., Bergmeir, C., Caceres, G., Chhay, L., O’Hara-Wild, M., Petropoulos, F., Razbash, S., Wang, E., Yasmeen, F., 2019. Forecast: Forecasting Functions for Time Series and Linear Models. R Package Version 8.4. http://pkg. robjhyndman.com/forecast. Hyndman, R.J., Khandakar, Y., 2008. Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 27. https://doi.org/10.18637/jss.v027.i03. Jaenson, T.G.T., Petersson, E.H., Jaenson, D.G.E., Kindberg, J., Pettersson, J.H.-O., Hjertqvist, M., Medlock, J.M., Bengtsson, H., 2018. The importance of wildlife in the ecology and epidemiology of the TBE virus in Sweden: incidence of human TBE correlates with abundance of deer and hares. Parasit. Vectors 11, 477. https://doi. org/10.1186/s13071-018-3057-4. Kapitány-Fövény, M., Ferenci, T., Sulyok, Z., Kegele, J., Richter, H., Vályi-Nagy, I., Sulyok, M., 2019. Can Google Trends data improve forecasting of lyme disease incidence? Zoonoses Public Health 66, 101–107. https://doi.org/10.1111/zph.12539. Lazer, D., Kennedy, R., King, G., Vespignani, A., 2014. The parable of Google Flu: Traps in big data analysis. Science 343, 1203–1205. https://doi.org/10.1126/science. 1248506. Lindquist, L., Vapalahti, O., 2008. Tick-borne encephalitis. Lancet 371, 1861–1871.

4