Disease surveillance using online news: Dengue and zika in tropical countries

Disease surveillance using online news: Dengue and zika in tropical countries

Journal Pre-proofs Disease Surveillance Using Online News: Dengue and Zika in Tropical Countries Yiding Zhang, Motomu Ibaraki, Franklin W. Schwartz PI...

932KB Sizes 0 Downloads 3 Views

Journal Pre-proofs Disease Surveillance Using Online News: Dengue and Zika in Tropical Countries Yiding Zhang, Motomu Ibaraki, Franklin W. Schwartz PII: DOI: Reference:

S1532-0464(20)30001-0 https://doi.org/10.1016/j.jbi.2020.103374 YJBIN 103374

To appear in:

Journal of Biomedical Informatics

Received Date: Revised Date: Accepted Date:

25 June 2019 10 December 2019 3 January 2020

Please cite this article as: Zhang, Y., Ibaraki, M., Schwartz, F.W., Disease Surveillance Using Online News: Dengue and Zika in Tropical Countries, Journal of Biomedical Informatics (2020), doi: https://doi.org/10.1016/ j.jbi.2020.103374

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier Inc.

Paper Type: Research Paper Tittle: Disease Surveillance Using Online News: Dengue and Zika in Tropical Countries

Yiding Zhang Corresponding author: Environmental Science Graduate Program, The Ohio State University, 234 Mendenhall Lab, 125 South Oval Mall, Columbus OH 43210, USA; Phone: (614) 787-1856; Fax: (614) 292-7688; Email: [email protected]

Motomu Ibaraki School of Earth Sciences, The Ohio State University, Columbus OH 43210, USA; Email: [email protected]

Franklin W. Schwartz School of Earth Sciences, The Ohio State University, Columbus OH 43210, USA; Email: [email protected]

Keywords: disease surveillance; newspaper; text mining; dengue fever; zika

1

Abstract BACKGROUND: Around the world in tropical areas, certain vector-borne diseases have become endemic and hyperendemic. Among the developing nations, there are common difficulties in establishing the incidences of various diseases, especially vector-borne diseases with complex etiologies and a broad spectrum of presentations. One alternative approach to characterization of the disease outbreaks examines the possibilities of developing proxy information from online news articles. Such sources are being evaluated for applications to disease surveillance, early outbreak detection, and epidemiology research. Our study here looks to examine the potential of news articles in elucidating outbreaks of dengue in India and zika disease in Brazil. OBJECTIVE: This study is designed to assess the potential usefulness of news articles in tracking case numbers of dengue and zika through an improved understanding of how news outlets report on disease. We specifically examine the possibilities of providing near real-time reporting on the development of outbreaks of dengue and zika. METHODS: Newspaper articles related to dengue fever and zika disease in India and Brazil, respectively were extracted from the LexisNexis database. We targeted news articles available from five popular international news sources and two local

2

newspapers in each country. The news articles were processed to provide yearly and weekly time series in the number of articles concerned with dengue and zika to test their potential suitability as proxies for disease prevalence. The collections of articles were analyzed using a text mining tool-kit that subdivides a collections of news articles into smaller clusters to study the topical focus of articles and their relevance to tracking diseases. RESULTS: For dengue fever in India, the local newspapers provide a better source of information than international newspapers. The multi-year analysis (20102016) suggests that the numbers of dengue cases are strongly correlated with the numbers of news reports, with an R2 value of 0.88. For zika disease in Brazil, the news reports provided useful information on the timing of the zika outbreak. Reporting increase sharply at the beginning of 2016, peaked in weeks 5 to 8, and decreased sharply. The numbers of articles remained low for the remainder of 2016 and 2017. Comparisons with reported case again show article numbers to be a useful proxy of prevalence of zika in Brazil. CONCLUSIONS: The paper describes a strategy that applies newspaper as proxies to monitor outbreaks of infectious diseases and to study the epidemiology. It has potential applicability in some developing countries and regions with relatively poor

3

medical infrastructures and records. Clearly, large national newspapers in India provide a better source of information on diseases than international outlets. This approach has potential with selected diseases in a few selected countries. Article numbers internationally appear to vary in proportion to the perceived health impact.

4

Introduction Around the world in tropical areas, certain vector-borne diseases are emerging to become endemic and hyperendemic. The drivers are typically poverty and large population densities that are exemplified by poor housing, absent public services, and unhygienic conditions [1]. The particular diseases of interest here are (i) dengue fever, as well as the more severe dengue hemorrhagic fever and dengue shock syndrome, and (ii) various problems associated with the zika virus. In his 2011 review, Gubler [2] explains how urbanization, globalization, and the inability to control mosquitoes have conspired to make dengue epidemic, placing more than half of the world’s population at risk. What is particularly concerning about dengue fever is the geographic expansion, and the increasing frequency and scale of epidemics over the past 40 years [2,3]. Dengue is now the “most important mosquito-borne viral disease” [3]. This emergence of dengue has coincided with a trend towards urbanization, especially evident in poor tropical and subtropical countries [2]. Issues associated with zika, a similar mosquito-borne virus to dengue, became evident in 2015, especially the potential for severe impacts to fetal health. The emergence of zika virus in Brazil in 2015 and its rapid spread throughout the Americas was sudden and unexpected [4]. Historically, zika infections were known but generally

5

considered to be a subclinical or mild influenza-like disease. The outbreaks in Americas were more serious in character especially due to neurologic impacts and microcephaly in babies born to infected mothers. With both dengue and zika-related infections, the full extent of human impacts has been difficult to measure. For example, for dengue factors contributing to this problem include ineffective surveillance, minimal reporting for a variety of reasons, and misdiagnoses [3]. For many countries, the problem of reporting can be directly attributed to rudimentary medical infrastructure, including public health systems. Similar problems in reporting were evident with zika virus. When zika and associated health problems emerged in Brazil in 2015, the medical community had difficulty reacting to this serious emergency [5]. Information on the virus, mosquito vectors and the magnitude of affected individuals was largely unknown. As a consequence, the international community was hindered in its response as the disease spread through South and Central America [6]. The common difficulties in collecting information, and the absence of comprehensive, systematic and timely reporting of disease-related data is a continuing and serious problem. One potential solution relies on alternative information proxies to estimate numbers of dengue and zika cases. Useful information can be extracted from

6

various types of internet-based resources, such as databases of scientific literature, online newspapers, and blogs, have potential in this respect [7]. They are being considered as sources of information relevant to disease surveillance, early outbreak detection, and epidemiological research. There are indications that such informal sources of information have value. For example, nearly all major disease outbreaks investigated by the World Health Organization (WHO) were first identified through the informal online sources rather than local medical reports [8]. What is also clear is that these information sources are challenging to work with. The elected array of available data sources touching on aspects of disease reporting is neither well organized nor integrated. Moreover, it is virtually impossible to read and assimilate information from these many different kinds of reports. There has been progress in the development of disease surveillance systems that rely on online sources. For example, HealthMap [9,10] provides real-time surveillance using data from online media sources, such as Google News, WHO reports, and the Program for Monitoring Emerging Diseases (ProMED)-mails. Text mining approaches yield information on numbers and locations of disease outbreaks. MedISy [11] is a media monitoring system providing event-based surveillance using information from

7

online news reports. All news items are categorized into classes which include geolocation, entities (persons, organizations, etc.), groups and subgroups (e.g., diseases) [11]. Google Flu Trends [12] is another such project. It began in 2008 and is designed to track seasonal flu. The approach relies on data mining records of flu-related search terms, which are entered in Google’s search engine, combined with computer modelling. The simulation results are then calibrated using real-time flu data. In some years, predictions matched data compilations from Centers for Disease Control and Prevention (CDC) well. Other years, e.g., 2013, have been more problematic, pointing to a need to keep recalibrating the model [12]. Although progress is evident, more work is required to develop these online concepts. Social media, such as Twitter, represent another important source of information. There are applications that utilize spatial, temporal, and text mining with Twitter data to create real-time disease surveillance systems, applied to illnesses like flu and cancer [13]. Our preliminary work, however, suggests that unlike other information sources, social media, particularly Twitter, is mostly useful for the early prediction of seasonal disease outbreaks. With twitter and other social media, the brevity of the messages and the complexity and unregulated character of data can affect the accuracy of the predictions [13]. The goal of this study is to assess the potential usefulness of news information in

8

tracking diseases in developing countries through an improved understanding of how news outlets report on disease. The objective is to explore the potential of utilizing online news sources to monitor dengue fever in India and zika in Brazil and India. The decision to target dengue fever and zika was based on two considerations. In the case of dengue, it has been noted that the prevalence and case numbers have been increasing. Moreover, its course from year to year is erratic with dengue becoming epidemic or hyper-endemic on an irregular basis. Dengue would continue to be newsworthy mainly because of its inherent variability in intensity and public health impacts. The news interest in a serious emerging disease like zika should be similar to dengue fever but with more emphasis on health implications with microcephaly and other related problems, risk factors, and breaking news on the science. The second reason for selecting dengue fever is the likelihood that it will experience growth in case numbers and geographic distribution [2]. It is a disease of poverty, urbanization and globalization directly associated with the ecology of mosquitoes and climate [1]. A news-based approach then may offer some potential in rapidly tracking diseases with characteristics like dengue and zika. There is a strong likelihood to expect the incidences of dengue fever (and zika) to spread geographically and increase due to global climate change and other factors.

9

We chose to utilize newspapers and written reports from news organizations [e.g., Cable News Network (CNN), British Broadcasting Corporation (BBC)]. Fast, well-organized and systematic coverage from these sources is now commonly available online. Daily reporting provides a large and diverse collection of world-wide articles. There are also significant advantages over other online sources, such as scientific papers and social media. Scientific papers are slow to come out and often limited geographically to developed countries. Blogs and messages on social media are commonly proprietary and unavailable, and rather inconsistent in style [14]. We also chose to focus our analysis on two of the world’s large developing countries India and Brazil. These countries have both experienced an increase in intensity in dengue fever, especially in 2015. Dengue fever has emerged in India over the last 50 years as a severe health problem [15]. It is caused by dengue virus which spread by the bite of infected Aedes mosquitoes [3]. Sources in India reported an annual average of 20,474 dengue cases between 2006 and 2012 [16]. Although dengue fever is also a serious problem in Brazil, we chose to examine the recent emergence of zika virus, also transmitted by Aedes mosquitoes. Brazil was ground zero for the first reports of serious birth defects associated with zika. The outbreak of zika in Brazil in 20152016 received worldwide attention because of linkages with pregnant women to

10

microcephaly and other serious birth defects [17]. Another important consideration in our selection of countries is the availability of systematic news reporting in India and Brazil. In other words, not every topical or sub-tropical country has well-developed sources for news. For example, little news comes from the poorest of countries (e.g., Myanmar or Yemen) or countries where the news is government-controlled. Large newspapers [e.g., The New York Times (NYT), The Guardian (TG)] with broad appeal worldwide are only somewhat useful in this respect because their coverage is always limited in these countries. Both India and Brazil benefit from having a robust national network of news outlets and are sufficiently large and developed economically that events, business, problems etc. are of international interest. India also has the largest expat population of all countries at 16 million [18].

Methods Data Sources and Analyses Our study examines whether news reports can inform the timing and intensity of disease. In order to verify the feasibility of using the numbers of news reports as the basis for a disease surveillance system, we created time series of reported numbers of monthly dengue cases in India and zika in Brazil. For India, we use data on monthly

11

dengue cases from 2003 through 2016 that are compiled by WHO [19]. In 2003, the number of reported cases of dengue was 12,754. For the years up to 2011, case numbers increased but commonly were < 20,000/year before. From 2012 onward, reported numbers of dengue cases increased significantly to a peak in 2013 of 75,808. Although the numbers of dengue cases declined in 2014, they again spiked to 99,913 cases in 2015 and 90,277 cases in 2016. Outbreaks of zika in Brazil began in 2015 and extended through 2016. However, no information on case numbers for zika in all Brazil was available for 2015 because zika disease was not a reportable disease for the Brazil Ministry of Health. Weekly data on both suspected and confirmed zika, however, became available from the start of 2016 from the Brazil Ministry of Health [20]. Numbers in both suspected and confirmed cases showed a sharp increase early in 2016, peaking in week 7 with >20,000 suspected and 10,000 confirmed cases. Subsequently, there was a marked and steady decline into week 24 of 2016, which was followed by long tailing decline through the remainder of 2016 with <300 cases per week. In terms of statistical analyses, we created linear/nonlinear regression models between numbers of disease cases and news reports. These statistical models were tested in terms of their ability to predict reported numbers of disease cases given

12

number of related news reports. The reliabilities of the models were evaluated by creating a plot of reported versus predicted case numbers and calculating the coefficient of correlation (R2) between them. Selection of Sources of Online Information The study depends on the availability of relevant news articles from online sources. We explored the possibilities of downloading news reports directly from newspaper websites. This effort was unsuccessful because of design limitations in their individual search engines. We also investigated the possibility of using the Google News search engine, which required development of a web scraper to download news articles from the search results. This approach was also unsatisfactory because the search results from Google News tended to focus on major news outlets in developed counties, which were biased to their national view, providing minimal coverage of India and Brazil. Several additional databases were tested, including LexisNexis [21], Factiva [22], InfoTrac Newsstand [23], and Historical Newspapers Archives [24]. LexisNexis was most suited for our application because of its broad coverage of international newspapers, large numbers of fully texted articles, and well formatted text data. LexisNexis has access to over 15,000 news, business, and legal sources. Their broad

13

news coverage includes deep archives and up-to-the-minute stories in national and regional newspapers, wire services, broadcast transcripts, international news, and nonEnglish language sources [21]. LexisNexis has been used as the primary database for studies in areas such as medicine, social work, and informatics [25–27], as well as a somewhat similar study of dengue fever in India [10]. That study involved a search of all the possible news articles related to dengue in India in 2014 from LexisNexis. For unknown reasons, the temporal trend in article numbers did not agree with the dengue outbreak that year. Our study involved focus on two Indian newspapers rather than many, rigorous statistical modeling with calibration, and careful evaluation of the efficacy of the time series. In addition to the two Indian national papers, Times of India (TOI) and Hindustan Times (HT), we also selected five popular international news sources BBC, CNN, NYT, TG, and Washington Post (WP). In the case of zika in Brazil, we used the same five major international news outlets and two nationally important Brazilian newspapers, Jornal O Globo (JOG) and O Estado de São Paulo (Estadão). In addition to national and international newspapers, we also examined the possibility of using local or regional newspapers as well. Our thinking was that the spatial variability in article numbers could shed light on geographically related features of epidemiology of certain endemic

14

diseases. However, our trials encountered two intractable logistical problems. The local news outlets often do not have organized archives and are difficult to use in real time because of unconventional formatting. Second, these news streams are not commonly archived on databases like LexisNexis. Thus, the numbers of local news reports locally are small, and not usable for disease surveillance. Computer Tool-kit for Text Data Mining and Clustering The text mining and clustering techniques to process written text are widely used in bibliometric studies that involve medicine and health [28–33], science more broadly [34,35], and business [36,37]. A series of algorithms written in Python are used for the processing and analysis of news articles. These modules form the basis of a toolkit that has been widely described both in our own works [38,39] and elsewhere [31,40]. Interested readers can refer to those papers for additional detail. Here, we present an overview of our approaches. Our algorithm is used to extract all the words or short terms from each news article text. After some natural language processing procedures, similarity between each pair of news articles is calculated based on the captured words or short terms in the texts. The took-kit automatically clusters news articles with higher similarities into same groups using a Louvain Modularity Method [39]. The clustered groups exhibit a general diversity of topics regarding the targeted disease.

15

The large number of news articles are in processed in two different ways. First, a Boolean logic search identifies news articles published during a specific time interval (e.g., each month) across a number of years, which are concerned with themes like “dengue” or “zika”. To be selected, the news articles must contain the disease names (“dengue” or “zika”) in the titles and country names [“India” or “Brazil” (or “Brasil” in Portuguese)] in any section of the news articles. The time series in the number themed articles (e.g., weekly) is tested to determine its suitability as a proxy for disease prevalence. The second strategy is an unstructured cluster analysis that takes a collection of articles and subdivides them into a smaller number of clusters (e.g., 5 to 15). The articles present in a cluster have more in common with each other than articles in any of the other clusters. As mentioned, this analysis takes the collection of news articles and subdivides them into about a dozen categories. For each cluster, we determine associated keywords based on their frequencies of occurrence. These words describe the “theme” of each cluster. The pattern of clustering can also be visualized to examine similarity patterns among the various clusters.

16

Results Comparison between News Articles and Dengue Outbreaks in India This section examines the question as to whether changes in the relative abundance of news articles concerned with dengue in India has power in estimating rates of occurrence of dengue fever. Our search of articles in high visibility news outlets outside of India discovered only 4 - 9 news reports on dengue in India in each of the five international news outlets between 2010 and 2017. The number of articles is too few to provide a useful temporal analysis. Fortunately, the results are much more promising for selected large national newspapers located in India. Over this same sevenyear period, HOI and HT, published 3,570 and 1,167 reports related to dengue, respectively. As could be expected, these outlets were much more focused on health issues relevant to India. The large differences in numbers of articles on dengue fever in the Indian newspapers as compared to the international press results suggest that informal thresholds exist before the recurring disease problems of individual countries are deemed “newsworthy” for the international media. In particular, the international press was concerned more about global dengue problems and new research from US. Thus, we have turned to the nationally important newspapers in India to study their usefulness

17

in tracking cases of dengue fever. Thus we examine the possibilities of using the numbers of news articles from TOI and HT for estimating annual dengue cases. For calibration purposes, actual compilations of yearly dengue cases in India from 2003 through 2016 from WHO are utilized [19]. These numbers likely represent more serious dengue cases, probably resulting in hospital treatment [41]. As such, the annual totals are a lower estimate of dengue cases, because most patients are asymptomatic and do not require medical treatment. Yearly numbers of news reports associated with dengue in India from 2010 2016 were extracted to test their efficacy as a proxy for numbers of human disease cases. For both of these newspapers, the total number of published articles finding their way onto the LexisNexis Database are growing annually (approximately 7.3% per year). As a first approximation, the yearly totals on dengue-relevant articles were normalized in order to remove the effects of growth in article numbers. Figure 1a provides a comparison of the WHO compilation of yearly dengue cases in India with the annual numbers of dengue reports from the newspapers. The article numbers began to increase rapidly in 2012, reaching a peak in 2013 with 558 reports. After a decline in 2014 to 442 reports, the numbers increased significantly in 2015 and 2016 with 703 and 873

18

respectively. There is a strong non-linear, correlation with recorded dengue cases providing an R2 value of 0.88. The blue curve in Figure 1b represents the WHO number of dengue cases yearly in India. The numbers of dengue cases were commonly < 20,000/year before 2011. From 2012, numbers increased significantly to a peak in 2013 with 75,808 cases. Although the number of dengue cases declined in 2014, they again spiked up to 99,913 cases in 2015 and 90,277 cases in 2016. The numbers of dengue cases in 2015 and 2016 were the largest on record since monitoring began in 1998 [19,42]. The regression equation for the correlation in Figure 1a provides a statistical equation to predict case numbers for dengue fever as a function of annual newspaper articles from TOI and HT. The results are shown as the red curve in Figure 1b. By looking at the years between 2010-2016, the statistical prediction (red line) tracks the tabulated number of dengue cases (blue line) very well. Regression analysis comparing the modeled versus observed data yields an R2 of 0.85. This result suggests that the number of dengue-related articles in national newspapers could be useful as a proxy to represent present multi-year trends of dengue transmission in India.

19

Figure 1. A statistical basis for predicting the yearly dengue cases based on the numbers of dengue-related news reports. Shown in Panel (a) is a seven-year regression (2010-2016) between numbers of articles and dengue cases with an R2 value of 0.88. Panel (b) compares variation in annual reported case numbers for dengue fever in India (2003-2016, blue) with the predicted dengue cases (20102016) from the statistical model (red).

Temporal Trends of Number of News Articles related to Dengue in India Next, we use the frequency data on local news articles to examine whether patterns exist with respect to intra-annual variability with respect to dengue infections in

20

India. Weekly numbers of news reports related to dengue in India in TOI and HT were extracted from the LexisNexis database plotted versus time (Figure 2). Clearly, the results indicate that numbers of news articles in both newspapers display obvious and consistent yearly trends. Typically, the numbers of articles increased sharply between June and July, to reach a peak in late September to early October. The numbers of articles wane towards the end of each year. This trend matches the known behavior of dengue cases in India as shown in Figure 3. This figure presents the average number of patients for dengue fever in India from 2014-2017 [43]. The number of dengue patients becomes noticeable in July, reaching a peak in October, before declining through November. By checking news articles, the higher peaks in 2015 (83 and 62 reports in TOI and HT, respectively, Figure 2) also coincide with the worst recorded dengue outbreak in that year. These results provide additional confirmation about the usefulness of local news outlets in the monitoring of a serious long-lasting infectious diseases such as dengue fever. When sufficiently large numbers of articles are available, highly resolved temporal resolution on the order of weeks is practical.

21

Figure 2. The time variation in the number of weekly news reports from Indian local news outlets from 2013 to 2016, (a) TOI and (b) HT.

Figure 3. Average monthly number of patients for dengue fever in India between 2014 and 2017 [43].

Spectrum of Dengue-Related Topics in News Articles This section examines the spectrum of topics being considered in newspaper articles on dengue-related topics. This information is important because certain types of articles may carry little information relevant to timing and degrade the correlations just 22

discussed. Our text mining and clustering tool-kit was applied to provide unstructured clustering of the collection of articles from the two newspapers. The dataset includes 3,774 news reports in TOI and HT from 2013 through 2017, with 604, 504, 780, 966, and 920 articles, respectively for these five years. All the article features including titles, articles themselves, authors, and publish dates, were retrieved from the LexisNexis Database. The cluster analysis uses the words present in each article, across all the articles. The algorithm calculates the similarity between each pair of articles. For each article, the system only records the top 10% articles which have higher similarities to it. Articles with higher similarities are iteratively clustered as one group. The keywords associated with the resulting 14 clusters (Table 1), describe the focus of articles in each of these clusters.

Table 1. Topics of news reports discussing dengue fever in India. Cluster No.

No. of Articles

Topic

1

783

Reports on numbers of dengue outbreak cases

2

695

Indian government fights against dengue / Government failure

3

384

Delhi Municipal Corporation reports / Dengue and chikungunya

4

357

Dengue diagnoses and virus studies

5

354

Dengue death reports

6

255

Information of mosquitoes

7

244

Reports on suspected dengue deaths 23

8

167

Reports on positive dengue tests in hospitals

9

163

Reports on dengue and swine flu

10

140

Dengue control and prevention

11

116

Dengue tests / Lab tests

12

50

Dengue vaccines

13

40

Dengue alarms

14

26

A law case of dengue death

As Table 1 shows, the dengue-related articles cover a variety of topics. The results show that approximately 2/3 of all articles are represented by five clusters (Table 1). This top five groups include topics related to dengue outbreaks, government activities, Delhi reports/dengue and chikungunya, dengue diagnoses and virus studies, and dengue death reports. Some other less popular but important topics include mosquitoes, hospital tests, lab studies, vaccines, and dengue alarms. A temporal analysis was conducted by plotting monthly time series for the number of articles in each of the 14 clusters (Figure 4). It is evident that article numbers in 12 of 14 topics follow quite similar annual trends with annual peaks in late summer. Moreover, article numbers associated with Clusters 1, 3, 4, 6, 8, and 11 peaked in 2015 and 2016. These clusters were concerned with dengue outbreaks, Delhi reports/dengue and chikungunya, dengue diagnoses and virus studies, mosquitoes, hospital dengue tests, and lab tests (Figure 4a, c, d, f, h, and k). This pattern is similar to the yearly

24

trends in dengue cases. The numbers of articles in Cluster 2 and 10, whose topics are government actions and dengue control, behaved somewhat differently. Although articles peaked towards the end of summer, their magnitude kept increasing after 2015. This pattern probably suggests both the Indian government and citizens realized the seriousness of dengue fever in 2015 and efforts were ramped up to prevent and to control dengue outbreaks in more recent years. The articles in Clusters 12 and 13, concerned with dengue vaccines and alarms, exhibited higher frequencies after 2015 as well (Figure 4l and m). Only article numbers from Cluster 12 (Figure 4l) and 14 (Figure 4n) failed to show a strong seasonal peak. The two topics are dengue vaccines and a law case of dengue death in 2017 and are represented a very small number of articles (Table 1). This analysis shows that > 98% of the news articles exhibited patterns providing a substantial increase in article numbers in late summer. Given all the other uncertainties, it is appropriate to simply use all the news articles rather than some subset to characterize patterns in dengue outbreaks. The appearance of dengue cases in late summer consistently leads to the production of news articles of all kinds, which eventually wane as the case numbers decline through the fall.

25

Figure 4. Time series from 2013-2017 on the weekly number of articles in each of the 14 topical clusters. The cluster numbers correspond to the topics listed in Table 1.

Zika Transmission Trends in Brazil Unlike dengue fever, zika is a disease that to a large extent was unnoticed until it emerged in Brazil in 2015. Historically, zika was always regarded as a rather benign disease that presented with flu-like symptoms. What made it newsworthy was the 26

apparent association with serious congenital anomalies, e.g., microcephaly. Infections of the zika virus is also a tropical disease associated with Aedes aegypti mosquitoes, which are present in Brazil. Again, we used both international and local newspapers for analysis. Because the Brazilian newspapers are published in Portuguese, Google Translate was used to translate all the news articles into English. Altogether, 574 and 247 news reports were retrieved from international and local news outlets, respectively. Unlike results from the study of dengue fever in India, major international newspapers attached much greater importance to this new emerging infectious disease. There was extremely limited scientific understanding of a disease that was considered to be a public health disaster. The zika outbreaks in Brazil occurred in 2015 and 2016. In October 2015, reports from State of Pernambuco indicated an increase in the number of newborns with microcephaly, i.e., 26 cases in 3 weeks [44]. The greatest number of cases (900 in one week) was reported in week 49 of 2015 [45]. It was soon discovered that the first trimester of pregnancy for women delivering in October coincided with the peak of the outbreak of an exanthematous disease later identified as zika, which occurred from January to March 2015 [44]. However, the weekly data of zika cases for all of Brazil was not available for 2015 because zika disease was not then a reportable disease for the

27

Brazil Ministry of Health [20]. The Brazil Ministry of Health began tracking the disease in 2016. The weekly data of both suspected and confirmed zika cases in Brazil is shown in Figure 5 [20]. From Figure 5, both suspected and confirmed zika cases increased sharply at the beginning of 2016, peaking in week 7 with >20,000 suspected and 10,000 confirmed cases. Subsequently, zika cases declined abruptly into week of 24 with a long tail through the remained of 2016 with <300 cases per week. The weekly number of zika cases in 2017 were relatively small at <500 cases per week.

Figure 5. The time variation in the number of weekly reported suspected and confirmed zika cases in Brazil from 2015 to 2017[20].

To examine the responses of news outlets to the zika outbreak in Brazil, weekly numbers of news reports related to zika in each news outlet was extracted from the LexisNexis database plotted versus time in Figure 6. The weekly numbers of news 28

articles in both international and local newspapers tracked the course of zika outbreaks quite well (Figure 5 and 6). The reports from international newspapers started to increase sharply at the beginning of 2016 and peaked in weeks 5 to 8, with the similar performance as the number of zika cases [20]. The numbers of reports decreased gradually with long tails (Figure 6a-e). The number of reports in 2017 was constantly low, which agreed with the number of zika cases in 2017 as well [20]. The number of reports from the local Brazilian newspapers behave similarly. However, their reports were more time sensitive. From Figure 6(f) and (g), it is clear that both JOG and Estadão picked up the problem sooner with small numbers of articles appearing in weeks 48 to 53 at the end of 2015. These reports were published almost simultaneously with the peak of zika outbreaks in Brazil in week 49 of 2015. Both international and local news outlets performed well in the monitoring of zika, which was a useful example of a serious, newly emerging disease. Compared to the international newspapers, local newspapers were timelier in their reporting.

29

Figure 6. The time variation in the number of weekly news reports from international and local news outlets from 2015 to 2017. The international outlets included (a) BBC, (b) CNN, (c) NYT, (d) TG, and (e) WP; the local Brazilian newspapers included (f) JOG and (g) Estadão.

Spectrum of Topics of Zika-related Articles in News Outlets Again, we use a clustering approach to examine the variety expressed in the content of news articles concerned with zika in Brazil. Although pattern of variation in the numbers of articles versus time were quite similar for both the international and 30

local news outlets, it is not clear that the coverage of topics was also similar. The two analyses of topical clustering used 247 news reports from the local Brazilian newspapers (JOG and Estadão) and 574 from the international newspapers (BBC, CNN, NYT, TG, and WP) from 2015 to 2017. The lists of topics that resulted are listed in Table 2.

Table 2. Topics of news reports concerned with zika in Brazil. Topics from International Newspapers

Topics from Local Newspapers

No. of Articles

Topics

No. of Topics Articles

76

Zika epidemic / Birth defects

32

Warnings and announcements from health organizations

65

Zika global health emergency

27

Zika vaccine

63

Zika and Rio Olympics

23

Mosquitoes with zika

60

CDC zika reports

22

Zika outbreak cases

55

Symptoms, microcephaly, and brain damage

21

Pregnancy with zika

50

Brazilian government reports

19

Zika and Rio Olympics

37

Zika outbreaks reports from countries globally

18

Reports of microcephaly

36

Abortion and birth control

16

Zika epidemiology

31

Zika hot zones in the US

16

Zika virus and tests

30

Mosquitoes with zika

13

Reports from Sao Paulo

24

Zika transmission pathways

13

Zika stories

23

Zika in Uganda

10

International zika emergency

16

Zika vaccine

10

Abortion and birth control

8

Zika virus from blood supply

7

Funding for zika research

31

A detailed inspection of Table 2 shows that the topics of articles from the international news outlets and local newspapers are somewhat different. The six most popular topics from the international newspapers related to zika in Brazil are zika epidemic / birth defects, zika global health emergency, zika and Rio Olympics, CDC zika reports, symptoms, and Brazilian government reports. However, the six most popular topics from the local newspapers are warnings from health organizations, zika vaccine, mosquitoes with zika, zika outbreak cases, pregnancy with zika, and zika and Rio Olympics. Seven of the topics covered by the international and local newspapers are similar. These include zika epidemic, global health emergency, zika and Rio Olympics, Brazilian government reports, abortion and birth control, mosquitoes with zika, and zika vaccine. These similar topics are mainly concerned with the epidemiology of zika as well as local reports from Brazil. However, the international newspapers stressed the implications of global transmissions of zika, for example, zika outbreaks in countries globally, the historical origins of the zika virus in Uganda, and global health emergency posed by the virus. Additionally, because of the leadership of the US in medical research, the international newspaper reports commonly presented a US perspective, e.g., CDC reports of zika and zika hot spots in the US. In contrast, the local newspapers

32

in Brazil provided a local perspective with news relevant to Brazil, which included information on zika outbreaks, epidemiology, local solutions, and research. On balance, the local news reports provide more specific information on the emergence of zika in Brazil. A Comparison of Zika Epidemiology between Brazil and India CDC issued a traveler alert March 9, 2018, providing a world map that shows countries with high zika risk [46]. The alert mentioned India as one of the countries with highest zika risk. As a newly emerging disease, it is significant to understand whether zika presents, for example, in Asian countries with serious presentations comparable to Brazil. A search for articles on zika in India in TOI and HT found no articles before 2016. However, from 2016 to 2018, a search of LexisNexis yielded 145 and 70 news articles from the TOI and HT. The weekly number of news reports is plotted along with time in Figure 7.

33

Figure 7. The time variation in the number of weekly news reports on zika in India from TOI and HT from 2016 to 2018.

Inspection of Figure 7 shows three major and one minor peaks in article numbers concerned with zika in India from 2016-2018. By checking the contents of news articles associated with each peak, we found the topics are (i) zika alerts and reactions due to zika outbreaks in South Africa (February, 2016), (ii) zika tests of the athletes coming back from Rio Olympics (August, 2016), (iii) the first confirmed zika case in India (May-June, 2017), and (iv) a zika outbreak in India (October, 2018). The first two peaks show the media reaction, which reflects public concern to the sudden zika outbreaks in South America. They do not reflect the real zika outbreaks in India. The first zika case in India was confirmed on May, 2017 [47], which caused a spike in news reports. The real zika outbreaks occurred in October, 2018, with a report of the highest number of zika cases reaching 100 in Rajasthan [48]. However, the actual 34

zika outbreaks were less consequential as compared to those in Brazil. A review of the actual contents of articles concerned with zika in India, found no reports of microcephaly in India attributable to zika virus. News reports were essentially characterizing zika disease in India as benign, similar to common flus. Clearly then reporting on zika was quite different in India than Brazil. This very different treatment in news articles may simply reflect differences in zika strains in the different countries. At this time, there is simply not sufficient information to provide additional context for the news reporting.

Discussion Selection of Newspapers Our analyses of dengue fever in India and zika in Brazil suggest that news articles related to diseases can be successfully applied as proxies for monitoring disease outbreaks. However, caution is warranted as was evident when the number of news articles related to zika in India provided no information on zika outbreaks. Thus, for certain diseases in certain countries, there are good prospects for using news articles as a proxy for the number of disease cases. However, this approach is by no means a panacea. It works best with serious diseases that come and go interannually or intraannually. With diseases that are with us all the time (e.g., cancers and heart disease),

35

newsworthiness is not reflected by disease incidence but new emerging treatments and research leading to fundamental breakthroughs in understanding. An approach that may prove useful in one country may not be appropriate for other countries. For example, zika in India, without any clear linkage to microcephaly, was not particularly newsworthy, compared to zika in Brazil. Thus, news articles were not a particularly useful proxy of case numbers in India. They were however amenable to providing a snapshot of public perceptions on the disease through cluster analysis. Techniques for natural language processing are evolving quickly both in terms of recognizing themes and new graphical approaches. Although not the main focus of our study, we think it important that readers realize the opportunities of using news articles to track the evolution of ideas with respect to diseases. Our analyses of zika in India provide a simple example. One of other studies (e.g., Schwartz et al., 2018 [38]) shows the power of text mining and clustering approaches in tracking the variation in research themes across 60 years. Effects of Newsworthiness Our study has also begun to probe the complexity of what is newsworthy in different kinds of news settings. The goal here would be to provide researchers with guidance that would optimize their use of news information. Our results here illustrate

36

the broad variability in news focus associated with disease related news reports. The variances are amplified especially between popular international media (e.g., NYT, CNN) and national or local newspapers (e.g., TOI and HT in India). There appears to be a large barrier that needs to be overcome for disease reports to interest international press. For example, the increased prevalence of dengue in India in recent years was relatively unremarkable from an international news perspective. Dengue after all is a rather predictable member of the family of neglected tropical diseases. The contrast in the treatment of zika in Brazil by the international news community is instructive. The main driver for the extensive international coverage appears to be the potential for horrific health impacts that includes fetal congenital problems and neural issues like Guillain-Barre syndrome. Although serious impacts had occurred previously, Brazil and the Olympics was when zika emerged onto the world’s stage. Thus, diseases that newly emerged, poorly understood, potentially dangerous, and potential posing a global threat are especially newsworthy. However, with more common diseases that are not particularly deadly, there is a need to discover news outlets that are geographically more associated with the particular disease. For example, Delhi, the home of the TOI and HT, is a locus of dengue impacts in India. This is a topic we address in an upcoming follow-on paper. Thus, in terms of

37

local or national newspapers, it is important to discover news sources that are associated geographically with places where the disease of interest is a significant local problem. From this aspect, it is usually easier to track the detailed epidemiology of diseases from locally reported news articles. We think that our progress in developing news-articles as a proxy for systems of surveillance came because dengue is a significant problem in Delhi (home for TOI and HT) and nearby states. Limitations Certain other factors need to be considered when considering the application of news articles to model disease outbreaks (i.e., statistical models for prediction of disease cases). Both our project here and for example, Google’s Flu Trends [12] suggest that prediction models need to be continuously validated throughout years to avoid the uncertainties and variations of disease outbreaks, as well as news reports. Secondly, for various unknown reasons, online sources show increase in number of reports (e.g., from India). Here, we found that LexisNexis reporting of stories associated with India expanded with time. When the news reports are considered as proxies, the numbers of articles need to be continuous monitored and recalibrated as practical. Experience of this study shows the tremendous advantages available with the consistent reporting structure provided by LexisNexis. The most important in this

38

respect is availably of large article coverage. Also important is the accuracy in searching for relevant articles. However, to a large extent, serious researchers are really at the mercy of the information provider in providing a consistent product. Towards the end of our study, the LexisNexis database was transitioned to Nexis Uni [49]. Nexus Uni features a modernized search capabilities and other improvements. Our preliminary analyses with Nexus Uni found an apparent reduction in the number of articles (as compared with LexisNexis). The result is that time series for dengue in India are less accurate because of a loss in coverage relative LexisNexis’ results. This outcome may be a temporary technical issue or a permanent reduction in coverage. Conclusions The paper describes the potential usefulness in newspaper articles as proxies to monitor outbreaks of certain infectious diseases. These data are useful in better understanding the epidemiology of complex diseases like dengue and zika. Because of their availability online, it is usually easier to access the information from newspapers comparing to formal medical reports. In addition, the disease surveillance systems created using newspapers are potentially useful tools to help the development of medical studies, especially in developing countries and regions with relatively poor medical infrastructures and records. In the case of dengue in India, the study found a

39

strong correlation between numbers of cases and numbers of relevant news reports. Not surprisingly important national newspapers provide a better source of information on diseases than key international outlets. From a disease perspective, our approach confirmed the temporal associations between mosquito-borne infectious diseases (i.e., dengue fever) and the waning monsoon rainfalls of late summer in India. We consider our progress in the creative use of news articles to be unique to the special circumstances of dengue in India. This particular disease impacts the lives of many people in India. Case numbers are growing in recent years but with large temporal variability, all of which create continuing reader interest in news articles. Key national newspapers (TOI and HT) have been diligent in dengue reporting through a relatively large number of English-language articles each year. Most other countries are not so fortunate to have these large numbers of articles. Our experience with zika shows the importance of newsworthiness when news media select topics to reports. For Brazil, news articles provided useful information on the timing of the zika epidemic. As a newly emerging disease, international news outlets put much more focus on zika in Brazil as compared, for example, to reports of dengue in India. In India, the zika reporting appeared to be informational-style articles, given the apparent absence of the most serious manifestations of zika infections.

40

References [1]

M. Moore, P. Gould, B.S. Keary, Global urbanization and impact on health, Int. J. Hyg. Environ. Health. 206 (2003) 269–278. doi:10.1078/1438-4639-00223.

[2]

D.J. Gubler, Dengue, Urbanization and Globalization: The Unholy Trinity of the 21st Century, Trop. Med. Health. 39 (2011) S3–S11. doi:10.2149/tmh.2011-S05.

[3]

N.E.A. Murray, M.B. Quam, A. Wilder-Smith, Epidemiology of dengue: Past, present and future prospects, Clin. Epidemiol. 5 (2013) 299–309. doi:10.2147/CLEP.S34440.

[4]

A.R. Plourde, E.M. Bloch, A literature review of zika virus, Emerg. Infect. Dis. 22 (2016) 1185–1192. doi:10.3201/eid2207.151990.

[5]

A. Gulland, Zika virus is a global public health emergency, declares WHO, BMJ. 352 (2016) i657. doi:10.1136/bmj.i657.

[6]

J. Ikejezie, C.N. Shapiro, J. Kim, M. Chiu, M. Almiron, C. Ugarte, M.A. Espinal, S. Aldighieri, Zika Virus Transmission—Region of the Americas, May 15, 2015– December 15, 2016, Am. J. Transplant. 17 (2017) 1681–1686. doi:10.1111/ajt.14333.

[7]

C.D. Corley, D.J. Cook, A.R. Mikler, K.P. Singh, Text and structural data mining of influenza mentions in web and social media, Int. J. Environ. Res. Public Health. 7 (2010) 596–615. doi:10.3390/ijerph7020596.

[8]

T.W. Grein, K.B.O. Kamara, G. Rodier, A.J. Plant, P. Bovier, M.J. Ryan, T. Ohyama, D.L. Heymann, Rumors of disease in the global village: Outbreak verification, Emerg. Infect. Dis. 6 (2000) 97–102. doi:10.3201/eid0602.000201.

[9]

J.S. Brownstein, C.C. Freifeld, B.Y. Reis, K.D. Mandl, Surveillance sans frontières: Internet-based emerging infectious disease intelligence and the HealthMap project, PLoS Med. 5 (2008) 1019–1024. doi:10.1371/journal.pmed.0050151.

[10]

A. Villanes, E. Griffiths, M. Rappa, C.G. Healey, Dengue fever surveillance in India using text mining in public media, Am. J. Trop. Med. Hyg. 98 (2018) 181– 191. doi:10.4269/ajtmh.17-0253.

41

[11]

A. Rortais, J. Belyaeva, M. Gemo, E. van der Goot, J.P. Linge, MedISys: An earlywarning system for the detection of (re-)emerging food- and feed-borne hazards, Food Res. Int. 43 (2010) 1553–1556. doi:10.1016/j.foodres.2010.04.009.

[12]

D. Butler, When Google got flu wrong, Nature. 494 (2013) 155–156. doi:10.1038/494155a.

[13]

K. Lee, A. Agrawal, A. Choudhary, Real-Time disease surveillance using twitter data:Demonstration on flu and cancer, in: Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., ACM Press, New York, New York, USA, 2013: pp. 1474– 1477. doi:10.1145/2487575.2487709.

[14]

K. Ghazinour, M. Sokolova, S. Matwin, Detecting health-related privacy leaks in social networks using text mining tools, in: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Springer, Berlin, Heidelberg, 2013: pp. 25–39. doi:10.1007/978-3-642-38457-8_3.

[15]

N. Gupta, S. Srivastava, A. Jain, U.C. Chaturvedi, Dengue in India, Indian J. Med. Res. 136 (2012) 373–390. doi:10.1016/S0140-6736(07)61687-0.

[16]

D.S. Shepard, Y.A. Halasa, B.K. Tyagi, S.V. Adhish, D. Nandan, K.S. Karthiga, V. Chellaswamy, M. Gaba, N.K. Arora, Economic and disease burden of dengue Illness in India, Am. J. Trop. Med. Hyg. 91 (2014) 1235–1242. doi:10.4269/ajtmh.14-0002.

[17]

S.A. Rasmussen, D.J. Jamieson, M.A. Honein, L.R. Petersen, Zika Virus and Birth Defects — Reviewing the Evidence for Causality, N. Engl. J. Med. 374 (2016) 1981–1987. doi:10.1056/NEJMsr1604338.

[18]

C. Abrams, This Map Shows Where India’s Huge Diaspora Lives, Wall Str. J. (2016). https://blogs.wsj.com/indiarealtime/2016/01/19/india-has-worlds-biggestdiaspora-and-this-map-shows-where-they-are/ (accessed January 22, 2019).

[19]

World Health Organization, WHO | Dengue, http://www.who.int/denguecontrol/en/ (accessed November 26, 2018).

[20]

Pan American Health Organization / World Health Organization, ZikaEpidemiological Report Brazil, 2017. doi:10.1126/science.aaf5036.

42

(2018).

[21]

LEXIS-NEXIS® Academic, (2018). https://www.lexisnexis.com/communities/academic/w/wiki/30.lexisnexisacademic-general-information.aspx (accessed November 21, 2018).

[22]

Factiva, (2018). https://www.dowjones.com/products/factiva/ December 31, 2018).

[23]

InfoTrac Newsstand, (2018). (accessed December 31, 2018).

[24]

Newspapers.com, (2018). https://go.newspapers.com/welcome?xid=767&gclid=Cj0KCQiAmafhBRDUAR IsACOKERO4PtO4FBuYZuILilnBvTeBoKtWS6I2txdBOkPPRFWjdAyB7rwon B4aAkbiEALw_wcB (accessed December 31, 2018).

[25]

R.L. Toblin, L.J. Paulozzi, J. Gilchrist, P.J. Russell, Unintentional strangulation deaths from the “Choking Game” among youths aged 6-19 years - United States, 1995-2007, J. Safety Res. 39 (2008) 445–448. doi:10.1016/j.jsr.2008.06.002.

[26]

D.A. Weaver, B. Bimber, Finding news stories: A comparison of searches using LexisNexis and google news, Journal. Mass Commun. Q. 85 (2008) 515–530. doi:10.1177/107769900808500303.

[27]

S.L. Pruitt, P.D. Mullen, Contraception or abortion? Inaccurate descriptions of emergency contraception in newspaper articles, 1992-2002, Contraception. 71 (2005) 14–21. doi:10.1016/j.contraception.2004.07.012.

[28]

X. Ji, P.-Y. Yen, Using MEDLINE Elemental Similarity to Assist in the Article Screening Process for Systematic Reviews, JMIR Med. Informatics. 3 (2015) e28. doi:10.2196/medinform.3982.

[29]

X. Ji, A. Ritter, P.Y. Yen, Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews, J. Biomed. Inform. 69 (2017) 33–42. doi:10.1016/j.jbi.2017.03.007.

[30]

X. Ji, R. Machiraju, A. Ritter, P.-Y. Yen, Visualizing article similarities via sparsified article network and map projection for systematic reviews, in: Stud. Health Technol. Inform., 2017: pp. 422–426. doi:10.3233/978-1-61499-830-3-422.

(accessed

https://www.gale.com/c/infotrac-newsstand

43

[31]

X. Ji, R. Machiraju, A. Ritter, P.-Y. Yen, Examining the Distribution, Modularity, and Community Structure in Article Networks for Systematic Reviews., AMIA ... Annu. Symp. Proceedings. AMIA Symp. 2015 (2015) 1927–1936. http://www.ncbi.nlm.nih.gov/pubmed/26958292.

[32]

X. Ji, H.W. Shen, A. Ritter, R. MacHiraju, P.Y. Yen, Visual Exploration of Neural Document Embedding in Information Retrieval: Semantics and Feature Selection, IEEE Trans. Vis. Comput. Graph. 25 (2019) 2181–2192. doi:10.1109/TVCG.2019.2903946.

[33]

A. Albin, X. Ji, T.B. Borlawsky, Z. Ye, S. Lin, P.R. Payne, K. Huang, Y. Xiang, Enabling online studies of conceptual relationships between medical terms: Developing an efficient web platform, J. Med. Internet Res. 16 (2014) e23. doi:10.2196/medinform.3387.

[34]

C. Zhang, C. Ré, GeoDeepDive : Statistical Inference using Familiar DataProcessing Languages, Sigmod. (2013) 993–996. doi:10.1145/2463676.2463680.

[35]

L. Hirschman, J.C. Park, J. Tsujii, L. Wong, C.H. Wu, Accomplishments and challenges in literature data mining for biology, Bioinformatics. 18 (2002) 1553– 1561. doi:10.1093/bioinformatics/18.12.1553.

[36]

E.W.T. Ngai, Y. Hu, Y.H. Wong, Y. Chen, X. Sun, The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature, Decis. Support Syst. 50 (2011) 559–569. doi:10.1016/j.dss.2010.08.006.

[37]

E.W.T. Ngai, L. Xiu, D.C.K. Chau, Application of data mining techniques in customer relationship management: A literature review and classification, Expert Syst. Appl. 36 (2009) 2592–2602. doi:10.1016/j.eswa.2008.02.021.

[38]

F.W. Schwartz, Y. Zhang, M. Ibaraki, What’s Next? Now that the Boom in Contaminant Hydrogeology has Busted, Groundwater. 57 (2019) 205–215. doi:10.1111/gwat.12851.

[39]

Y. Zhang, X. Ji, M. Ibaraki, F.W. Schwartz, Mining Information from Collections of Papers: Illustrative Analysis of Groundwater and Disease, Groundwater. 56 (2018) 993–1001. doi:10.1111/gwat.12804.

44

[40]

X. Ji, R. Machiraju, A. Ritter, P.-Y. Yen, Visualizing Article Similarities via Sparsified Article Network and Map Projection for Systematic Reviews, MedInfo 2017. (2017) 2–6.

[41]

V.H. Ratageri, T.A. Shepur, P.K. Wari, S.C. Chavan, I.B. Mujahid, P.N. Yergolkar, Clinical profile and outcome of dengue fever cases, Indian J. Pediatr. 72 (2005) 705–706. doi:10.1007/BF02724083.

[42]

S.R. Mutheneni, A.P. Morse, C. Caminade, S.M. Upadhyayula, Dengue burden in India: Recent trends and importance of climatic parameters, Emerg. Microbes Infect. 6 (2017) e70. doi:10.1038/emi.2017.57.

[43]

M. Murhekar, V. Joshua, K. Kanagasabai, V. Shete, M. Ravi, R. Ramachandran, R. Sabarinathan, B. Kirubakaran, N. Gupta, S. Mehendale, Epidemiology of dengue fever in India, based on laboratory surveillance data, 2014–2017, Int. J. Infect. Dis. (2019). doi:10.1016/j.ijid.2019.01.004.

[44]

M.G. Teixeira, M. Da Conceição N Costa, W.K. De Oliveira, M.L. Nunes, L.C. Rodrigues, The epidemic of Zika virus-related microcephaly in Brazil: Detection, control, etiology, and future scenarios, Am. J. Public Health. 106 (2016) 601–605. doi:10.2105/AJPH.2016.303113.

[45]

C. Barcellos, D.R. Xavier, A.L. Pavão, C.S. Boccolini, M.F. Pina, M. Pedroso, D. Romero, A.R. Romão, Increased hospitalizations for neuropathies as indicators of Zika virus infection, according to health information system data, Brazil, Emerg. Infect. Dis. 22 (2016) 1894–1899. doi:10.3201/eid2211.160901.

[46]

World Map of Areas with Risk of Zika, CDC. (2018). https://wwwnc.cdc.gov/travel/page/world-map-areas-with-zika (accessed January 16, 2019).

[47]

C.G.P. Doss, R. Siva, B.P. Christopher, C. Chakraborty, H. Zhu, Zika: How safe is India?, Infect. Dis. Poverty. 6 (2017) 37. doi:10.1186/s40249-016-0234-6.

[48]

In Rajasthan, number of Zika-hit 100 | India News - Times of India, The Times of India. (2018). https://timesofindia.indiatimes.com/india/in-rajasthan-number-ofzika-hit-100/articleshow/66278777.cms (accessed January 16, 2019).

[49]

Nexis

Uni,

(2019).

https://www.lexisnexis.com/en-us/support/nexis-

45

uni/default.page (accessed March 5, 2019).

Highlights  

News reports are verified to provide near real-time indications of outbreaks The numbers of dengue cases are strongly correlated with the those of news reports



Local newspapers provide better sources of information for dengue fever in India



News reports provide useful information on the timing of zika outbreaks in Brazil

46

Yiding Zhang: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Visualization, Writing - Original Draft. Motomu Ibaraki: Conceptualization, Methodology, Writing - Review & Editing, Supervision. Frank Schwartz: Conceptualization, Methodology, Writing - Review & Editing, Supervision.

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

47