16 Data for Epidemiology and Public Health, and Big Data1
16.1. Introduction The approaches used by epidemiologists are diverse: they range from “field studies” for modeling and healthcare monitoring, to methods developed for researching and combating the emergence of diseases. Their analytical tools focus on the bio-statistics used as a tool to objectify phenomena studied in well-defined populations. If we had to pinpoint the birth of modern epidemiology, we would propose the end of the 1940s. The case–control study on the association between tobacco and lung cancer carried out by Doll and Hill [DOL 50] in 1948 and Framingham’s cardiovascular risk factor cohort study the same year [DAW 51] marks the development of this discipline and illustrates the methodologies (case–control studies, cohort studies), which shape the paradigms of research in this field. It should be noted that epidemiologists focused on having data collected specifically for the research protocols that they were developing. In doing so, they sought a quality of data that was necessary, depending on the sample sizes and cohorts studied, for reaching statistically significant conclusions. Reuse of data already available, for example in patient records, was low. 1 The questions posed by data processing for epidemiology and public health are often similar to those discussed in the chapters on clinical research (Chapter 18) and bioinformatics data (Chapter 17). For the sake of clarity, we address these questions in different chapters, although the problems are of the same nature and the solutions are isomorphic. In order to avoid too much repetition, the issue of big data is discussed here without going into the content of the other chapters.
198
Health Data Processing
The researchers’ point of view on this issue has now changed. They identified the resources that constitute these data and that should be exploited. The reuse of the data available in the various information systems, including administrative systems, is therefore desirable. Without calling into question the methodological foundations mentioned above, the data currently available lead to the opening up of other avenues of analysis to generate hypotheses or detect signals for vigilance, for example. In public health, health monitoring is aimed particularly at the early detection of epidemics in order to alert authorities to the risk identified, investigate its cause and put into place appropriate measures to limit its effects. The collection of surveillance data is based on the reporting of events produced by different sources (doctors, hospitals and healthcare establishments, etc.). The quality control of these data is a constant concern for epidemiologists. Various systems have been developed to carry out this monitoring. We will briefly present the Système de Surveillance Sanitaire des Urgences et des Décès, the system set up in France for the monitoring of emergencies and deaths in healthcare (SurSaUD) following the heat wave of 2004. Similarly, in the fields of both communicable and non-communicable diseases (neoplastic and cardiovascular diseases, behavioral disorders, etc.), prevention has taken on great importance and can now be considered a priority. This explains the considerable development of modern epidemiology. This evolution of public health is currently colliding with the development of information systems, sensors and connected objects for health, among other things. As a result, the data known as “big data” (also known as massive data or mega-data) have exploded into the field of public health and, there as elsewhere, are contributing to changing approaches and developing new concepts. Observational research developed using these tools complements experimental research by providing large and varied population samples. It can also contribute to the design of new experimental studies and the standardization of the results. These aspects will be discussed in the second part of this chapter. 16.2. Multi-source monitoring systems 16.2.1. Monitoring devices A monitoring device is a system for collecting information that allows adverse reactions related to the use of products or equipment to be detected. For example, this includes pharmacovigilance, hemovigilance and materiovigilance.
Data for Epidemiology and Public Health, and Big Data
199
In order for a detection strategy to be effective and enable early action to be taken, an appropriate public health infrastructure must be established and effective monitoring and surveillance systems must be in place, whether they are structured monitoring systems, the reporting of unusual events from human and animal healthcare professionals, or more generally from the world of research. Traditionally, monitoring devices have focused on ad hoc information systems based on a dedicated collection of reported information. More recently, this outlook has been revised and enhanced. Systems are multi-source and often based on reusing data. 16.2.2. The French Système de Surveillance Sanitaire des Urgences et des Décès (healthcare monitoring system for emergencies and deaths: SurSaUD) The overall organization of the system is shown in Figure 16.1. The system is fed by four sources of information: – Hospital emergency services in the OSCOUR network (Organisation de la surveillance coordonnée des urgences – Organization for Coordinated Emergency Monitoring). – Associations of city emergency doctors. These networks of independent practitioners participate in continuous outpatient care in collaboration with the Service d’Aide Médicale Urgente (Emergency Medical Service – SAMU). – Mortality data transmitted by INSEE. Every day, INSEE transmits to the InVS2 demographic data on deaths registered the previous day by the municipalities in the network set up by the Institute. – Electronic certifications of deaths. Doctors can certify deaths electronically using a secure application. This system allows data from the medical part of the certificate to be transmitted to the InVS and ARSs within 30 minutes of its validation by the doctor. At present, only a small number of deaths are collected in this way. The data are individual, and are extracted from the business software of the various professionals and transmitted to the InVS. They correspond to the activity of these professionals the day prior to the shipment.
2 InVS: Institut de Veille Sanitaire (Insitute for Healthcare Monitoring), part of the Santé Publique France (French Public Health) agency.
200
Health Data Processing
Figure 16.1. Architecture of the French monitoring system SurSaUD [CAS 14]. For a color version of this figure, see www.iste.co.uk/fieschi/health.zip
The system proposes the routine monitoring of a number of pathologies. These are summarized in Table 16.1. 16.3. The challenges and opportunities of big data for public health 16.3.1. Big data call into question the traditional principles of data processing Big data is a collection of technologies and processes designed to manage very large volumes of data and analyze them very quickly. Their emergence calls certain fundamental principles of data processing into question. Big data is based on a specific infrastructure for both storage and the “processor” resources required for processing. These are characterized by the four Vs, which constitute challenges for processing these data: – V for volume. The quantity of data is enormous. This is particularly due to the connected objects and different sensors that have been used in the field for a number of years.
Data for Epidemiology and Public Health, and Big Data
201
– V for velocity (speed). Velocity refers both to the speed at which the data are generated or captured and to the capacity for responsiveness in the data processing that needs to be put into place. – V for variety (heterogeneity of data and sources). The data comes in different formats (digital, text, image, etc.) and can come from different providers/sources (patient records, environmental data, mHealth sensors and applications, social media, etc.). – V for value (of the data). The (scientific, economic, social, etc.) value of the data most often lies in the original idea to (re)use and/or interconnect them in order to obtain new knowledge. Syndromes routinely monitored
Periodicity
Deaths Mortality monitoring
Year round
Infectious diseases Flu, flu-like symptoms, bronchiolitis, viral meningitis Gastro-enteritis, measles
October – March Year round
Other pathologies Asthma, discomfort, fevers, allergies Monitoring of various cardiological, neurological, infectious, gastroenterological, traumatological, urological, psychiatric and pneumological pathologies
Summer, spring, fall Year round
Health events Impact of low temperatures Impact of heatwaves Carbon monoxide poisoning Impact of exceptional events (storms, floods, etc.)
October – March June – August Year round From the occurrence of the event; length varies in accordance with its scale
Other specific events Impact of an industrial accident (monitoring of respiratory pathologies, etc.), large gatherings of people
From the occurrence of the event; length varies in accordance with its scale
Table 16.1. Pathologies followed as routine and in exceptional circumstances in the SurSaUD system [CAS 14]
202
Health Data Processing
A fifth V, for “veracity”, can also help us to understand the new big data paradigm. With big data, the quality is generally not controlled, as it can be with a traditional data set. Missing values are more frequent. The precision and the exact representation of the measured fact can have some imperfections that the methods of analysis used in the big data must handle. It should be noted that this new approach to data processing requires the special skills of a data scientist manager. This person masters the methods of data analysis, semantic coherence and interoperability, as well as information processing technologies that enable informed technological choices to be made. The dimensions of this competence currently represent the core of the field of medical informatics. When discussing the areas mentioned above, technologies for data retention, data storage and high-performance data mining and analysis (big data analytics) will not be addressed. We will simply state that the volume of big data allows us to move away from traditional methods of statistical analysis. It proposes a comprehensiveness that challenges traditional sampling methods. Within the framework of a systemic approach to processing information, our analysis is focused on issues of semantic consistency, business processes and interoperability: – Epidemiological research was almost exclusively based on a hypothesis-driven approach, with a methodology (cohort studies, case–control studies) and specific criteria for evaluating the quality of the research. As a result, epidemiologists have traditionally had little interest in data collected for other purposes (hospital information systems, computerized patient records, etc.). With big data, we move on to a “data-driven” approach that involves a significant change, a difficulty in recognizing the scientific quality of this approach and sometimes a lack of perception of the stakes in national institutions (Figure 16.2). – The big data illustrates, through the data reuse on which it is often built, the usefulness of information systems which, owing to their large quantities of data, are of interest in the research of new hypotheses or new knowledge. This interest requires the search for semantic coherence and permeability between clinical, medico-social and medico-economic information systems, and so on. The search for working hypotheses in the field of health, environmental and socio-economic factors presupposes the linking of health and non-health data. These data come from many structured or unstructured databases, using a variety of measurement and collection methods that include missing data. They amplify the problems encountered, well identified in the past, in the development of data warehouses.
Data for Epidemiology and Public Health, and Big Data
Hypothesis
Protocol
Data
203
Evidence
Data Hypothesis
Evidence
Figure 16.2. The enriched paradigm. The upper part of the diagram illustrates the classical methodology. The lower part, carried by big data, enriched the traditional model
16.3.2. Big data pave the way for tailor-made medicine Although it does not fall within the scope of epidemiology in the traditional sense of the term owing to the personal nature of its application, at this point we should mention the potential of big data. Among the multiple opportunities that they offer are applications focused on the use of large volumes of, often biological (genome, omic), data with individual aims3. Predictive medicine, individual risk assessment or diagnostic decisions for medicines popularized by the term “personalized medicine” or “precision medicine” are currently central to applications. In addition, they have an economic and clinical value for the development of integrated care provision, traceability in materiovigilance, the monitoring of chronic diseases and so on. 16.3.3. Improving knowledge and development The example of using data collected with mobile devices for development purposes shows another way of applying big data in the field of public health. Through their networks and antennae, cell phone operators collect billions of data on calls and SMS (call times and duration, geolocation through antenna changes, periods of inactivity, etc.). These data, which can be cross-checked with payment data when mobile operators also provide financial services, as is the case in Africa, constitute a valuable source of knowledge on the movements and behaviors of populations, and their living conditions. Their use for humanitarian purposes (to prevent epidemics or manage actions), understanding epidemics, analyzing the 3 See the chapter on bioinformatics data (Chapter 17).
204
Health Data Processing
distribution of populations, and understanding migration trends is at the root of much of the work carried out. The United Nations agencies working in the fields of development (UNDP, FAO) and health (WHO) are closely monitoring the development of this work. 16.3.4. Processing more data to accelerate research and development The recruitment of patients for a clinical trial benefits greatly from big data methods. By cross-checking and analyzing a large amount of data, it is possible to quickly identify patients eligible for the proposed study. The usefulness to pharmaceutical research is also clear. Recording and analyzing all known data on a molecule can determine whether it is likely to act on other pathologies, or simulate its action in a given context. 16.3.5. Reducing healthcare expenditure According to a study by McKinsey, big data could contribute more than $300 billion a year to the US healthcare system. For French experts, the economic benefit of big data is indisputable, as the massive processing of data can make it possible to eliminate unnecessary expenditure (e.g. by reducing hospital readmission rates). 16.4. Epidemiology and big data Whether observational studies (cohorts, health dashboard, real-time monitoring), experimental studies (therapeutic trials) or theoretical studies (epidemic models), all types of epidemiological research are affected by big data. They differ from traditional decision support tools in management and public health, allowing real-time data collection and analysis. Three examples illustrate their impact: – The integration of collective and individual journeys (aircraft, public transport, etc.) into the modeling of the development of epidemics. – The 2011 diarrhea epidemic in Germany [MAN 14], where the modeling of food distribution channels by product category identified the source of the epidemic. – The prediction of epidemic locations and identification of risk areas for Ebola [PIG 14] using satellite and meteorological data. Within this new wave of research, we can single out the “Precision Medicine” initiative launched by President Obama in 2015. This aims to form a cohort of
Data for Epidemiology and Public Health, and Big Data
205
one million American volunteers for whom the computerized medical record will be enriched with genetic, biological and dietary data. This initiative (the “All of Us” project) aims to launch a new model for developing scientific knowledge, involving participants and empowering stakeholders for data sharing and privacy protection. 16.5. Multiple and heterogeneous data sources Big data provides new data, increasingly often in real time, which is a considerable advantage for some applications. The tools for aggregating and visualizing these multiple data sources have led to a new generation of computer applications. Social networks are a source of sociologically interesting data and are used by patients to share their healthcare experiences, especially when using medications. They are enjoying increased, even widespread, usage and provide unstructured data that raise methodological questions. Social networks are a new medium for direct patient expression, and are potentially useful in different areas of public health such as the detection of adverse drug reactions; see [SAR 15] for an example. We also have sociological data with a precise degree of granularity, mobility and contact data, geographic data and so on. These allow us to carry out analyses beyond their individual scope prior to obtaining big data and, owing to the multiplicity of sources, to develop more “realistic” models (the contribution of artificial intelligence methods is extremely promising). For example, Twitter data could be reused to assess the mood swings of thousands of people around the world, throughout the day and seasons [GOL 11]. The crucial challenge today is to support these techniques and integrate them into a coherent healthcare system that complements the techniques as they open up new fields of investigation, data and knowledge. 16.6. The contribution of big data and e-health to prevention, 4 monitoring and health vigilance 16.6.1. Monitoring and vigilance WHO identifies three types of preventive action: 4 This section takes large extracts from the following article: Fieschi M., Robin J.Y., “Esanté : nouvelles perspectives pour la sécurité sanitaire ?” (E-health: new possibilities for healthcare security?), Informatique et Santé, vol. 20, pp. 185–196, Lavoisier, 2017.
206
Health Data Processing
– primary prevention, which aims to prevent the appearance of the disease by acting on its causes; – secondary prevention, which aims to detect the disease or lesion that precedes it at a stage where the effective management of the unwell individuals can be useful; – tertiary prevention, which aims to reduce the prevalence of recurrence and consequent disabilities. Quaternary prevention can be added to this list. This is characterized by all healthcare activities designed to mitigate or avoid the consequences of the unnecessary, excessive or even iatrogenic interventions on the part of the healthcare system. The efficacy and safety of medicinal products contribute to preventing health risks, which must be accompanied by effective pharmacovigilance. Health security belongs to the prevention register and is placed ahead of problems, as it aims to prevent crises and detect emerging phenomena, whether they are health-related, environmental or societal in nature. Health crises in recent years have shown that it is necessary to anticipate potentially risky situations in order to help with improving decision-making. It is therefore a question not only of using the measures known to be effective on known and expected risks (e.g. monitoring of influenza), but also of being able to identify new and unexpected situations that could lead to health problems as soon as possible (e.g. vigilance in pharmacology). In the two areas concerned, surveillance and vigilance, data and developments in e-health are currently making a clear contribution. Here are two examples to illustrate this: – Data analysis leading to the withdrawal of dangerous medications; Vioxx [GRA 04]: 80 million patients took this medication between 1999 and 2004. The cardiovascular risks of Vioxx were highlighted in a number of studies carried out on small samples. The Merck laboratory made a decision on this problem after five years. It took a study by D.J. Graham of the FDA, which involved 1.4 million patients from the Kaiser Permanente database and compared the risk of adverse cardiovascular events for Vioxx users against the risk for users of Pfizer’s Celebrex. The study concluded that more than 27,000 myocardial infarctions (heart attacks) and avoidable sudden deaths occurred between 1999 and 2003.
Data for Epidemiology and Public Health, and Big Data
207
– Oral contraception and cardiovascular risk: In 2013, the French health authorities were led to reimburse third-generation contraceptive pills following studies showing a slight but significant increase in the risk of pulmonary embolism, or even heart attack or stroke, if hormonal contraceptives were taken regularly. A study, published in the BMJ, was conducted in France on five million women aged 15–49 years old [WEI 16], on the basis of health insurance data, the SNIIRAM database, and the PMSI database, in order to study the risk of pulmonary embolism and thromboembolism in women using a third-generation contraceptive pill. The results give a refined estimate of the overall frequency of cardiovascular incidents while taking the pill. It appears to be slightly higher, at least for venous thromboembolic incidents, than in the general population. In the USA, monitoring systems have benefited from new momentum owing to the incentives given by the meaningful use concept already mentioned several times. In the latest version of the criteria required by this program, hospitals and doctors are required to provide data to public healthcare bodies in order to support syndromic and infectious disease surveillance systems and immunization registries. The global adoption of monitoring systems, especially for reporting the results of laboratory tests, is growing rapidly. According to Wu [WU 14], in 2014, nearly twothirds of the states in the USA had adopted the transmission of biological results, in part thanks to meaningful use regulations. 16.6.2. The contribution of big data is heralding a significant change for researchers in epidemiology and public health This change expands and considerably magnifies the previous change that has been developing for years, stemming from data mining in data warehouses. It is linked to the origin of the data, which are preliminary to the issue to be analyzed. The experiment is no longer planned in relation to the objective or the question asked. Data come from warehouses, traditional databases or web-based resources, stored for other reasons. This is a multi-source data-driven approach. The development of connected objects and geolocalization practices also greatly increases the amount of data available. It also addresses research questions owing to the availability of previously inaccessible data. Data aggregators from different sources and visualization tools have spawned a new generation of disease-monitoring applications that can search, filter and visualize data in order to understand epidemics online in real time. “Seeing is believing” is a statement that validly applies not only to imaging techniques, but also to presenting and giving meaning to health data.
208
Health Data Processing
These data can facilitate disease prevention: – by increasing our capacity to understand the behavioral, social and environmental determinants of the health of the population; – by enabling preventive efforts to be better targeted to the subpopulations for whom these efforts are most effective; – by providing research with new ways to identify modifiable disease risk factors. 16.7. The heterogeneity of data, a feature of big data, underlines the importance of interoperability standards If the multiplicity of data sources and the large masses of information that they generate are bearers value creation, the difficulties in using them are significant in a world which lacks standardization. As a result, if it were necessary to reaffirm this, the interoperability needed for individual patient care would also become a priority for public healthcare applications. Epidemiological researchers must take this new dimension into account when developing this discipline. They must urge national agencies and organizations which are developing activities in the healthcare and medico-social fields to review their strategies for developing information systems, by integrating this issue in order to promote semantic interoperability and the processing of these data on a large scale. 16.7.1. Owing to the fragmented adoption of terminological standards, much of these data remain unusable in isolated computer systems Data sharing in this context, where semantics are not sufficiently standardized, can lead to data that cannot be used, or analytical difficulties or errors. There is no single technology that can solve the problem related to the heterogeneity of big data. Different techniques should be combined to improve the feasibility of processing. The problem to be addressed can be seen as the interoperability of large distributed systems providing data or data analysis services. In this case, web service technologies can address the complexity of networked systems. Semantics is processed by enriching data, which is carried out by linking each data element to a shared ontology. Ontologies (see Appendix 6) are used here to provide a common interpretation of the terminologies used in the different data sources.
Data for Epidemiology and Public Health, and Big Data
209
16.7.2. Data-sharing initiatives for research purposes A policy of openness to scientific analysis of multiple data sources has been developed in the USA for many years. This is the case, for example, with the Kaiser Permanente, the largest integrated care management organization, founded in 1945. It has long affirmed a policy of openness on data management that is not limited to administrative or billing data. It operates in nine states, has a database that gathers the health data of nine million policyholders and has set up a Division of Research to exploit its data. The projects referred to in this chapter are just a few examples. They are consistent with those mentioned in the chapter on data integration in clinical research (Chapter 18). In both cases, the projects aim to improve or even ensure better interoperability using technology platforms. These platforms provide different models for integrating and using heterogeneous data. The questions that they raise are partially of a technical nature in order to ensure consistent data models and good semantic interoperability, but also of a managerial and legal (data protection) nature, especially when implemented in international projects. 5 16.7.2.1. The FDA’s Sentinel system in the USA
In February 2016, the FDA6, following an initial phase of the Mini-Sentinel project launched in 2007, launched the Sentinel project, which uses a distributed data infrastructure to enable the FDA to quickly access the healthcare data of more than 193 million patients and carry out active monitoring. The network provides access to data from multiple institutions and the expertise of collaborating centers. This initiative complements other FDA health monitoring systems. 16.7.2.2. PCORnet (Patient-Centered Clinical Research Network) The PCORnet is an initiative of the Patient-Centered Outcomes Research Institute (PCORI). It is a national network in the USA with clinical and patient-centered research networks. It is designed to conduct clinical research studies that are faster, easier to implement and cheaper owing to the power of large quantities of health data. This network provides the PCORnet CDM data model based on the common data model of various AHRQ7 distributed research projects, the FDA’s 5 http://www.fda.gov/Safety/FDAsSentinelInitiative/ucm149340.htm. 6 FDA: Food and Drug Administration. 7 AHRQ: Agency for Healthcare Research and Quality.
210
Health Data Processing
Mini-Sentinel project8 and the Standards & Interoperability Framework Query Health Initiative. The PCORnet CDM uses standard terminologies and healthcare coding systems (including ICD, SNOMED, LOINC, etc.) to enable interoperability and respond to the evolution of data standards. 16.7.2.3. OMOP (Observational Medical Outcomes Partnership) This public–private partnership was created in order to use medical observation databases to study the effects of medical products. OMOP has several objectives: – to carry out methodological research to empirically evaluate the performance of various analytical methods and their ability to identify proven associations, and to avoid misleading correlations; – to develop tools for transforming, characterizing and analyzing disparate data sources across the spectrum of healthcare delivery; – to establish a shared resource in order to facilitate the collaboration of research teams. 16.7.2.4. Observational Health Data Sciences and Informatics (OHDSI, pronounced “Odyssey”) [HRI 15] OHDSI is an interdisciplinary, international collaborative project involving more than 120 researchers from over 12 countries, whose goal is to create and apply open-source data analysis solutions to a vast network of health databases. The OHDSI collaboration brings together all of the researchers from the initial OMOP project and develops its tools using the data model and the shared OMOP vocabulary. The team includes academics, industry scientists, healthcare providers and regulatory bodies. Its mission is to help us to transform medical decision-making by creating reliable scientific evidence relating to diseases, care provision and the effects of medical interventions using the analysis of large-scale, population-wide health observation databases. Researchers pool their expertise at all levels (from infrastructure to clinical research) in order to ensure the congruence of infrastructure developments and clinical research needs. The Common Data Model (CDM) [OVE 12] makes it possible to represent health data from a variety of sources in a consistent and standardized manner. The consortium has developed tools to facilitate data mining. These tools include: – ACHILLES (Automated Characterization of Health Information at Large-scale Longitudinal Exploration System), a browser-based visualization tool. It provides users with an interactive and exploratory framework for a clinical database in CDM 8 www.mini-sentinel.org.
Data for Epidemiology and Public Health, and Big Data
211
format based on summary statistics already extracted from data sets (demographic data, prevalence, medications, etc.). It allows quality assessment and the visualization of observation data. It is available as open source9 on the Web. – HERMES (Health Entity Relationship and Metadata Exploration System) is a web-browsing tool for exploring a set vocabulary with the ability to search for a term and explore related concepts. OHDSI addresses a number of key themes: – Data standardization: OHDSI adopted the OMOP data model as a platform, and is developing special open-source tools and processes to help with implementing the model as well as standardized analysis tools within partner institutions. – Monitoring of the safety of medical products: The aim is to identify and evaluate the potential associations between exposure to any medical product and its adverse effects. OHDSI aims to establish an international system of risk identification and analysis, which is accessible to the public and proactive in detecting the potential effects of medications. The collaboration is also setting up an open-source reference terminology, so that any organization with observational data, who wishes to contribute to this objective, can produce (using open-source tools) and share evidence. – Comparative research on effectiveness: In order to inform the public about possible treatments and enable them to make direct comparisons between their effectiveness, OHDSI develops open-source tools to generate evidence based on observational data. – Personalized risk prediction: Patient-level predictive modeling can provide personalized risk-related information to estimate the potential of an adverse reaction, based on known medical history and past health behavior. – Data characterization: In order to generate reliable and transparent evidence about the real world, OHDSI develops analyses of data collected for purposes other than research (insurance claims, EHR, etc.). To help improve interpretation, the origin of the data must be taken into account. OHDSI is developing tools to assess data quality and 9 ACHILLES Repository: http://github.com/ohdsi/achilles. ACHILLES Demonstration: http://ohdsi.org/web/achilles/.
212
Health Data Processing
database profiling in order to determine whether a database can be used for a given analysis. – Quality improvement: Healthcare systems seek to improve the quality of care received by their patients. OHDSI is developing open-source tools to facilitate this work through the systematic application of quality measures in observational data in the OMOP common data model. 16.8. Conclusion This overview allows us to illustrate the problem of data processing in epidemiology and in public health more generally. Data reuse in this field is taking on a central role that it did not play ten years ago. The examples show the value of a “data-driven” strategy, which emerged with the availability of big data. The question of semantics is central to the successful exploitation of these data, which we have been able to consider for some time, following their initial use as “data cemeteries”. The evolution of data reuse (in all areas, not limited exclusively to the field of health) raises questions about individual freedoms. Nowadays, the total anonymization of the data is illusory, as the profiles of consumption, health situation or use enable a high probability of individual identification. The reuse and sharing of data poses methodological and technical problems, as well as regulatory and legislative ones. The French law on this point changed in 2016, and the European regulation published in 2016 should apply to EU countries from 2018 (see chapter on data protection). In terms of ethics, it is crucial to make the restrictions on access to data more flexible, for the benefit of both researchers and public interest.