Taking a byte out of big data

Taking a byte out of big data

COMMENTARIES Editorials represent the opinions of the authors and not necessarily those of the American Dental Association. EDITORIAL ...

127KB Sizes 0 Downloads 72 Views

COMMENTARIES

Editorials represent the opinions of the authors and not necessarily those of the American Dental Association.

EDITORIAL



Taking a byte out of big data Michael Glick, DMD

A

mong buzzwords, “big data” is a term and concept that is being hotly debated and is rapidly becoming an essential tool in the care of our patients. The idea of big data has been discussed for more than a decade, and its use is continuously being redefined. Basically, in health care, big data is the use of data that are too big and too cumbersome for health care providers to process with existing tools and technologies. The following 6 Vs are attributes that are commonly used to define, explain, and describe the concept of big data: - value (relevance of the data); - variability (evolution and seasonality of diseases); - variety (data from different categories, taxonomies, and data sources); - volume (quantity of data and high-throughput technologies); - velocity (speed of processing and generation of new data); - veracity (quality of data). For example, an ever-growing number of companies are offering genetic testing to both health care providers and the public, and it is important to put such output into the perspective of big data. What is the value, variability, variety, volume, velocity, and veracity of available genetic testing? Making sense of available health care data that may soon reach an output measured in zettabytes (1021) or even yottabytes (1024) is an impossible task unless we develop and embrace new data management technologies. Data are continuously generated by real-time imaging (for example, cardiovascular magnetic resonance imaging), point-of-care devices, and various and sundry mobile and wearable devices. Advances in technology, including the ability to detect even minute processes such as metabolic signaling, will generate data that have never been seen before and that will result in the development of yet unheard of therapeutic agents.1 Health care professionals soon will be able to decode and interpret realtime patient data that may include an oral microbiome that will denote a state of health or disease; provide genomic, proteomic, transcriptomic, and metabolomic data to be used in pharmacogenomics, as well as for precision or personalized oral health care; and suggest specific dental materials and other treatment modalities that can interact directly with a patient’s own tissues. The reason for using big data in health care is to provide better, more efficient, and more evidence-based clinical care (care that answers clinical questions that are supported by observational evidence). Does big data create a hypothesis or will a hypothesis create big data? Having large data sets invites searches for statistically significant findings, which can result in a retrospective hypothesis or a post hoc analysis—one created after analyzing results. Unfortunately, commonly used statistical methods are not good at delineating

JADA 146(11) http://jada.ada.org

November 2015 793

COMMENTARIES

significant findings from large amounts of data, as large amounts of data almost always will result in some kind of statistical significance. Thus, health care professionals need to be able to sift through and be selective when choosing which particular data set to use. Present algorithms may not be sufficient. The advantage of using big data is the generation of predictive disease models for both chronic and acute conditions that can be made on the basis of voluminous patient information, sometimes even in real time. One pitfall of using big data is not being mindful of the gravitational pull of larger data sets, which will overwhelm significant and important information from smaller data sets. Translational genomics already have helped better identify subtypes of different cancers and subsequently improved treatment. For example, targeted therapies—a treatment that takes advantage of gene changes associated with the development of specific cancers—have shown great promise for better outcomes in patients with breast cancer.2 In other areas, pharmacogenetic-guided anticoagulation dosing with warfarin has shown greater effectiveness and safety.3 The Human Genome Project,4 the Research Collaboratory for Structural Bioinformatics Protein Data Bank,5 and the Human Metabolome Database6 are 3 large databases that provide new insights into a person’s susceptibility to specific diseases and conditions and may be able to help clinicians discern in more detail disease etiology, prevention, treatment, and cures.7 Such enormous data sets enable better understanding of complex disease patterns and facilitate the discovery of novel and clinically useful biomarkers. Big data also will change the way we define and diagnose oral diseases, including periodontal diseases, inflammatory and immunologic pathologies, and even cancers. Using

794 JADA 146(11) http://jada.ada.org

different -omic markers will result in the recognition that oral and oropharyngeal cancers actually are several different diseases with different causes, treatments, and cure rates.8 The availability of better and more data also eventually will result in more insight into the pathogenesis and biological pathways of many commonly occurring diseases, such as diabetes and cardiovascular diseases, which will assist in better surveillance and health outcomes. The use of genomic, transcriptomic, proteomic, and metabolomic data, together with physiologic monitoring, will create an integrative personal -omics profile that can enhance our understanding of a person’s overall health and disease status far beyond today’s commonly used screening and diagnostic tools.9,10 How to effectively use this enormous volume of data for clinical care poses an interesting and ambitious challenge. Already, a third term used to describe an experimental model has joined the commonly used terms in vivo and in vitro: in silico, meaning “performed on computer or via computer simulation.”11 One opportunity that has not been fully realized is the use and data mining of multiple electronic health records (EHRs)—a source of big data—that can communicate with each other. Unfortunately, although there are numerous appropriate choices for dental EHRs, few can interact in a meaningful way with medical databases. The inability to effectively gather and exchange information within the entire health care system results in inefficiencies and much higher medical costs. The proliferation and breaches of EHRs have highlighted the need to develop better and more robust policies that will protect personal medical records. Another important issue that needs better delineation is the ownership and appropriate use of personal health-related data, especially with the growth in personal health apps and devices that can record a person’s real-time data and send that data to

November 2015

the user’s health care providers. In this era of big data, this issue will become an even more onerous task. Opportunities to improve health and better serve patients through the use of big data are rapidly emerging. Dentistry needs to increase its involvement, and oral health care professionals need to proactively contribute information to databases that are being used to determine and assess health outcomes. If not, as the saying goes, “If you are not at the table, you will be part of the menu.” n http://dx.doi.org/10.1016/j.adaj.2015.09.002 Copyright ª 2015 American Dental Association. All rights reserved.

Dr. Glick is a professor and the William M. Feagans Chair, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, NY. He also is the editor of The Journal of the American Dental Association. Address correspondence to Dr. Glick at School of Dental Medicine, University at Buffalo, The State University of New York, 325 Squire Hall, Buffalo, NY 14214-8006, e-mail [email protected]. Disclosure. Dr. Glick did not report any disclosures. 1. Kim T, Hyeon T. Applications of inorganic nanoparticles as therapeutic agents. Nanotechnology. 2014;25(1):012001. 2. Murphy CG, Morris PG. Recent advances in novel targeted therapies for HER2-positive breast cancer. Anticancer Drugs. 2012;23(8):765-776. 3. Maitland-van der Zee AH, Daly AK, Kamali F, et al. Patients benefit from geneticsguided coumarin anticoagulant therapy. Clin Pharmacol Ther. 2014;96(1):15-17. 4. National Human Genome Research Institute. All about the Human Genome Project. Available at: www.genome.gov/10001772. Accessed September 11, 2015. 5. Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank. Available at: www.rcsb.org/pdb/home/home.do. Accessed September 11, 2015. 6. The Human Metabolome Database. Available at: www.hmdb.ca. Accessed September 11, 2015. 7. Taylor JC, Martin HC, Lise S, et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat Genet. 2015;47(7):717-726. 8. Glick M, Johnson NW. Oral and oropharyngeal cancer: what are the next steps? JADA. 2011;142(8):892-894. 9. Chen R, Mias GI, Li-Pook-Than J, et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012;148(6):613-624. 10. Glick M. Personalized oral health care: providing “-omic” answers to oral health care queries. JADA. 2012;143(2):102-104. 11. In silico. Available at: https://en.wikipedia. org/wiki/In_silico. Accessed September 11, 2015.