CHAPTER
BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
8
Md. Mehedi Hassan Onik*, Satyabrata Aich*, Jinhong Yang†, Chul-Soo Kim*, Hee-Cheol Kim‡ Department of Computer Engineering, Inje University, Gimhae, South Korea* Department of Healthcare IT, Inje University, Gimhae, South Korea† Institute of Digital Anti-Aging Healthcare, Inje University, Gimhae, South Korea‡
8.1 INTRODUCTION Healthcare big data analysts and researchers all over the world struggle with multidimensional healthcare data. Similarly, healthcare data providers also hesitate to share sensitive medical data. Consequently, patient-specific care and associated large-scale data mining have become a difficult challenge. Recently, with the innovation of new technologies, security and privacy of healthcare data has been given the highest priority. According to a previous study reported in 2015 by Forbes Magazine, more than 112 million data records were either stolen, lost, or inappropriately disclosed [1]. Currently, the main healthcare big data stakeholders are patients, payers, providers, and analyzers. Fig. 8.1 presents the stakeholders of healthcare big data. To perform a detailed analysis of medical records, there must be adequate collaboration and communication among these four stakeholders. In one way or another, security and privacy-breaching incidents are also linked to those entities. Firstly, patients are the source of all types of data. Patients produce this information using clinical records or wearable devices [2]. Secondly, payers are those who directly or indirectly support the patients while paying the healthcare cost (i.e., insurance companies, private sources, and bank loans etc.). Thirdly, the providers are those who collect and store medical records (hospitals, clinics, medical centers, blood banks etc.). Finally, the researchers and analyzers are who use that information provided by the aforementioned sources to improve the performance of the healthcare industry. In the past when blockchain technology was not available, the interoperability of healthcare data by different institutions could be categorized into three models: push, pull, and view. In the push model, the transfer of the data is possible between two providers, and the third provider does not have access to the system. For example, the data transfer is possible from one department to other department and the data can be accessed in the same hospital, whereas it is not possible to access the same data from a different hospital, even though it is transferred to the different hospital. This push model very often fails to protect the end to end data integrity. In the view model, one provider can ask for the data from the other provider in an informal way, i.e., without a standardized audit trail. For example, an
Big Data Analytics for Intelligent Healthcare Management. https://doi.org/10.1016/B978-0-12-818146-1.00008-8 # 2019 Elsevier Inc. All rights reserved.
197
198
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Providers
Analysers
Patients
Payers
FIG. 8.1 Different stakeholders of healthcare big data.
orthopedic surgeon asking for some information from a cardiologist in an informal way. In the view model, one provider can see the data by entering into the record with another provider. For example, a doctor from the surgery department can access the X-ray taken in the emergency department. The security approaches in this model are not based on the relationship that exists between the patient and the provider, but mostly the approach is done in a temporary way. The policies behind the working principle of the above models are subject to the rule and law of the local government of that particular country. Hassanien et al. [3] discussed the challenges of handling big data in the medical sector and introduced a new term called “medical of things.” Smart healthcare information pattern mining is another challenge [4]. A huge amount of medical big data is generated from smart Internet of Things (IoT) devices so this is another huge sector to explore. Dey et al. [5] mentioned that the IoT system has several layers where sensors have engaged in order to collect data. That study analyzed the challenges in every layer, i.e., the whole IoT ecosystem. Similarly, Kamal et al. [6] analyzed the medical data classification using a map-reduce framework with existing challenges. Another study discussed the necessity of healthcare data optimization for cloud computing [7]. The blockchain-based model gives the healthcare market a new dimension by considering safety aspects for data integrity and developing standardized and formalized contracts for accessing the data. When we work with EHRs, which store the data with a different workflow, it is difficult to know the identity of the person who does what and when the work has been performed. The blockchain-based model puts a time stamp on every workflow and also puts an identity to it and the copies are distributed to each participated node in the network. So, if there is modification or update in any node, it is equally distributed to all the nodes and is visible to everybody accessing it anywhere in the world. The model ensures that the data integrity is maintained between the endpoints without any human intervention.
8.1 INTRODUCTION
199
Some of the notable opportunities that would be able to revolutionize the healthcare industry with the integration of the blockchain are as follows [8]: •
•
• •
Decentralized storage: blockchain stores the information, which is transparent and delivered to third parties based on the consent of the creator. A decentralized way of storing the information is keeping multiple copies of that information in multiple places. Consent: access, storage, and distribution will be controlled by the global consensus algorithm. After an autonomous sanction from all available parties, the changes are allowed to be made on the data. Immutability: alteration of any data is impossible. Once data is stored in a particular block, it can never be changed or modified. Increased capacity: With no middleman and less approvals complexity, blockchain is a costeffective way of maintaining the privacy of data.
Fig. 8.2 below presents the possible changes that blockchain can bring related to the privacy of healthcare big data. A big concern in the healthcare big data sector is the increasing amount of data as well as it’s related analysis, privacy, and security [9]. With the increasing demand for healthcare data mining, a need for a higher computing power is also increasing, which was discussed by Pattnaik et al. [10]. For the security of personally identifiable information (PII) and her, blockchain technology has already been welcomed by many researchers [11, 12]. This chapter identifies the effect of blockchain on healthcare big data privacy. In addition, this chapter also lists the challenges, scope, and current status of the technologies in detail. This chapter also discusses the flaws that are associated with blockchain solutions, which can be a good reference point for future study. Patient
Patient
Data analytics company
Data analytics company
Doctor
Doctor
Blockchain
Insurance company
Researcher
Less trust, less collaboration
FIG. 8.2 Benefits of using blockchain on the privacy of healthcare data.
Insurance company
Researcher
More trust, more collaboration
200
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
8.1.1 ROADMAP The different sections of this chapter are presented as follows: Section 8.2 elaborates on the overview of healthcare big data and blockchain architecture; Section 8.3 elaborates on the privacy and regulations associated with healthcare big data; Section 8.4 explores the effectiveness of applying blockchain based-applications on healthcare big data with adequate case studies; Section 8.5 provides several blockchain-based challenges and solutions for healthcare big data management; and Section 8.6 concludes the chapter with future research direction followed by necessary references.
8.2 HEALTHCARE BIG DATA AND BLOCKCHAIN OVERVIEW 8.2.1 HEALTHCARE BIG DATA Laney [13] suggested big data is a large set of complex data with manifold properties that are analyzed computationally in order to identify patterns associated with it. Big data is a term for a set of data that are so huge and composite that existing data handling applications are insufficient. 3Vs (volume, velocity, and variety) are used to define the characteristics and dimensions of the huge set of data or big data. “Volume” relates to the data size and dimensionality. The processing speed of the data is labeled by “velocity.” Finally, “variety” denotes the combination of a number of different types of data. Due to the emergence of the modern healthcare sector, most of the components associated with healthcare industries are producing enormous amounts of healthcare data. Diverse medical records are available from various sources such as traditional patient data contained in text, clinical images, and sounds recorded, X-rays and ultrasounds, MRI (magnetic resonance imaging), patient’s conversation with doctors, and several healthcare IoT devices and trackers etc. Lee et al. [14] mentioned two major directions of EHRs or healthcare big data that are available at this time: sensor data from the different healthcare devices and electronic medical records (EMRs). Healthcare data mining and big data are closely linked, as extracting a pattern from EHR can help doctors in future disease prediction. For example, the relationship between Parkinson’s disease and gait was clearly analyzed using the text mining approach [15] and the link between Parkinson’s disease and healthy older adults was analyzed using machine learning techniques based on the pattern generated using the gait characteristics [16]. Since millions of medical data types exist, a necessity of EHR classification has been mentioned in several studies [17–19]. EMR data are a list of information gathered by medical institutions from the start of the treatment to the cure of the disease. It’s a series of time-specific information recorded by hospitals, as shown in Fig. 8.3. Widely used IoT equipment is mobile phones, wearables, microphones, CCTV, ambient sensors, skin-embedded sensors, smart watches etc. The aforementioned devices are collecting information in order to monitor the status of several body parts or functions of patients. For example, the Parkinson’s disease (PD) gait data was measured using wearable sensors and those data are used for developing a diagnostic tool to assess the PD patients [20]. Fig. 8.4 explains the kind of sensor data that can be recorded using different sensors. Eventually, the aforementioned sources collectively produce the mammoth amount of healthcare big data. Healthcare data is increasing at an astronomical rate, as mentioned by a survey by Stanford Medicine [21]: 153 exabytes (one exabyte ¼ one billion gigabytes) data were generated in 2013 where
8.2 HEALTHCARE BIG DATA AND BLOCKCHAIN OVERVIEW
201
Body checkup report
Traditional patient data Images
Electronic medical records (EMR)
Sound file
X-ray images
Lab tests
Diagnoses
Medications
FIG. 8.3 Electronic medical record (EMR).
the expected amount for 2020 is 2314 exabytes. However, with a 48% rate of annual increase, it is expected to enter the yottabyte (one yottabyte ¼ 10008 bytes) range. A survey [22] reported that the healthcare analytics industries are growing at an exponential rate with a compound annual growth rate (CAGR) of 27.3% and this is anticipated to reach 29.84 Billion USD by 2022 from 8.92 Billion USD in 2017. Besides physical data breaching, medical data can be breached during medical signal processing and sharing [23]. Therefore, a context-aware big data processing is highly needed where before data processing the type (personal, nonpersonal, sensitive, etc.) of data must be well-analysed was mentioned by Reddy et al. [24]. Panigrahi et al. [25] mentioned big data security aspects in detail. That study deals with the different use of cloudlets for big data and focuses on the details of cyber foraging systems to manage different characteristics of healthcare big data. According to a report by Guardian [26], 26% of consumer’s medical records were breached in the United States. A similar source reported 10 of the biggest healthcare data breaching incidents listed by United States Department of Health and Human Services Office for handling civil rights. Anthem Blue Cross, a giant health insurance company, breached 80 million healthcare data on January 29, 2015. The
202
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Positioning sensor (accelerometer) Airflow sensor (breathing)
Blood pressure sensor (Sphygmomanometer)
Pulse and oxygen in blood sensor (SPO2)
Electrocardiogram sensor (ECG)
Sensor data Galvanic skin response sensor (GSR-sweating)
Body temperature sensor
Wearable sleep tracking device
Electromyography sensor (EMG) Smart lenses
FIG. 8.4 Overview of different kinds of sensor data.
main reasons for those breaches, which mentioned in another report by Snell [27], are unintended data disclosure (41%), hacking and malware (19%), insider incidents (15%), and physical damage (8%). According to past reports, it costs around $380 USD per second for every healthcare record that is breached.
8.2.2 BLOCKCHAIN The blockchain technology is a decentralized ledger that can initiate a transaction across a peer-to-peer network without any approval from the central authority. Swan [28] defined two versions of blockchain in his book: cryptocurrencies was version 1.0 and all other applications were version 2.0. Robert Hackett [29] reported in 2016 in the Fortune Magazine, “This coding breakthrough—which consists of concatenated blocks of transactions—allows competitors to share a digital ledger across a network of computers without the need for a central authority. No single party has the power to tamper with the records: the math keeps everyone honest.” Zheng et al. [30] mentioned four key characteristics of blockchain technologies, which will directly affect the healthcare industries of the upcoming fourth industrial revolution (Industry 4.0): decentralization, persistency, anonymity, and auditable. The amount of investment in blockchain technology is shown in Fig. 8.5. Major tools constituting this technology are shown in Figs. 8.6–8.9 and Table 8.1.
8.2 HEALTHCARE BIG DATA AND BLOCKCHAIN OVERVIEW
1384
23 million
Cryptocurrencies
Blockchain wallet user
35% Money transfer cost can be reduced
Blockchain
20000 USD
Highest bitcoin market price
203
20 bn USD Banking sector can save
44.02 TWh
Annual electricy consumption 6 bn USD 3 bn USD Global investment in blockchain Banking and financial sectors investment
3 bn USD
Banking and financial sectors investment
FIG. 8.5 Market summary of blockchain technology use. Transaction 1
Transaction 2
Block 1
Transaction 3
Block 2
Transaction 4
Block 3
Block 4
Header Block version
Merkle tree root hash
Time stamp
Nonce
Parent block hash
Target threshold
Transaction counter Number of transactions in the block
Transactions Data
Data
Block FIG. 8.6 Architecture of blockchain.
In December 2017, it was reported about the blockchain market that if the market size of blockchain grows at a rate of 79.6% (CAGR) then it is expected to grow from USD 411.5 million (2017) to USD 7683.7 million (2022) [31]. In the beginning, Bitcoin was widely regarded as one of the important applications of blockchain technology [32]. However, with more advanced technical developments, that technology can be applied to other fields [33, 34]. Most organizations can get the same benefits of
204
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Root = Hash (H4, H5)
H4 = Hash (H0, H1)
H5 = Hash (H2, H3)
H0 = Hash (TD0)
H1 = Hash (TD1)
H2 = Hash (TD2)
H3 = Hash (TD3)
Transaction data 0
Transaction data 1
Transaction data 2
Transaction data 3
FIG. 8.7 Merkle tree (hash).
New rule Rule 0
Rule 0
Rule 1
Rule 0
Rule 0
Rule 1
Rule 1
Rule 1
Old rule
FIG. 8.8 Blockchain forking.
Blockchain
Public • Public permission • Higher cost • Highly secured • Decentralized
FIG. 8.9 Different types of blockchain technology.
Private • Public or private permission • Medium cost • Medium security • Partially decentralized
Consortium • Public or private permission • Low cost • Medium security • Almost centralized
8.2 HEALTHCARE BIG DATA AND BLOCKCHAIN OVERVIEW
205
Table 8.1 Blockchain Consensus Algorithm Comparison
Node management Transmission rate Energy consumption Storage consumption Scalability Finality process Transaction cost Adversary tolerated power
PoW
PoS
PoET
PBFT
DPOS
Ripple
Tendermint
Open
Both
Both
Permission
Open
Open
Permission
Low
High
Medium
High
High
High
Medium
High
Medium
Low
Medium
Low
Low
High
High
High
High
Medium
Medium
Medium
High Probabilistic
High Probabilistic
High Probabilistic
Low Immediate
Low Probabilistic
Low Immediate
Low Probabilistic
High
High
Medium
Low
Medium
Low
Low
25% Computing power
<51% Stake
Unknown
<33% Voting power
<51% Faulty replicas
<20% Faulty nodes in UNL
<33.3% Voting power
blockchain by utilizing the original characteristics of this technology such as: finance, healthcare, operations management, IoT, basic science, intellectual property rights, automobile sharing, energy conservation, government, retail sector, human resources management [35–39]. According to Global Fintech Report by PwC, 55% of respondents (companies with more than 500 employees) were planning to adopt it as part of a production system or process by 2018, and 77% by 2020 [40]. The components of blockchain technology are defined below: A block: this stores particular transaction information. In other words, we can define a block as a permanent and immutable record. With a similar concept to an ordinary ledger, a block indicates a current transaction or decision that will eventually indicate a new block as soon as a new transaction occurs. Header, transaction counter, and transaction data are the three main parts of a block [41]. Header: Headers of blockchain content follow a list of information in separate subsections. Block version represents the current block version number, which decides the regulations that this block follows. The parent block hash stores the previous block hash, which points to the previous block. The Merkle tree root hash is a procedure of keeping blockchain information after a double SHA-256 hashing. A tree element with double hashing was used by NISTIR [42], as shown in Fig. 8.7. The field time stamp records the approval time of this particular block. Mining difficulty or block creation difficulty is the target threshold. The last field variance is called nonce (one-time use), used only once. This introduction of nonce changes the hash output of the block contents. Transaction counter: This counts the total number of successfully completed transactions. It expresses the serial number of the currently used block. Transaction data: Depending on the usability, the purpose of this field varies. It can be used for bitcoin transactions, contract records, healthcare info, and business data etc. For example, in the case of healthcare, it stores clinical and medical information related to patients.
206
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
A hash algorithm: This hash algorithm is used in order to map random information into the definite length of hash. A little in-hash data results in a big difference in actual data, which makes falsification difficult in any data recorded by blockchain technology. A digital signature: First used by Johnson et al. [43] as the elliptical curve digital signature algorithm ECDSA. A digital signature is processed to prove the validity of the blockchain transmission. Public and private keys are present in order to provide a double layer of authentication. Personal transactions can be managed using the private key and external security can be managed using the public key. Forking: A forking of the blockchain happens when the chain of blocks diverges into two paths. Since diverse participants need to agree on common rules, often more than one blockchain is generated. However, after consensus and appropriate regulation, this forking problem is solved. Three types of forking were mentioned by Castor [44]: hard fork, soft fork, and user centric fork. Blockchain types: Blockchain technology is divided into three types based on its user-level permission and scope in the healthcare sector by Linn and Koo [45]. Among the public, private, and consortium layer, two types of blockchain are mostly used for organization of the healthcare sector. Fig. 8.9 elaborates on the different types of blockchain. Consensus algorithm: Consensus algorithm is a process in blockchain technology by which any new block (new transaction) is being approved by a set of legal nodes or distributed systems. Commonly used consensus protocols or mechanisms are discussed and compared in Table 8.1.
8.2.3 HOW BLOCKCHAIN WORKS An example of a blockchain transaction is described in Fig. 8.10. First of all, the two parties decide to perform a transaction or exchange information. A group of nodes associated to those parties approves the legality of the transaction parties as well as the smart contract between them. After successful hashing and consensus from every involved party with any aforementioned consensus algorithm, a new block is finalized. Finally, a new block with detailed transaction information is added to the existing block and the transaction finishes. Prerequisites of blockchain technology: To set up this technology, the following prerequisites are needed: several connected nodes (computers), a block data storing software or hashing software, a consensus algorithm to make decisions, and finally a network system with fast Internet connectivity.
8.3 PRIVACY OF HEALTHCARE BIG DATA Digital identity: A digital identity refers to any online identity of any individual or organization. A digital identity consists of information or attributes related to any particular entity (individual, organization, or electronic device). The information contained in a digital identity allows assessment and authentication of an entity on the web. However, through digital identity (online identity), any civil and personal information associated with any online activities can be detected. Personally identifiable information (PII): Any data or information that could lead to a specific organization or individual is called personally identifiable information or PII. Nowadays, PII is also known as sensitive personal information (SPI) or personal data (PD) or personal information (PI) all over the world. Any distinguishable information that can be used to separate the personal identity is also
8.3 PRIVACY OF HEALTHCARE BIG DATA
207
FIG. 8.10 Working principle of blockchain technology.
termed as PII. Similarly, any information that helps de-anonymizing unidentified information is also considered as PII. In privacy, several studies have identified digital identities as a source of the user identifier. Pfitzmann and Hansen [46] defined PII as “An identity is any subset of attribute values of an individual person which sufficiently identifies this individual person within any set of persons.” McCallister et al. [47] of the National Technical Information Service (NTIS) defines “any information about an individual maintained by an agency, including (PII) any information that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother’s maiden name, or biometric records; and (linked PII or PPII) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.” Potential personally identifiable information (PPII): Aforementioned studies also stated that a personal identity or PII is not limited to a single number or entity and that combining partial characteristics or entities can also generate a complete PII or identity. This concept of partial identity or potential personally identifiable information (PPII) was defined by Pfitzmann and Hansen [46] as “A partial identity is a subset of attribute values of a complete identity, where a complete identity is the union of all attribute values of all identities of this person.” If we carefully follow the above table, the information on the left side is PII and can lead to a particular person or organization at any time. For example, if health insurance of any particular person is disclosed, this will lead to a leak of all health-related information of that person. Eventually, the rest of the information related to that person is also obtainable. On the contrary, the information on the right side of Table 8.2 (PPII) does not directly expose any identity. For example, blood pressure, height,
208
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Table 8.2 A List of Personally Identifiable Information (PII) and Potential PII (PPII) Personally Identifiable Information (PII)
Potential Personally Identifiable Information (PPII)
Social security number (SSN), full name, credit card information, bank information, car number, passport information, National Identification (NID), login information, handwriting, image, full name, location, health insurance information.
Living area, partial name, few digits of SSN, food place, medical information, workplace information, partial email, race and sex information, educational institute, medical information, IP address, supported information, blood pressure, height, weight, partial phone number.
weight, race, and living area are unable to identify a person alone. However, a combination of that information will surely lead us to a particular entity. Onik et al. [35–39] proposed a sequential question-based attack classification. A similar concept can be implemented to identify healthcare information security issues classification. Similarly, healthcare data are being breached through different mobile applications linked with healthcare issues. A study by Onik et al. [35–39] described how easy it was for a collaborative application manufacturer to obtain patient health-related identity. Protected health information (PHI): This refers to any individually recognizable health facts generated by any clinic, health planner, or health payer (i.e., clinic, doctor, pathology, government health department, insurance company etc.). In detail, any physical or mental health information linked to an individual’s past, present, and future is known is PHI. Generally, PHI data can either be maintained or transmitted in any given form, speech, paper, or electronic document etc. The first use of PHI was done by HIPAA (Health Insurance Portability and Accountability Act) in 1996. Some example of sensitive PHI are given below: Hospital information: First date of hospital visit, patient registration ID, hospital bed number, doctor’s ID, hospital address etc. Images: Images related to individuals and their items of interest. This includes every kind of X-ray image, medical image, clinical report picture, MRI. Biometric data: The main purpose of biometric data is to carry individual’s biological information. This information is very sensitive and includes special body marks, fingerprints, handwriting, retina color, weight, blood type, voice type, DNA, race, body color, etc. Payment and contact information: Every kind of payment method and associated numbers, patient phone number, therapy taking the address, contact email, etc. Mehmet Kayaalp [48] stated a few relationships among different personal identities. They considered PHI as a common set of PII and medical records. Three kinds of medical data were classified by that study. The elaboration of the idea is shown in Fig. 8.11. Anonymization techniques: Anonymization of information is the alteration of PII, PPII, or PHI into an anonymous state. Although Berinato [49] mentioned in the Harvard Business Review that “there is no such thing called anonymous data,” several researchers have proposed data anonymizing techniques. He mentioned that several MIT scientists experimented on a dataset of 1.1 million credit card information entries and 90%–94% of these entries could be used to obtain personal information using reverse engineering. However, we now discuss a few anonymization methods for high-dimensional data or big data. The statistical learning method along with the Hilbert curve anonymization to increase the utility of the dataset was used by Abdalaal et al. [50]. This MSA-diversity technique converts multidimensional identifiers to single dimensional data. Sweeney [51] first used the k-anonymity techniques for database anonymization. Gradually, several other studies used this method in order to de-identify personal data.
8.3 PRIVACY OF HEALTHCARE BIG DATA
209
Clinical information Only clinical data
Only personal data
Personal identifiers
Demographic information and clinical personal data
FIG. 8.11 Relationship between clinical data and personal data.
Aggarwal [52] used clustering techniques in order to show the consequence of high-dimensional data with k-anonymity, l-diversity, and other anonymization techniques. Ali [53] proposed this k-anonymity method for securing user passwords. On one side, this study generalized password information up to a certain limit and on the other side, the password was hashed with an anonymous value. Gal et al. [54] reported that applying all three de-identification techniques (k-anonymity, l-diversity, and t-closeness) together may create information loss due to overgeneralization and suppression. That study proposed a micro-aggregation method on several quasiidentifiers in order to create k-anonymous information by masking healthcare-related information. Long patient profiles with multidimensional data create a problem during the process of anonymization. Ghinita et al. [55] recommended a method for correlationaware anonymization of high-dimensional data. This study used big data attributes to find correlation among them in order to reduce quasiidentifiers among sensitive personal data. Several noteworthy strategies for making data anonymous are mentioned below [56]: (a) Aggregation: When this method is applied to data, users are unable to detect the sources of the information. In this way, data mining will be difficult. (b) Elimination: Through this process, some fields of the data are removed from the actual data. (c) Temporize: This method adds impurities or wrong information. (d) Top to bottom coding: This method removes the tag information. The most important information is removed by this technique. (e) Group: This process puts different information together to hide an individual’s privacy. (f) Directory replacement: Modifying the name related to the data is another way of anonymizing information. (g) Scrambling: Adding irrelevant matter to the actual data. (h) Masking: Hiding information with hidden characteristics or random characters. (i) Personalized anonymization: This technique depends on the relevant user. Customized anonymization techniques can be used by the owner of the data. (j) Blurring: An approximate value is used and this makes prediction difficult. (k) Hash digest: Cryptographic hashing is another solution. This chapter will mainly focus on this technique (i.e., blockchain). (l) Pseudonymization: This method replaces one or more field of the record with artificial identities. One to many pseudonyms can be used per field. By applying this method, data will be no longer belong to a particular entity or ID.
210
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Security safeguard theme
Administrative
Physical
• Risk analysis and management • Chief information security officer • Disaster recovery planning
• Assigned security responsibility • Physical access controls • Workstation security • Radio frequency identification device
Technical • • • • • •
Access control Entity authentication Audit trails Data encryption Firewall protection Virus checking
FIG. 8.12 Different types of security techniques.
Kruse et al. [57] prepared a detailed study of three popular online research databases: PubMed (MEDLINE), CINAHL, and ProQuest Nursing. Three commonly used security techniques are mentioned in this study: administrative, physical, and technical safeguards. These techniques are illustrated in Fig. 8.12.
8.3.1 PRIVACY RIGHT BY COUNTRY AND ORGANIZATION Every country has their own policies and regulations for information privacy. Information protection regulations of several countries with key factors are given in Table 8.3. The big data security risk cycle was mentioned in a study by Abouelmehdi et al. [58]. This study marked the data modeling stage as the riskiest stage of the cycle. Fig. 8.13 describes the detail of the risk cycle.
8.4 HOW BLOCKCHAIN IS APPLICABLE FOR HEALTHCARE BIG DATA 8.4.1 DIGITAL TRUST From patient to doctors, data analyst to clinical data providers, everyone wants to trust in their business. Blockchain technology can bring this digital trust among healthcare big data dealers. Mattila [59] discussed three factors needed for digital trust that can be achieved by blockchain technology: security, identifiability, and traceability. When any data generator or handler wants to trade data, there must be some smart contract among themselves to bring trust among those parties. In healthcare big data, clinical data is occasionally breached—who is going to take this responsibility?
8.4 HOW BLOCKCHAIN IS APPLICABLE FOR HEALTHCARE BIG DATA
211
Table 8.3 Information Protection Regulations of Different Countries and Organizations Country
Law
Key Factor
Angola
Information Protection Law
For sensitive information storing, processing, and collection, legal permission is needed. 1. Right to medical record and health insurance privacy record from 12 to 18 year old citizen. 2. Meaningful use of all Electronic Health Records (EHRs). Personal data and private life and citizen’s image are considered as highly sensitive and secured information. Protection of people’s personal information during storing and processing. Data operators are to take all responsibility regarding any unlawful access to data. The individual has huge power in controlling the movement of personal data. Privacy information must follow the country’s territorial boundary. Any kind of breaching of personal data should compensate the victim. Right, interest, and dignity of the citizen are taken care of. No territorial scope is defined. Personal information protection regulation, in which key factors are consent from the owner, right to forget, and the territorial boundary of data. Security automation, public awareness, and harmonized security rules are key factors. Special rules of medical institutions on collection, storage, and processing of health data. NEHTA aims to unlock eHealth system aspects to improve the electronic health record collecting and exchanging ways.
United States
• HIPAA (Health Insurance Portability and Accountability Act started in 1996)
• The Health Information Technology for Brazil
Economic and Clinical Health (HITECH) Act, February 17, 2009 Law of the Constitution
EU
Data Protection Law from Government
Russia
Personal Data Act by Russian Federal Law
United Kingdom
Data Protection Act (DPA)
India
IT Act
South Korea
Personal Information Protection Act September 30, 2011 General Data Protection Regulation (GDPR)
EU
United States New Zealand
Commonwealth
National Institute of Standards and Technology (NIST) Health Information Privacy Code
National Electronic Health Transition Authority (NEHTA)
Blockchain technology is capable of creating a trusted platform among the digital realm and physical world. On the contrary, several studies have already demanded a digital trust in healthcare technology in order to better analyze and collaborate patient data [60, 61]. Jirotka et al. [61] mentioned e-healthcare data operability and maintaining this through digital trust over the Internet among stakeholders. Similarly, Hesse et al. [60] reported that around 63.0% of patients visit online platforms for health-related help. Of these, 62.4% trust their physician for sharing and protecting their healthcare data.
212
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Registration
Knowledge creation
Big data security risk cycle in healthcare
Data transformation
Filter and classification
Delivery and visualization
Data collection
Data modeling Analytics and prediction
FIG. 8.13 Risk cycle of healthcare big data security.
8.4.2 INTELLIGENT DATA MANAGEMENT Big data mining is becoming more complicated, along with artificial intelligence, neural networks, and deep learning [62]. Previously wasted healthcare data can be transformed into information and information can be further processed to generate knowledge (e.g., for marketing and drug analysis). The main problem with extensive data handling is privacy and manipulation risk. The blockchain guarantees a multilayered data protection mechanism using decentralization and smart consensus. From health to wearable IoT sensors, information is safer, where investors (doctors, researchers, public authorities, patients, IoT industries etc.) can join in the blockchain network as a “miner” [41]. The healthcare big data industry can deploy blockchain in developing data-driven business intelligence.
8.4.3 SMART ECOSYSTEM The healthcare network is formed by the hospitals, entrepreneurs, and patients, along with their data suppliers, producers, and competitors. A smart ecosystem is achievable with blockchain technology for healthcare data management. Peer-to-peer (P2P) and business-to-business (B2B) collaboration using blockchain technology can lead to a new age of the smart healthcare ecosystem. Blockchain technology can decrease costs, time, and system loss in medical data handling and processing. Blockchain uses the concept of “Smart contract” or a smart way of securing regulations [41], thus allowing an interchange of anything of value in a clear and argument-free way. This stabilizes the contract and also confirms that it occurs in a successful way. Best of all, blockchain technology ensures the privacy of the terms among parties.
8.4 HOW BLOCKCHAIN IS APPLICABLE FOR HEALTHCARE BIG DATA
213
8.4.4 DIGITAL SUPPLY CHAIN Kim and Laskowski [63] described a digital supply chain (DSC) system where each stakeholder of the healthcare industry can track every service. A food product, medicine, healthcare product, insurance product etc. can be tracked through the DSC. Tian [64] and Aitken [65] proposed a food and agricultural product supply chain lifecycle using blockchain technology. Shae and Tsai [66] introduced a blockchain-protected medicine and clinical product supply chain system. A major blockchain benefit is that it contains a public ledger of the dealings without the individuality of the involved party. A public key infrastructure (PKI) is used by blockchain to notify each party. Each of the data or service providers in the healthcare supply chain can verify the transaction. Every aspect of the contract can be checked by this technology and if one of the aspects is missing then anybody can abandon it at any time. Another study by Chhetri et al. [67] described how secure a digital supply chain is for future industry architecture.
8.4.5 CYBERSECURITY Data stored by blockchain technology are immutable. This is a tamper-proof technology for storing any contract, decision, transaction, and information. The Pentagon and Washington Times stated that US military sees this technology as a cybersecurity shield. Blockchain technology stores data in a distributed way. However, this decentralized way of information storage can reduce data manipulation. Nugent et al. [68] described clinical data distribution and sharing by blockchain technology. Matanovic [69] reported blockchain as a secure technology because of the hash algorithm, the consensus algorithm, and data immutability. A study by Maddux [70] described the opportunity of blockchain technology in the healthcare big data sector. This study mentioned data portability and distribution can be more secure using this technology. Blockchain stores every detail of a data distribution so interparty (data owners and researchers) communication develops in context with information validation, time proof, and identity justification etc. According to IBM and Ponemon, healthcare information leaking costs approximately $380 USD per second whereas industry sector data breaching costs only $141 USD per second. To reduce this cybercrime, blockchain can extend its secure system [71]. An overview of benefits offered by blockchain technology is mentioned below in Fig. 8.14. The healthcare big data challenges and blockchain opportunities are mentioned in Fig. 8.15. Table 8.4 explains the details on how blockchain technology is capable of meeting the demand by healthcare big data. Examples of blockchain use: Several examples of how blockchain technology is used in the healthcare sector are given below: Medical devices, IoT, and big data: Several studies [35–39] have highlighted the privacy and security concerns regarding existing IoT systems and blockchain is a well-organized technology that can reduce that security threat. Medical and IoT devices used for healthcare and medical purposes produce a huge amount of clinical data. On one side, blockchain can create a smart ecosystem in their production lifecycle (supply chain). On the other side, the produced data can be tracked while moving from one party to another party. A longitudinal health record linking platform can be created across various healthcare organizations.
214
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Immutability
Disintermediation
Cost reduction
• Immune to tampering • Preserves authenticity of the information • File intergrity
• Remove third party intervention • Data manipulation rate is reduced
• Drastic reduction in operational costs • Compensation is unavoidable
Decentralization
• Open to all with a higher degree of security • Concurrent data processing • Collaborative version control
Security
• A tamper-proof audit log • Easy identification of malicious data & user • Data access management
FIG. 8.14 Benefits of blockchain technology.
Health claims: Insurance money, clinical data breaching compensations, wrong treatment penalty etc. will be easy to recover. Disintermediation of third parties and trustless digital systems with smart contracts by blockchain can easily ensure monetary gain for the victims. Medication adherence: Patient behavior partially influences the medication integrity. Few patients are now well aware of the medicine, therapy, or diet plan given by doctors. Blockchain technology can provide incentives after fulfilling any particular task or order by the physician. The smart medical device can act in an interactive way to remind the patient about his do’s and don’ts. Blockchain can introduce a medication smart contract between the IoT device and patient.
8.4.6 INTEROPERABILITY AND DATA SHARING Research and development are all about the interconnectivity of the entities and data sharing. Data includes OMICS, drug success rate, IoT device feedback, patient’s lab report, medication status etc. All this information is shared among laboratories, businesses, researchers, patients, blood banks,
8.4 HOW BLOCKCHAIN IS APPLICABLE FOR HEALTHCARE BIG DATA
215
Secure analysis System interoperability Scalability Transparency Healthcare big data
Secure distribution
Blockchain technology
B2B communication Decentralization Data consistency Personal data security Privacy monitoring
FIG. 8.15 Healthcare big data challenges versus blockchain opportunities.
clinics etc. However, interconnectivity and interoperability of existing medical records are not adequate. Blockchain technology can increase data security and transparency in order to gain trust. Interconnectivity can be increased many times. By achieving consensus from all parties, blockchain technology can ensure data quality, data reliance, and stakeholder’s opinion with the same platform [35–39]. Healthcare providers, insurance providers, and patients create policy by blockchain smart contract.
8.4.7 IMPROVING RESEARCH AND DEVELOPMENT (R&D) Large healthcare data, healthcare market analysis data, and genomic data are frequently analyzed. Blockchain technology can ensure genomic and health-linked information quality can be tracked from the generator to analyzer. Currently, biological and health data are manipulated and faked, which affects the research quality. Collaborative research can gain more attention if stakeholders can guarantee the excellence of the information. Blockchain technology is widely used for genomic information sharing. Hahnel [72] proposed a technique that can ensure the security of genome data from the moment it is sequenced.
216
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
Table 8.4 Capability of Blockchain With Respect to Challenge Offered From Healthcare Big Data Factors
Healthcare Big Data Challenge
Blockchain Solution
Data fragmentation
Data are producing in a fragmented decentral way. Patients, doctors, clinic, therapists, analytics are producing data separately. Manual collaboration, privacy policy, and negotiation hamper timely access and concurrent analysis.
Computer networks and the connected decentralized system can create a network among all groups (blockchain nodes). Distributed blockchain can secure concurrent analysis on the same set of data without any risk of data manipulation (blockchain are immutable). Blockchain verify participating nodes and associated data, which eliminates all risk (node and data authentication through consensus algorithm). Blockchain can secure IoT device by device communication and data by creating a private blockchain network among themselves. Blockchain can use its three types (private, public, and consortium) of user scope to define owner’s selected data processors and regulators. Blockchain middlemen less concurrent architecture can reduce the operation cost of data analysis and distribution. Blockchain technology offers a secret way of data handling where data processing identities are open to all. Blockchain can even secure data flow and permissions (smart contract). Blockchain can remove every third party to create a territory-less open boundary for data distribution in a secure way.
Concurrent and timely access
System scalability
Trust issue (data and person) generates unrest in collaboration among parties.
Sensor (IoT) data handling
Thousands of IoT devices are in use to collect and distribute data, which are difficult to track and handle. Data handlers should not be open and there must be some regulations for distribution.
Data and access consistency Data processing cost Data owner’s privacy
Currently, several third parties along with all individuals do an analysis of data, which increases costs. Currently, owner’s identities are open and processors are hidden.
Business to business communication
Currently, middlemen and third parties manipulate and distribute data to make a profit.
8.4.8 FIGHTING COUNTERFEIT DRUGS Fake and low-quality drugs are everywhere. Blockchain technology can help to maintain medicine standards by tracking from production to end user feedback. According to a report by the Guardian [73] published by the World Health Organization, “One in ten medical products circulating in low- and middle-income countries is either substandard or falsified.” Drug production and distribution supply chains can be tracked by blockchain technology. WHO is spending 30 billion USD in the fight against counterfeit medicine.
8.4.9 COLLABORATIVE PATIENT ENGAGEMENT The patient can form a common community or group for taking care of one another. Blockchain technology can maintain the overall monitoring system. Patients also feel uneasy dealing with some disease with associated information. All the information and doctors’ decision can be tackled with
8.5 BLOCKCHAIN CHALLENGES AND SOLUTIONS FOR HEALTHCARE
217
private blockchain technology. For any sensitive decision, doctors can gain consent from the patient’s community and family. A medical team from several locations can cooperate to take any clinical trials without any trust issue.
8.4.10 ONLINE ACCESS TO LONGITUDINAL DATA BY PATIENT Blockchain provides a double layer secured platform for managing online medical records. The main advantage of this is that the patient can do this without the involvement of third parties. Broderson et al. [74] mentioned three key capabilities of this technology, which makes blockchain a strong clinical data management tool. Firstly, blockchain is capable of developing a trusted health database. Secondly, blockchain is capable of creating a network among data owners and processing with anonymity. Finally, before any decision, automatic consents are also available.
8.4.11 OFF-CHAIN DATA STORAGE DUE TO PRIVACY AND DATA SIZE Due to restriction on personal data privacy, off-chain data storage facilities are gaining popularity. Generally, blockchain is unable to store huge amounts of data due to its decentralized architecture. On the contrary, healthcare and biological data are naturally massive in numbers. In order to solve this issue, off-chain data storage backed by data hashing has become a good option. The next section will provide a detailed description of this.
8.5 BLOCKCHAIN CHALLENGES AND SOLUTIONS FOR HEALTHCARE BIG DATA Challenges: Although advantages and opportunities exist with blockchain technology, there are still a few fundamental drawbacks that create some domain-specific challenges. Storage, scalability, modification, privacy and regulations are four key challenges that this technology faces in healthcare big data. Storage: Healthcare and medical records produce an enormous amount of EMR and sensor data from patients and wearable IoT devices. On the contrary, blockchain architecture supports very limited on-chain data storage. Blockchain’s decentralized and hashed architecture has too high a cost for data storing. Similarly, blockchain data access, management, and operations can also be costly if the data size is bigger. Therefore, blockchain applications must be designed keeping this factor in mind. Modification: On the one side, blockchain characteristics of data immutability secure the system but on the other side, it gives no option for data modification and deletion and data modification and changes are unavoidable. Either we need to create a new block by consensus from all nodes or generate a new chain. These two methods are costly and not feasible. Therefore, blockchain application development must be in such a way that data modification need is lowest. Scalability: Due to the decentralized architecture, the scalability issue is less serious. However, private clinics, healthcare centers, rural hospitals, enterprise research organizations, insurance companies, individual patients, and IoT startups etc. have millions of users with different infrastructure. It is highly unlikely that all of them are capable of maintaining the same blockchain decentralized architecture. Blockchain technology also needs a higher computation power, which demands higher
218
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
electricity consumption by network equipment [75]. The scalability issue for healthcare big data must be taken care of seriously in order to make blockchain popular. Privacy and regulations: Blockchain maximizes the security of its content many times. Cryptographic, decentralized, independent, and immutable architecture can ensure the highest security of its contents. Healthcare big data is all about sensitive information of the patient, by the patient, and for the patient. Therefore, it can be risky to keep a copy of those data in every node. The most critical issue for currently practiced blockchain technology is storing of PII and EHRs forever. Several countries and standardized organizations do not follow this practice. Let’s discuss the General Data Protection Regulation (GDPR) and blockchain case as an example.
8.5.1 GDPR VERSUS BLOCKCHAIN 8.5.1.1 Problem statement and key factors of GDPR The focus of the recently executed GDPR is to secure individual’s information so organizations must pay particular attention to both individual’s consent and data sharing. Consent needs to be obtained before any private data is analyzed and there is also an accountability to confirm that this data can be withdrawn or deleted (i.e., “the right to be forgotten”). The blockchain is based on “immutability” of the data, on the contrary, GDPR demands that all personal data or PII should be mutable or erasable by any organization according to the users’ wish. GDPR mentioned in (Article 17th sec 2 of GDPR) “the obligation to erase personal data without undue delay.” Similarly, GDPR also stated “the right to be forgotten.” At this moment, blockchain data storing facilities follow the CRAB principle (Create Retrieve Append Burn). The interesting part is the last part, burn, which means throwing away the encryption key for accessing the blockchain data. Yet, GDPR does not accept this as “erasure of data.” Key GDPR changes are: • • • •
Territorial scope of personal data. Every kind of personal data should be gathered, stored, and processed within the territorial boundary of the European Union. GDPR can fine up to 4% of the company profit or 20 million Euro. Consent must be taken from the user for any kind of personal data collection. Consent should be understandable and simple. Three rights are ensured: right to access, right to be forgotten, right to breach notification.
Possible solution: We listed few possible solutions below: 1. Do not store personal information on the blockchain. 2. Record personal information pseudo-anonymously. 3. Store information in the referenced local encrypted database.
8.5.1.2 Solutions Above all, the blockchain must comply with GDPR in order to work in the EU and with EU citizens. Several studies are proposing a modified blockchain architecture in order to satisfy GDPR. Humbeeck [76] proposed an off-chain blockchain architecture that complied with GDPR. That study proposed a two-layer data storing mechanism. In the local database, database 1 and database 2 (Fig. 8.16) will store (off-chain) every kind of GDPR sensitive data. With the help of an associated application, this system will store the link and hash of the data in blockchain (on a chain). This system can delete data from the local database anytime. At the same time, a remaining hash of the data is of no use.
8.5 BLOCKCHAIN CHALLENGES AND SOLUTIONS FOR HEALTHCARE
219
Database 1
Application back-end Database 2
Application back-end
Blockchain
FIG. 8.16 Off-chain blockchain architecture.
8.5.1.3 Off-chain blockchain advantages The aforementioned approach complies with GDPR because the proposed approach can completely delete personal data. Similarly, since the hash of the data is stored on-chain, data security is also preserved.
8.5.1.4 Off-chain blockchain disadvantages The major disadvantage of the aforementioned system is that anybody linked with that particular local data can access the data without the consensus from other nodes. Data owners’ rights are compromised by this. The local database is open to more attack vectors. Finally, managing the hash with API adds management level complexity. Finally, the GDPR versus blockchain is still a paradox. There are still a few questions on the off chain data storing mechanism. They are: • • • •
Who is the owner of the off-chained data? Is it possible to encrypt the off-chain data? Is data access controllable or not? What if off-chain data is copied illegally?
We will now explore several blockchain technology solutions available for healthcare big data. Blockchain health: “Blockchain Health” is a US based healthcare data handling software company. By using their service, “Pokitdoc” developed by Smith [77], patients and researchers can share information. They use their “DoKchain” based on-chain and off-chain data storing infrastructure. There are a set of key-pairs for each node for identity verification. Proof of Elapsed Time (PoET)
220
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
consensus algorithm is used by this system. Longitudinal audibility and security are ensured through a strong identity and permanent notary of the stored information. This is an HIPAA compliant platform. Burst IQ: This is also a US based blockchain technology developing research organization. They offer several services, mainly blockchain based healthcare data-driven platforms. They offer a data management ecosystem to the user. They develop individual life graphs with user data and store these in a health wallet. The user can share, manage, sell, and donate individual information from that wallet. Burst IQ provides an interactive platform for big data distribution among individuals, researchers, and organizations. This is also HIPAA, GDPR, and NIST compliant and also supports a larger volume of health data. Burst IQ [78] also uses machine learning for big data handling. MedRec: MedRec by Azaria et al. [79] provides a complete platform for patient data authentication and sharing among stakeholders. Firstly, they used Proof of Work (PoW) mining for gathering consensus among researchers, public health authorities, and patients. This blockchain system also introduced incentives based on successful data sharing and authentication. This provides a log of data sharing for transparent audibility. MIT Media Lab and Beth Israel Dracones Health Center jointly developed a blockchain-based model called “MedRec” to handle EHRs by incorporating a decentralized model with all the safety aspects with guaranteed data integrity between endpoints. MedRec was created to manage aspects such as accountability, authentication, and confidentiality that are necessary requirements for the current situation regarding data breach, violation of the code of conduct, and other potential crimes related to healthcare data. MedRec works in a different way for storing the healthcare data compared to the traditional way of storing the data in EHR. It stores the signature of the record in a blockchain instead of health records and then notifies the patient, who is in charge of the record and determines the movement of the record based on the requirement. The storage of the unchanged copy of the record is assured by the signature on the record. In this model, the patient is in charge of the data and if by chance the patient is not interested in taking care of the data then some service organization will be involved in this role. Instead of making different user interfaces for different institutions, the MedRec system simplifies it to work with multiple institutions by using simple features to have an easier interaction with patients. Gem Health: The main purpose of this US-based blockchain company is to manage revenue earned by data permission and sharing. Gem Health provides a real-time and transparent system for any kind of health claim. Prisco [80] reported that in cooperation with the giant company Philips, this compensation claiming platform can also transfer healthcare data in real time. Model chain: Kuo and Ohno-Machado [81] proposed this model chain architecture where a crossinstitutional researcher can collaborate for healthcare-related data. This study introduced a private blockchain network to manage metadata of the health-related information. This study used both machine learning and blockchain technology to facilitate Patient-Centered Outcomes Research (PCOR) and interinstitution collaboration. Guard time: This is an Estonian-based blockchain company and the world’s largest blockchain platform. They recently joined Estonian eHealth [82] foundation to bring transparency and audibility of patient’s records. This platform used Oracle and a blockchain database to manage governmentowned electronic patient information. HealthCombix: The main purpose of this platform is to create a decentralized healthcare ecosystem by initiating peer-to-peer communication in real time. HealthCombix [83] is a token-based privacy preserving patient data management platform. In addition, this platform also serves disease prediction
8.5 BLOCKCHAIN CHALLENGES AND SOLUTIONS FOR HEALTHCARE
221
based on big data analysis and transparent data asset monetization with risk associated management. A private blockchain is used in this study to initiate a collaborative research environment. IBM Blockchain: With other blockchain applications, IBM also offers several facilities in the healthcare domain. The main factor of the IBM healthcare blockchain is the automatic clinical trial and transparent health record sharing. The detailed advantages are discussed in IBM’s blog titled “Blockchain in healthcare [84].” Universal Health Coin: This is a blockchain and AI-based cryptocurrency to exchange data among stakeholders proposed by Gordon [85]. The user of this system can directly communicate with each data owner or processor to buy and sell data with this coin, UHC. All these transactions and data are encrypted and secured by public-private blockchain key. Genomes: Hahnel [72] proposed another off-chain and on-chain gene information storing platform. Genomes also provide a platform for securely sharing biological information among third parties. GENE tokens were introduced by this study as a medium of exchange. Youbase: Josh [86] proposed a hierarchical deterministic (HD) based wallet. This wallet controls access to the personal information and contains a tree-like structure with keys. The key advantage is, due to having several branches (parent and child chains), this Youbase can store data separately depending on the specific type of information. Data anonymization technique is applied here to comply with personal data protection regulations. Peterson et al. [87] proposed a community-based network architecture for a health information exchanging mechanism. This study proposed a system where data on a particular node can only be accessed if data structure and semantics are understood and approved by community members. Patients ultimately control the privacy and regulation of those shared data. However, direct storing of personal data is its main drawback. An overall big picture of future big data handling and medical decision gathering architecture can be expressed with Fig. 8.17. It’s presents off-chain and on-chain based blockchain architecture.
FIG. 8.17 A GDPR compliant blockchain collaboration in big data healthcare.
222
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
8.6 CONCLUSION AND DISCUSSION In this chapter, two pressing challenges of healthcare big data were discussed. Firstly, it discusses the existing privacy regulations and secondly, it discusses how blockchain addresses those privacy regulations with different available solutions. This chapter provides a detailed elaboration on blockchain opportunities, challenges, and healthcare solutions. While discussing the available regulations, we identified the major design principles and prerequisites needed for blockchain system development. From our study, we noticed that regulations such as GDPR and HIPAA do not support on-chain personal data storing. However, off-chain blockchain, encrypted private storing is a popular alternative blockchain architecture to address the existing issues. Overall, the blockchain has a varied range of potentials in medical data that invites numerous research opportunities in this sector. Overall this chapter highlights the impact of blockchain on the privacy of healthcare big data as well as proposes a solution to decrease the impact on the long-term. The technical, strategical, economical, and regulatory aspects of blockchain and healthcare big data privacy is understandable as well as implementable with the help of this chapter. Our future goal is to develop a fully fledged healthcare data sharing and identity sharing platform.
REFERENCES [1] D. Munro, Data Breaches in Healthcare Totaled Over 112 Million Records in 2015, vol. 31, Forbes, New York, NY, 2015. [2] S. Banerjee, T. Hemphill, P. Longstreet, Wearable devices and healthcare: data sharing and privacy, Inf. Soc. 34 (1) (2018) 49–57. [3] A.E. Hassanien, N. Dey, S. Borra (Eds.), Medical Big Data and Internet of Medical Things: Advances, Challenges and Applications, Taylor & Francis, 2019. [4] M. Ahmed, A.S.B. Ullah, Infrequent pattern mining in smart healthcare environment using data summarization. J. Supercomput. 74 (2018) 5041, https://doi.org/10.1007/s11227-018-2376-8. [5] N. Dey, A.E. Hassanien, C. Bhatt, A.S. Ashour, S.C. Satapathy (Eds.), Internet of Things and Big Data Analytics Toward Next-Generation Intelligence, Springer International Publishing, 2018. [6] M.S. Kamal, S. Parvin, A.S. Ashour, F. Shi, N. Dey, De-Bruijn graph with MapReduce framework towards metagenomic data classification, Int. J. Inf. Technol. 9 (1) (2017) 59–75. [7] B.S.P. Mishra, H. Das, S. Dehuri, A.K. Jagadev (Eds.), Cloud Computing for Optimization: Foundations, Applications, and Challenges, In: vol. 39, Springer International Publishing, 2018. [8] C. Holotescu, Understanding blockchain opportunities and challenges, in: The International Scientific Conference eLearning and Software for Education, vol. 4, Carol I National Defence University, 2018, pp. 275–283. [9] M.A. Sahi, H. Abbas, K. Saleem, X. Yang, A. Derhab, M.A. Orgun, … A. Yaseen, Privacy preservation in e-healthcare environments: state of the art and future directions, IEEE Access 6 (2018) 464–478. [10] P.K. Pattnaik, S.S. Rautaray, H. Das, J. Nayak (Eds.), Progress in computing, analytics and networking, Proceedings of ICCAN 2017, In: vol. 710, Springer, 2018. [11] J.D. Halamka, A. Lippman, A. Ekblaw, The potential for blockchain to transform electronic health records, Harv. Bus. Rev. 3 (2017). Retrieved from: https://hbr.org/2017/03/the-potential-for-blockchain-totransform-electronic-health-records. [12] X. Yue, H. Wang, D. Jin, M. Li, Healthcare data gateways: found healthcare intelligence on blockchain with novel privacy risk control, J. Med. Syst. 40 (10) (2016) 218.
REFERENCES
223
[13] D. Laney, 3D data management: controlling data volume, velocity and variety, META Group Res. Note 6 (70) (2001) 1. [14] C. Lee, Z. Luo, K.Y. Ngiam, M. Zhang, K. Zheng, G. Chen, W.L.J. Yip, Big healthcare data analytics: challenges and applications, in: Handbook of Large-Scale Distributed Computing in Smart Healthcare, Springer, Cham, 2017, pp. 11–41. [15] S. Aich, M. Sain, J. Park, K.W. Choi, H.C. Kim, A text mining approach to identify the relationship between gait-Parkinson’s disease (PD) from PD based research articles, in: International Conference on Inventive Computing and Informatics (ICICI), IEEE, 2017, pp. 481–485. [16] S. Aich, P.M. Pradhan, J. Park, H.C. Kim, A machine learning approach to distinguish Parkinson’s disease (PD) patient’s with shuffling gait from older adults based on gait signals using 3D motion analysis, Int. J. Eng. Technol. 7 (3.29) (2018) 153–156. [17] H. Das, A.K. Jena, J. Nayak, B. Naik, H.S. Behera, A novel PSO based back propagation learning-MLP (PSO-BP-MLP) for classification, in: Computational Intelligence in Data Mining, vol. 2, Springer, New Delhi, 2015, pp. 461–471. [18] H. Das, B. Naik, H.S. Behera, Classification of diabetes mellitus disease (DMD): a data mining (DM) approach, in: Progress in Computing, Analytics and Networking, Springer, Singapore, 2018, pp. 539–549. [19] R. Sahani, C. Rout, J.C. Badajena, A.K. Jena, H. Das, Classification of intrusion detection using data mining techniques, in: Progress in Computing, Analytics and Networking, Springer, Singapore, 2018, pp. 753–764. [20] S. Aich, H.C. Kim, Auto detection of Parkinson’s disease based on objective measurement of gait parameters using wearable sensors, Artif. Intell. 117 (2018) 103–112. [21] Stanford Medicine, Retrieved from:https://med.stanford.edu/content/dam/sm/sm-news/documents/ StanfordMedicineHealthTrendsWhitePaper2017.pdf, 2017. [22] Healthcare Analytics, Medical Analytics Market by Type (predictive, prescriptive) Application (Clinical, RCM, Claim, Fraud, Waste, Supply Chain, PHM) Component (Service, Software) Delivery (On demand, Cloud) End User (Payer, Hospital, ACO)—Global Forecast to 2022, Retrieved from:https://www. marketsandmarkets.com/Market-Reports/healthcare-data-analytics-market-905.html, 2017. [23] C. Pradhan, H. Das, B. Naik, N. Dey, Handbook of Research on Information Security in Biomedical Signal Processing. IGI Global, Hershey, PA, 2018, pp. 1–414, https://doi.org/10.4018/978-1-5225-5152-2. [24] K.H.K. Reddy, H. Das, D.S. Roy, A data aware scheme for scheduling big-data applications with SAVANNA hadoop, in: Futures of Network, CRC Press, 2017. [25] C.R. Panigrahi, M. Tiwary, B. Pati, H. Das, Big data and cyber foraging: future scope and challenges, in: Techniques and Environments for Big Data Analysis, Springer, Cham, 2016, pp. 75–100. [26] Guardian, Top 10 Biggest Healthcare Data Breaches of All Time, Retrieved from:https://digitalguardian. com/blog/top-10-biggest-healthcare-data-breaches-all-time, 2018. [27] E. Snell, 41% of Health Data Breaches Stem from Unintended Disclosure, Retrieved from:https:// healthitsecurity.com/news/41-of-health-data-breaches-stem-from-unintended-disclosure, 2017. [28] M. Swan, Blockchain: Blueprint for a New Economy, O’Reilly Media, Inc, 2015. [29] R. Hackett, Wait, What Is Blockchain?, Retrieved from:http://fortune.com/2016/05/23/blockchaindefinition/, 2016. Accessed 15 December 2017. [30] Z. Zheng, S. Xie, H. Dai, X. Chen, H. Wang, An overview of blockchain technology: architecture, consensus, and future trends, in: 2017 IEEE International Congress on Big Data (BigData Congress), IEEE, 2017, pp. 557–564. [31] Report Buyer, Blockchain Market by Provider, Application, Organization Size, Industry Vertical And Region—Global Forecast to 2022, Available from:https://www.reportbuyer.com/product/4226790, 2017. Accessed 24 January 2018. [32] S. Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System, (2008). [33] M. Crosby, P. Pattanayak, S. Verma, V. Kalyanaraman, Blockchain technology: beyond bitcoin, Appl. Innov. Rev. 2 (2016) 6–10.
224
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
[34] M.H. Miraz, M. Ali, Applications of Blockchain Technology Beyond Cryptocurrency, 2018. arXiv preprint arXiv:1801.03528. [35] M.M.H. Onik, M.H. Miraz, C.S. Kim, A recruitment and human resource management technique using blockchain technology for industry 4.0, in: Smart Cities Symposium 2018 (SCS’18), 22–23rd April, University of Bahrain, IET, Bahrain, 2018, pp. 11–16. [36] M.M.H. Onik, N. Al-Zaben, H.P. Hoo, C.S. Kim, A Novel Approach for Network Attack Classification Based on Sequential Questions, 2018. arXiv preprint arXiv:1804.00263. [37] M.M.H. Onik, N. Al-Zaben, J. Yang, C.S. Kim, Privacy of Things (PoT): personally identifiable information monitoring system for smart homes, in: 국통신회 술대회논문집 Proceedings of the Korean Institute of Communication Sciences Conference, KICS, Jeju, Korea, 2018, pp. 256–257. [38] M.M.H. Onik, N. Al-Zaben, J. Yang, N.Y. Lee, C.S. Kim, Risk identification of personally identifiable information from collective mobile app data, in: Proceeding of International Conference on Computing, Electronics & Communications Engineering 2018 (iCCECE ’18), University of Essex, Southend, 2018, pp. 71–76. [39] N. Al-Zaben, M.M.H. Onik, J. Yang, N.Y. Lee, C.S. Kim, General data protection regulation complied blockchain architecture for personally identifiable information management, in: Proceeding of International Conference on Computing, Electronics & Communications Engineering 2018 (iCCECE ’18), University of Essex, Southend, 2018, pp. 77–82. [40] PwC, Global FinTech Report 2017, Available from:https://www.pwc.com/gx/en/industries/financialservices/assets/pwc-global-fintech-report-2017.pdf, 2017. Accessed 24 January 2018. [41] C. Cachin, Architecture of the Hyperledger blockchain fabric, in: Workshop on Distributed Cryptocurrencies and Consensus Ledgers, July, 2016. [42] NISTIR, Blockchain Technology Overview, National Institute of Standards and Technology, 2018. Available from:https://csrc.nist.gov/CSRC/media/Publications/nistir/8202/draft/documents/nistir8202-draft.pdf. Accessed 27 January 2018. [43] D. Johnson, A. Menezes, S. Vanstone, The elliptic curve digital signature algorithm (ECDSA), Int. J. Inf. Secur. 1 (1) (2001) 36–63. [44] A. Castor, A (Short) Guide to Blockchain Consensus Protocols, CoinDesk, 2017. [45] L.A. Linn, M.B. Koo, Blockchain for health data and its potential use in health it and health care related research, in: ONC/NIST Use of Blockchain for Healthcare and Research Workshop, ONC/NIST, Gaithersburg, MD, 2016. [46] A. Pfitzmann, M. Hansen, A Terminology for Talking about Privacy by Data Minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management, 2010. [47] E. McCallister, T. Grance, K.A. Scarfone, Guide to Protecting the Confidentiality of Personally Identifiable Information (PII), 2010. No. Special Publication (NIST SP)-800-122. [48] M. Kayaalp, Patient privacy in the era of big data, Balkan Med. J. 35 (1) (2018) 8–17. [49] S. Berinato, There’s No Such Thing as Anonymous Data, Retrieved from:https://hbr.org/2015/02/theres-nosuch-thing-as-anonymous-data, 2015. Accessed 10 June 2018. [50] A. Abdalaal, M.E. Nergiz, Y. Saygin, Privacy-preserving publishing of opinion polls, Comput. Secur. 37 (2013) 143–154. [51] L. Sweeney, k-anonymity: a model for protecting privacy, Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10 (05) (2002) 557–570. [52] C.C. Aggarwal, On k-anonymity and the curse of dimensionality, in: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB Endowment, 2005, pp. 901–909. [53] J. Ali, Mechanism for the prevention of password reuse through Anonymized Hashes. PeerJ Prepr. 5 (2017) e3322v1https://doi.org/10.7287/peerj.preprints.3322v1. [54] T.S. Gal, T.C. Tucker, A. Gangopadhyay, Z. Chen, A data recipient centered de-identification method to retain statistical attributes, J. Biomed. Inform. 50 (2014) 32–45.
REFERENCES
225
[55] G. Ghinita, Y. Tao, P. Kalnis, On the anonymization of sparse high-dimensional data, in: ICDE 2008. IEEE 24th International Conference on Data Engineering, April, 2008, IEEE, 2008, pp. 715–724. [56] H.K. Patil, R. Seshadri, Big data security and privacy issues in healthcare, in: 2014 IEEE International Congress on Big Data (BigData Congress), IEEE, 2014, pp. 762–765. [57] C.S. Kruse, B. Smith, H. Vanderlinden, A. Nealand, Security techniques for the electronic health records, J. Med. Syst. 41 (8) (2017) 127. [58] K. Abouelmehdi, A. Beni-Hessane, H. Khaloufi, Big healthcare data: preserving security and privacy, J. Big Data 5 (1) (2018) 1. [59] J. Mattila, The Blockchain Phenomenon, 2016 ed., Berkeley Roundtable of the International Economy, 2016. [60] B.W. Hesse, D.E. Nelson, G.L. Kreps, R.T. Croyle, N.K. Arora, B.K. Rimer, K. Viswanath, Trust and sources of health information: the impact of the Internet and its implications for health care providers: findings from the first Health Information National Trends Survey, Arch. Intern. Med. 165 (22) (2005) 2618–2624. [61] M. Jirotka, R. Procter, M. Hartswood, R. Slack, A. Simpson, C. Coopmans, … A. Voss, Collaboration and trust in healthcare innovation: the eDiaMoND case study, Comput. Support Coop. Work 14 (4) (2005) 369–398. [62] D.E. O’Leary, Artificial intelligence and big data, IEEE Intell. Syst. 28 (2) (2013) 96–99. [63] H.M. Kim, M. Laskowski, Toward an ontology-driven blockchain design for supply-chain provenance, Intell. Syst. Account. Financ. Manag. 25 (1) (2018) 18–27. [64] F. Tian, An agri-food supply chain traceability system for China based on RFID & blockchain technology, in: 2016 13th International Conference on Service Systems and Service Management (ICSSSM), IEEE, 2016, pp. 1–6. [65] R. Aitken, IBM & Walmart Launching Blockchain Food Safety Alliance in China with Fortune 500’s JD. com, 2017. [66] Z. Shae, J.J. Tsai, On the design of a blockchain platform for clinical trial and precision medicine, in: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), IEEE, 2017, pp. 1972–1980. [67] S.R. Chhetri, S. Faezi, N. Rashid, M.A. Al Faruque, Manufacturing supply chain and product lifecycle security in the era of industry 4.0, J. Hardw. Syst. Secur. 2 (1) (2018) 51–68. [68] T. Nugent, D. Upton, M. Cimpoesu, Improving data transparency in clinical trials using blockchain smart contracts. F1000Res. 5 (2016) 2541, https://doi.org/10.12688/f1000research.9756.1. [69] A. Matanovic, Blockchain/cryptocurrencies and cybersecurity, threats and opportunities, in: The 9th International Conference on Business Information Security, 2017, pp. 11–15. [70] Maddux, D. Vice President, Kidney Disease Initiatives, Cybersecurity and Blockchain in Health Care (Acumen Physical Solutions, 24 April 2017) HibBucket, Artificial Intelligence Blockchain Research and Development (HubBucket Healthcare Blockchain-HubBlockchain), 2017. [71] E. Snell, Healthcare Data Breach Costs Highest for 7th Straight Year, Retrieved from:https://healthitsecurity. com/news/healthcare-data-breach-costs-highest-for-7th-straight-year, 2017. Accessed 13 June 2018. [72] M. Hahnel, Blockchain Enabled Genome Security From the Moment It Is Sequenced, Retrieved from:https:// www.genomes.io/wp-content/uploads/2018/03/The-genomes.io-Whitepaper-V-1.1.4.pdf, 2018. Accessed 14 June 2018. [73] Guardian (WHO), 10% of Drugs in Poor Countries are Fake, Says WHO, Retrieved from:https://www. theguardian.com/global-development/2017/nov/28/10-of-drugs-in-poor-countries-are-fake-says-who, 2017. Accessed 13 June 2018. [74] C. Broderson, B. Kalis, C. Leong, E. Mitchell, E. Pupo, A. Truscott, Blockchain: Securing a New Health Interoperability Experience, Retrieved from:https://www.healthit.gov/sites/default/fles/2-49-accenture_ onc_blockchain_challenge_response_august8_fnal.pdf, 2016. [75] M.M.H. Onik, N. Al-Zaben, H. Phan Hoo, C.S. Kim, MUXER—a new equipment for energy saving in ethernet, Technologies 5 (4) (2017) 74.
226
CHAPTER 8 BLOCKCHAIN IN HEALTHCARE: CHALLENGES AND SOLUTIONS
[76] A.V. Humbeeck, The Blockchain-GDPR Paradox—Wearetheledger—Medium, Retrieved from:https:// medium.com/wearetheledger/the-blockchain-gdpr-paradox-fc51e663d047, 2017. Accessed 14 June 2018. [77] B. Smith, DokChain, Retrieved from:https://pokitdok.com/dokchain/, 2016. Accessed 14 June 2018. [78] Burst IQ, Retrieved from:https://www.burstiq.com/, 2015. Accessed 14 June 2018. [79] A. Azaria, A. Ekblaw, T. Vieira, A. Lippman, Medrec: using blockchain for medical data access and permission management, in: International Conference on Open and Big Data (OBD), IEEE, 2016, pp. 25–30. [80] G. Prisco, The Blockchain for Healthcare: Gem Launches Gem Health Network With Philips Blockchain Lab, Retrieved from:https://bitcoinmagazine.com/articles/the-blockchain-for-heathcare-gem-launchesgem-health-network-with-philips-blockchain-lab-1461674938/, 2016. [81] T.T. Kuo, L. Ohno-Machado, ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain Networks, 2018. arXiv preprint arXiv:1802.01746. [82] Estonian eHealth, Health Care Blockchain, Retrieved from:https://guardtime.com/blog/estonian-ehealthpartners-guardtime-blockchain-based-transparency, 2016. Accessed 14 June 2018. [83] Healthcombix, Retrieved from:https://healthcombix.com/, 2016. Accessed 14 June 2018. [84] Blockchain in healthcare, Patient Benefits and More—Blockchain Unleashed: IBM Blockchain Blog, Retrieved from:https://www.ibm.com/blogs/blockchain/2017/10/blockchain-in-healthcare-patient-benefitsand-more/, 2018. Accessed 14 June 2018. [85] G. Jones, Universal Health Coin (UHC), Retrieved from:https://www.universalhealthcoin.com/, 2017. Accessed 14 June 2018. [86] J. Robinson, M. Leonard Kish, YouBase Whitepaper, Retrieved from:https://legacy.gitbook.com/book/ joshrobinson/youbase/details, 2016. Accessed 14 June 2018. [87] K. Peterson, R. Deeduvanu, P. Kanjamala, K. Boles, A blockchain-based approach to health information exchange networks, in: Proc. NIST Workshop Blockchain Healthcare, vol. 1, 2016, pp. 1–10.
FURTHER READING G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, A. Zhu, Approximation algorithms for k-anonymity, J. Priv. Technol. 62 (2005) 797–812. N. Dey, A.S. Ashour, C. Bhatt, Internet of things driven connected healthcare, in: Internet of Things and Big Data Technologies for Next Generation Healthcare, Springer, Cham, 2017, pp. 3–12. GDPR-Regulation (EU), GDPR-Regulation (EU) 2016/679 (General Data Protection Regulation), Retrieved from: https://gdpr-info.eu/art-4-gdpr, 2016. Accessed 14 June 2018. M. Nofer, P. Gomber, O. Hinz, D. Schiereck, Blockchain, Bus. Inform. Syst. Eng. 59 (3) (2017) 183–187.