Artificial Intelligence and Machine Learning in Cardiovascular Healthcare

Artificial Intelligence and Machine Learning in Cardiovascular Healthcare

Journal Pre-proof Artificial Intelligence and Machine Learning in Cardiovascular Healthcare Arman Kilic, M.D. PII: S0003-4975(19)31612-1 DOI: https...

1MB Sizes 0 Downloads 99 Views

Journal Pre-proof Artificial Intelligence and Machine Learning in Cardiovascular Healthcare Arman Kilic, M.D. PII:

S0003-4975(19)31612-1

DOI:

https://doi.org/10.1016/j.athoracsur.2019.09.042

Reference:

ATS 33183

To appear in:

The Annals of Thoracic Surgery

Received Date: 29 August 2019 Accepted Date: 8 September 2019

Please cite this article as: Kilic A, Artificial Intelligence and Machine Learning in Cardiovascular Healthcare, The Annals of Thoracic Surgery (2019), doi: https://doi.org/10.1016/ j.athoracsur.2019.09.042. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 by The Society of Thoracic Surgeons

Artificial Intelligence and Machine Learning in Cardiovascular Healthcare Running Title: Machine Learning in Cardiovascular Care

Arman Kilic, M.D.1

From the 1 Division of Cardiac Surgery, University of Pittsburgh Medical Center, Pittsburgh, PA

Key Words: artificial intelligence; machine learning; cardiovascular

Word Count: 5,631

Correspondence and Reprint Requests: Arman Kilic, MD Division of Cardiac Surgery University of Pittsburgh Medical Center 200 Lothrop Street Suite C-700 Pittsburgh, PA 15213 Email: [email protected]

Abstract Background: This review article provides an overview of artificial intelligence (AI) and machine learning (ML) as it relates to cardiovascular healthcare. Methods: An overview of the terminology and algorithms used in ML as it relates to healthcare are provided by the author. Articles published up to August 1, 2019 in the field of AI and ML in cardiovascular medicine are also reviewed and placed in the context of the potential role these approaches will have in clinical practice in the future. Results: AI is a broader term referring to the ability of machines to perform intelligent tasks, and ML is a subset of AI that refers to the ability of machines to learn independently and make accurate predictions. An expanding body of literature has been published using ML in cardiovascular healthcare. Moreover, ML has been applied in the settings of automated imaging interpretation, natural language processing and data extraction from electronic health records, and predictive analytics. Examples include automated interpretation of chest xrays, electrocardiograms, echocardiograms, and angiography, identification of patients with early heart failure using clinical notes evaluated by ML, and predicting mortality or complications following percutaneous or surgical cardiovascular procedures. Conclusions: Although there is an expanding body of literature on AI and ML in cardiovascular medicine, the future these fields will have in clinical practice remains to be paved. In particular, there is a promising role in providing automated imaging interpretation, automated data extraction and quality control, and clinical risk prediction although these techniques require further refinement and evaluation.

Abstract Word Count: 250

Artificial intelligence (AI) is a broader term that refers to technologies or systems being able to demonstrate human-like intelligence. Machine learning (ML) is a subset of AI that refers to the ability of machines to learn from data, improve at tasks with experience, and make predictions. AI and ML have been applied to a variety of industries. In healthcare, there is an expanding body of literature on ML based algorithms and their potential clinical utility. In this review, an introduction and overview of ML techniques will be provided. In addition, the author will review prior literature on ML in cardiovascular medicine and surgery, and highlight potential future applications of ML in automated imaging interpretation, electronic medical record extraction and quality control, and predictive analytics as it applies to cardiovascular healthcare.

Methods A search was conducted on PubMed and results were reviewed for articles published up until August 1, 2019. Search terms included “artificial intelligence”, “machine learning”, “deep learning”, “cardiovascular”, “cardiac”, “echocardiography”, “angiography”, “risk modeling”, “cardiology”, and “cardiac surgery”, or any combination thereof. Only studies published in English were reviewed. Otherwise, there were no additional exclusion criteria and all article types including original full-length articles, meta-analyses, prior reviews, and case series were included.

Results Overview of Select ML Algorithms There are 2 major types of ML algorithms (Table 1). Supervised learning refers to cases where there are inputs and known outputs. The goal of the algorithm is to most accurately map the inputs to the outputs. Regression is the type of supervised learning that involves continuous outputs, whereas classification involves non-continuous or categorical outputs. In unsupervised

learning, there are no labeled outputs, and the goal of those algorithms is to learn about the inherent structure of the data itself. An example of an unsupervised ML algorithm is K-means clustering (Figure 1). In this approach, the user defines a target number k, which refers to the number of centroids, or center of each cluster, in the database. Each data point is assigned to a cluster in such a manner to optimally reduce the in-cluster sum of squares. The algorithm starts by randomly selecting centroids and then performs iterative calculations to optimally position the centroids. Once the position of the centroids is relatively stable with subsequent iterations and the predefined number of iterations has been reached, the model is complete. Another useful unsupervised ML algorithm is dimensionality reduction. Dimensionality reduction aims to reduce the number of features or dimensions in a dataset. For example, if there are 80 variables that are collected within a database, dimensionality reduction may aim to shrink the number of features to a few that accurately represent the samples. This approach is advantageous in that it addresses the problem of overfitting. Overfitting exists when a model is formed from many features and is overly complex, thereby becoming increasingly dependent on the data upon which it was trained. This results in poor performance of the model when evaluated in external datasets. The simpler the model and the less assumptions made, the greater likelihood that overfitting will be minimized. Dimensionality reduction also reduces computing time, requires less computing power or storage space, and removes redundant or “noisy” features. There are many examples of supervised learning ML algorithms. A popular algorithm is decision tree analysis (Figure 2). In this, there is a condition or internal node, which is then split into branches. For example, the first condition may ask if the patient is over or under the age of 70 years, upon which separate branches will arise depending on the answer. The end of a branch that no longer has any splits is known as the decision or leaf of the decision tree. The decision represents the outcome that is being studied (e.g., mortality). Important aspects of

decision tree analysis include knowing what conditions to utilize for splitting, and knowing when to stop growing the decision tree. There are several methods to execute these tasks, including recursive binary splitting where all features are evaluated and different splits of data are iteratively attempted using a cost function. The cost function allows the algorithm to identify how much accuracy each split will cost in the model, and thereby allow it to choose the splits that cost the least. The user can predefine the maximum depth of the model as well which sets the distance from the root to a leaf in the decision tree. Model performance can be further optimized in a process called pruning where features or branches of the tree having low importance are removed. The major advantage of decision trees is their ease of visual interpretation although there exists the potential for overfitting particularly with more complex trees. Neural networks are another form of supervised ML (Figure 3). Neural networks consist of inputs and outputs with a series of hidden layers in between. The hidden layers contain a varying number of neurons, with connections between the inputs and the first hidden layer, the first and second hidden layers, and so forth until the final set of connections between the final hidden layer and the outputs. Each neuron represents its own individual model, with different sets of incoming features and different weights depending on importance. In this manner, models are plugged into other models and the sum predictive capability is greater than the individual components. A more recently developed algorithm that our group has been interested in utilizing is extreme gradient boosting, or XGBoost. This is an ensemble learning method, meaning it aggregates the predictive power of multiple algorithms. The model iteratively builds a strong model using a collection of weaker models that typically represent short decision trees. With each iteration, the algorithm takes the difference between the strong model predictions and the ground truth and trains a new weak model to predict the residual values. These weak models

are then added to the strong model, and the contribution of the weak model to the overall strong model is weighted based on the number of outcomes it mislabeled.

Evaluation of the Performance of ML Models In general, risk models can be developed and evaluated in a similar manner regardless of the specific approach. Popular methods include randomly splitting a dataset in a 2:1, 3:1 or 4:1 fashion into training and validation cohorts (Figure 4). If an outcome rarely occurs, it can be advantageous to split the dataset in such a way that the percentage of patients experiencing that particular outcome is the same between the validation and training sets. This avoids a situation whereby the random split overly skews the outcome to occur in the training set and makes it challenging to evaluate the model in the validation set where very few events may have occurred. In cases where there is a limited sample size, k-fold cross-validation is a useful technique (Figure 5). In this method, the dataset is randomly split into k-folds. A single fold is then held out as a testing set and the remaining folds are used as the training set. The process repeats k-fold times and the average performance of the model across the unique testing sets is then calculated. Once the initial ML algorithms are developed, they can be further optimized. This process is known as hyperparameter optimization or tuning. Hyperparameters are factors that can control the learning process for the algorithm. These factors can be altered manually by the user or default settings can be used or adjusted by the software program. Examples of hyperparameters include the support vectors in support vector machine, weights in an artificial neural network, and number of leaves or depth of a decision tree. Some metrics that are used to evaluate the performance of ML algorithms are similar to those that are used in evaluating risk models utilizing traditional methods such as logistic regression. These include measures to evaluate the discriminatory power and calibration of

models. Discriminatory power refers to the ability of the model to distinguish those who have a particular outcome versus those who do not. This is typically evaluated with the area under receiver-operating-characteristic curve, or c-index. A c-index of 0.50 indicates no discriminatory power (equivalent to flipping a coin) versus 1.0 which indicates perfect discrimination. Risk models with a c-index over 0.70 demonstrate reasonable discriminatory power and those over 0.80 are considered to demonstrate strong discriminatory power. Calibration refers to the ability of a model to assign an average risk of an outcome accurately to a population. As such, the observed outcomes are similar to the expected outcomes in a well-calibrated model. Calibration is evaluated using the Hosmer-Lemeshow test, where a p-value of greater than 0.05 suggests that there are no significant differences between the observed and expected outcomes and that the model therefore has good overall calibration. This can be visually depicted by plotting the observed versus expected outcomes on a graph with the line of perfect correlation corresponding to a 45-degree line. Calibration is also evaluated using the slope and calibration-in-the-large metrics from the calibration plot. Perfect calibration is represented by a slope of 1 and a calibration-in-the-large (or y-intercept) of 0. Other metrics used to evaluate ML models include precision, recall, and accuracy. Precision is the ratio of true positives divided by the sum of the true positives and false positives. Precision answers the question: if a test predicts a positive event, it is correct what percentage of the time? Recall, which is the ratio of true positives divided by the sum of the true positives and false negatives, is the same as the sensitivity of a model. It answers the question: what percentage of true positives were identified correctly by the model? Precision-recall graphs can also be plotted with precision on the y-axis and recall on the x-axis. An F1 score can be calculated which equals 2 multiplied by precision multiplied by recall divided by the sum of precision and recall. The optimal F1 score is achieved at the break-even point which is where the precision-recall curve intersects with a 45-degree line. Accuracy of a model equals

the sum of true positives and true negatives divided by the sum of true positives plus true negatives plus false positives plus false negatives. Accuracy therefore is a reflection of how well the model identified true positives and true negatives.

Automated Imaging Interpretation One potential application of ML in the field of cardiovascular healthcare is in automated imaging interpretation (Table 2). For example, chest radiographs are routinely obtained in patients with cardiovascular disease. In a study utilizing convolutional neural networks, the authors were able to demonstrate that ML algorithms were able to diagnose 14 different pathologies on chest xrays at a similar performance level as practicing radiologists but with an average time of interpretation that was much shorter (240 versus 1.5 minutes).[1] Another similar application of ML is with electrocardiograms. A deep neural network was used to classify 12 rhythm classes from 91,232 electrocardiograms from 53,549 patients.[2] The ML algorithm had a c-index of 0.97 with an F1 score of 0.837 which was superior to that of cardiologists reading the same electrocardiograms (0.780). Another analysis utilized the GoogLeNet deep neural network architecture to classify 5 types of rhythms and demonstrated an accuracy of 96% with an error rate of 4%.[3] A study combining convolutional neural networks and long short-term memory algorithms showed high classification performance in electrocardiogram interpretation, with an accuracy of 98%, sensitivity of 98%, and specificity of 99%.[4] ML has been applied to coronary computed tomographic angiography as well. A multicenter study evaluating 351 patients demonstrated that a ML-based deep learning algorithm was able to outperform visual interpretation in estimating fractional flow reserve.[5] Another analysis of 1,052 patients undergoing coronary computed tomographic angiography showed that a deep learning model was associated with a c-index of 0.78 for the detection of an abnormal fractional flow reserve, compared to 0.56 with visual interpretation.[6]

Other imaging modalities that involve video-based data present unique challenges to ML. For instance, developing automated ML-based interpretation of invasive coronary angiography is challenged by non-rigid deformations, vessel segment identification, and the asynchronous nature of multiple-view videos. A prior study that aimed to automate quantification of stenosis from invasive coronary angiograms was based on still images.[7] Although others have developed algorithms to perform tasks such as tracking correspondence across angiographic frames, the vast majority of studies in invasive angiography have operated on still images.[8-10] Similar challenges arise in ML-based interpretation of echocardiography. Pipeline supervised models were used to achieve an accuracy of 94% in 15-view still-image echocardiograms.[11] A recent study used convolutional neural networks to train and validate ML algorithms using 14,035 echocardiograms.[12] The automated measurements were comparable or superior to manual measurements and the c-indices in diagnosing hypertrophic cardiomyopathy, cardiac amyloidosis, and pulmonary artery hypertension were 0.93, 0.87, and 0.85, respectively.

Natural Language Processing from the Electronic Health Record Natural language processing is a subset of AI that refers to the ability of computers to read and understand humans’ natural language. In healthcare, this offers the opportunity to comb through millions of documents and automate extraction and interpretation of data. One study used natural language processing to accurately identify and label affirmations and denials of Framingham heart failure diagnostic criteria in primary care clinical notes in the electronic medical record with a precision of 0.925, recall of 0.896, and F1 score of 0.910.[13] The authors noted that this approach could be useful in early detection of heart failure in an automated fashion. Another analysis similarly showed that 85% of patients who were eventually diagnosed with heart failure had met at least 1 criterion in the preceding 1 year of their formal

diagnosis.[14] The risk of stroke and major bleeding in patients with atrial fibrillation was predicted using clinical notes and structured data.[15] Natural language processing can also be used for quality control. There was substantial discordance between patient-reported symptoms and clinical documentation in one study where Kappa statistics were 0.52 for chest pain, 0.46 for dyspnea, and 0.38 for cough.[16]

Predictive Analytics Predictive analytics are a cornerstone of care in both cardiovascular medicine and surgery. Predictive analytics simply is forecasting or predicting future outcomes based on current data. In cardiac surgery, the Society of Thoracic Surgeons (STS) risk models for operative mortality, major morbidity, and length of stay are well-established forms of predictive analytics that use logistic regression and have profound implications in the field, including surgeon and hospital evaluation, quality improvement, facilitation of discussions with patients and their families, use in defining inclusion criteria for clinical trials, and optimal therapy selection. Similarly, the American College of Cardiology Foundation’s National Cardiovascular Data Registry (NCDR) also uses statistical models for similar purposes in percutaneous coronary intervention (PCI). ML algorithms have been used in predicting outcomes in cardiovascular healthcare and in some instances their performance has been compared to societal risk models in current use. In a study of the NCDR database, a XGBoost ML algorithm was found to predict major bleeding after PCI with a c-index of 0.82 which was improved compared to the current model used by NCDR at 0.78.[17] Another study utilizing the NCDR database found that ML algorithms were better able to predict acute kidney injury after PCI.[18] In cardiac surgery, a study demonstrated that a ML algorithm was able to outperform both the EuroSCORE II and a separate logistic regression model in predicting in-hospital mortality after elective open-heart operations with a c-index of 0.795.[19] A study of 2,010

patients undergoing cardiac surgery evaluated several models to predict the risk of postoperative acute kidney injury and found that the best performance was achieved by gradient boosting machine (c-index 0.78, compared to logistic regression 0.69).[20] Our group has demonstrated in a single center analysis of 11,190 patients undergoing index cardiac operations that XGBoost was able to achieve modest but significant improvements in predicting operative mortality as compared to the STS risk models.[21]

Comment According to a recent analytic report, the AI in healthcare market is expected to grow from 2.1 billion dollars to 36.1 billion dollars by 2025.[22] This impact will undoubtedly extend to the cardiovascular healthcare arena as well. As highlighted above, there are numerous potential applications of AI and ML as it relates to cardiovascular medicine and surgery. These fields are particularly amenable to the introduction of such technology as risk modeling has played a vital role in both the NCDR and STS as well as other similar cardiovascular data registries for decades. Although there is much promise for ML in the cardiovascular realm, there is a need to be cautiously optimistic as several criteria are essential for transition of such algorithms into routine clinical practice. Foremost, the ML algorithm needs to be accurate. As a society and community, our threshold for acceptable accuracy will likely depend on the particular scenario at-hand. For instance, there will likely be a very small margin for error in detecting lifethreatening events in chest xrays such as tension pneumothorax. In these situations it will be more acceptable for the ML algorithm to overcall events and have a higher false positive rate rather than a higher false negative rate. In situations where the event is common and of little consequence to a patient’s condition, missing events for the sake of higher accuracy will likely be more acceptable.

Also, in defining our thresholds for acceptability, the reference standard to which the ML model is being compared to is of utmost importance. In situations where ML is being compared to physicians, such as in imaging interpretation, there may be a reluctance or hesitancy among the public to trust centers using such technology, particularly if these technologies are used to fully replace the human aspect of interpretation. More likely than this will be centers that use the technology to triage reads by radiologists or cardiologists, and to supplement interpretation rather than replace it fully. Other instances where ML may supplement rather than replace existing methods would include in natural language processing where the ML algorithm may alert a physician to more closely examine the electronic medical record for treatment candidacy for example, rather than dictate care in an independent fashion. ML may also be supplementary in predictive analytics where certain subsets of patients and events may be more accurately predicted with ML and some more accurately predicted with traditional approaches such as logistic regression. Some ML algorithms have also been described as a “black box” where there is little insight into how the model is basing its prediction. This brings into question how physicians can modify certain risk factors in patients to make them a better candidate for therapy without knowing what is impacting their outcome. There are some ML algorithms, however, such as XGBoost, where the relative importance of variables in predicting a particular outcome can be calculated and displayed. This provides a level of insight similar to a logistic regression model with regards to individual risk factors and their predictive importance. In addition to some of these challenges, there are ethical issues that would need to be addressed before widely implementing ML into clinical practice. Privacy is one such ethical issue. Potentially sensitive personal health information such as human immunodeficiency virus status, intravenous drug use, smoking history, or other high risk behavior could potentially be imputed or patterns recognized and assumed by ML algorithms in cases where the patient does not disclose the information. Whether disclosed or not, if these factors are found to be predictive

of an outcome by a ML algorithm, they could potentially bias a treatment recommendation against such groups of patients. Similarly, if a “black box” ML algorithm that sheds very little insight into its individual predictive factors identifies race, age, or sex, for example, as a significant predictor, the algorithm may then bias recommendations against certain groups of patients. Once an ML model is developed and validated, its use in clinical practice would still require continual evaluation. This could be in the form of safety monitoring to ensure that mistakes made by the ML algorithm do not lead to deleterious effects on patients. The impact an ML algorithm has on practice can also be evaluated in a clinical trial format where standard of care is compared to ML-supported care. Even after validation and the establishment of potential utility in clinical practice, the software based on the ML algorithms will need to successfully make it to market. Aside from the usual financial and regulatory issues surrounding new software, the U.S. Food and Drug Administration is considering a new framework for AI and ML based software to be considered as medical devices.[23] There also needs to be a plan from developers for updating the models. ML models are known to be “data hungry”, that is, as they accumulate more data for analysis they continually refine their predictive algorithms in a way that makes them more accurate. Despite these challenges, there is certainly a strong appeal for ML-based medical imaging interpretation, natural language processing, and clinical decision support. These appeals extend to global healthcare as well. Moreover, there is a possibility of training algorithms to interpret images and support or guide clinical decision-making based on expert level ground truths and then transfer such expertise to underserved areas. This is attractive both from a quality of care standpoint but also an efficiency standpoint if the ratio of physicians to the general populous is very low in a certain area of the world, or if certain areas of the world

have an undersupply of expert medical professionals. Whether those areas have the ability to then implement and deliver the recommended care is another question.

Conclusions The application of AI and ML to healthcare is in an exciting phase. This is somewhat of a “blank canvas” and the shaping of how this technology will be used and implemented in clinical practice remains to be determined. Prior analyses using a variety of ML techniques to interpret medical imaging in an automated way, to process electronic medical records, and to provide predictive models have been promising. Although the science is being developed in an exponential way, translating that science into actual practice will require thoughtful discussion between multiple stakeholders. The ultimate goal will be to utilize this technology to provide more informed and effective care and to do so in a more efficient and cost-effective way.

References 1.

Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz CP, Patel BN, Yeom KW, Shpanskaya K, Blankenberg FG, Seekins J, Amrhein TJ, Mong DA, Halabi SS, Zucker EJ, Ng AY, Lungren MP. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018 Nov 20;15(11):e1002686.

2.

Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019 Jan;25(1):65-69.

3.

Kim JH, Seo SY, Song CG, Kim KS. Assessment of Electrocardiogram Rhythms by GoogLeNet Deep Neural Network Architecture. J Healthc Eng. 2019 Apr 28;2019:2826901.

4.

Oh SL, Ng EYK, Tan RS, Acharya UR. Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats. Comput Biol Med. 2018 Nov 1;102:278-287.

5.

Coenen A, Kim YH, Kruk M, Tesche C, De Geer J, Kurata A, Lubbers ML, Daemen J, Itu L, Rapaka S, Sharma P, Schwemmer C, Persson A, Schoepf UJ, Kepka C, Hyun Yang D, Nieman K. Diagnostic Accuracy of a Machine-Learning Approach to Coronary Computed Tomographic Angiography-Based Fractional Flow Reserve: Result From the MACHINE Consortium. Circ Cardiovasc Imaging. 2018 Jun;11(6):e007217.

6.

Kumamaru KK, Fujimoto S, Otsuka Y, Kawasaki T, Kawaguchi Y, Kato E, Takamura K, Aoshima C, Kamo Y, Kogure Y, Inage H, Daida H, Aoki S. Diagnostic accuracy of 3D deep-learning-based fully automated estimation of patient-level minimum fractional flow reserve from coronary computed tomography angiography. Eur Heart J Cardiovasc Imaging. 2019 Jun 23.

7.

Au B, Shaham U, Dhruva S, Bouras G, Cristea E, Lanksy A, Coppi A, Warner F, Li S, Krumholz H. Automated Characterization of Stenosis in Invasive Coronary Angiography Images with Convolutional Neural Networks. arXiv 2018 (accepted, in press).

8.

Shin SY, Lee S, Noh KJ, Yun ID, Lee KM. (2016) Extraction of Coronary Vessels in Fluoroscopic X-Ray Sequences Using Vessel Correspondence Optimization. In: Ourselin S., Joskowicz L., Sabuncu M., Unal G., Wells W. (eds) Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science, vol 9902. Springer, Cham.

9.

Vlontzos A, Mikolajczyk K. Deep segmentation and registration in x-ray angiography video. arXiv 2018 (accepted, in press).

10.

Moccia S, De Momi E, El Hadji S, Mattos LS. Blood vessel segmentation algorithms Review of methods, datasets and evaluation metrics. Comput Methods Programs Biomed. 2018; 158: 71-91.

11.

Madani A, Ong JR, Tibrewal A, Mofrad MRK. Deep echocardiography: data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease. NPJ Digit Med. 2018 Oct 18;1:59.

12.

Zhang J, Gajjala S, Agrawal P, Tison GH, Hallock LA, Beussink-Nelson L, Lassen MH, Fan E, Aras MA, Jordan C, Fleischmann KE, Melisko M, Qasim A, Shah SJ, Bajcsy R, Deo RC. Fully Automated Echocardiogram Interpretation in Clinical Practice. Circulation. 2018 Oct 16;138(16):1623-1635.

13.

Byrd RJ, Steinhubl SR, Sun J, Ebadollahi S, Stewart WF. Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records. Int J Med Inform. 2014 Dec;83(12):983-92.

14.

Vijayakrishnan R, Steinhubl SR, Ng K, Sun J, Byrd RJ, Daar Z, Williams BA, deFilippi C, Ebadollahi S, Stewart WF. Prevalence of heart failure signs and symptoms in a large

primary care population identified through the use of text and data mining of the electronic health record. J Card Fail. 2014 Jul;20(7):459-64. 15.

Wang SV, Rogers JR, Jin Y, Bates DW, Fischer MA. Use of electronic healthcare records to identify complex patients with atrial fibrillation for targeted intervention. J Am Med Inform Assoc. 2017; 24: 339–44.

16.

Pakhomov SV, Jacobsen SJ, Chute CG, Roger VL. Agreement between patient-reported symptoms and their documentation in the medical record. Am J Manag Care. 2008 Aug;14(8):530-9.

17.

Mortazavi BJ, Bucholz EM, Desai NR, Huang C, Curtis JP, Masoudi FA, Shaw RE, Negahban SN, Krumholz HM. Comparison of Machine Learning Methods With National Cardiovascular Data Registry Models for Prediction of Risk of Bleeding After Percutaneous Coronary Intervention. JAMA Netw Open. 2019 Jul 3;2(7):e196835.

18.

Huang C, Murugiah K, Mahajan S, Li SX, Dhruva SS, Haimovich JS, Wang Y, Schulz WL, Testani JM, Wilson FP, Mena CI, Masoudi FA, Rumsfeld JS, Spertus JA, Mortazavi BJ, Krumholz HM. Enhancing the prediction of acute kidney injury risk after percutaneous coronary intervention using machine learning techniques: A retrospective cohort study. PLoS Med. 2018 Nov 27;15(11):e1002703.

19.

Allyn J, Allou N, Augustin P, Philip I, Martinet O, Belghiti M, Provenchere S, Montravers P, Ferdynus C. A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis. PLoS One. 2017 Jan 6;12(1):e0169772.

20.

Lee HC, Yoon HK, Nam K, Cho YJ, Kim TK, Kim WH, Bahk JH. Derivation and Validation of Machine Learning Approaches to Predict Acute Kidney Injury after Cardiac Surgery. J Clin Med. 2018 Oct 3;7(10).

21.

Kilic A, Goyal A, Miller JK, Gjekmarkaj E, Lam Tam W, Gleason TG, Sultan I, Dubrawski A. Predictive Utility of a Machine Learning Algorithm in Estimating Mortality Risk in Cardiac Surgery. Ann Thorac Surg (under review).

22.

Artificial Intelligence in Healthcare Market by Offering, Technology, End-Use Application, End User and Geography -- Global Forecast to 2025. Availble at http://reportlinker.com. Accessed August 2, 2019.

23.

Proposed Regulatory Framework for Modifications to Artifical Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD). Available at: https://www.fda.gov/media/122535/download. Accessed August 5, 2019.

Table 1. Examples of Machine Learning Algorithms. Algorithm K-means clustering

Type

Basic Description

Unsupervised

Creates k number of centroids that are used to define clusters of data. Each data point is assigned to its closest centroid.

Hierarchical clustering

Unsupervised

A hierarchy of clusters is built, with merging of clusters closest to each other. The distances between clusters are then recalculated and eventually a tree or dendrogram of clusters is built.

Principal component analysis

Unsupervised

A form of dimensionality reduction that makes data more compressible by reducing the number of data points or dimensions. The aim is to reduce complexity while maintaining structure of the data.

Singular value decomposition

Unsupervised

A model that allows us to decompose a big matrix of data into a product of smaller matrices.

Naïve Bayes theorem

Supervised

A simple and efficient algorithm that makes an assumption of independence among predictors.

K-nearest neighbors

Supervised

Stores all available cases and assigns new cases based on a similarity measure.

Support vector machine

Supervised

A discriminative classifier that is defined by a separating hyperplane that categorizes new examples.

Random forest

Supervised

Consists of a large number of decision trees that operate as an ensemble. Each individual tree provides a class prediction and the prediction with the most votes becomes the overall models prediction.

Extreme gradient boosting

Supervised

An ensemble method that builds a strong model based on weaker models that are short decision trees. A new weak model is created to predict the residual values between the ground truth and the strong model, and these weak models are then added to the overall strong model in an iterative fashion.

Decision tree

Supervised

There are condition or internal nodes based upon which the tree will split into branches or edges. Branches that do not split any further are decisions or leaves of the tree, which represents the predicted outcome. Recursive binary splitting is one method to determine how the data are split and is based on a cost function that determines which splits cost the least in accuracy for the model.

Table 2. Summary of Potential Applications of Machine Learning in Cardiovascular Healthcare Application

Examples

Automated Imaging Interpretation

Automated interpretation of chest xrays including identification of pneumothoraces, pleural effusions, pulmonary masses, tuberculosis, atelectasis, pneumonia, and emphysema Detection of arrhythmias from electrocardiograms Identification of significant coronary stenosis using coronary computed tomographic angiography Automated interpretation of invasive coronary angiograms Calculation of echocardiographic parameters including left ventricular and valvular dimensions in addition to diagnosis of structural diseases such as hypertrophic cardiomyopathy and cardiac amyloidosis

Natural Language Processing

Identifying patients meeting Framingham heart failure diagnostic criteria from primary care clinical notes Predicting risk of stroke or major bleeding in patients with atrial fibrillation Identifying patients at risk for progression of coronary or valvular disease from electronic health records

Predictive Analytics

Prediction of major bleeding after percutaneous coronary intervention Predicting operative mortality after open-heart surgery Predicting acute kidney injury risk after cardiac surgery Estimating the risk of major complications of cardiac surgery

Predicting procedural mortality and complication profiles in transcatheter aortic valve replacement Predicting early and late outcomes of left ventricular assist device implantation or heart transplantation

Figure Legends Figure 1. An example of K-means clustering where the different color dots represent different clusters of data points with each asterisk representing the centroid associated with those clusters.

Figure 2. An example of a decision tree algorithm that predicts mortality or survival based on conditions or internal nodes that result in splits or branches and finally, leaves or predictions in the tree.

Figure 3. A neural network that consists of an input layer, 2 hidden layers, and an output layer.

Figure 4. An example of randomly dividing a dataset of 1,000 patients in a 2:1 fashion into training and testing sets.

Figure 5. An example of 10-fold cross-validation where the sample is randomly split into 10 folds, with 1 fold being held out as a testing set, and the process repeating 10 times with average performance metrics measured and reported across the 10 folds.