Learning physical properties in complex visual scenes: An intelligent machine for perceiving blood flow dynamics from static CT angiography imaging

Learning physical properties in complex visual scenes: An intelligent machine for perceiving blood flow dynamics from static CT angiography imaging

Journal Pre-proof Learning physical properties in complex visual scenes: An intelligent machine for perceiving blood flow dynamics from static CT angi...

3MB Sizes 0 Downloads 3 Views

Journal Pre-proof Learning physical properties in complex visual scenes: An intelligent machine for perceiving blood flow dynamics from static CT angiography imaging Zhifan Gao, Xin Wang, Shanhui Sun, Dan Wu, Junjie Bai, Youbing Yin, Xin Liu, Heye Zhang, Victor Hugo C. de Albuquerque

PII: DOI: Reference:

S0893-6080(19)30376-4 https://doi.org/10.1016/j.neunet.2019.11.017 NN 4325

To appear in:

Neural Networks

Received date : 26 March 2019 Revised date : 22 October 2019 Accepted date : 19 November 2019 Please cite this article as: Z. Gao, X. Wang, S. Sun et al., Learning physical properties in complex visual scenes: An intelligent machine for perceiving blood flow dynamics from static CT angiography imaging. Neural Networks (2019), doi: https://doi.org/10.1016/j.neunet.2019.11.017. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Elsevier Ltd. All rights reserved.

Journal Pre-proof

pro of

Learning Physical Properties in Complex Visual Scenes: An Intelligent Machine for Perceiving Blood Flow Dynamics from Static CT Angiography Imaging Zhifan Gaoa,∗, Xin Wangb,∗, Shanhui Sunb , Dan Wub , Junjie Baib , Youbing Yinb,∗∗, Xin Liuc , Heye Zhangd,∗∗, Victor Hugo C. de Albuquerquee a

Western University, London, Canada Shenzhen Keya Medical Technology Corporation, China c Guangdong Academy Research on VR Industry, Foshan University, China d School of Biomedical Engineering, Sun Yat-sen University, China e Graduate Program in Applied Informatics, University of Fortaleza, Fortaleza/CE, Brazil

re-

b

Abstract

Jo

urn a

lP

Humans perceive physical properties such as motion and elastic force by observing objects in visual scenes. Recent research has proven that computers are capable of inferring physical properties from camera images like humans. However, few studies perceive the physical properties in more complex environment, i.e. humans have difficulty estimating physical quantities directly from the visual observation, or encounter difficulty visualizing the physical process in mind according to their daily experiences. As an appropriate example, fractional flow reserve (FFR), which measures the blood pressure difference across the vessel stenosis, becomes an important physical quantitative value determining the likelihood of myocardial ischemia in clinical coronary intervention procedure. In this study, we propose a novel deep neural network solution (TreeVes-Net) that allows machines to perceive FFR values directly from static coronary CT angiography images. Our framework fully utilizes a tree-structured recurrent neural network (RNN) with a coronary representation encoder. The encoder captures coronary geometric information providing the blood fluid-related representation. The tree-structured RNN builds a long-distance spatial dependency of blood flow ∗

Zhifan Gao and Xin Wang are co-first authors Corresponding author: Heye Zhang ([email protected]) and Youbing Yin ([email protected]) ∗∗

Preprint submitted to Elsevier

November 23, 2019

Journal Pre-proof

pro of

information inside the coronary tree. The experiments performed on 13000 synthetic coronary trees and 180 real coronary trees from clinical patients show that the values of the area under ROC curve (AUC) are 0.92 and 0.93 under two clinical criterions. These results can demonstrate the effectiveness of our framework and its superiority to seven FFR computation methods based on machine learning. Keywords: Learning Physical Properties, Tree-structured RNN, LSTM, Fractional Flow Reserve, CT angiography 1. Introduction

Jo

urn a

lP

re-

Exploring the intrinsic physical knowledge hidden in visual scenes has become an emerging research hotspot in machine learning as it facilitates to investigating how people understand the world (Ullman et al., 2014). In recent studies, this exploration aided by machine has been frequently performed in visual scenes that imply certain physical laws, such as a pillow falling (law of gravity) (Stewart and Ermon, 2007), an object sliding down (law of friction) (Wu et al., 2015), or pucks colliding (momentum conservation principle) (Ullman et al., 2014), etc., as illustrated in Figure 1. Similarly, the computer-aided medical image processing is another process for understanding the physiological knowledge hidden in visual scenes, i.e. medical images. In current studies, most medical image indices are related to anatomical structures (Tseng et al., 2017; Zhang et al., 2017; Zhou et al., 2017; Zhu et al., 2017; Gao et al., 2015; Xu et al., 2018; Zhao et al., 2018), which can be easily identified by the visual observation of experienced clinicians. For example, clinicians can assess the carotid intima-media thickness according to the visualization of arteries in ultrasound imaging, for the purpose of diagnosing carotid atherosclerotic disease (Shin et al., 2016). However, some medical indices, like the mechanical property of tissue (Gao et al., 2017b, 2018a) and the pressure inside vessels (Ho et al., 2013), are difficult to be quantified through the naked eye observation to the medical images, even for experienced clinicians. We believe that the complexities behind these scenes bring great difficulties in understanding medical indices related to physical properties. Prior to overcoming the challenges mentioned previously, we would like to understand the complexities behind these scenes. First, the physical law in some environments is not intuitive and complex in computation or in logic. 2

(a) falling

pro of

Journal Pre-proof

(b) sliding down

(c) bouncing

Navier-Stokes Equation

(e) fluid dynamics

re-

(d) rebounding

lP

Figure 1: Visual scenes constrained by different physical principles including (a) falling, (b) sliding down, (c) bouncing, (d) rebounding, and (e) fluid flowing. Our target environment implies more complex and counterintuitive physical knowledge.

Jo

urn a

Thus, humans have difficulty imaging based on their previous experiences. Second, the target physical property relies on high-level representations of visual observation which cannot be directly perceived by humans. Thus, the human visual system responds to a poor perception of this property. Finally, the target physical properties vary in the spatial domain rather than the time domain, i.e. the values of these properties may change in the spatial domain. In this study, we propose a new machine learning technology for the physical knowledge discovery from visual observation with such complexities. As a key medical index with such complexities, fractional flow reserve (FFR), defined as the ratio of distal coronary pressure to aortic pressure obtained during maximal coronary hyperemia (Toth et al., 2016) (see Figure 2d for details), aims to evaluate myocardial ischemia with the following characteristics: clinicians cannot directly quantify it through observing the coronary appearance in CT angiography images; FFR measures inherently the variation of blood flow constrained by the complex physical process, i.e. Navier-Stokes equations (Xiao et al., 2014); FFR values vary in different regions within a coronary according to vascular geometrical structure. From the perspective of computer vision, this task of FFR computation from CT 3

Journal Pre-proof

urn a

lP

re-

pro of

angiography images can be understood as an intelligent machine perceiving physical properties from visual scenes. Moreover, sample imbalance and annotation scarcity are very common in medical image recognition tasks such as Alzheimer detection (Zhou et al., 2013; Zhu et al., 2014; Hwang et al., 2016), brain tumor classification (Tseng et al., 2017; Subbanna et al., 2014; Li et al., 2018) and bone mineral density computation (Lin et al., 2017). In the FFR computation, the situation becomes even worse since the acquisition of true FFR values requires the invasive catheterization procedure in clinic. This leads to the difficulty to collect a large number of real FFR measurements for machine learning. In this paper, we learn the physical knowledge from complex visual scenes, i.e. propose an intelligent machine which we coin TreeVes-Net to compute FFR values from coronary CT angiography images. Specifically, the blood pressure information (physical knowledge) is inferred from vessel morphology (visual scenes). TreeVes-Net utilizes a recurrent neural network (RNN) enforcing the interaction between the upstream and downstream information of a coronary tree. The local and global fluid-related coronary geometric features along vessel centerline are encoded as an input of the network. Then, we use long short-term memory units (LSTM) as the RNN unit (Greff et al., 2017). LSTM further leverages the spatial long-term dependencies between the features of the centerline points. Taking the coronary tree structure into account, we further propose a tree-structured LSTM to improve FFR inference accuracy. To solve overfitting problems due to lack of training data and annotations, we synthesize plenty of coronary trees and apply hemodynamics simulation to generate the corresponding data labels from the coronary morphology. The contributions of our framework can be summarized as follows:

Jo

• We enable the physical property inference of dynamics fluid constrained by Navier-Stokes equations from complex visual scenes, i.e. coronary CT angiography images. In particular, a tree-structured RNN framework is developed to compute FFR values from CT angiography images for myocardial ischemia diagnosis. This framework can solve the problems of the spatial long-term dependency learning, and the inconsistency of FFR computation with respect to the networks without the tree structure. To the best of our knowledge, this is the first work to study a non-invasive FFR inference from CT angiography images using a deep recurrent neural network. 4

Journal Pre-proof

pro of

• We introduce a physical-induced strategy to increase the number of the labeled training data (a total of 13000 coronary trees). This strategy has shown a successful performance of generating precious labeled data through a physical-based numerical simulation, and can be efficiently transferred to other fields for facilitating similar physical knowledge inferences.

re-

• We validate the TreeVes-Net on 180 clinical subjects from three hospitals. The experimental results show that the FFR values calculated by our framework has a high agreement with the invasive pressure wirebased technology in clinic (ground truth). Our framework has a great potential in clinical diagnosis because of its noninvasive nature and better performance than seven other FFR computation methods based on machine learning.

2. Related work

Jo

urn a

lP

2.1 Physical knowledge discovery based on visual observation Previous studies on visual understanding with physical knowledge aim to discover the physical properties of objects according to human observation, such as velocity, distance, materials, support, etc. (Ullman et al., 2014; Wu et al., 2015; Gupta et al., 2010; Bouman et al., 2013; Bell et al., 2015; Wu et al., 2016; Gupta et al., 2011; Silberman et al., 2012; Zheng et al., 2013; Jia et al., 2013, 2015; Yu et al., 2015; Ehrhardt et al., 2019). This discovery commonly focuses on the intuitive physical law and requires the data label to be generated manually by observers. These studies further motivate the estimation of temporal variation of physical properties in specially-designed scenes including object falling and sliding (Stewart and Ermon, 2007; McIntyre et al., 2001; Zago et al., 2009; Mottaghi et al., 2016), bouncing (Nusseck et al., 2007; Kyriazis et al., 2011) and collision (Fragkiadaki et al., 2016). In addition, the possible change of the static environment is also concerned for the potential danger evaluation, like the collapse of stacked objects (Battaglia et al., 2013; Zhang et al., 2016; Lerer et al., 2016). Contrasting with these recent studies, the novelty of our study is to learn the non-intuitive physical properties which cannot be directly observed from visual scenes. Specifically, we aim to learn the blood-pressure-related information from static coronary CT angiography image data, where the blood 5

Journal Pre-proof

Vascular pressure in coronary ௔ decoder

lumen

encoder

tree-structured bidirectional RNN encoder

encoder

ௗ

pro of

decoder

plaque

௔ ˖ proximal coronary pressure ௗ ˖ distal coronary pressure

encoder

Hemodynami Hemodynamics simulation by lu lump model and FEM FE

FFR definition:

 ൌ

Ground truth (GT):

FFR colormap

0.6

Our GT 0.74 0.73

Our GT 0.92 0.96

Our GT 0.68 0.68

0.9

(c) FFR computation

(b) Training

Our GT 0.90 0.93

0.8

Our GT 0.66 0.69

re-

(a) Label generation

0.7

Our GT 0.68 0.70

ௗ ௔

Pressure wired-based measurement

1.0

(d) Ground truth

Our GT 0.74 0.70

Our GT 0.91 0.90

lP

(e) Sample results

urn a

Figure 2: The overview of our proposed framework and results. (a) An illustration of the annotation generation method (FFR value along the coronary centerline) utilizing one-dimensional model. (b) The training process. (c) FFR inference process. (d) An illustration of FFR definition, which is the ratio ofdistal coronary pressure to aortic pressure obtained during maximal coronary hyperemia. (e) Examples of our results. The tables show FFR value comparison of our results (”Our”) and the ground truth (”GT”).

Jo

pressure inside the coronary vessel varies in the spatial domain rather than the time domain. Furthermore, the physical law of the blood flow dynamics is not intuitive and very complex, and the target physical property might not be able to be represented by simple or low-level feature representation of visual observation. Thus, clinicians have difficulty making the correct inference according to their previous experiences when observing the CT angiography images. This leads to the poor performance of human visual systems on the evaluation of blood flow using only CT angiography. It is worth noting that our framework to infer the physical property seems to be similar to previous studies by Wu et al. (2015, 2016), but these studies inferred unobservable 6

Journal Pre-proof

tanh

pro of

tanh

tanh

tanh

tanh

tanh

tanh

tanh

(b)

(a)

(c)

(d)

re-

Figure 3: The connection and structures of LSTM units. (a) The bidirectional connection between LSTM units within a coronary branch. (b) The bidirectional connection between the LSTM unit at the coronary bifurcation and at its two sibling centerline points. (c) The inner structure of the LSTM units for (a), formulated in Equation (3). (d) The inner structure of the LSTM units for (b), formulated in Equation (4). This figure only shows the two-branch bifurcation for simplicity. Our network architecture can handle the bifurcation with multiple branches (> 2) such as trifurcation.

lP

properties from observable properties according to the intuitive physics-based relationship, such as object sliding down. This is different from the target proposed in our study.

Jo

urn a

2.2 FFR measurement Invasive pressure wired-based FFR is the gold-standard clinical technology used to evaluate myocardial ischemia during coronary intervention practice (Toth et al., 2016). However, this invasive FFR has operational risks such as sudden heart attack caused by catheter fracture, and allergic reaction owing to use of vasodilator for coronary expansion. In addition, patients have to suffer from expensive medical costs and high radiation exposure. These limitations motivate the development of noninvasive, pressure wire-free the FFR computation technology (van de Hoef et al., 2013; Liu et al., 2019). Previous FFR computation approaches mainly use computational fluid dynamics simulation. With the patient-specific boundary (Zhang et al., 2014; Liu et al., 2016), the flow and pressure distribution within coronary vessel can be computed, and the distribution of FFR values along the coronary centerline can be obtained. Existing FFR computation methods can use coronary CT angiography (Lu et al., 2017), 3D quantitative X-ray angiography (Tu et al., 2014), or optical coherence tomography images (Zafar et al., 7

Journal Pre-proof

urn a

3. Methodology

lP

re-

pro of

2014; Seike et al., 2017). The CT-based FFR computation method should be one promising techniques as its clinical effectiveness has been proved in various clinical trials (Lu et al., 2017; Douglas et al., 2016; Miyoshi et al., 2015). Nevertheless, the CT-based method suffers from high computational costs owing to numerical computation of hundreds of thousands of grids in hemodynamics simulation. Motivated by the tree structure of coronary morphology in the network design, we propose the TreeVes-Net to extract the FFR value from individual CT angiography images. Our framework can achieve a high performance of myocardial ischemia diagnosis and reduce the computational cost to one normal personal computer (rather than a high-performance computer). To the best of our knowledge, only Itu et al. (2016) proposed an FFR computation method based on machine learning technology. However, this study simply computed the FFR values of independent points along coronary centerline according to the arterial geometric features. This means that they did not consider the hidden physical constraint (Navier-Stokes equation) of blood flow dynamics in their models. In contrast, we consider the spatial dependency of fluid information along the coronary vessel and apply the physical-based numerical simulation to generate training data set. Therefore, our TreeVes-Net is able to learn the information constrained by this physical principle. This is the main contribution of our study with respect to the previous FFR computation method based on machine learning.

Jo

In this section, we explain the methodology of inferring the target physical property (i.e. FFR) from the complex medical visual scenes (coronary CT angiography). An overview of our framework is illustrated in Figure 2. In Section 3.1, we present the network architecture of TreeVes-Net. This input of TreeVes-Net is the hydrodynamics-related geometric feature vector (denoted by x) for one position at the centerline of coronary artery, and the output is the corresponding FFR value. The definition of feature vector x can be seen in Section 3.3. Note that we adopt different strategies in the network training and testing. In the training procedure, we only use the simulation data to acquire the network parameters. Specifically, we firstly generate plenty of synthetic coronary arteries. From these synthetic coronary arteries, we then extract the feature vector x and compute the training label (FFR value) according to hemodynamic simulation. The generation of train8

Journal Pre-proof

pro of

ing data can be seen in Section 3.2. Otherwise, we adopt the real clinical data in the testing procedure, i.e. the feature vector x is extracted from CCTA images (see Section 3.3), and the ground truth is the FFR values collected by the pressure sensor on the intracoronary guidewire during percutaneous coronary intervention.

urn a

lP

re-

3.1 Network architecture of TreeVes-Net In TreeVes-Net, a fully-connected multi-layer neural network as a feature encoder is used to embed the input image features and infer their high-level representation on each centerline point. Then, the tree-structured bidirectional LSTM uses these high-level representations to compute FFR values, leveraging long-term dependencies among fluid states at different points on coronary centerlines. Note that our network can handle the bifurcation with multiple branches. The following details of network architecture corresponds to two branches for simplicity. The proposed RNN is illustrated in Figure 3. For a centerline point within a coronary branch, its FFR value computation is impacted by both the state of the nearest upstream point and downstream point. For a centerline point at the artery bifurcation, its FFR value computation is impacted by one nearest upstream point and two nearest downstream points on different downstream branches. Denote D as the set of centerline points at coronary bifurcation. For any centerline point not in the set D, the update of its state is formulated as (see Figure 3a): [hdm , cdm ] = φ1 (hdm−1 , cdm−1 , zm ),

[hum , cum ] = φ1 (hum+1 , cum+1 , zm )

(1)

Jo

h is the hidden vector, c is the cell vector, z is the feature representation of x by the encoder. The subscript m is the index of the centerline point in the current coronary branch, and the superscripts d and u denote the downstream and upstream directions, respectively. φ1 is the RNN unit formulated in Equation (3). The two equations together form a bidirectional RNN unit for the centerline points within a coronary branch, where the former corresponds to the downstream direction and the latter corresponds to the upstream direction. For any centerline point in the set D, the update of its state is formulated as (see Figure 3b): 9

Journal Pre-proof

[hp , cp ] = φ2 (hsl , hsr , csl , csr , zp )

(2)

pro of

[hdout , cdout ] = φ1 (hdin , cdin , zp ),

lP

re-

where the subscript in is the input of the parent point at the coronary bifurcation along the downstream direction, and out is the corresponding output. The superscripts sl and sr are the input of the parent point along the upstream direction, and p is the corresponding output. These two inputs equal to the output of the two sibling points. φ2 is an RNN unit formulated in Equation (4). The two equations together form a bidirectional RNN unit corresponding to centerline points at the coronary bifurcation, where the former corresponds to the downstream direction and the latter corresponds to the upstream direction. Another important aspect is the inner structure of the RNN unit. Because the fluid dynamics of blood flow at two distal points on coronary centerline can influence each other, we apply LSTM as the RNN unit to avoid the long-term dependency problem in spatial domain. The influence is different between the centerline point at and not at coronary bifurcation. Thus φ1 is formulated as (see Figure 3c): e c = tanh (Wc z + Rc h + bc ) , i = σ (Wi z + Ri h + Pi c + bi ) f = σ (Wf z + Rf h + Pf c + bf ) , c∗ = e c i+c f ∗ ∗ o = σ (Wo z + Ro h + Po c + bo ) , h = tanh(c∗ ) o

(3)

urn a

where z, h, c are the input of this LSTM unit and h∗ , c∗ are the output, σ is the sigmoid function. W, R, P and b are network parameters (for all subscripts). φ2 is formulated as (see Figure 3d): e cp = tanh(Wcp zp + Rlc hsl + Rrc hsr + bpc )

ip =σ(Wip zp + Rli hsl + Rri hsr + Pli csl + Pri csr + bpi ) f l =σ(Wfp1 zp + Rlf1 hsl + Rrf1 hsr + Plf1 csl + Prf1 csr + blf ) f r =σ(Wfp2 zp + Rlf2 hsl + Rrf2 hsr + Plf2 csl + Prf2 csr + brf ) cp = e c ip + csl f l + csr f r , hp = tanh(cp ) op

(4)

Jo

sl sr sr p p p op = σ(Wop zp + Rsl o h + Ro h + Po c + bo )

where hsl , hsr , csl , csr , zp are the input of this LSTM unit and hp , cp are the output. W, R, P and b are network parameters (for all superscripts and subscripts). 10

re-

pro of

Journal Pre-proof

lP

Figure 4: Two sample results of coronary lumen segmentation by our framework. The red, green and blue regions in 3D visualization of lumen segmentation represent the aorta, left coronary artery and right coronary artery, respectively. We also select six representative cross-sectional images in the 3D coronary arteries (their locations are showed as the yellow dashed line), and show the detected lumen borders as the green closed curves in these images.

Jo

urn a

3.2 Training data generation To generate the synthetic training set, we build a dataset of 13000 synthetic coronary trees for training our TreeVes-Net based on the geometric parameters of coronary. In order to show varieties of representative coronary morphologies of subjects with suspect myocardial ischemia, we randomly prespecify the values of these parameters in the appropriate ranges, such as vessel radius, branch length and bifurcation angle. The ranges of some parameters are derived from published clinical studies, and others are determined by experienced clinicians. After generating the simulated coronary arteries, we invite experienced clinicians to check the agreement between all synthetic coronary trees and those which may arise in clinical practice according to their professional experience, and then discard all simulated arteries which are pathophysiologically infeasible. In the first step of coronary generation, we initialize the skeleton of coronary tree by the skeleton parameters, including the number of branches, the spatial trends of branches, length of these branches and bifurcation angles. 11

Journal Pre-proof

lP

re-

pro of

Based on the coronary skeleton, we then generate the vessel border by the wall parameters, including vessel radius at inlet and outlet cross sections of every branch. Finally, we generate coronary stenosis regions within the vessel border by the stenosis parameters, including the center location, stenosis ratio, length of narrowing region, widening region and constant stenotic region. The data labels (i.e. FFR values along coronary centerline) corresponding to the above training set can be computed by hemodynamic simulation technology. i.e. numerical methods for solving fluid dynamics problems. In hemodynamic simulation, the vessel wall is assumed to be rigid, and the blood is assumed to be an incompressible and axisymmetry Newtonian fluid with a constant viscosity and no-slip wall boundary condition. Thus the kinematics of blood flow in coronary arteries can be described by one-dimensional formulation of Navier-Stokes equations with two assumptions (Xiao et al., 2014). The first assumption is the dominant component of the flow is axial symmetric and runs along axial direction. Denote U (x, t) is the blood flow field in coronary arteries. According to this assumption, the continuity equation can be formulated as

urn a

1 ∂r(x)Ur (x, t) ∂Ux (x, t) + =0 (5) ∂x r(x) ∂r(x) where x is the distance of the certain centerline point to the coronary root, t is the time, r(x) is the lumen radius. Ur and Ux are the radial and longitudinal components of U , respectively. Then the following formulation can be obtained by integrating Equation (5):

Jo

∂A(x) ∂A(x)U (x, t) + =0 (6) ∂t ∂x where A(x) is the lumen cross-sectional area, and U (x, t) is the axial blood flow velocity averaged over the coronary cross-sectional segment. The other assumption is that the blood pressure is invariant within each lumen cross-sectional segment, but changes along the longitudinal direction. Accordingly, the axial momentum equation can be formulated as ∂Ux (x, t) ∂Ux (x, t) ∂Ux (x, t) + Ux (x, t) + Ur (x, t) ∂t ∂r ∂x  1 ∂P (x, t) β ∂ ∂Ux (x, t) + = r ρ ∂x r ∂r ∂r 12

(7)

Journal Pre-proof

pro of

where P (x, t) is the blood pressure, ρ is the blood density, β is the viscosity. Integrating Equation (7) over the cross-sectional area can derive the following formulation: ∂U (x, t) ∂U (x, t) 1 ∂P (x, t) F (x, t) + U (x, t) + = ∂t ∂x ρ ∂x ρA(x)

(8)

lP

re-

where F (x, t) is the frictional force per unit length along coronary centerline. Equation (6) and (8) can describe the blood flow dynamic in coronary. To solve Equation (6) and (8), the boundary conditions should be known in advance. The inlet and outlet boundary conditions (i.e. blood flow information at two ends of coronary) are provided by a lumped model (Liu et al., 2016; Westerhof et al., 2009), which is a zero-dimensional electrical circuit describing the relationship between blood pressure in coronary. The geometric boundary conditions are generated by the coronary morphology. Finally, the distribution of blood pressure along coronary centerline can be computed by finite element method (Bathe, 2006), which subsequently leads to the distribution of FFR values. The generation of the training set is illustrated in Figure 2(a).

Jo

urn a

3.3 Coronary segmentation and feature extraction The prerequisite of the geometric feature extraction for network testing is to segment the border of coronary artery from CCTA images. We firstly detect the aorta in order to locate the coronary artery. Inspired by (Gao et al., 2017a), we apply the Hough transform to detect the object with circle-like shape and consider it as the aorta. We then use the region growing algorithm to find the coronary root (i.e. the intersection between aorta and coronary artery) within a circular region with the diameter 1.2r and the center as same as the aorta (r is the radius of aorta). After using a three-dimensional (3D) window centered as the coronary root to extract an image patch, we apply the U-Net (Ronneberger et al., 2015) to detect the region of coronary artery and the dynamic programming algorithm to extract the corresponding segment of coronary centerline within this patch. Then, we move the 3D window along the direction of the detected centerline segment, and repeat the above detection of coronary region and centerline until the entire coronary artery is segmented. To demonstrate the capability of our TreeVes-Net to learn with relatively raw features, we select the coronary geometric features related to the dynamics of blood flow field according to hemodynamics simulation, and consider 13

Journal Pre-proof

lP

4. Experiments and Results

re-

pro of

these features as the input of TreeVes-Net. For any point p on the coronary centerline, the input feature vector x for p can be divided into three parts x = [xv , xs , xg ]. The local vascular features xv represent the nearby vascular structure around p, including the coronary cross-section area, the maximum and minimum coronary radius at the point p, the distance between the point x and the nearest upstream coronary bifurcation. The local stenotic features xs represent the geometric characteristics of stenosis on the coronary branch where the point p is located, including the stenosis length and the smallest 50% coronary radius along stenosis. The global features xg consists of the global vascular features and global stenosis features. The former represents the upstream and downstream vascular morphological information with respect to p, including the distance between the coronary root and proximal coronary bifurcation, and the length of the coronary branch where the point p is located, the number and the total area of all upstream/downstream bifurcations connected to this branch. The latter represents the geometric characteristics of six stenosis nearest to the point p. The features of each stenosis is the same with xs .

Jo

urn a

4.1 Experimental Setup Datasets. We generate 13000 synthetic coronary trees with FFR values for training TreeVes-Net. We perform the validation on 180 real coronary trees with invasive pressure wire-based FFR measurements. The real coronary trees are derived from CT angiography images of 180 subjects with suspected myocardial ischemia. All CT angiography images (produced using Siemens 64-slice CT scanner) and FFR measurements are collected by experienced radiologists from four public general hospitals with 60, 31, 42, and 47 subjects, respectively. In the CT image collection, the in-plane image size is 512 × 512 with voxel spacing from 0.25 × 0.25mm2 to 0.75 × 0.75mm2 . The slice thickness ranges from 0.25mm to 0.7mm. The FFR measurement is performed using the VOLCANO instrument and pressure wire (PrimeWire PRESTIGE Plus pressure guide wire) by following the clinical guideline (Levine et al., 2016). The pressure wire is advanced distally to the stenosis after calibration and equalization. Hyperemia is induced by using an intravenous infusion of adenosine (140 µg kg−1 min−1 ). As a golden standard in the current clinical workflow, we use the invasive pressure wire-based FFR measurement as the ground truth to validate 14

Journal Pre-proof

pro of

our framework. The FFR measurement is used to determine the likelihood of myocardial ischemia in interventional coronary procedures. The measurement procedure, developed by Pijls et al. (1995), requires a catheter inserting into coronary arteries from femoral (groin) or radial arteries (wrist), where a small sensor on the tip of the wire within the catheter is used to measure the blood pressure. A pullback of the pressure wire is performed, and then the values of blood pressure along coronary are recorded. The FFR value is then computed according to the definition as the ratio of distal coronary pressure to aortic pressure. In clinical practice, the FFR measurement usually records the FFR value once on each coronary branch with suspected ischemia.

J=

lP

re-

Implementation Details. The encoder is a fully-connected network with four layers (64, 32, 16, and 8 neuron units), and the decoder has one layer with eight neuron units. To initialize the weights of TreeVes-Net, we apply the calibrated Gaussian distribution to the encoder, in order to reduce the large variance of initialized neurons with the grows of the number of inputs, formulated by w ∼ N (0.001, n1 ) where N is the Gaussian distribution, n is the number of inputs, and w is the network weight. Then, the activation function is set as the rectified linear unit (ReLU). In the encoder, we apply dropout with a rate q = 0.5 (Srivastava et al., 2014). The loss function is the summed squared error, defined by N1 X N2 1X (FFRn1 1 ,n2 − FFRn2 1 ,n2 )2 2 n =1 n =1 1

(9)

2

Jo

urn a

where FFR1 and FFR2 are the FFR values computed from TreeVes-Net and the lumped model (see Section 3.2 for details), respectively. N1 is the number of subjects and N2 is the number of coronary centerline points of the nth patient. We apply Adam optimizer to train the TreeVes-Net. Adam is a first-order gradient-based optimization algorithm based on the stochastic objective function to adaptively estimate the lower-order moments (Kingma and Ba, 2015), where the step size is 0.0001, and the exponential decay rates for the first moment and the second moment estimates are 0.9 and 0.999, respectively. Finally, the TreeVes-Net only computes FFR values along the centerline of coronary artery, because it aims to compute the FFR values which are comparable to invasive pressure wire-based FFR in clinic. The training of TreeVes-Net is performed on NVIDIA GPU M40 with Pytorch v0.2. The pseudo code of the training process is shown in Algorithm 1. 15

Journal Pre-proof

Jo

urn a

lP

re-

pro of

Algorithm 1: Training process of the proposed framework Input: Geometry structure of coronary tree T and the ground truth Output: Network parameter Θ after training 1 Initialize Θ; 2 for n = 1 to N do // Traverse N coronary trees in the dataset 3 begin Downstream Direction 4 begin Forward Propagation 5 for m = 1 to M do // Traverse M centerline points in the coronary tree 6 Extract the feature vector xm from Tn at the centerline point m; 7 Get the high-level representation zm of xm by the fully-connected neural network; 8 if The point m is not in the coronary fork then 9 Compute the output of the LSTM unit by Equation (1) and (3); 10 end 11 else if The point m is in the coronary fork then 12 Compute the output of the LSTM unit by Equation (2) and (4); 13 end 14 end 15 Compute the loss function J in Equation (9); 16 end 17 begin Backward Propagation 18 Use Adam optimizer to update Θ based on the loss function J. 19 end 20 end 21 begin Upstream Direction 22 Same with the Downstream Direction except traversing the centerline point from M to 1. 23 end 24 end

16

Journal Pre-proof

bias 0.0502 0.0483 0.0498 0.0431 0.0318 0.0446 0.0456 -0.0019

LOA [-0.1423, 0.2427] [-0.1056, 0.2022] [-0.1150, 0.2146] [-0.1495, 0.2357] [-0.1404, 0.2041] [-0.0996, 0.1888] [-0.0980, 0.1893] [-0.1205, 0.1166]

pro of

Bland-Altman analysis SOM RF SVM FC LSTM TreeVes-D TreeVes-U TreeVes-Net

re-

Table 1: The results of the Bland-Altman analysis between our FFR computation framework (TreeVes-Net) and the ground truth. We also evaluate the agreement on other machine learning based methods, including SOM, RF, SVM, FC, LSTM, TreeVes-D, and TreeVes-U. The definition of the bias and LOA are formulated in Equation (10), where LOA is the 95% confidence interval of the difference between the FFR values computed by our framework and the ground truth.

lP

Evaluation Metrics. We measure the algorithm accuracy in terms of Bland-Altman analysis (Bland and Altman, 1986). Two indices in BlandAltman analysis are used: the difference (bias) and its limits of agreement (LOA). They are formulated as bias = mean(FFRdiff ) LOA = [bias − 1.96 × std(FFRdiff ), bias + 1.96 × std(FFRdiff )]

(10)

Jo

urn a

where FFRdiff is the difference between the FFR values computed by our framework and the ground truth, the bias is the mean value of FFRdiff , and LOA is the 95% confidence interval of FFRdiff . Besides, there is a golden standard in clinical applications that the subject is myocardial ischemia positive when the FFR value is less than a cutoff otherwise negative. The cut-off FFR value is either 0.75 or 0.8 (Berry et al., 2015; Pijls and Sels, 2012). We use both values in our experiments. We take the specificity and sensitivity as the measurement indices of the diagnosis performance of myocardial ischemia: TN TP (11) , specificity = TP+FN FP + TN where TP is the number of subjects with myocardial ischemia correctly identified, TN is the number of subjects without myocardial ischemia correctly sensitivity =

17

Journal Pre-proof

pro of

identified, FP is the number of subjects without myocardial ischemia incorrectly identified, and FN is the number of subjects with myocardial ischemia incorrectly identified. According to specificity and sensitivity, the area under the ROC curve (AUC) can be computed for evaluating the performance of our framework in myocardial ischemia diagnosis.

Jo

urn a

lP

re-

Comparative Methods. We compare the TreeVes-Net with seven other machine learning models for FFR computation: random forest (RF), selforganizing map (SOM), support vector machine (SVM), fully-connected neural network (FC), LSTM, TreeVes-D, and TreeVes-U. In RF, SOM, SVM and FC, the FFR value at each centerline point is predicted independently from others centerline points. The Bayesian optimization is used to select the hyper-parameters in the comparative methods (Shahriari et al., 2016). It is a sequential model-based approach to update the hyper-parameters of the learning model through Bayesian posterior estimation based on data observation and the prior prescribed on the potential objective functions. RF is an ensemble learning method combining multiple decision trees (Murphy, 2012). Its parameters include the number of tree, the impurity function, and the decision tree algorithm. Besides, the number of features for the best split is the square root of the number of the input. SVR is a supervised machine learning algorithm to find the maximum margin separating hyperplane, i.e. the distance between the positive and negative hyperplanes (Scholkopf and Smola, 2001). Its parameters include the kernel function, the penalty parameters of the error term, and the kernel coefficient. SOM is the neural network that uses competitive learning to construct the feature space which preserves the topological properties of the input space (Kohonen, 1984). In the experiments, we construct SOM by a twolayer network: a feature representation layer and a linear regression layer. The feature representation layer is one-dimensional since the network input is the a feature vector with the coronary structure information. Then, a linear regression layer is added behind this feature representation layer as the regressor for the FFR computation. During the training procedure, the network weights are randomly initiated. The parameters of SOM include the learning rate, the neighborhood radius, and the number of nodes. FC is the multi-layer perceptron neural network proposed in (Itu et al., 2016). Its parameters include the number of hidden layers and the number

18

Journal Pre-proof

Hospital 2 (0.75) (0.8) 0.75 0.86 0.88 0.82 0.90 0.91

Hospital 3 (0.75) (0.8) 0.78 0.87 0.87 0.85 0.89 0.92

Hospital 4 (0.75) (0.8) 0.80 0.90 0.92 0.86 0.93 0.94

pro of

specificity sensitivity AUC

Hospital 1 (0.75) (0.8) 0.76 0.85 0.86 0.84 0.91 0.91

Total (0.75) (0.8) 0.77 0.89 0.91 0.84 0.92 0.93

Table 2: The specificity, sensitivity and AUC of our framework (TreeVes-Net) tested on the subjects from four public general hospitals. The digits 0.75 and 0.8 in parentheses are two kind of cut-off FFR values in clinic.

lP

re-

of nodes in each layers. LSTM is the single-path RNN, where the entire coronary tree are split into multiple vessels, each of which corresponds to a path from the coronary root to the end of a vessel branch. TreeVes-D is the tree-structured architecture with the unidirectional connection from the proximal end of the coronary to the distal end. TreeVes-U is the tree-structured architecture with the unidirectional connection from the distal end of the coronary to the proximal end. The model parameters of LSTM, TreeVes-D and TreeVes-U are same with the proposed framework except the network architecture.

urn a

4.2 Performance of coronary lumen segmentation On all in vivo CCTA images the from the 180 subjects, an experienced expert segmented the coronary artery tree with a semi-automatic software. The segmentation is then reviewed by two other independent experts. After reaching consensus among the three experts, the segmentation mask serves as the reference result. The proposed method of coronary artery segmentation is then compared against this human reference. The average unsigned surface distance between our framework and the manual delineation is 0.47 mm. These results show that the accuracy of coronary lumen segmentation is at the state-of-the-art level (Kiri¸sli et al., 2013). Figure 4 shows two samples results of coronary artery segmentation by our framework.

Jo

4.3 Performance of FFR computation Accuracy. Figure 2(e) shows the representative results of FFR inference using our TreeVes-Net. Table 1 shows the results of Bland-Altman analysis between our framework and the invasive FFR measurement method (ground truth). The small 19

Journal Pre-proof

SVM FC LSTM TreeVes-D TreeVes-U TreeVes-Net

sensitivity 0.77 0.70 0.78 0.71 0.75 0.70 0.78 0.72 0.93 0.85 0.92 0.94 0.92 0.93 0.91 0.84

AUC 0.82 0.80 0.85 0.82 0.83 0.80 0.83 0.84 0.85 0.86 0.86 0.88 0.86 0.90 0.92 0.93

pro of

RF

specificity 0.68 0.72 0.69 0.75 0.67 0.73 0.71 0.78 0.66 0.75 0.60 0.74 0.60 0.74 0.77 0.89

re-

SOM

cut-off (0.75) (0.80) (0.75) (0.80) (0.75) (0.80) (0.75) (0.80) (0.75) (0.80) (0.75) (0.80) (0.75) (0.80) (0.75) (0.80)

lP

Table 3: Comparative results of the specificity, sensitivity and AUC between our framework (TreeVes-Net) and other machine learning based methods. The digits 0.75 and 0.8 in parentheses are two kind of cut-off FFR values in clinic.

urn a

mean bias (-0.0019) and the narrow corresponding 95% confidence ([-0.1205, 0.1166]) can indicate that our framework is well agreed with this clinicalavailable technology. Table 2 shows the performance of our framework in myocardial ischemia diagnosis based on the FFR Results. Our framework has high values of specificity, sensitivity and AUC validated on four hospitals when the cut-off value is either 0.75 or 0.8 (two different clinical criterions). These results can demonstrate the effectiveness of our framework on the myocardial ischemia diagnosis.

Jo

Comparison with the state-of-the-art methods. Table 1 shows the comparison between our framework and these models with respect to the Bland-Altman analysis. The results show that our framework has smaller mean bias. This implies that the agreement of our framework with the ground truth. Table 3 displays that our framework has better ability to diagnose the 20

1

0.8

0.8

0.6

SOM (AUC=0.82) RF (AUC=0.85) SVM (AUC=0.83) FC (AUC=0.83) LSTM (AUC=0.85) TreeVes-D (AUC=0.86) TreeVes-U (AUC=0.86) TreeVes-Net (AUC=0.92)

0.4

0.2

0 1.0

0.8

0.6

0.4

0.2

pro of

1

sensitivity

sensitivity

Journal Pre-proof

0.6

SOM (AUC=0.80) RF (AUC=0.82) SVM (AUC=0.80) FC (AUC=0.84) LSTM (AUC=0.86) TreeVes-D (AUC=0.88) TreeVes-U (AUC=0.90) TreeVes-Net (AUC=0.93)

0.4

0.2

0

0

1.0

specificity

0.8

0.6

0.4

0.2

0

specificity

re-

Figure 5: The ROC curves of our TreeVes-Net (red) and other network architectures, including SOM (cyan), RF (brown), SVM (purple), FC (orange), LSTM (blue), TreeVesD (magenta), TreeVes-U (green). The left and right figures correspond to the FFR cutoff values of 0.75 and 0.80, respectively.

urn a

lP

myocardial ischemia based on the FFR value. This can show the potential clinical value of our framework. Moreover, we compare the ROC curves of these comparative methods in Figure 5. This figure shows the highest AUC value of our framework with respect to the other methods. These results demonstrate the effectiveness of the key components in our model: the LSTM-type RNN units, the bidirectional information propagation, and the tree-structured unit connection.

Jo

Ablation study. In addition, we perform the ablation study to show the performance of our framework in different network configurations (i.e. the depth and width of the encoder, and the dropout rate). First, we change the depth of the encoder from four layers to two, three, five, and six layers. This leads the change of AUC values from 0.92 (0.93) to 0.85 (0.85), 0.87 (0.86), 0.90 (0.91), 0.89 (0.89) when the cut-off FFR value is 0.75 (0.80). Then, we double and halve the width of the encoder. This causes the AUC value changing to 0.91 (0.91) and 0.90 (0.92). Finally, we change the dropout rate from 0.5 to 0.3, 0.4, 0.6, 0.7. This leads to the AUC value changing to 0.84 (0.85), 0.88 (0.88), 0.90 (0.89), 0.90 (0.90). These results demonstrate that the current configuration of the proposed network is suitable for the FFR computation.

21

Journal Pre-proof

5. Discussion

pro of

In this section, we discuss the reason that our framework is effective in the fluid property inference, as well as the contributions to the machine learning and computer vision community and the clinical community.

urn a

lP

re-

5.1 Analysis of our framework Effectiveness. The effectiveness of our framework is contributed to our learning strategy that desires to explore the physical law of fluid flow. We propose three goals in the model design to assure our model can learn from physical laws of fluid flow. First, our learning model should consider the boundary limitation of fluid flow. To achieve this goal, we design our network as a tree-structured architecture for imitating the geometric morphology of entire coronary artery, which considers the main branches of coronary artery. Second, the interaction between the upstream and downstream fluid flow should be considered. A RNN network with LSTM units can help to tackle the problem of spatial long-term dependency and thus can build this relationship. Finally, our learning model should utilize the explicit information from the desired physical law (i.e. Navier-Stokes equations). For this goal, we apply the hemodynamic simulation to produce the training data labels (i.e. FFR values). This scheme assures that labels obey the NavierStokes equations, and make our model gradually learn the inner physical mechanism of fluid flow in the training procedure. Accordingly, the learning model design based on the above three considerations brings in the ability of our model to explore the fluid physical mechanism.

Jo

Comparison with other methods. Our framework (TreeVes-Net) is superior to seven other kinds machine learning methods (SOM, RF, SVM, FC, LSTM, TreeVes-U and TreeVes-D) for the following reasons. In contrast with RF, SVM, SOM and FC, our framework targets a significantly different learning problem. These studies predict FFR values at different locations within the same coronary artery independently, while our framework considers that FFR values in different locations have dependency. Therefore, it is hard to acknowledge that these four learning methods are inspired by the Navier-Stokes equations. Moreover, to enforce the blood dynamics constraint along the coronary centerline, we construct a tree-like structure, and apply the bi-directional RNN to it for learning the hidden structural (soft) dependencies. These soft long and short dependencies are 22

Journal Pre-proof

re-

pro of

learnt directly from data which could avoid the bias introduced by feature engineering using a hard context dependency in these learning methods. In regards to LSTM, our tree-structured architecture can help to avoid an estimation error in its single-branch model. The tree structure inherently builds the distribution of the flow influence from upstream artery to all downstream children branches. In contrast, the single-branch model of coronary artery in LSTM will lead to the inconsistent results of FFR computation as the different downstream morphology of coronary artery pass the different influence to the flow within the same upstream artery (e.g. left main artery in coronary tree). Comparing with TreeVes-U and TreeVes-D, our bi-directional information propagation mechanism in the tree-structured RNN is more reasonable. This is because both the morphology change of the upstream and downstream coronary arteries will affect the blood flow dynamics, but the directions of their influence propagation is opposite. The lower performance of TreeVes-U and TreeVes-D can prove the effectiveness of the bi-directional architecture in our network.

Jo

urn a

lP

5.2 Contributions to machine learning and computer vision community We perform a pilot study to infer complex physical properties from static visual scenes using machine learning. While there are a few literatures in other fields (such as meteorology Cuzol and M´emin (2009) and particle image velocimetry Qian et al. (2017)) that investigated inferring physical property of fluid using machine learning, this research direction has not been well established in the field of perception-based computer vision. As one of the embodiments in this research direction, we propose a novel algorithm to infer blood information property from medical images. Our work is inherently a generalized machine learning problem applied in a computer vision scenario: the target physical property (blood pressure difference, i.e. FFR) is constrained by a complex but not intuitive physical law (Navier-Stokes equations), and cannot be visually quantified directly from the visual scenes (CT angiography images). This scenario is significantly more complex than in most previous studies on machine learning and computer vision (Karimpouli and Tahmasebi, 2019; Grabec, 2013; Gao et al., 2014, 2018b; Su et al., 2017; Zhang et al., 2018a,b,c; Mammone et al., 2018, 2019; Ieracitano et al., 2019; Yu et al., 2019; Tang et al., 2019; Nie et al., 2019; Muhammad et al., 2016, 2018a,b). In addition, to the best of our knowledge, our work is the first effort to derive a highly abstract physical property from images using 23

Journal Pre-proof

pro of

a graphical representation solved by a bi-directional tree structure recurrent neural network. Our work demonstrates a promising potential to handle a valuable but unsolved problem in the field of computer vision. Note that the tree-structured RNN was mainly used in natural language processing, but it has not yet been well studied in the field of computer vision. Our pilot study shows the feasibility to apply the large bi-directional tree-structured RNN to solve graphical representation problems for large range context reasoning.

Jo

urn a

lP

re-

5.3 Contributions to clinical community We have taken a step forward in the development of noninvasive FFR measurement technology. The current clinical FFR measurement procedure is still invasive, with operational risks (such as sudden heart attack) to PCI and is prohibitive to populations allergic to the vasodilator for coronary expansion. Thus the noninvasive FFR measurement is required. However, existing non-invasive FFR measurement technologies (based on either CT or X-ray angiography imaging such as Lu et al. (2017) and Tu et al. (2014)) cannot give the results of FFR computation in real time (4∼6 hours). This implies that these technologies only plays a role in the preoperative and postoperative examination. In contrast, the low computational cost of our framework (<10 seconds) makes the non-invasive FFR measurement feasible to be intraoperatively performed. Because of the real-time characteristic and the higher accuracy, our framework is able to assist the clinician’s decision during PCI surgery, and thus brings in more clinical value. Our framework facilitates the popularity of FFR measurement. The high medical expense is one of the main resistances to promote this medical technology, where the invasive method needs to consume the disposable catheter and pressure sensors and the previous non-invasive methods requires the high-performance workstation or server. Moreover, the lack of experienced clinicians will increase the surgical risk and reduce the accuracy of FFR measurement. Thus, these disadvantages hinder the promotion of FFR measurement to non-specialist hospitals. On the contrary, the requirement of CT machine and home computer in our framework will potentially make FFR measurement more feasible to a larger number of general hospitals and clinics in order to benefit more people.

24

Journal Pre-proof

6. Conclusion

Acknowledgement

lP

re-

pro of

In this study, we explore the inference of complex physical knowledge from visual scenes, specifically, making the machine perceive FFR values from static CT angiography images. As an important blood dynamics information for clinical diagnosis of myocardial ischemia, FFR reflects the physical property of fluid field that is constrained by Navier-Stokes equations. This medical index is difficult to be directly inferred from visual observations, i.e. coronary CT angiography. We propose a recurrent neural network based system (an intelligent machine) for solving the FFR inference problem. The system extracts the high-level feature representation from coronary vessel trees, and learns the long-distance spatial dependencies among fluid information along the coronary vessel. The proposed tree-structured RNN, which is inspired by the coronary morphology, can avoid the difference of estimated FFR values in the upstream coronary branches when using singlepath RNN architecture. In addition, to address data scarcity problems in the medical field, we propose an efficient strategy to generate sufficient training labels using the physical-based numerical simulation. Experimental results can demonstrate the effectiveness of our framework in the FFR computation. It is also worth mentioning that our framework can be easily transferred to other applications for inferring useful physical properties.

Jo

urn a

This study was supported by the Brazilian National Council for Research and Development (CNPq, Grant #304315/2017-6 and #430274/20181), Science and Technology Planning Project of Guangdong Province, China (2018A050506031, 2019B010110001), Guangdong Provincial Key Laboratory of Sensor Technology and Biomedical Instrument, Guangdong Natural Science Funds for Distinguished Young Scholar (2019B151502031), National Natural Science Foundation of China (61771464, U1801265), Shenzhen Overseas High Level Talent (Peacock Plan) Project (KQTD2016112809330877), the Project of Shenzhen International Cooperation Foundation (GJHZ201809 26165402083), the Science Technology and Innovation Committee of Shenzhen for Research Projects (JCYJ20170413114 916687 and SGLH2016121210 4605195), and the Fundamental Research Funds for the Central Universities.

25

Journal Pre-proof

References

pro of

Bathe, K.J., 2006. Finite Element Procedures. Prentice Hall. Battaglia, P.W., Hamrick, J.B., Tenenbaum, J.B., 2013. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences of the United States of America 110, 18327–18332. Bell, S., Upchurch, P., Snavely, N., Bala, K., 2015. Material recognition in the wild with the materials in context database, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3479–3487.

re-

Berry, C., Corcoran, D., Hennigan, B., Watkins, S., Layland, J., Oldroyd, K.G., 2015. Fractional flow reserve-guided management in stable coronary disease and acute myocardial infarction: recent developments. European Heart Journal 36, 3155–3164. Bland, J.M., Altman, D.G., 1986. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 327, 307–310.

lP

Bouman, K.L., Xiao, B., Battaglia, P., Freeman, W.T., 2013. Estimating the material properties of fabric from video, in: IEEE International Conference on Computer Vision (ICCV), pp. 1984–1991.

urn a

Cuzol, A., M´emin, E., 2009. A stochastic filtering technique for fluid flow velocity fields tracking. IEEE Transcations on Pattern Recognition and Machine Intelligence 31, 1278–1293. Douglas, P.S., Bruyne, B.D., Pontone, G., Patel, M.R., Norgaard, B.L., Byrne, R.A., Curzen, N., Purcell, I., Gutberlet, M., Rioufol, G., Hink, U., Schuchlenz, H.W., Feuchtner, G., Gilard, M., Andreini, D., Jensen, J.M., Hadamitzky, M., Chiswell, K., Cyr, D., Wilk, A., Wang, F., Rogers, C., Hlatky, M.A., 2016. 1-year outcomes of FFRCT -guided care in patients with suspected coronary disease: the PLATFORM study. Journal of the American College of Cardiology 68, 435–445.

Jo

Ehrhardt, S., Monszpart, A., J.Mitra, N., Vedaldia, A., 2019. Taking visual motion prediction to new heightfields. Computer Vision and Image Understanding (Online Available) .

26

Journal Pre-proof

pro of

Fragkiadaki, K., Agrawal, P., Levine, S., Malik, J., 2016. Learning visual predictive models of physics for playing billiards, in: International Conference on Learning Representations (ICLR). Gao, Z., Guo, W., Liu, X., Huang, W., Zhang, H., Tan, N., Hau, W.K., Zhang, Y.T., Liu, H., 2014. Automated detection framework of the calcified plaque with acoustic shadowing in IVUS images. PLOS One 9, e109997. Gao, Z., Hau, W.K., Lu, M., Huang, W., Zhang, H., Li, C., Liu, X., 2015. An automated framework for detecting lumen and media-adventitia borders in intravascular ultrasound images. Ultrasound in Medicine & Biology 41, 2001–2021.

re-

Gao, Z., Li, Y., Sun, Y., Yang, J., Xiong, H., Zhang, H., Liu, X., Wu, W., Liang, D., Li, S., 2018a. Motion tracking of the carotid artery wall from ultrasound image sequences: A nonlinear state-space approach. IEEE Transcations on Medical Imaging 37, 273–283.

lP

Gao, Z., Liu, X., Qi, S., Wu, W., Hau, W.K., Zhang, H., 2017a. Automatic segmentation of coronary tree in CT angiography images. International Journal of Adaptive Control and Signal Processing 31, 1–9. Gao, Z., Xiong, H., Liu, X., Zhang, H., Ghista, D., Wu, W., Li, S., 2017b. Robust estimation of carotid artery wall motion using the elasticity-based state-space approach. Medical Image Analysis 37, 1–21.

urn a

Gao, Z., Zhang, H., Wang, D., Guo, M., Liu, H., Zhuang, L., Shi, P., 2018b. Robust recovery of myocardial kinematics using dual H∞ criteria. Multimedia Tools and Applications 77, 23043–23071. Grabec, I., 2013. Autonomous learning derived from experimental modeling of physical laws. Neural Networks 41, 51–58.

Jo

Greff, K., Srivastava, R.K., Koutn´ık, J., Steunebrink, B.R., Schmidhuber, J., 2017. LSTM: a search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 2222–2232. Gupta, A., Efros, A.A., Hebert, M., 2010. Blocks world revisited: image understanding using qualitative geometry and mechanics, in: European Conference on Computer Vision (ECCV), pp. 482–496. 27

Journal Pre-proof

pro of

Gupta, A., Hebert, M., Kanade, T., Blei, D.M., . Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces, in: Annual Conference on Neural Information Processing Systems (NIPS). Gupta, A., Satkin, S., Efros, A.A., Hebert, M., 2011. From 3D scene geometry to human workspace, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1961–1968. Ho, H., Mithraratne, K., Hunter, P., 2013. Numerical simulation of blood flow in an anatomically-accurate cerebra. IEEE Transactions on Medical Imaging 32, 85–91.

re-

van de Hoef, T.P., Meuwissen, M., Escaned, J., Davies, J.E., Siebes, M., Spaan, J.A.E., Piek, J.J., 2013. Fractional flow reserve as a surrogate for inducible myocardial ischaemia. Nature Reviews 10, 439–452.

lP

Hwang, S.J., Adluru, N., Collins, M.D., Ravi, S.N., Bendlin, B.B., Johnson, S.C., Singh, V., 2016. Coupled harmonic bases for longitudinal characterization of brain networks, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2517–2525. Ieracitano, C., Mammone, N., Bramanti, A., Hussain, A., Morabito, F.C., 2019. A convolutional neural network approach for classification of dementia stages based on 2d-spectral representation of EEG recordings. Neurocomputing 323, 96–107.

urn a

Itu, L., Rapaka, S., Passerini, T., Georgescu, B., Schwemmer, C., Schoebinger, M., Flohr, T., Sharma, P., Comaniciu, D., 2016. A machinelearning approach for computation of fractional flow reserve from coronary computed tomography. Journal of Applied Physiology 121, 42–52. Jia, Z., Gallagher, A., Saxena, A., Chen, T., 2013. 3D-based reasoning with blocks, support, and stability, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8.

Jo

Jia, Z., Gallagher, A.C., Saxena, A., Chen, T., 2015. 3D reasoning from blocks to stability. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 905–918. Karimpouli, S., Tahmasebi, P., 2019. Image-based velocity estimation of rock using convolutional neural networks. Neural Networks 111, 89–97. 28

Journal Pre-proof

Kingma, D.P., Ba, J.L., 2015. Adam: A method for stochastic optimization, in: International Conference on Learning Representations (ICLR).

re-

pro of

Kiri¸sli, H.A., Schaap, M., Metz, C., Dharampal, A.S., Meijboom, W.B., Papadopoulou, S.L., Dedic, A., Nieman, K., de Graaf, M.A., Meijs, M.F.L., Cramer, M.J., Broersen, A., Cetin, S., Eslami, A., Fl´orez-Valencia, L., ¨ uz, I., Shahzad, R., Lor, K.L., Matuszewski, B., Melki, I., Mohr, B., Oks¨ Wang, C., Kitslaar, P.H., Unal, G., Katouzian, A., Orkisz, M., Chen, C., ¨ Precioso, F., Najman, L., Masood, S., Unay, D., van Vliet, L., Moreno, R., Goldenberg, R., Vu¸cini, E., Krestin, G.P., Niessen, W.J., van Walsum, T., 2013. Standardized evaluation framework for evaluating coronary artery stenosis detection, stenosis quantification and lumen segmentation algorithms in computed tomography angiography. Medical Image Analysis 17, 859–876. Kohonen, T., 1984. Self-Organization and Associative Memory. Springer.

lP

Kyriazis, N., Oikonomidis, I., Argyros, A., 2011. Binding vision to physics based simulation: the case study of a bouncing ball, in: British Machine Vision Conference (BMVC), pp. 43.1–43.11. Lerer, A., Gross, S., Fergus, R., 2016. Learning physical intuition of block towers by example, in: International Conference on Machine Learning (ICML), pp. 430–438.

urn a

Levine, G.N., Bates, E.R., Blankenship, J.C., et al., 2016. 2015 ACC/AHA/SCAI focused update on primary percutaneous coronary intervention for patients with ST-elevation myocardial infarction: An update of the 2011 ACCF/AHA/SCAI guideline for percutaneous coronary intervention and the 2013 ACCF/AHA guideline for the management of ST-elevation myocardial infarction. Circulation 133, 1135–1147.

Jo

Li, Q., Gao, Z., Wang, Q., Xia, J., Zhang, H., Zhang, H., Liu, H., Li, S., 2018. Glioma segmentation with a unified algorithm in multimodal MRI images. IEEE Access 6, 9543–9553. Lin, H.H., Peng, S.L., Wu, J., Shih, T.Y., Chuang, K.S., Shih, C.T., 2017. A novel two-compartment model for calculating bone volume fractions and bone mineral densities from computed tomography images. IEEE Transactions on Medical Imaging 36, 1094–1105. 29

Journal Pre-proof

pro of

Liu, X., Wang, Y., Zhang, H., Yin, Y., Cao, K., Gao, Z., Liu, H., Hau, W.K., Gao, L., Chen, Y., Cao, F., Huang, W., 2019. Evaluation of fractional flow reserve in patients with stable angina: can CT compete with angiography? European Radiology . Liu, X., Zhang, H., Ren, L., Xiong, H., Gao, Z., Xu, P., Huang, W., Wu, W., 2016. Functional assessment of the stenotic carotid artery by CFD-based pressure gradient evaluation. American Journal of Physiology. Heart and Circulatory Physiology 311, H645–H653.

re-

Lu, M.T., Ferencik, M., Roberts, R.S., Lee, K.L., Ivanov, A., Adami, E., Mark, D.B., Jaffer, F.A., Leipsic, J.A., Douglas, P.S., Hoffmann, U., 2017. Noninvasive FFR derived from coronary CT angiography: management and outcomes in the PROMISE trial. JACC: Cardiovascular Imaging .

lP

Mammone, N., Ieracitano, C., Adeli, H., Bramanti, A., Morabito, F.C., 2018. Permutation jaccard distance-based hierarchical clustering to estimate EEG network density modifications in MCI subjects. IEEE Transactions on Neural Networks and Learning Systems 29, 5122–5135. Mammone, N., Salvo, S.D., Bonanno, L., Ieracitano, C., Marino, S., Marra, A., Bramanti, A., Morabito, F.C., 2019. Brain network analysis of compressive sensed high-density EEG signals in AD and MCI subjects. IEEE Transactions on Industrial Informatics 15, 527–536.

urn a

McIntyre, J., Zago, M., Berthoz, A., Lacquaniti, F., 2001. Does the brain model newton’s laws? Nature Neuroscience 4, 693–694. Miyoshi, T., Osawa, K., Ito, H., Kanazawa, S., Kimura, T., Shiomi, H., Kuribayashi, S., Jinzaki, M., Kawamura, A., Bezerra, H., Achenbach, S., Nørgaard, B.L., 2015. Non-invasive computed fractional flow reserve from computed tomography (CT) for diagnosing coronary artery disease japanese results from NXT trial (analysis of coronary blood flow using CT angiography: Next steps). Circulation Journal 79, 406–412.

Jo

Mottaghi, R., Bagherinezhad, H., Rastegari, M., Farhadi, A., 2016. Newtonian image understanding: Unfolding the dynamics of objects in static images, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3521–3529.

30

Journal Pre-proof

pro of

Muhammad, K., Hamza, R., Ahmad, J., Lloret, J., Wang, H., Baik, S.W., 2018a. Secure surveillance framework for IoT systems using probabilistic image encryption. IEEE Transactions on Industrial Informatics 14, 3679– 3689. Muhammad, K., Hussain, T., Baik, S.W., 2018b. Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recognition Letters . Muhammad, K., Sajjad, M., Mehmood, I., Rho, S., Baik, S.W., 2016. A novel magic LSB substitution method (M-LSB-SM) using multi-level encryption and achromatic component of an image. Multimedia Tools and Applications 75, 14867–14893.

re-

Murphy, K.P., 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.

lP

Nie, D., Wang, L., Adeli, E., Lao, C., Lin, W., Shen, D., 2019. 3-D fully convolutional networks for multimodal isointense infant brain image segmentation. IEEE Transactions on Cybernetics 49, 1123–1136. Nusseck, M., Fleming, R., Lagarde, J., Bardy, B., B¨ ulthoff, H.H., 2007. Perception and prediction of simple object interactions, in: 4th Symposium on Applied Perception in Graphics and Visualization (APGV), pp. 27–34.

urn a

Pijls, N.H.J., Gelder, B.V., Voort, P.V.D., Peels, K., Bracke, F.A.L.E., Bonnier, H.J.R.M., Gamal, M.I.H.E., 1995. Fractional flow reserve a useful index to evaluate the influence of an epicardial coronary stenosis on myocardial blood flow. Circulation 92, 3183–3193. Pijls, N.H.J., Sels, J.E.M., 2012. Functional measurement of coronary stenosis. Journal of the American College of Cardiology 59, 1045–1057.

Jo

Qian, Y., Gong, M., Yang, Y.H., 2017. Stereo-based 3D reconstruction of dynamic fluid surfaces by global optimization, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1269–1278. Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 234–241. 31

Journal Pre-proof

Scholkopf, B., Smola, A.J., 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.

pro of

Seike, F., Uetani, T., Nishimura, K., Kawakami, H., Haruhiko Higashi and, J.A., Nagai, T., Inoue, K., Suzuki, J., Kawakami, H., Okura, T., Yasuda, K., Higaki, J., Ikeda, S., 2017. Intracoronary optical coherence tomography-derived virtual fractional flow reserve for the assessment of coronary artery disease. The American Journal of Cardiology 120, 1772– 1779. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N., 2016. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE 104, 148–175.

re-

Shin, J.Y., Tajbakhsh, N., Hurst, R.T., Kendall, C.B., Liang, J., 2016. Automating carotid intima-media thickness video interpretation with convolutional neural networks, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2526–2535.

lP

Silberman, N., Hoiem, D., Kohli, P., Fergus, R., 2012. Indoor segmentation and support inference from RGBD images, in: European Conference on Computer Vision (ECCV), pp. 746–760.

urn a

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1929–1958. Stewart, R., Ermon, S., 2007. Label-free supervision of neural networks with physics and domain knowledge, in: AAAI Conference on Artificial Intelligence (AAAI), pp. 2576–2582. Su, S., Hu, Z., Lin, Q., Hau, W.K., Gao, Z., Zhang, H., 2017. An artificial neural network method for lumen and media-adventitia border detection in IVUS. Computerized Medical Imaging and Graphics 57, 29–39.

Jo

Subbanna, N., Precup, D., Arbel, T., 2014. Iterative multilevel MRF leveraging context and voxel information for brain tumour segmentation in MRI, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 400–405.

32

Journal Pre-proof

pro of

Tang, Z., Yap, P.T., Shen, D., 2019. A new multi-atlas registration framework for multimodal pathological images using conventional monomodal normal atlases. IEEE Transactions on Image Processing 28, 2293–2304. Toth, G.G., Johnson, N.P., Jeremias, A., Pellicano, M., Vranckx, P., Fearon, W.F., Barbato, E., Kern, M.J., Pijls, N.H.J., Bruyne, B.D., 2016. Standardization of fractional flow reserve measurements. Journal of the American College of Cardiology 68, 742–753. Tseng, K.L., Lin, Y.L., Hsu, W., Huang, C.Y., 2017. Joint sequence learning and cross-modality convolution for 3D biomedical segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6393–6400.

lP

re-

Tu, S., Barbato, E., K¨oszegi, Z., Yang, J., Sun, Z., Holm, N.R., Tar, B., Li, Y., Rusinaru, D., Wijns, W., Reiber, J.H., 2014. Fractional flow reserve calculation from 3-dimensional quantitative coronary angiography and timi frame count: a fast computer model to quantify the functional significance of moderately obstructed coronary arteries. JACC: Cardiovascular Interventions 7, 768–777. Ullman, T., Stuhlm¨ ulle, A., Goodman, N., Tenenbaum, J., 2014. Learning physical theories from dynamical scenes, in: Annual Meeting of the Cognitive Science Society (CogSci), pp. 1640–1645.

urn a

Westerhof, N., Lankhaar, J.W., Westerhof, B.E., 2009. The arterial Windkessel. Medical & Biological Engineering & Computing 47, 131–141. Wu, J., Lim, J., Zhang, H., Tenenbaum, J., Freeman, W., 2016. Physics 101: learning physical object properties from unlabeled videos, in: British Machine Vision Conference (BMVC), pp. 39.1–39.12.

Jo

Wu, J., Yildirim, I., Lim, J.J., Freeman, B., Tenenbaum, J., 2015. Galileo: perceiving physical object properties by integrating a physics engine with deep learning, in: Annual Conference on Neural Information Processing Systems (NIPS), pp. 1–9. Xiao, N., Alastruey, J., Figueroa, C.A., 2014. A systematic comparison between 1-D and 3-D hemodynamics in compliant arterial models. International Journal for Numerical Methods in Biomedical Engineering 30, 204–231. 33

Journal Pre-proof

pro of

Xu, C., Xu, L., Gao, Z., Zhao, S., Zhang, H., Zhang, Y., Du, X., Zhao, S., Ghista, D., Liu, H., Li, S., 2018. Direct delineation of myocardial infarction without contrast agents using a joint motion feature learning architecture. Medical Image Analysis 50, 82–94. Yu, L.F., Duncan, N., Yeung, S.K., 2015. Fill and transfer: A simple physicsbased approach for containability reasoning, in: IEEE International Conference on Computer Vision (ICCV), pp. 711–719. Yu, R., Qiao, L., Chen, M., Lee, S.W., Fei, X., Shen, D., 2019. Weighted graph regularized sparse brain network construction for MCI identification. Pattern Recognition 90, 220–231.

re-

Zafar, H., Sharif, F., Leahy, M.J., 2014. Feasibility of intracoronary frequency domain optical coherence tomography derived fractional flow reserve for the assessment of coronary artery stenosis. International Heart Journal 55, 307–311.

lP

Zago, M., McIntyre, J., Senot, P., Lacquaniti, F., 2009. Visuo-motor coordination and internal models for object interception. Nature Neuroscience 192, 571–604.

urn a

Zhang, J.M., Zhong, L., Su, B., Wan, M., Yap, J.S., Tham, J.P.L., Chua, L.P., Ghista, D.N., Tan, R.S., 2014. Perspective on CFD studies of coronary artery disease lesions and hemodynamics: a review. International Journal for Numerical Methods in Biomedical Engineering 30, 659–680. Zhang, R., Wu, J., Zhang, C., Freeman, W.T., Tenenbaum, J.B., 2016. A comparative evaluation of approximate probabilistic simulation and deep neural networks as accounts of human physical scene understanding, in: Annual Meeting of the Cognitive Science Society (CogSci), pp. 1781–1786.

Jo

Zhang, Y.D., Hou, X.X., Chen, Y., Chen, H., Yang, M., Yang, J., Wang, S.H., 2018a. Voxelwise detection of cerebral microbleed in CADASIL patients by leaky rectified linear unit and early stopping. Multimedia Tools and Applications 77, 21825–21845. Zhang, Y.D., Pan, C., Sun, J., Tang, C., 2018b. Multiple sclerosis identification by convolutional neural network with dropout and parametric relu. Journal of Computational Science 28, 1–10. 34

Journal Pre-proof

pro of

Zhang, Y.D., Zhang, Y., Hou, X.X., Chen, H., Wang, S.H., 2018c. Seven-layer deep neural network based on sparse autoencoder for voxelwise detection of cerebral microbleed. Multimedia Tools and Applications 77, 1–18. Zhang, Z., Xie, Y., Xing, F., McGough, M., Yang, L., 2017. MDNet: a semantically and visually interpretable medical image diagnosis network, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6428–6436. Zhao, S., Gao, Z., Zhang, H., Xie, Y., Luo, J., Ghista, D., Wei, Z., Bi, X., Xiong, H., Xu, C., Li, S., 2018. Robust segmentation of intima-media borders with different morphologies and dynamics during the cardiac cycle. IEEE Journal of Biomedical and Health Informatics 22, 1571–1582.

re-

Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C., 2013. Beyond point clouds: scene understanding by reasoning geometry and physics, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3127–3134.

lP

Zhou, L., Wang, L., Liu, L., Ogunbona, P., Shen, D., 2013. Discriminative brain effective connectivity analysis for Alzheimer’s disease: a kernel learning approach upon sparse gaussian bayesian network, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2243–2250.

urn a

Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M., Liang, J., 2017. Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7340–7351. Zhu, X., Suk, H.I., Shen, D., 2014. Matrix-similarity based loss function and feature selection for Alzheimer’s disease diagnosis, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3089–3096.

Jo

Zhu, X., Yao, J., Zhu, F., Huang, J., 2017. WSISA: making survival prediction from whole slide histopathological images, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7234–7242.

35

Journal Pre-proof

Conflict of Interest Statement

pro of

The authors declare that there is no conflict of interest.

re-

Sincerely yours,

Jo

urn a

lP

Prof. Heye Zhang (On behalf of all the authors)