Fully automated 2D–3D registration and verification

Medical Image Analysis 26 (2015) 108–119 Contents lists available at ScienceDirect Medical Image Analysis journal homepage: www.elsevier.com/locate/...

Download PDF

2MB Sizes 1 Downloads 115 Views

Report

PDF Reader
Full Text

Medical Image Analysis 26 (2015) 108–119

Contents lists available at ScienceDirect

Medical Image Analysis journal homepage: www.elsevier.com/locate/media

Fully automated 2D–3D registration and veriﬁcation Andreas Varnavas a,∗, Tom Carrell a,b, Graeme Penney a a Department of Biomedical Engineering, Division of Imaging Sciences and Biomedical Engineering, King’s College London, King’s Health Partners, St. Thomas’ Hospital, London, UK b Department of Vascular Surgery, Guy’s and St Thomas’ NHS Foundation Trust, King’s Health Partners, London, UK

a r t i c l e

i n f o

Article history: Received 22 September 2014 Revised 17 July 2015 Accepted 20 August 2015 Available online 2 September 2015 Keywords: 2D–3D Registration Registration veriﬁcation Hough transform

a b s t r a c t Clinical application of 2D–3D registration technology often requires a signiﬁcant amount of human interaction during initialisation and result veriﬁcation. This is one of the main barriers to more widespread clinical use of this technology. We propose novel techniques for automated initial pose estimation of the 3D data and veriﬁcation of the registration result, and show how these techniques can be combined to enable fully automated 2D–3D registration, particularly in the case of a vertebra based system. The initialisation method is based on preoperative computation of 2D templates over a wide range of 3D poses. These templates are used to apply the Generalised Hough Transform to the intraoperative 2D image and the sought 3D pose is selected with the combined use of the generated accumulator arrays and a Gradient Difference Similarity Measure. On the veriﬁcation side, two algorithms are proposed: one using normalised features based on the similarity value and the other based on the pose agreement between multiple vertebra based registrations. The proposed methods are employed here for CT to ﬂuoroscopy registration and are trained and tested with data from 31 clinical procedures with 417 low dose, i.e. low quality, high noise interventional ﬂuoroscopy images. When similarity value based veriﬁcation is used, the fully automated system achieves a 95.73% correct registration rate, whereas a no registration result is produced for the remaining 4.27% of cases (i.e. incorrect registration rate is 0%). The system also automatically detects input images outside its operating range. © 2015 Elsevier B.V. All rights reserved.

1. Introduction 2D–3D registration can provide critical guidance in a wide range of clinical procedures such as Image Guided Radiotherapy, Minimally Invasive Surgery, Interventional Radiology and Image Guided Endoscopy (Markelj et al. (2012)). However, as discussed by Chintalapani and Chinnadurai (2012): although many 2D–3D registration algorithms have been published, their use in actual clinical settings is very limited due to “their complexity, context sensitivity and user interactivity”. This paper addresses this last factor “user interactivity” and proposes methods to enable fully automatic registration. In a typical registration algorithm user interaction is required for: 1. initial pose estimation (before registration), and 2. result veriﬁcation (after registration) The aim of initial pose estimation is to provide the registration algorithm with a starting position within its capture range. The aim of result veriﬁcation is to ensure that only correct results are presented

∗

Corresponding author. E-mail address: [email protected] (A. Varnavas).

http://dx.doi.org/10.1016/j.media.2015.08.005 1361-8415/© 2015 Elsevier B.V. All rights reserved.

to the clinicians. For routine clinical use it is highly desirable for both these processes to require little, or preferably no, user interaction. Although a large number of 2D–3D registration algorithms have been proposed (Markelj et al. (2012)), very little attention has been paid to reducing their required level of user interaction. Concerning initial pose estimation, the typical referenced approach is use of manually selected anatomical landmarks (Hamadeh et al. (1998); Penney et al. (2011); Weese et al. (1997)). Gong et al. (2013) propose the use of a tool which is virtually inserted in the preoperative CT scan and is placed in a similar pose close to the patient intraoperatively. A similar concept is proposed by Varnavas et al. (2013) where a virtual ﬁducial marker, present in the intra-operative scene, is semi-automatically inserted in the preoperative CT. In both methods the tool or the marker needs to be identiﬁed manually in the 2D images by a user. Other methods propose the use of ﬁducial markers placed intraoperatively to relate between different images (Otake et al. (2012a)) or use of previous registrations as future initial estimates in applications where the C-arm is not expected to move much (Gendrin et al. (2012)). To avoid the need for manual interaction, a number of methods have been proposed which aim to expand the capture range of 2D– 3D registration so no information is needed by a user for initialisation. van der Bom et al. (2010) propose the use of a spectrum based

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

109

Fig. 2. Overview of the entire automated pipeline for vertebra based CT to ﬂuoroscopy registration.

Fig. 1. Example ﬂuoroscopy images from complex aortic aneurysm procedures. Note how similar vertebrae appear in (a) and how vertebral features can have low contrast (b) and high noise (c). Note also the regular presence of interventional instruments partially occluding the vertebrae.

similarity measure for this purpose and report a success rate of 68.6% for starting positions less than 21 mm. This however would be unacceptable for many clinical systems. Otake et al. (2012b) propose the use of a 2 Degrees of Freedom (DoF) search parallel to the ﬂuoroscopy plane, prior to a covariance matrix adaptation evolution optimiser, to achieve an 100 mm × 180 mm in-plane capture range in 4.7 s. These characteristics make the method practical for clinical use but only in cases when a good estimate for patient out-of-plane translation and C-arm orientation is available. In a later work, Otake et al. (2013) extend this method with the use of a full 6 DoF global multi-start strategy in order to improve robustness against initialisation errors and anatomical deformation. The extended method still requires rotations to be known within 10° and computation times vary between 6.3 s and 54 s depending on initialisation accuracy. Concerning result veriﬁcation, most reported work in the literature uses ﬁducial markers attached on phantoms or cadavers (Gendrin et al. (2011)). For clinical application, the use of accurate (i.e. bone implanted) ﬁducial markers purely for validation would not be ethically acceptable. Typically, with patient images, result veriﬁcation is achieved by visual inspection (Hamadeh et al. (1998); Livyatan et al. (2003); Penney et al. (2011)). However, this manual process requires expertise and affects clinical workﬂow. We propose novel algorithms to enable fully automated 2D–3D registration, without requiring specialised hardware (e.g. tracking devices or high end ﬂuoroscopy sets). The proposed registration pipeline relates to a vertebra based, CT to ﬂuoroscopy, rigid registration system. Initialisation is based on the use of the Generalised Hough Transform (GHT) (Ballard (1981)), which is applied on the 2D ﬂuoroscopy to perform a full 6 DoF global search for a number of considered vertebrae. A very wide capture (operating) range is employed (see Section 3) so no information about the translation or the orientation of the ﬂuoroscopy set is required. The produced pose estimate for each vertebra is used as input to a 2D–3D registration algorithm (Penney et al. (2011)). The resultant registrations are automatically veriﬁed using two algorithms: the ﬁrst uses similarity value features and is an adaptation of the algorithm proposed by Varnavas et al. (2013). The second uses geometric features and is based on the pose agreement between at least two vertebra based registrations. The proposed registration pipeline is evaluated using data from 31 complex aortic aneurysm procedures. The registrations were based on six (lower thoracic and lumbar) vertebrae. As seen in Fig. 1, vertebrae are highly similar objects and our aim is to robustly distinguish between them. This is a major difference between our method for initial pose estimation and the traditional use of Hough-like voting schemes in the literature, which typically aim to detect multiple objects of a generic type. Another major difference is the low feature, high noise nature of vertebrae in ﬂuoroscopy when compared

to man-made objects under normal lighting conditions (see Fig. 1(b) and (c)). To overcome these challenges we propose the use of the GHT with patient, vertebra and pose speciﬁc templates rendered from the CT scans of the patients. This is the ﬁrst time that such an approach is used for pose estimation and speciﬁc vertebra identiﬁcation in medical imaging. 2. Methods Fig. 2 provides an overview of the entire fully automated 2D–3D registration pipeline. The methods are presented in the context of a system that performs vertebra based, rigid registration between intraoperative ﬂuoroscopy and preoperative CT data. Boxes labelled A and B in Fig. 2 depict the two areas of algorithm development proposed in this paper: initial pose estimation (A) and result veriﬁcation (B). These are fully described in Sections 2.1 and 2.2 respectively. 2.1. Initial pose estimation The proposed initial pose estimation method is based upon the use of the GHT to perform a global search over a desired registration capture range. Our motivation for using the GHT as opposed to a more standard intensity based similarity measure is to reduce computational complexity. We refer to Appendix A for a high level comparison concerning the complexities of the two approaches. As shown in Fig. 2 the initial pose estimation takes as input a ﬂuoroscopy image, a CT image (including sub-images containing single vertebrae) and a set of R-tables, which are precomputed for each vertebra from renderings of the CT data. Each R-table represents a different pose position, and the range of poses is selected to encompass the required capture range. After applying the GHT on the ﬂuoroscopy image with respect to each R-table, the poses are evaluated using the produced accumulator arrays and a Gradient Difference Similarity Measure (GDSM) (Penney et al. (2001)), resulting in a pose estimate for each vertebra. We initially describe the preoperative processing and then the intraoperative algorithm. 2.1.1. Preoperative R-table computation Fig. 3 shows the preoperative workﬂow. Input is the CT image, from which N subvolumes each containing a single vertebra are extracted. This is achieved by picking a single point in the centre of each vertebra and deﬁning a mean subvolume size. M Digitally Reconstructed Radiographs (DRRs) are produced for each of the N vertebrae, picturing each vertebra under M corresponding rigid transformations. These transformations deﬁne the capture (operating) range of the algorithm, and so need to span the range of movements of the ﬂuoroscopy set plus estimates for variations in patient position. Edge points are extracted from the DRRs using Phase Congruency (Kovesi (2003)) and are used to construct the GHT R-Tables (Ballard (1981)). Each R-Table contains vectors from each edge point to the centre of the corresponding DRR. The vectors are grouped with respect to each

110

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

arrays were then ranked based on the following normalised metric

I1 V = 6

i=2 Ii

Fig. 3. Preoperative construction of vertebra based R-Tables, used in the GHT/GDSM Initial Pose Estimation.

Fig. 4. GHT/GDSM Initial Pose Estimation: the pipeline of processes for computing an initial estimate of the 3D pose.

point’s quantised local orientation. For this quantisation bins which are ϕ = 30◦ wide are used.

2.1.2. Intraoperative pose aware vertebra detection Fig. 4 shows the intraoperative pipeline. Let us note that the pipeline is shown for just a single vertebra; therefore, this pipeline needs to be carried out N times to obtain an initial pose for each vertebra. The pipeline begins by applying the GHT to the ﬂuoroscopy image using the M preoperatively produced R-tables. Edge points are extracted from the ﬂuoroscopy image again using Phase Congruency (Kovesi (2003)). The GHT output is M accumulator arrays, one for each R-table. In the traditional use of the GHT, a single R-table is used to produce a single accumulator array, which is a two dimensional voting space of the size of the ﬂuoroscopy image. Each edge point in the R-table casts a vote for each edge point in the ﬂuoroscopy image with the same local orientation. The vote increases a counter in the accumulator array at a location indicated by the coordinates of the edge point in the image and the vector corresponding to the edge point in the R-table. The accumulator array is then thresholded to provide potential locations for the target object. Our accumulator based selection algorithm aims to locate the single best location from multiple (M) accumulator arrays. Simply ﬁnding the global maximum would not work as this would bias the results towards arrays produced with R-tables from DRRs which contained more edge features. Our solution is to normalise the maximum of each accumulator array, I1 using the values of the next ﬁve local maxima1 , Ii , i = 2, . . . , 6. A neighbourhood around each chosen Ii value was excluded from subsequent searches, to ensure that selected local maxima are at least a minimum distance apart. The M

1

The values of the normalising maxima are typically of the same order, so the formula is not sensitive to the exact number of maxima used.

(1)

For most ﬂuoroscopy images the 3D pose associated with the highest rank (largest value of V) is within the capture range of the subsequent 2D–3D registration algorithm. However the method is not as robust as using a specialised 2D–3D similarity measure. Therefore, to improve accumulator array selection we adapt the method as follows: the accumulator arrays are ranked according to metric V, and the poses associated with the top K = 100 ranked arrays are used to produce a DRR2 . Each DRR is then compared to the ﬂuoroscopy image using the Gradient Difference Similarity Measure (GDSM) (Penney et al. (2001)). The pose producing the maximum similarity value is chosen as the initial pose for the 2D–3D registration algorithm. Using this method the GHT, which requires much less computation than the intensity-based similarity measure, is used to ﬁlter a large number (M) of possible poses to identify a small number (K) of highly likely poses. Then the more robust similarity measure determines the best pose from the K candidate poses. As shown in Fig. 2 (overview of entire 2D–3D registration pipeline) the output from the initial pose estimation is a single best pose for each of the N vertebrae. These are then used as a starting position for a 2D–3D registration algorithm which outputs N updated (registered) pose positions, one for each vertebra. These pose positions each need to be checked for registration accuracy, which is the subject of the second main area of algorithm development proposed in this paper, result veriﬁcation, and is described in Section 2.2. Let us brieﬂy mention here a few details about the 2D–3D registration algorithm used (Penney et al. (2011)). This algorithm ﬁrst uses a global search over a speciﬁed range of rigid transformations which deﬁnes its capture range (represented by the dotted lines in Fig. 6). The position found by the global search is then further optimised with the use of a hill climbing search strategy. Both searches use the same Gradient Difference Similarity Measure presented in Penney et al. (2001). 2.2. Result veriﬁcation Result veriﬁcation is critical for any registration algorithm which is less than 100% robust. Our proposed registration pipeline (Fig. 2) is guaranteed to produce incorrect registrations for all of the N vertebrae not visible in the ﬂuoroscopy image. Even when corresponding vertebrae are present misregistrations still occur (1.8% reported by Penney et al. (2011)). This section presents three methods to detect and remove misregistrations to enable automatic selection of a correct registration as output from the registration pipeline. The ﬁrst method is based on similarity value features and has been presented in earlier work (Varnavas et al. (2013)). For completeness, an overview is presented in Section 2.2.1. The second method is based on features describing the geometric difference between different vertebra based registrations and is presented in Section 2.2.2. Both methods produce two features. Registrations to a set of training data (with manually veriﬁed registrations) are used to populate a feature space and deﬁne a classiﬁcation rule separating correct and incorrect registrations. The third method, presented in Section 2.2.3, combines the two classiﬁcation rules of the previous methods in a way that detection of incorrect registrations is enhanced. This method can also be incorporated into the registration pipeline in a computationally eﬃcient way. Often more than one vertebra will be successfully registered to a ﬂuoroscopy image. In such situations the successfully registered 2 The value of K = 100 produces a set of top ranked poses where at least one of them is within the capture range of the subsequent 2D–3D registration algorithm approximately 99% of the time. See Section 4.1.1 for further details.

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

vertebra closest to the centre of the image is chosen, and the pose of this vertebra is the overall pipeline output (ﬁnal pose). 2.2.1. Similarity value based veriﬁcation The ﬁrst veriﬁcation method uses two features produced from the GDSM. These features are produced by applying the GDSM to an area of interest in the ﬂuoroscopy image, automatically calculated by projecting a bounding box of the candidate vertebra from CT into the ﬂuoroscopy. The ﬁrst feature is the value of the GDSM (registration value) at the registration position. The registration value provides a good indication of a successful registration, but it is affected by a number of factors. Varnavas et al. (2013) describes methods to reduce the effect that variations in the image resolution and the gradient value distribution have on the registration value. However, the registration value on its own does not enable good separation between correct and incorrect registrations. This is because the registration value can be greatly affected by the presence of interventional instruments in the area of interest. Conversely, due to the similar shape of vertebrae, a reasonably high registration value can be obtained when an incorrect vertebra is matched. To account for these factors a second, normalisation value is calculated as follows: the registration process is repeated to the same ﬂuoroscopy area of interest, but using different (therefore probably incorrect) CT vertebrae. For the experiments carried out in this paper the T = 4 closest vertebrae to the candidate vertebra were used. The initial pose for these registrations is computed by adapting the initial pose which was used for the registration under veriﬁcation, to account for the translation between the candidate and new CT vertebra. The normalisation value is set to the best (i.e. largest) GDSM value from registering the T closest vertebrae. The second veriﬁcation feature named as similarity ratio is deﬁned as:

similarity ratio =

registration value normalisation value

(2)

The classiﬁcation rule is computed combining the use of a Support Vector Machine (SVM) (Cortes and Vapnic (1995)), which is applied jointly on both features, and scalar thresholds, which are based on each feature individually. A linear SVM is ﬁrst trained to produce a straight line separating correct from incorrect registrations. Let us note here that large similarity value features represent correct registrations. Therefore, under a linear separating rule, if any one feature is very large, a correct registration result can be returned, even if the value of the other feature is very small. Since we are particularly interested in rejecting incorrect registrations (i.e. minimising False Positives), single feature classiﬁcation rules were additionally produced, to enforce that both features must be suﬃciently large to return a correct registration result. These rules were deﬁned using a standard method for non-parametric outlier rejection: when a feature takes a value lower than the lower quartile minus the inter-quartile range then the corresponding registration is classiﬁed as incorrect. Therefore, a registration result should pass the two single feature rules and the SVM one to be classiﬁed as correct. 2.2.2. Inter-vertebrae pose agreement based veriﬁcation The second method for automatic result veriﬁcation makes use of the multiple independent registrations to different vertebrae. Correct registrations should give rise to similar poses for the CT data as intervertebral movement, for patients lying in a similar position, is only a few degrees/mm (especially between adjacent vertebrae). Therefore, features based on the difference between poses produced by different vertebrae can be used to discriminate between correct and incorrect registrations. Two features are used. The ﬁrst feature is a measure of minimum rotational difference, between the candidate pose of a given vertebra and the poses found for all other vertebrae. Using the N pose positions

111

found, Pi = (θx , θy , θz , X, Y, Z )i , i = 1, . . . , N, the minimum rotational difference is deﬁned as:

mRDi = min max j

θx i − θx j , θy i − θy j , θz i − θz j ,

j = 1, . . . , N, j = i

(3)

The second feature is a measure of minimum translational difference between the candidate vertebra pose and the poses of all other vertebrae. Translation is measured using a reprojection distance dreproj (Pi , Pj , x) which is the minimum distance between 3D position x and a back projection line which is calculated as follows: we ﬁrst project x into 2D using the projection matrix corresponding to the candidate vertebra pose Pi , to get x2D , and then back project from x2D using the projection matrix corresponding to the pose Pj of a different vertebra. The minimum translational difference mTD feature is calculated using the mean reprojection distance of 8 points, xk , k = 1, . . . , 8, which form a bounding box around the vertebra the pose of which is to be veriﬁed. The vertebra position is compared with all the other vertebrae poses and the minimum distance is calculated:

mT Di = min j

8 1 drepro j Pi , Pj , xik 8 k=1

j = 1, . . . , N, j = i

(4)

A necessary condition for these pose agreement features to be able to detect correct registrations is that at least two of the N vertebrae should have been registered correctly. Randomly coinciding wrong registrations will create False Positives. The probability of such cases is expected to be small given the size of the six dimensional space of extrinsic parameters. The classiﬁcation rule using the two features is computed using a linear SVM applied jointly on both features. Let us note here that in contrast to the similarity value based features, the pose agreement based features are bounded by the classiﬁcation rule and the coordinate axis for the subspace of correct registrations. This means that both features must be close to zero to return a correct registration result. Therefore, in contrast to Section 2.2.1, no additional single feature based rules were employed. 2.2.3. Combined veriﬁcation scheme The two veriﬁcation methods proposed in the previous sections are to a large degree independent and they can therefore be combined to enhance the classiﬁcation results. As mentioned in Section 2.2.1 we are particularly interested in minimising the False Positive Rate, as it is of primary importance to avoid showing wrong registrations to clinicians. For this reason we propose a combined veriﬁcation scheme which classiﬁes a registration as correct if it is classiﬁed as such by both methods of Sections 2.2.1 and 2.2.2. If a registration is classiﬁed as incorrect by at least one of the previous two methods, the combined veriﬁcation scheme classiﬁes the registration as incorrect. This classiﬁcation scheme, apart from enhancing the detection of incorrect registrations, can be incorporated in the proposed registration pipeline in a computationally eﬃcient way. The similarity value based veriﬁcation has increased computational cost as it requires additional registrations to be produced. On the other hand the pose agreement based veriﬁcation is computationally inexpensive as it requires the computation of two simple metrics comparing vertebra poses which have already been produced. We propose incorporating the combined scheme to the registration pipeline in two steps: ﬁrst the produced registration for each vertebra is checked with the pose agreement veriﬁcation and the set of registrations classiﬁed as correct is kept. Then the registration corresponding to the vertebra closest to the centre of the image is checked with the similarity value based veriﬁcation. The result of this check is the result of the

112

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

complete registration pipeline. In this way the combined veriﬁcation scheme achieves computational reduction by applying the expensive similarity value based veriﬁcation to only one of the produced registrations. 3. Experimental methods Experiments were carried out using real patient data and “gold standard” data of a cadaveric lumbar spine Tomaževiˇc et al. (2002). The real patient data were used to evaluate the performance of: the initial pose estimation (Section 3.1); the result veriﬁcation (Section 3.2); and the entire fully automated registration pipeline (Section 3.3). The “gold standard” data were also used to evaluate the performance of the fully automated registration pipeline (Section 3.3) and to quantify the ability of visual inspection to discriminate between correct and incorrect 2D–3D registrations (Section 3.4). The real patient data were acquired from 31 operations over two periods of one year, following requests by the clinicians to run the presented or earlier (Penney et al. (2011)) versions of the registration system intraoperatively. They consist of 31 preoperative CT scans and 417 intraoperative ﬂuoroscopy (nearly all low dose screening) images, acquired in approximately equal intervals during the operations or upon the clinicians’ requests. The dimensions of the ﬂuoroscopy images were 480 × 480 pixels and the ﬂuoroscopy set had four magniﬁcation settings with the pixel size varying in the range between 0.308 and 0.744 mm. For each ﬂuoroscopy image the magniﬁcation setting and orientation of the ﬂuoroscopy set was recorded. This information was used to estimate the intrinsic parameters, and to determine whether the image was within the operating range o of the registration algorithm. The preoperative CT data came from a variety of scanners with an average and maximum slice thickness of 0.94 mm and 1 mm respectively. All the ﬂuoroscopy data was acquired using a Siemens Axiom Artis DTA system at St Thomas’ Hospital. Use of these data was approved by National Research Ethics Committee (09/H0707/64) with informed patient consent. The data was split into three groups: •

•

•

Group A: 19 preoperative CT scans; 224 intraoperative ﬂuoroscopy images (within operating range). Group B: 12 preoperative CT scans; 164 intraoperative ﬂuoroscopy images (within operating range). Group C: 29 intraoperative ﬂuoroscopy images (outside operating range). CT scans contained in groups A and B.

The initial pose estimation operating range o was deﬁned as follows: •

•

•

The rotational range was deﬁned about the axes of the CT scan, with the origin in the centre. The range about the axes perpendicular to the transverse, sagittal and coronal planes was set to ◦ ◦ ◦ ◦ ◦ ◦ [−40 , 40 ], [−30 , 30 ] and [−8 , 8 ], respectively. Since intraoperatively the ﬂuoroscopy plane is approximately parallel to the CT coronal plane under the Anterior-Posterior (AP) position of the C-Arm, we refer to the rotations above as lateral, cranial-caudal and in-plane respectively. The out-of-plane translational range was set to [700 mm, 900 mm]. This range describes the distance between the C-arm source and the centroid of the CT scan. In plane translation is deﬁned over the entire ﬂuoroscopy image and requires at least two vertebrae from T11 to L4 to be visible in it.

The operating range needs to encompass variations in patient position, and the range of movement of the ﬂuoroscopy set during the operation. Variations in how reproducibly patients lie supine are expected to be small, approximately 3–4° (Penney et al. (2011)). Patient movement during procedures has been reported (Varnavas et al. (2013)) but is expected to be mostly translational. The operating

range used here is expected to cover movements of the ﬂuoroscopy set for standard endovascular aortic repair. It is trivial to extend the operating range, but this will result in longer registration times. R-tables were produced for the six vertebrae from T11 to L4 as depicted in Fig. 3 and described in Section 2.1.13 . DRRs were produced to cover the operating range with a rotational step of 4°, and an out of plane translational step of 50 mm. R-tables were produced for each of the four ﬂuoroscopy magniﬁcation settings. This resulted in M = 8400 R-tables per vertebra and per magniﬁcation setting. A “ground truth” CT pose for each of the images in groups A and B was obtained using the GDSM based 2D–3D registration algorithm (Penney et al. (2011)). The registration algorithm was initialised manually using the method described by Penney et al. (2011). The initialisation was completed in under a minute for each image, but since manual vertebra identiﬁcation was required, knowledgeable input from an experienced user was needed. Registrations were carried out to the vertebra closest to the centre of the image and completed in 1.25 s in each case. The results were veriﬁed using visual inspection. An experiment to quantify the accuracy of this visual inspection process is described in Section 3.4. In addition to the real patient data, we also used the publicly available dataset presented in Tomaževiˇc et al. (2003) to enable comparison of our results with other methods, and to an independently calculated “gold standard”. The dataset consists of a CT scan and 18 ﬂuoroscopy images of a cadaveric lumbar spine (L1 to L5). The “gold standard” registrations have been calculated using ﬁducial markers Tomaževiˇc et al. (2002). The dataset comes with 450 starting positions for the CT data (Tomaževiˇc et al. (2003)), and eight points around each vertebra which are used to calculate the mean Target Registration Error (TRE). 3.1. Experiments for initial pose estimation The initial pose estimation method, described in Section 2.1, estimates the 3D pose of a single vertebra. The target vertebra for this set of experiments was the vertebra closest to the image centre for each of the 224 ﬂuoroscopy images in Group A. For each image the target vertebra was manually identiﬁed, and the set of R-tables corresponding to the identiﬁed vertebra and the magniﬁcation setting of the ﬂuoroscopy set were used as input to the pose estimation method. The difference between the found pose and the “ground truth” pose was calculated, and compared with the known capture range of the subsequent 2D–3D registration algorithm (Penney et al. (2011)). 3.2. Experiments for registration result veriﬁcation To evaluate the result veriﬁcation algorithms (Section 2.2) the data in Group A was used for training and the data in Group B for testing purposes. For all the images in both groups the registration pipeline (see Fig. 2) was run up to the Result veriﬁcation stage (box B). This resulted in a large number of successful and unsuccessful registrations. The main causes of the latter are attempts to register a vertebra from CT to a ﬂuoroscopy image which does not contain the chosen vertebra. Each produced pose was labelled as being correct or incorrect using visual inspection (comparing DRRs constructed at the produced pose position with the ﬂuoroscopy). The produced poses in the training set were used to compute the classiﬁcation rules as described in Sections 2.2.1, 2.2.2 and 2.2.3. Whenever an SVM was employed, a constraint parameter C = 1 was used. The classiﬁcation performance of the produced rules is measured both with respect to the training and the testing set. For the training and testing of the similarity value based veriﬁcation algorithm all data could be used. For the inter-vertebrae pose 3 For two patients in Group A, we used 5 out of the 6 vertebrae in range T11–L4 for R-table production, as only 5 were in the ﬁeld of view of the preoperative CT scans.

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119 Table 1 Number of 3D poses in the training and testing sets for the three veriﬁcation algorithms. Number of poses

Similarity value based veriﬁcation Pose agreement based veriﬁcation Combined veriﬁcation scheme

Training set Group A

Testing set Group B

1325

984

1266

900

1266

900

Table 2 Number of Gold Standard data registrations within and outside operating range o , for three ranges of initial displacement from gold standard position.

Within Operating Range o Outside Operating Range o

0–6 mm

6–12 mm

12–18 mm

Total

2625 304

1371 1634

429 1737

4425 3675

agreement algorithm only images where at least two correct registrations occurred were used, as this is the condition under which we want to train and test the algorithm. For the same reason only the images above were used for the combined veriﬁcation scheme as well. Table 1 shows the total number of poses in the training and testing sets for each algorithm. 3.3. Experiments for fully automated registration To evaluate the complete registration pipeline (Fig. 2) both patient and the gold standard data were used. For the patient data experiments, images from Groups B and C were used. The entire registration pipeline was run, on each of these images, using each time one of the three veriﬁcation methods presented in Section 2.2. When the similarity value and the pose agreement based methods were used, they were applied on each of the vertebra corresponding poses produced in the output of 2D–3D registration. When the combined veriﬁcation scheme was used, it was applied in the way described in Section 2.2.3, i.e. ﬁrst selecting the poses which pass the pose agreement method and then applying the similarity value based method on the pose corresponding to the vertebra closest to the centre of the image. When the pipeline produced a registration result, its correctness was manually checked using visual inspection. There are therefore three possible results of the registration pipeline: •

•

•

Correct registration: the pipeline produces a registration result and it is veriﬁed as correct by the visual inspection. Incorrect registration: the pipeline produces a registration result but the visual inspection shows that it is incorrect. No registration: the pipeline produces no result (i.e. no vertebra based registration is accepted by the result veriﬁcation module).

The complete registration pipeline, with combined veriﬁcation, was also evaluated with the use of the gold standard dataset. For each of the 18 ﬂuoroscopy images, registrations were started from the 450 provided start positions (8100 registrations in total), which cover a wide range of transformations (up to 51.6°) from the gold standard positions. In the original publication, Tomaževiˇc et al. (2003), the starting positions are split into three ranges based on distance to the gold standard position, 0–6 mm, 6–12 mm, 12–18 mm computed as a mean TRE using the provided eight target points. These starting positions were separated into two sets, those which fell within the operating range (o deﬁned in Section 3) of our initial pose estimation algorithm, and those which were outside this range. Table 2 shows the number of registrations in each set. The main reason for starting positions falling outside the operating range (particularly in the 0–6mm

113

bracket) was due to a large in-plane rotation (patient rotation around an axis perpendicular to the bed for an anterior-posterior view). Large rotations around this axis have not been observed during our clinical ◦ ◦ procedures and so our algorithm only searches between [−8 , 8 ] for this parameter. To test our algorithm with multiple starting positions, R-tables were produced to cover the complete range of starting positions, with the same rotational (4°) and translational (50 mm) step size as used with the patient data. For each individual starting position, a subset of these R-tables were extracted which just covered our method’s operating range o . In addition, for each individual starting position, the 16 R-tables which represented the positions closest to the gold standard position were altered so that they were an integer multiple of the step size from the starting position. These 16 R-tables are expected to be the most critical for the initial pose identiﬁcation, so we changed them to be identical to the ones we would produce if our operating range was sampled exactly about the given starting position. The complete registration pipeline (Fig. 2) was tested using the ﬁve lumbar vertebrae. As mentioned in Section 2.2, the ﬁnal pose for our method is the pose of the registered vertebra closest to the centre of the image. We compared our ﬁnal pose with ground truth, by calculating Target Reprojection Registration Error (TRRE) using the eight target points around L3 provided with the dataset. 3.4. Experiment for evaluation of visual veriﬁcation The aim of this experiment was to evaluate the size of registration error that could be robustly detected using expert visual inspection. The CT scan and an anterior-posterior ﬂuoroscopy image from the gold standard dataset were used. DRRs were produced from the CT scan at poses close to the ground truth pose. The mean reprojection error of each pose was calculated, and a set of 80 DRRs with error uniformly distributed between 0 and 4 mm was selected. Twenty DRRs produced at the gold standard position were added to this set, to produce a test data set of 100 DRRs. Each of these 100 DRRs were presented to the two experts who carried out the visual inspection veriﬁcation to obtain the patient data “ground truth” CT poses, as described in Section 3. The DRRs were presented in random order, overlaid to the ﬂuoroscopy image and were classiﬁed by each expert as either “correct” (good registration) or “incorrect” (misregistration). 4. Results 4.1. Initial pose estimation performance An adequate initial pose (i.e. within the capture range of the 2D– 3D registration algorithm (Penney et al. (2011))) was produced for 211 out of 224 images in Group A. Therefore, our initialisation method has a 94.2% chance of successfully ﬁnding an initial 3D pose to a single vertebra. The distances between the 211 successfully found initial poses and the corresponding “ground truths” are illustrated in Fig. 6. The vast majority of estimated poses fall well within the capture range of the subsequent 2D–3D registration algorithm, which is indicated in Fig. 6 by the horizontal dotted lines. The success of the initial pose estimation method lies with its ability to produce very high rankings for accumulator arrays which correspond to poses close to the registration position. To illustrate this, Fig. 5 shows a ﬂuoroscopy image and the accumulator arrays of the ﬁnally selected poses for the L2 and L3 vertebrae. The global maximum for each vertebra is clearly visible, and correctly calculates the position of the vertebra in the image (i.e. the in-plane translation parameters). The distinctiveness of the global maximum, for two vertebrae which are partly occluded by interventional instruments, in a noisy, low-dose ﬂuoroscopy image, is indicative of the robustness of our method. Note also that, although vertebrae have very similar shapes, their simultaneous presence in the ﬂuoroscopy image does

114

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119 Table 3 Similarity value based veriﬁcation method results.

Training set Testing set

True Positive

False Positive

True Negative

False Negative

Correctly Classiﬁed

734 (99.32%)

0 (0%)

586 (100%)

5 (0.68%)

1320 (99.62%)

429 (93.67%)

0 (0%)

526 (100%)

29 (6.33%)

955 (97.05%)

Fig. 5. Fluoroscopy image (with ground truth rotations) and accumulator arrays corresponding to the estimated pose for the pictured vertebrae L2 and L3.

Fig. 6. Box and whisker plot of errors for each extrinsic parameter after the initial pose estimation, for the 211 successful cases in Group A. Box shows upper quartile, median and lower quartile, whiskers indicate percentile range 5–95%. Horizontal dotted lines indicate the capture range of the subsequent 2D–3D registration algorithm.

not create strong local maxima in the accumulator array: the method is able to identify speciﬁc vertebrae. Let us also give here some statistics about the “adequacy” of the poses indicated by the accumulator array ranking, in order to illustrate the contribution of further evaluating the top K poses with the use of the GDSM. When considering only the top pose indicated by the accumulator array ranking using Eq. (1), this is within the capture range of the 2D–3D registration algorithm for ∼85% of the 224 images in group A. This percentage of images increases to ∼95% and ∼99% when 10 and 100 top poses are considered respectively. Therefore, our choice for K = 100, combined with the eﬃciency of the GDSM for pose evaluation, signiﬁcantly increases the chances of producing an adequate initial estimate.

Fig. 7. Scatter plot of similarity value based features for training (a) and testing (b) set.

4.2. Registration result veriﬁcation performance

note again, that all classiﬁcation rules were produced using just the training data (Group A). Table 3 shows a summary of the classiﬁcation performance in the training (Group A) and testing (Group B) datasets respectively. In both cases the True Positive Rate is high ( 99.32% and 93.67% respectively), whereas the False Positive Rate is 0%. Let us note here that we use the terms (True/False) Positive and Negative for registrations which have been (correctly/incorrectly) classiﬁed as correct and incorrect by the proposed veriﬁcation algorithms respectively.

4.2.1. Results with similarity value based veriﬁcation Fig. 7 shows scatter plots of the similarity value based features using the training (Fig. 7(a)) and testing (Fig. 7(b)) data. The solid line shows the two feature classiﬁcation rule produced by the SVM, the dotted lines show the two single feature classiﬁcation rules. Let us

4.2.2. Results with inter-vertebrae pose agreement based veriﬁcation Fig. 8 shows scatter plots of the pose agreement based features using the training (Fig. 8(a)) and testing (Fig. 8(b)) data. Table 4 summarises the classiﬁcation performance. The separability of correct from incorrect poses is good, with correct poses of the same image

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

115

Table 4 Pose agreement based veriﬁcation method results.

Training set Testing set

True Positive

False Positive

True Negative

False Negative

Correctly Classiﬁed

729 (100%)

18 (3.35%)

519 (96.65%)

0 (0%)

1248 (98.58%)

442 (99.55%)

8 (1.75%)

448 (98.25%)

2 (0.45%)

890 (98.89%)

50 correct registration incorrect registration Support vectors

45

Maximum angle difference ( o)

40

Table 5 95% conﬁdence intervals for the False Positive and False Negative Rates of the similarity value based and the pose agreement based veriﬁcation methods (computed by data in testing sets).

35

False Positive Rate

False Negative Rate

0%–0.7%

4.28%–8.97%

0.76%–3.43%

0.05%–1.62%

30

Similarity value based veriﬁcation Pose agreement based veriﬁcation

25 20 15 10 5 0 0

10

20

30 40 50 60 70 80 Minimum translational difference (mm)

90

100

110

(a) Training set 50 correct registration incorrect registration

45

Correctly identifying incorrect registrations is most important for medical applications; therefore the similarity value based algorithm is preferable. However, since in the proposed registration pipeline we produce a number of (vertebra based) registrations from which we need to select only one correct, the two methods can be combined in a computationally eﬃcient way, retaining the high speciﬁcity of the similarity value based algorithm, as described in Section 2.2.3.

40 o

Maximum angle difference ( )

1. the similarity value based veriﬁcation method is statistically signiﬁcantly better than the pose agreement based method for correctly classifying incorrect registrations (fewer False Positives). 2. the pose agreement based veriﬁcation method is statistically signiﬁcantly better than the similarity value based method for correctly classifying correct registrations (fewer False Negatives).

35 30 25 20 15 10 5 0 0

10

20

30 40 50 60 70 80 Minimum translational difference (mm)

90

100

110

(b) Testing set Fig. 8. Scatter plot of inter-vertebrae pose agreement based features for training (a) and testing (b) set.

being similar, within a translational and rotational tolerance of 5 mm ◦ and 5 respectively. On the other hand, most incorrect poses to a given image are very different to each other, resulting in overall classiﬁcation performance of 98.58% (training data) and 98.89% (testing data). However, a few incorrect poses can be seen to have resulted in good agreement with each other, resulting in some False Positive results, 3.35% (training data) and 1.75% (testing data). 4.2.3. Results comparing the veriﬁcation methods To compare the performance of the similarity value based and the inter-vertebrae pose agreement based veriﬁcation methods, we present in Table 5 the 95% conﬁdence intervals for the False Positive and False Negative Rates. These binomial conﬁdence intervals were computed using the Clopper–Pearson method on the data of the testing sets. As these intervals do not overlap, they show that:

4.2.4. Results with combined veriﬁcation scheme Table 6 summarises the results for the veriﬁcation scheme that combines the similarity value and the inter-vertebrae pose agreement based classiﬁcation algorithms. Due to the way the algorithms are combined (both need to pass to classify a produced pose as correct), the False Positive Rate will be decreased, with a corresponding increase in the False Negative Rate. Note that a direct comparison can be done only between Tables 4 and 6, as it is in these cases that the underlying datasets are exactly the same. The combined veriﬁcation scheme reduced the False Positive Rate to 0%, with an increase in the False Negative Rate which is 0.69% and 5.41% in the training and testing sets respectively. 4.3. Automated registration performance Table 7 summarises the performance of the automated registration pipeline using the test patient data set (i.e. Group B). All of these images are within the operating range of the algorithm, and so a perfect result would be 100% correct registrations. Best performance is achieved with the similarity value based veriﬁcation, 95.73% success, and 0% incorrect registrations are classiﬁed as correct. A lower success rate of 89.63% is achieved with the pose agreement veriﬁcation method, and now a few (3.66%) incorrect registrations are classiﬁed as being correct. Combined use of both classiﬁers results in a slightly lower overall success rate (87.8%) and 0% incorrect registrations. It should be noted that although the combined approach

116

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119 Table 6 Results for combined use of similarity value and pose agreement based veriﬁcation methods.

Training set Testing set

True Positive

False Positive

True Negative

False Negative

Correctly classiﬁed

724 (99.31%)

0 (0%)

537 (100%)

5 (0.69%)

1261 (99.61%)

420 (94.59%)

0 (0%)

456 (100%)

24 (5.41%)

876 (97.33%)

Table 7 Performance of the registration pipeline for images within the operating range of interest (Group B). Type of veriﬁcation used in the system

Correct registration

Incorrect registration

No registration

Similarity value based veriﬁcation Pose agreement based veriﬁcation Combined veriﬁcation

157 (95.73%)

0 (0%)

7 (4.27%)

147 (89.63%)

6 (3.66%)

11 (6.71%)

144 (87.8%)

0 (0%)

20 (12.2%)

Table 8 Performance of the registration pipeline for images outside the operating range of interest (Group C). Type of veriﬁcation used in the system

Incorrect registration

No registration

Similarity value based veriﬁcation Pose agreement based veriﬁcation Combined veriﬁcation

0 (0%)

29 (100%)

1 (3.45%)

28 (96.55%)

0 (0%)

29 (100%)

produces a slightly lower success rate, its time requirements, compared to similarity value based veriﬁcation, are signiﬁcantly reduced (see Section 4.5). The performance of the automated registration pipeline using images outside the operating range (i.e. Group C) is presented in Table 8. Correct registration is not feasible to these images, and so the desired output is a “no registration” result. Using the similarity value based veriﬁcation or the combined veriﬁcation, all registrations are correctly rejected yielding 0% incorrect registrations. The pose agreement based veriﬁcation has a slightly lower performance of 3.45% incorrect registration rate. The proposed registration pipeline, with combined veriﬁcation, was used during complex aortic repairs of 12 patients in Group B. This enabled 3D renderings of the aorta and visceral vessels from CT to be overlayed onto the ﬂuoroscopy during the procedure. We show in Fig. 9 four examples of such overlays. The images on the left are typical examples of the low dose ﬂuoroscopy images that were used in this study. Note also the good agreement of the aortic overlays on the right with the underlying ﬂuoroscopy images at anatomical points critical for these procedures. These points are the ostia of the visceral vessels which agree well with the cannulating wires on the ﬂuoroscopy images, indicating accurate registrations. Table 9 shows results using the gold standard data for the 4425 cases when the initial position was inside the operating range. A registration was deﬁned to be successful if the Target Registration Reprojection Error (TRRE) for all eight target points was smaller than 2 mm. The results show very accurate registrations and a 100% success rate for all three starting position ranges. For the 3675 cases where the initial position was outside the operating range, a no registration result was produced for 3657 (99.5%) of them. Incorrect registrations were produced for the remaining 18

Fig. 9. Overlay examples: typical low dose ﬂuoroscopy images (left) and CT rendered, aortic overlays produced after registration (right). The arrows indicate the ostia of visceral vessels (key clinical landmarks), which agree well with the cannulating wires on the ﬂuoroscopy images, indicating accurate registrations. Note: due to deformation renal arteries do not agree with wire positions distal to the aorta.

cases which corresponded to strong local extrema due to the reﬂection symmetry of a vertebra in a sagittal plane. This results in DRRs produced at an angle ± x° from true lateral view appearing similar (Penney (2000)). Such cases were not observed in our patient data ◦ as lateral rotations larger than 90 are not generally used in clinical practice. These failures can be avoided in a clinical situation as long as the operating range of the algorithm is not extended to be close

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

117

Table 9 Target Registration Reprojection Error (TRRE) and Rotation Error (γ )a before and after 2D/3D registration produced with Gold Standard Data. Percentage of successful registrations for three ranges of initial displacement from gold standard position, within the algorithm’s operating range. Before registration TRRE (mm) RMS Max 5.9 23.4

After registration

γ() ◦

RMS 14.2

Max 43.7

TRRE (mm) RMS Max 0.21 0.62

Successful registrations (%)

γ() ◦

RMS 0.24

0–6 mm

Max 0.69

γ < 16.7

◦

100

6–12 mm ◦ γ < 32.7 100

12–18 mm ◦ γ < 43.7 100

The rotation γ between two poses was computed by decomposing the pose difference into translation and rotation components using the same approach as Tomaževiˇc et al. (2003). a

20

20 18 16

classified as correct classified as incorrect

16 14

10 8

Number of DRRs

Number of DRRs

14 12

classified as correct classified as incorrect

18

12 10 8

6

6

4

4

2

2

0

0

0 0.2 0.6 1 1.4 1.8 2.2 2.6 3 3.4 3.8 Mean Reprojection Distance from gold position (mm) (a)

0 0.2 0.6 1 1.4 1.8 2.2 2.6 3 3.4 3.8 Mean Reprojection Distance from gold position (mm) (b)

Fig. 10. Number of DRRs classiﬁed as “correct” and “incorrect” as a function of their distance from gold standard position. Classiﬁcation took place through visual inspection by two experts (a) and (b).

to true lateral. Our use of a global search strategy, which limits the search algorithm to clinically used positions, could help avoid such local extrema which may be found using other search strategies.

Table 10 Representative computation time (in seconds) for the three stages of the automated registration pipeline. Initial pose estimation

2D–3D registration

Result veriﬁcation

Total

13

7.5

30

50.5

13

7.5

0

20.5

13

7.5

5

25.5

4.4. Evaluation of visual veriﬁcation Fig. 10 shows the results of the experiment to evaluate the size of registration error which could be robustly detected using visual inspection. Both experts correctly classiﬁed all 20 DRRs which had been produced at the gold standard position; therefore with this data visual inspection had a sensitivity of 100%. Only 5 of the 2 × 80 DRRs produced away from the gold standard were classiﬁed as correct (speciﬁcity of 96.9%). The mean and maximum error from these 5 incorrectly classiﬁed DRRs were 0.72 mm and 1.8 mm. All DRRs with an error larger than 2 mm were correctly classiﬁed. These results show our ability to discriminate between correct and incorrect registrations through visual inspection, and conﬁrms the validity of using visual inspection to verify the ground truth used in the patient data experiments. 4.5. Time Requirements for automated registration To enable registration in a clinically acceptable time, a computer with two NVidia GTX 690 Graphics cards was used. Each card contains two GPUs, so in total four GPUs were used simultaneously. We implemented a parallel version of the three computationally expensive parts of the registration pipeline: production of the accumulator arrays, DRR rendering and similarity value computation.

Similarity value based veriﬁcation Pose agreement based veriﬁcation Combined veriﬁcation

Computation times for the three stages of the automated registration pipeline are summarised in Table 10. Exact computation time varies depending on the magniﬁcation setting. Reported times correspond to an intermediate magniﬁcation. The initial pose estimation algorithm can produce 8400 accumulator arrays corresponding to a single vertebra in 2 s. Therefore, for all six vertebrae, complete initialisation stage is achieved in 13 s. A single vertebra based 2D–3D registration is completed in 1.25 s. Thus, the 2D-3D registration stage of the automated pipeline requires 7.5 s (6 vertebrae × 1.25 s). Finally, for result veriﬁcation, similarity value based veriﬁcation requires four additional 2D–3D registrations (i.e. 5 s) per vertebra; therefore 30 s for all 6 vertebrae. The computational cost of pose agreement based veriﬁcation is insigniﬁcant. Combined veriﬁcation makes use of the similarity value based veriﬁcation only for a single vertebra and so is completed in 5 s. Therefore, the time required for the complete automated pipeline when the similarity value, the pose agreement

118

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

and the combined veriﬁcation is used, is 50.5, 20.5 and 25.5 s respectively. 5. Discussion We have proposed methods to automate 2D–3D registration. The only manual input in our experiments was the magniﬁcation setting of the ﬂuoroscopy set, information which could be transferred automatically via the image header. Therefore, provided with an interventional ﬂuoroscopy image, and magniﬁcation setting, our proposed methods are able to carry out 2D–3D registration and veriﬁcation with no additional manual input. For clinical use a system should have a high correct registration rate, low (preferably zero) incorrect registration rate, and be fast, preferably enabling registration and validation in just a few seconds. Our system, using similarity value based veriﬁcation with patient data sets achieved 95.73% correct registration rate and a 0% incorrect registration rate. Registrations to the phantom (ground truth) data set showed the algorithm to be accurate (0.21 mm RMS error) and robust (100% successful for images in operating range). A very small number (0.5%) of incorrect registrations were not detected, however as long as the operating range is not extended close to a true lateral view, and then these failures would require view directions which are very unlikely to occur in a clinical situation. The algorithm took 50 s to register. Although 50 s is much longer than clinically desired, use of combined veriﬁcation reduced this time to 25 s which is a feasible time scale for clinical use. Additional code optimisation is ongoing and further computational advances should reduce this time to the desired few seconds in the near future. Our initial pose estimation method relies heavily on the ability of the GHT and GDSM to detect speciﬁc vertebra. Although vertebrae visually appear very similar, the sensitivity of the GHT and GDSM to their small differences in shape enables the robust detection of speciﬁc vertebra. The standard use of the GHT is to detect multiple reasonably generically shaped objects, and variations in shape due to view angle is normally a confounding factor. By producing vertebra speciﬁc and pose speciﬁc R-tables we are able to achieve high signal to noise using the GHT, as illustrated in Fig. 5. This is also achieved using low dose ﬂuoroscopy images which often contain many overlying interventional instruments (see Figs. 1 and 9). Our veriﬁcation methods are statistical, and so are not guaranteed to work. Although a 0% incorrect registration rate was achieved using the similarity value based veriﬁcation on patient data, and is validated here using a large number of images, if enough data is tested it is likely that some incorrect registrations would not be detected. How large an incorrect registration rate would be clinically acceptable is a subject for further research. However, it should be noted that guaranteeing a 0% incorrect registration rate is unlikely to be feasible even when manual inspection of registrations is used, as they are still subject to human error. Combined use of multiple veriﬁcation methods has the potential to greatly reduce incorrect registration rate if they are “not strongly correlated in their misclassiﬁcations” (Kittler et al. (1998)). This appears to be the case with our proposed veriﬁcation methods: out of the 55 registrations in Section 4.2 which were classiﬁed by both veriﬁcation methods and at least one of them failed, only one registration was jointly misclassiﬁed by both methods. Another strategy is for clinicians to never rely on a single registration; an easy to use (i.e. fast and automatic) registration algorithm will enable additional registrations to be carried out and conﬁdence can be increased by low variability in registration position. However, care should be taken using such an approach that the separate registrations have some degree of independence. Let us also mention here that although the proposed methods make speciﬁc use of vertebrae, there is potentially a wide range of applications for our methods. Firstly, a wide range of medical operations take place in the area of the thorax and/or abdomen where vertebrae

can be used as landmarks. Secondly, our methods could be adapted to work with single bony structures by dividing them into sub-volumes. Each sub-volume would need to contain enough features to enable an accurate registration, and there would need to be a high probability of features from multiple sub-volumes appearing in the same ﬂuoroscopy view. Such an adaptation has the potential to extend our proposed methods for use in the cranial and pelvic regions. Finally we would like to discuss how the proposed approach compares to image guided systems that use mechanical (Ruijters et al. (2011)) or optical (Livyatan et al. (2003)) tracking of the Carm movement. C-arm tracking enables the maintenance of a 3D overlay based on an initially produced and veriﬁed registration between intraoperative and preoperative data. Although such systems remove the computational needs of continuous 2D-3D registration, they are associated with a number of limitations which our proposed methods are able to overcome. Firstly, the initial registration stage typically involves extended manual interaction, as manual registration or identiﬁcation of the initial position is used. High radiation dose is also associated with the systems where interventional 3D imaging is used to establish correspondence with the preoperative data (Ruijters et al. (2011)). The 2D–3D registration method presented here can remove the above obstacles from the initialisation process, as it has a wide capture range and uses only 2D interventional ﬂuoroscopy. The second limitation of C-arm tracking based methods comes from the fact that intraoperative patient movement can introduce errors in the initially established correspondence. Commercially available image guided systems with C-arm tracking at the moment require the manual identiﬁcation of patient motion and repetition of the cumbersome initialisation process (Ruijters et al. (2011), Rossitti and Pﬁster (2009)). Automated correction for patient motion has been proposed with the use of 3D motion compensation methods (Wang et al. (2014)), whereas fast 2D–3D registration techniques with limited capture range could also be employed, based on the fact that a good initial position in such cases is available. However, such techniques will at times fail to provide an accurate registration; challenging scenarios include: a highly magniﬁed view with few bony features, views with large instruments obscuring bony detail, large patient movement, or a combination of these effects. Therefore, methods for automatic result veriﬁcation like the ones presented here, could signiﬁcantly improve automatic motion correction algorithms. Finally, it should be noted that C-arm mechanical tracking is only provided by high end ﬂuoroscopy sets, limiting their use to sites where such equipment is available. 6. Conclusions We presented novel techniques and demonstrated their use in fully automating the 2D–3D registration process. The proposed registration pipeline requires no manual input (apart from ﬂuoroscopy magniﬁcation setting) in either the initialisation or the veriﬁcation process and it can cope with a wide range of movements of the ﬂuoroscopy set in a clinically acceptable time. The complete registration process was extensively tested in real clinical conditions enabling automatic CT-to-ﬂuoroscopy image registration. The system was very robust producing correct registrations for the vast majority of images within the operating range of interest. Moreover no wrong registrations were produced for the remaining images as the results were successfully rejected by the veriﬁcation algorithm. We believe that the proposed techniques can have signiﬁcant impact on the practical use of 2D–3D registration technology in clinical practice and facilitate its widespread use in endovascular procedures. Acknowledgment The authors would like to thank the Guy’s and St Thomas’ charity for funding this work. The authors also acknowledge ﬁnancial

A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119

support from the Department of Health via the National Institute for Health Research (NIHR) comprehensive Biomedical Research Centre award to Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London and King’s College Hospital NHS Foundation Trust. We would also like to thank the Laboratory of Imaging Technologies, University of Ljubljana, Faculty of Electrical Engineering, Slovenia, for providing the CT, X-ray images and gold standard registrations used in the cadaver experiments presented in this work. Finally, we would like to thank all clinical staff at St Thomas’ who aided this research and all of the patients who participated in this study. Appendix A. Complexity comparison between use of GHT and use of similarity measure Our main motivation for using the GHT is to enable fast registrations. We present the following simpliﬁed registration example to demonstrate the greatly reduced computational complexity of the GHT compared to a more standard intensity based similarity measure. Assume two images with equal pixel sizes are to be registered using a global strategy which searches for the optimum in-plane translation. The images are: a DRR with R1 × R2 pixels and ﬂuoroscopy image with L1 × L2 pixels. When the intensity-based similarity measure is used, the DRR is translated across the ﬂuoroscopy image using a step size s and the similarity value is computed in each step. The complexity of similarity value computation at each step is proportional to the number of pixels in the DRR. Therefore, overall complexity is given by:

C1 =

L 1 L2 R1 R2 n1 , s2

(5)

where n1 is the number of operations per pair of ﬂuoroscopy and DRR pixels, required by the similarity measure. When the GHT is used, a set of edge points are found in the images, and the orientation of each edge point is calculated and placed in one of b equally sized bins. Let us assume we have p and p points per orientation bin in the ﬂuoroscopy and the DRR respectively. Each point in the DRR will cast a vote for each point in the ﬂuoroscopy image which belongs to the same orientation bin. Therefore, the GHT complexity is given by:

C2 = bp pn2 = b

R1 R2 r L1 L2 r R1 R2 r2 n2 = L 1 L2 n 2 b b b

(6)

where r is the fraction of pixels deﬁned as edge points in the ﬂuoroscopy and the DRR, and n2 is the number of operations required to cast a single GHT vote: a process which involves just two summations for the computation of an index variable and update of the indexed element in the accumulation space. Using Eqs. (5) and (6) the ratio ρ of the two complexities is given by:

ρ=

C1 bn1 = 2 2 . C2 r s n2

(7)

To estimate ρ we use the following approximate typical values 1 from the images used in our experiments: r = 10 (depends on image content); s = 10 (depends on pixel size); b = 12; and dependent on the similarity measure used n1 is typically several times larger than n2 . Using these values ρ is estimated to be between 10 and 100.

119

Therefore, we expect our GHT method to be at least a factor of 10 faster than using a standard intensity based similarity measure. References Ballard, D.H., 1981. Generalizing the hough transform to detect arbitrary shapes. Patt. Recogn. 13 (2), 111–122. van der Bom, M.J., Bartels, L.W., Viergever, M.A., Pluim, J.P.W., 2010. Robust initialization of 2D–3D image registration using the projection-slice theorem and phase correlation. Med. Phys. 37 (4), 1884–1892. Chintalapani, G., Chinnadurai, P., 2012. Role of 3D/3D and 2D/3D registration in computer-assisted stenting aneurysms. In: Lee, Demirci, Unal, Radeva (Eds.), Proceedings of the MICCAI-STENT, 2012, pp. 112–121. Cortes, C., Vapnic, V., 1995. Support-vector networks. Mach. Learning 20, 273–297. Gendrin, C., Furtado, H., Weber, C., Bloch, C., Figl, M., Pawiro, S.A., Bergmann, H., Stock, M., Fichtinger, G., Georg, D., Birkfellner, W., 2012. Monitoring tumor motion by real time 2D/3D registration during radiotherapy. Radiotherapy Oncol. 102, 274–280. Gendrin, C., Markelj, P., Pawiro, S.A., Spoerk, J., Bloch, C., Weber, C., Figl, M., Bergmann, H., Birkfellner, W., Likar, B., Pernus, F., 2011. Validation for 2D/3D registration. II: the comparison of intensity- and gradient-based merit functions using a new gold standard data set. Med. Phys. 38 (3), 1491–1502. Gong, R.H., Guler, O., Kurkluoglu, M., Lovejoy, J., Yaniv, Z., 2013. Interactive initialization of 2D/3D rigid registration. Med. Phys. 40 (12), 121911. Hamadeh, A., Lavallee, S., Cinquin, P., 1998. Automated 3-dimensional computed tomographic and ﬂuoroscopic image registration. Comput. Aided Surg. 3 (1), 11–19. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., 1998. On combining classiﬁers. IEEE Trans. Pattern Anal. Mach. Intell. 20 (3), 226–239. Kovesi, P., 2003. Phase congruency detects corners and edges. In: The Australian Pattern Recognition Society Conference: DICTA 2003, pp. 309–318. Livyatan, H., Yaniv, Z., Joskowicz, L., 2003. Gradient-based 2-D/3-D rigid registration of ﬂuoroscopic X-ray to CT. IEEE Trans. Med. Imaging 22 (11), 1395–1406. Markelj, P., Tomaževiˇc, D., Likar, B., Pernuš, F., 2012. A review of 3D/2D registration methods for image-guided interventions. Med. Image Anal. 16 (3), 642–661. Otake, Y., Armand, M., Armiger, R., Kutzer, M.D., Basafa, E., Kazanzides, P., Taylor, R.H., 2012a. Intraoperative image-based multiview 2D/3D registration for image-guided orthopaedic surgery: incorporation of ﬁducial-based C-arm tracking and GPUacceleration. IEEE Trans. Med. Imaging 31 (4), 948–962. Otake, Y., Schafer, S., Stayman, J.W., Zbijewski, W., Kleinszig, G., Graumann, R., Khanna, A.J., Siewerdsen, J.H., 2012b. Automatic Localization of Vertebral Levels in X-Ray Fluoroscopy Using 3D-2D Registration: A Tool to Reduce Wrong-Site Surgery. Phys. Med. Biol. 57 (17), 5485–5508. Otake, Y., Wang, A.S., Stayman, J.W., Uneri, A., Kleinszig, G., Vogt, S., Khanna, A.J., L., G.Z., Siewerdsen, J.H., 2013. Robust 3D-2D image registration: application to spine interventions and vertebral labeling in the presence of anatomical deformation. Phys. Med. Biol. 58, 8535–8553. Penney, G. P. (Ed.), 2000. Registration of Tomographic Images to X-ray Projections for Use in Image Guided Interventions. Doctoral Thesis. pp. 158--159 Penney, G.P., Varnavas, A., Dastur, N., Carrell, T., 2011. An image-guided surgery system to aid endovascular treatment of complex aortic aneurysms: description and initial clinical experience. In: Information Processing in Computer-Assisted Interventions. Springer, pp. 13–24. Penney, G.P., Weese, J., Batchelor, P.G., Hill, D.L.G., Hawkes, D.J., 2001. Validation of a two- to three-dimensional registration algorithm for aligning preoperative CT images and intraoperative ﬂuoroscopy images. Med. Phys. 28 (6), 1024–1032. Rossitti, S., Pﬁster, M., 2009. 3D Road-mapping in the Endovascular Treatment of Cerebral Aneurysms and Arteriovenous Malformations. Interventional Neuroradiology 15, 283–290. Ruijters, D., Homan, R., Mielekamp, P., van de Haar, P., Babic, D., 2011. Validation of 3D multimodality roadmapping in interventional neuroradiology. Phys. Med. Biol. 56, 5335–5354. Tomaževiˇc, D., Likar, B., Pernuš, F., 2002. “Gold Standard” 2D/3D Registration of XRay to CT and MR Images. In: Medical Imaging Computing and Computer-Assisted Intervention—MICCAI ’02, pp. 461–468. Tomaževiˇc, D., Likar, B., Slivnik, T., Pernuš, F., 2003. 3-D/2-D registration of CT and MR to X-ray images. IEEE Trans. Med. Imaging 22 (11), 1407–1416. Varnavas, A., Carrell, T., Penney, G.P., 2013. Increasing the automation of a 2D-3D registration system. IEEE Trans. Med. Imaging 32 (2), 387–399. Wang, J., Borsdorf, A., Heigl, B., Köhler, T., Hornegger, J., 2014. Gradient-based differential approach for 3-D motion compensation in interventional 2-D/3-D image fusion. In: 2nd International Conference on 3D Vision (3DV), 2014, pp. 293–300. Weese, J., Penney, G.P., Desmedt, P., Buzug, T.M., Hill, D.L.G., Hawkes, D.J., 1997. Voxelbased 2-D/3-D registration of ﬂuoroscopy images and CT scans for image-guided surgery. IEEE Trans. Informat. Technol. Biomed. 1 (4), 284–293.

Fully automated 2D–3D registration and verification

Fully automated 2D–3D registration and verification

Recommend Documents