Medical Image Analysis 26 (2015) 108–119
Contents lists available at ScienceDirect
Medical Image Analysis journal homepage: www.elsevier.com/locate/media
Fully automated 2D–3D registration and verification Andreas Varnavas a,∗, Tom Carrell a,b, Graeme Penney a a Department of Biomedical Engineering, Division of Imaging Sciences and Biomedical Engineering, King’s College London, King’s Health Partners, St. Thomas’ Hospital, London, UK b Department of Vascular Surgery, Guy’s and St Thomas’ NHS Foundation Trust, King’s Health Partners, London, UK
a r t i c l e
i n f o
Article history: Received 22 September 2014 Revised 17 July 2015 Accepted 20 August 2015 Available online 2 September 2015 Keywords: 2D–3D Registration Registration verification Hough transform
a b s t r a c t Clinical application of 2D–3D registration technology often requires a significant amount of human interaction during initialisation and result verification. This is one of the main barriers to more widespread clinical use of this technology. We propose novel techniques for automated initial pose estimation of the 3D data and verification of the registration result, and show how these techniques can be combined to enable fully automated 2D–3D registration, particularly in the case of a vertebra based system. The initialisation method is based on preoperative computation of 2D templates over a wide range of 3D poses. These templates are used to apply the Generalised Hough Transform to the intraoperative 2D image and the sought 3D pose is selected with the combined use of the generated accumulator arrays and a Gradient Difference Similarity Measure. On the verification side, two algorithms are proposed: one using normalised features based on the similarity value and the other based on the pose agreement between multiple vertebra based registrations. The proposed methods are employed here for CT to fluoroscopy registration and are trained and tested with data from 31 clinical procedures with 417 low dose, i.e. low quality, high noise interventional fluoroscopy images. When similarity value based verification is used, the fully automated system achieves a 95.73% correct registration rate, whereas a no registration result is produced for the remaining 4.27% of cases (i.e. incorrect registration rate is 0%). The system also automatically detects input images outside its operating range. © 2015 Elsevier B.V. All rights reserved.
1. Introduction 2D–3D registration can provide critical guidance in a wide range of clinical procedures such as Image Guided Radiotherapy, Minimally Invasive Surgery, Interventional Radiology and Image Guided Endoscopy (Markelj et al. (2012)). However, as discussed by Chintalapani and Chinnadurai (2012): although many 2D–3D registration algorithms have been published, their use in actual clinical settings is very limited due to “their complexity, context sensitivity and user interactivity”. This paper addresses this last factor “user interactivity” and proposes methods to enable fully automatic registration. In a typical registration algorithm user interaction is required for: 1. initial pose estimation (before registration), and 2. result verification (after registration) The aim of initial pose estimation is to provide the registration algorithm with a starting position within its capture range. The aim of result verification is to ensure that only correct results are presented
∗
Corresponding author. E-mail address:
[email protected] (A. Varnavas).
http://dx.doi.org/10.1016/j.media.2015.08.005 1361-8415/© 2015 Elsevier B.V. All rights reserved.
to the clinicians. For routine clinical use it is highly desirable for both these processes to require little, or preferably no, user interaction. Although a large number of 2D–3D registration algorithms have been proposed (Markelj et al. (2012)), very little attention has been paid to reducing their required level of user interaction. Concerning initial pose estimation, the typical referenced approach is use of manually selected anatomical landmarks (Hamadeh et al. (1998); Penney et al. (2011); Weese et al. (1997)). Gong et al. (2013) propose the use of a tool which is virtually inserted in the preoperative CT scan and is placed in a similar pose close to the patient intraoperatively. A similar concept is proposed by Varnavas et al. (2013) where a virtual fiducial marker, present in the intra-operative scene, is semi-automatically inserted in the preoperative CT. In both methods the tool or the marker needs to be identified manually in the 2D images by a user. Other methods propose the use of fiducial markers placed intraoperatively to relate between different images (Otake et al. (2012a)) or use of previous registrations as future initial estimates in applications where the C-arm is not expected to move much (Gendrin et al. (2012)). To avoid the need for manual interaction, a number of methods have been proposed which aim to expand the capture range of 2D– 3D registration so no information is needed by a user for initialisation. van der Bom et al. (2010) propose the use of a spectrum based
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
109
Fig. 2. Overview of the entire automated pipeline for vertebra based CT to fluoroscopy registration.
Fig. 1. Example fluoroscopy images from complex aortic aneurysm procedures. Note how similar vertebrae appear in (a) and how vertebral features can have low contrast (b) and high noise (c). Note also the regular presence of interventional instruments partially occluding the vertebrae.
similarity measure for this purpose and report a success rate of 68.6% for starting positions less than 21 mm. This however would be unacceptable for many clinical systems. Otake et al. (2012b) propose the use of a 2 Degrees of Freedom (DoF) search parallel to the fluoroscopy plane, prior to a covariance matrix adaptation evolution optimiser, to achieve an 100 mm × 180 mm in-plane capture range in 4.7 s. These characteristics make the method practical for clinical use but only in cases when a good estimate for patient out-of-plane translation and C-arm orientation is available. In a later work, Otake et al. (2013) extend this method with the use of a full 6 DoF global multi-start strategy in order to improve robustness against initialisation errors and anatomical deformation. The extended method still requires rotations to be known within 10° and computation times vary between 6.3 s and 54 s depending on initialisation accuracy. Concerning result verification, most reported work in the literature uses fiducial markers attached on phantoms or cadavers (Gendrin et al. (2011)). For clinical application, the use of accurate (i.e. bone implanted) fiducial markers purely for validation would not be ethically acceptable. Typically, with patient images, result verification is achieved by visual inspection (Hamadeh et al. (1998); Livyatan et al. (2003); Penney et al. (2011)). However, this manual process requires expertise and affects clinical workflow. We propose novel algorithms to enable fully automated 2D–3D registration, without requiring specialised hardware (e.g. tracking devices or high end fluoroscopy sets). The proposed registration pipeline relates to a vertebra based, CT to fluoroscopy, rigid registration system. Initialisation is based on the use of the Generalised Hough Transform (GHT) (Ballard (1981)), which is applied on the 2D fluoroscopy to perform a full 6 DoF global search for a number of considered vertebrae. A very wide capture (operating) range is employed (see Section 3) so no information about the translation or the orientation of the fluoroscopy set is required. The produced pose estimate for each vertebra is used as input to a 2D–3D registration algorithm (Penney et al. (2011)). The resultant registrations are automatically verified using two algorithms: the first uses similarity value features and is an adaptation of the algorithm proposed by Varnavas et al. (2013). The second uses geometric features and is based on the pose agreement between at least two vertebra based registrations. The proposed registration pipeline is evaluated using data from 31 complex aortic aneurysm procedures. The registrations were based on six (lower thoracic and lumbar) vertebrae. As seen in Fig. 1, vertebrae are highly similar objects and our aim is to robustly distinguish between them. This is a major difference between our method for initial pose estimation and the traditional use of Hough-like voting schemes in the literature, which typically aim to detect multiple objects of a generic type. Another major difference is the low feature, high noise nature of vertebrae in fluoroscopy when compared
to man-made objects under normal lighting conditions (see Fig. 1(b) and (c)). To overcome these challenges we propose the use of the GHT with patient, vertebra and pose specific templates rendered from the CT scans of the patients. This is the first time that such an approach is used for pose estimation and specific vertebra identification in medical imaging. 2. Methods Fig. 2 provides an overview of the entire fully automated 2D–3D registration pipeline. The methods are presented in the context of a system that performs vertebra based, rigid registration between intraoperative fluoroscopy and preoperative CT data. Boxes labelled A and B in Fig. 2 depict the two areas of algorithm development proposed in this paper: initial pose estimation (A) and result verification (B). These are fully described in Sections 2.1 and 2.2 respectively. 2.1. Initial pose estimation The proposed initial pose estimation method is based upon the use of the GHT to perform a global search over a desired registration capture range. Our motivation for using the GHT as opposed to a more standard intensity based similarity measure is to reduce computational complexity. We refer to Appendix A for a high level comparison concerning the complexities of the two approaches. As shown in Fig. 2 the initial pose estimation takes as input a fluoroscopy image, a CT image (including sub-images containing single vertebrae) and a set of R-tables, which are precomputed for each vertebra from renderings of the CT data. Each R-table represents a different pose position, and the range of poses is selected to encompass the required capture range. After applying the GHT on the fluoroscopy image with respect to each R-table, the poses are evaluated using the produced accumulator arrays and a Gradient Difference Similarity Measure (GDSM) (Penney et al. (2001)), resulting in a pose estimate for each vertebra. We initially describe the preoperative processing and then the intraoperative algorithm. 2.1.1. Preoperative R-table computation Fig. 3 shows the preoperative workflow. Input is the CT image, from which N subvolumes each containing a single vertebra are extracted. This is achieved by picking a single point in the centre of each vertebra and defining a mean subvolume size. M Digitally Reconstructed Radiographs (DRRs) are produced for each of the N vertebrae, picturing each vertebra under M corresponding rigid transformations. These transformations define the capture (operating) range of the algorithm, and so need to span the range of movements of the fluoroscopy set plus estimates for variations in patient position. Edge points are extracted from the DRRs using Phase Congruency (Kovesi (2003)) and are used to construct the GHT R-Tables (Ballard (1981)). Each R-Table contains vectors from each edge point to the centre of the corresponding DRR. The vectors are grouped with respect to each
110
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
arrays were then ranked based on the following normalised metric
I1 V = 6
i=2 Ii
Fig. 3. Preoperative construction of vertebra based R-Tables, used in the GHT/GDSM Initial Pose Estimation.
Fig. 4. GHT/GDSM Initial Pose Estimation: the pipeline of processes for computing an initial estimate of the 3D pose.
point’s quantised local orientation. For this quantisation bins which are ϕ = 30◦ wide are used.
2.1.2. Intraoperative pose aware vertebra detection Fig. 4 shows the intraoperative pipeline. Let us note that the pipeline is shown for just a single vertebra; therefore, this pipeline needs to be carried out N times to obtain an initial pose for each vertebra. The pipeline begins by applying the GHT to the fluoroscopy image using the M preoperatively produced R-tables. Edge points are extracted from the fluoroscopy image again using Phase Congruency (Kovesi (2003)). The GHT output is M accumulator arrays, one for each R-table. In the traditional use of the GHT, a single R-table is used to produce a single accumulator array, which is a two dimensional voting space of the size of the fluoroscopy image. Each edge point in the R-table casts a vote for each edge point in the fluoroscopy image with the same local orientation. The vote increases a counter in the accumulator array at a location indicated by the coordinates of the edge point in the image and the vector corresponding to the edge point in the R-table. The accumulator array is then thresholded to provide potential locations for the target object. Our accumulator based selection algorithm aims to locate the single best location from multiple (M) accumulator arrays. Simply finding the global maximum would not work as this would bias the results towards arrays produced with R-tables from DRRs which contained more edge features. Our solution is to normalise the maximum of each accumulator array, I1 using the values of the next five local maxima1 , Ii , i = 2, . . . , 6. A neighbourhood around each chosen Ii value was excluded from subsequent searches, to ensure that selected local maxima are at least a minimum distance apart. The M
1
The values of the normalising maxima are typically of the same order, so the formula is not sensitive to the exact number of maxima used.
(1)
For most fluoroscopy images the 3D pose associated with the highest rank (largest value of V) is within the capture range of the subsequent 2D–3D registration algorithm. However the method is not as robust as using a specialised 2D–3D similarity measure. Therefore, to improve accumulator array selection we adapt the method as follows: the accumulator arrays are ranked according to metric V, and the poses associated with the top K = 100 ranked arrays are used to produce a DRR2 . Each DRR is then compared to the fluoroscopy image using the Gradient Difference Similarity Measure (GDSM) (Penney et al. (2001)). The pose producing the maximum similarity value is chosen as the initial pose for the 2D–3D registration algorithm. Using this method the GHT, which requires much less computation than the intensity-based similarity measure, is used to filter a large number (M) of possible poses to identify a small number (K) of highly likely poses. Then the more robust similarity measure determines the best pose from the K candidate poses. As shown in Fig. 2 (overview of entire 2D–3D registration pipeline) the output from the initial pose estimation is a single best pose for each of the N vertebrae. These are then used as a starting position for a 2D–3D registration algorithm which outputs N updated (registered) pose positions, one for each vertebra. These pose positions each need to be checked for registration accuracy, which is the subject of the second main area of algorithm development proposed in this paper, result verification, and is described in Section 2.2. Let us briefly mention here a few details about the 2D–3D registration algorithm used (Penney et al. (2011)). This algorithm first uses a global search over a specified range of rigid transformations which defines its capture range (represented by the dotted lines in Fig. 6). The position found by the global search is then further optimised with the use of a hill climbing search strategy. Both searches use the same Gradient Difference Similarity Measure presented in Penney et al. (2001). 2.2. Result verification Result verification is critical for any registration algorithm which is less than 100% robust. Our proposed registration pipeline (Fig. 2) is guaranteed to produce incorrect registrations for all of the N vertebrae not visible in the fluoroscopy image. Even when corresponding vertebrae are present misregistrations still occur (1.8% reported by Penney et al. (2011)). This section presents three methods to detect and remove misregistrations to enable automatic selection of a correct registration as output from the registration pipeline. The first method is based on similarity value features and has been presented in earlier work (Varnavas et al. (2013)). For completeness, an overview is presented in Section 2.2.1. The second method is based on features describing the geometric difference between different vertebra based registrations and is presented in Section 2.2.2. Both methods produce two features. Registrations to a set of training data (with manually verified registrations) are used to populate a feature space and define a classification rule separating correct and incorrect registrations. The third method, presented in Section 2.2.3, combines the two classification rules of the previous methods in a way that detection of incorrect registrations is enhanced. This method can also be incorporated into the registration pipeline in a computationally efficient way. Often more than one vertebra will be successfully registered to a fluoroscopy image. In such situations the successfully registered 2 The value of K = 100 produces a set of top ranked poses where at least one of them is within the capture range of the subsequent 2D–3D registration algorithm approximately 99% of the time. See Section 4.1.1 for further details.
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
vertebra closest to the centre of the image is chosen, and the pose of this vertebra is the overall pipeline output (final pose). 2.2.1. Similarity value based verification The first verification method uses two features produced from the GDSM. These features are produced by applying the GDSM to an area of interest in the fluoroscopy image, automatically calculated by projecting a bounding box of the candidate vertebra from CT into the fluoroscopy. The first feature is the value of the GDSM (registration value) at the registration position. The registration value provides a good indication of a successful registration, but it is affected by a number of factors. Varnavas et al. (2013) describes methods to reduce the effect that variations in the image resolution and the gradient value distribution have on the registration value. However, the registration value on its own does not enable good separation between correct and incorrect registrations. This is because the registration value can be greatly affected by the presence of interventional instruments in the area of interest. Conversely, due to the similar shape of vertebrae, a reasonably high registration value can be obtained when an incorrect vertebra is matched. To account for these factors a second, normalisation value is calculated as follows: the registration process is repeated to the same fluoroscopy area of interest, but using different (therefore probably incorrect) CT vertebrae. For the experiments carried out in this paper the T = 4 closest vertebrae to the candidate vertebra were used. The initial pose for these registrations is computed by adapting the initial pose which was used for the registration under verification, to account for the translation between the candidate and new CT vertebra. The normalisation value is set to the best (i.e. largest) GDSM value from registering the T closest vertebrae. The second verification feature named as similarity ratio is defined as:
similarity ratio =
registration value normalisation value
(2)
The classification rule is computed combining the use of a Support Vector Machine (SVM) (Cortes and Vapnic (1995)), which is applied jointly on both features, and scalar thresholds, which are based on each feature individually. A linear SVM is first trained to produce a straight line separating correct from incorrect registrations. Let us note here that large similarity value features represent correct registrations. Therefore, under a linear separating rule, if any one feature is very large, a correct registration result can be returned, even if the value of the other feature is very small. Since we are particularly interested in rejecting incorrect registrations (i.e. minimising False Positives), single feature classification rules were additionally produced, to enforce that both features must be sufficiently large to return a correct registration result. These rules were defined using a standard method for non-parametric outlier rejection: when a feature takes a value lower than the lower quartile minus the inter-quartile range then the corresponding registration is classified as incorrect. Therefore, a registration result should pass the two single feature rules and the SVM one to be classified as correct. 2.2.2. Inter-vertebrae pose agreement based verification The second method for automatic result verification makes use of the multiple independent registrations to different vertebrae. Correct registrations should give rise to similar poses for the CT data as intervertebral movement, for patients lying in a similar position, is only a few degrees/mm (especially between adjacent vertebrae). Therefore, features based on the difference between poses produced by different vertebrae can be used to discriminate between correct and incorrect registrations. Two features are used. The first feature is a measure of minimum rotational difference, between the candidate pose of a given vertebra and the poses found for all other vertebrae. Using the N pose positions
111
found, Pi = (θx , θy , θz , X, Y, Z )i , i = 1, . . . , N, the minimum rotational difference is defined as:
mRDi = min max j
θx i − θx j , θy i − θy j , θz i − θz j ,
j = 1, . . . , N, j = i
(3)
The second feature is a measure of minimum translational difference between the candidate vertebra pose and the poses of all other vertebrae. Translation is measured using a reprojection distance dreproj (Pi , Pj , x) which is the minimum distance between 3D position x and a back projection line which is calculated as follows: we first project x into 2D using the projection matrix corresponding to the candidate vertebra pose Pi , to get x2D , and then back project from x2D using the projection matrix corresponding to the pose Pj of a different vertebra. The minimum translational difference mTD feature is calculated using the mean reprojection distance of 8 points, xk , k = 1, . . . , 8, which form a bounding box around the vertebra the pose of which is to be verified. The vertebra position is compared with all the other vertebrae poses and the minimum distance is calculated:
mT Di = min j
8 1 drepro j Pi , Pj , xik 8 k=1
j = 1, . . . , N, j = i
(4)
A necessary condition for these pose agreement features to be able to detect correct registrations is that at least two of the N vertebrae should have been registered correctly. Randomly coinciding wrong registrations will create False Positives. The probability of such cases is expected to be small given the size of the six dimensional space of extrinsic parameters. The classification rule using the two features is computed using a linear SVM applied jointly on both features. Let us note here that in contrast to the similarity value based features, the pose agreement based features are bounded by the classification rule and the coordinate axis for the subspace of correct registrations. This means that both features must be close to zero to return a correct registration result. Therefore, in contrast to Section 2.2.1, no additional single feature based rules were employed. 2.2.3. Combined verification scheme The two verification methods proposed in the previous sections are to a large degree independent and they can therefore be combined to enhance the classification results. As mentioned in Section 2.2.1 we are particularly interested in minimising the False Positive Rate, as it is of primary importance to avoid showing wrong registrations to clinicians. For this reason we propose a combined verification scheme which classifies a registration as correct if it is classified as such by both methods of Sections 2.2.1 and 2.2.2. If a registration is classified as incorrect by at least one of the previous two methods, the combined verification scheme classifies the registration as incorrect. This classification scheme, apart from enhancing the detection of incorrect registrations, can be incorporated in the proposed registration pipeline in a computationally efficient way. The similarity value based verification has increased computational cost as it requires additional registrations to be produced. On the other hand the pose agreement based verification is computationally inexpensive as it requires the computation of two simple metrics comparing vertebra poses which have already been produced. We propose incorporating the combined scheme to the registration pipeline in two steps: first the produced registration for each vertebra is checked with the pose agreement verification and the set of registrations classified as correct is kept. Then the registration corresponding to the vertebra closest to the centre of the image is checked with the similarity value based verification. The result of this check is the result of the
112
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
complete registration pipeline. In this way the combined verification scheme achieves computational reduction by applying the expensive similarity value based verification to only one of the produced registrations. 3. Experimental methods Experiments were carried out using real patient data and “gold standard” data of a cadaveric lumbar spine Tomaževiˇc et al. (2002). The real patient data were used to evaluate the performance of: the initial pose estimation (Section 3.1); the result verification (Section 3.2); and the entire fully automated registration pipeline (Section 3.3). The “gold standard” data were also used to evaluate the performance of the fully automated registration pipeline (Section 3.3) and to quantify the ability of visual inspection to discriminate between correct and incorrect 2D–3D registrations (Section 3.4). The real patient data were acquired from 31 operations over two periods of one year, following requests by the clinicians to run the presented or earlier (Penney et al. (2011)) versions of the registration system intraoperatively. They consist of 31 preoperative CT scans and 417 intraoperative fluoroscopy (nearly all low dose screening) images, acquired in approximately equal intervals during the operations or upon the clinicians’ requests. The dimensions of the fluoroscopy images were 480 × 480 pixels and the fluoroscopy set had four magnification settings with the pixel size varying in the range between 0.308 and 0.744 mm. For each fluoroscopy image the magnification setting and orientation of the fluoroscopy set was recorded. This information was used to estimate the intrinsic parameters, and to determine whether the image was within the operating range o of the registration algorithm. The preoperative CT data came from a variety of scanners with an average and maximum slice thickness of 0.94 mm and 1 mm respectively. All the fluoroscopy data was acquired using a Siemens Axiom Artis DTA system at St Thomas’ Hospital. Use of these data was approved by National Research Ethics Committee (09/H0707/64) with informed patient consent. The data was split into three groups: •
•
•
Group A: 19 preoperative CT scans; 224 intraoperative fluoroscopy images (within operating range). Group B: 12 preoperative CT scans; 164 intraoperative fluoroscopy images (within operating range). Group C: 29 intraoperative fluoroscopy images (outside operating range). CT scans contained in groups A and B.
The initial pose estimation operating range o was defined as follows: •
•
•
The rotational range was defined about the axes of the CT scan, with the origin in the centre. The range about the axes perpendicular to the transverse, sagittal and coronal planes was set to ◦ ◦ ◦ ◦ ◦ ◦ [−40 , 40 ], [−30 , 30 ] and [−8 , 8 ], respectively. Since intraoperatively the fluoroscopy plane is approximately parallel to the CT coronal plane under the Anterior-Posterior (AP) position of the C-Arm, we refer to the rotations above as lateral, cranial-caudal and in-plane respectively. The out-of-plane translational range was set to [700 mm, 900 mm]. This range describes the distance between the C-arm source and the centroid of the CT scan. In plane translation is defined over the entire fluoroscopy image and requires at least two vertebrae from T11 to L4 to be visible in it.
The operating range needs to encompass variations in patient position, and the range of movement of the fluoroscopy set during the operation. Variations in how reproducibly patients lie supine are expected to be small, approximately 3–4° (Penney et al. (2011)). Patient movement during procedures has been reported (Varnavas et al. (2013)) but is expected to be mostly translational. The operating
range used here is expected to cover movements of the fluoroscopy set for standard endovascular aortic repair. It is trivial to extend the operating range, but this will result in longer registration times. R-tables were produced for the six vertebrae from T11 to L4 as depicted in Fig. 3 and described in Section 2.1.13 . DRRs were produced to cover the operating range with a rotational step of 4°, and an out of plane translational step of 50 mm. R-tables were produced for each of the four fluoroscopy magnification settings. This resulted in M = 8400 R-tables per vertebra and per magnification setting. A “ground truth” CT pose for each of the images in groups A and B was obtained using the GDSM based 2D–3D registration algorithm (Penney et al. (2011)). The registration algorithm was initialised manually using the method described by Penney et al. (2011). The initialisation was completed in under a minute for each image, but since manual vertebra identification was required, knowledgeable input from an experienced user was needed. Registrations were carried out to the vertebra closest to the centre of the image and completed in 1.25 s in each case. The results were verified using visual inspection. An experiment to quantify the accuracy of this visual inspection process is described in Section 3.4. In addition to the real patient data, we also used the publicly available dataset presented in Tomaževiˇc et al. (2003) to enable comparison of our results with other methods, and to an independently calculated “gold standard”. The dataset consists of a CT scan and 18 fluoroscopy images of a cadaveric lumbar spine (L1 to L5). The “gold standard” registrations have been calculated using fiducial markers Tomaževiˇc et al. (2002). The dataset comes with 450 starting positions for the CT data (Tomaževiˇc et al. (2003)), and eight points around each vertebra which are used to calculate the mean Target Registration Error (TRE). 3.1. Experiments for initial pose estimation The initial pose estimation method, described in Section 2.1, estimates the 3D pose of a single vertebra. The target vertebra for this set of experiments was the vertebra closest to the image centre for each of the 224 fluoroscopy images in Group A. For each image the target vertebra was manually identified, and the set of R-tables corresponding to the identified vertebra and the magnification setting of the fluoroscopy set were used as input to the pose estimation method. The difference between the found pose and the “ground truth” pose was calculated, and compared with the known capture range of the subsequent 2D–3D registration algorithm (Penney et al. (2011)). 3.2. Experiments for registration result verification To evaluate the result verification algorithms (Section 2.2) the data in Group A was used for training and the data in Group B for testing purposes. For all the images in both groups the registration pipeline (see Fig. 2) was run up to the Result verification stage (box B). This resulted in a large number of successful and unsuccessful registrations. The main causes of the latter are attempts to register a vertebra from CT to a fluoroscopy image which does not contain the chosen vertebra. Each produced pose was labelled as being correct or incorrect using visual inspection (comparing DRRs constructed at the produced pose position with the fluoroscopy). The produced poses in the training set were used to compute the classification rules as described in Sections 2.2.1, 2.2.2 and 2.2.3. Whenever an SVM was employed, a constraint parameter C = 1 was used. The classification performance of the produced rules is measured both with respect to the training and the testing set. For the training and testing of the similarity value based verification algorithm all data could be used. For the inter-vertebrae pose 3 For two patients in Group A, we used 5 out of the 6 vertebrae in range T11–L4 for R-table production, as only 5 were in the field of view of the preoperative CT scans.
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119 Table 1 Number of 3D poses in the training and testing sets for the three verification algorithms. Number of poses
Similarity value based verification Pose agreement based verification Combined verification scheme
Training set Group A
Testing set Group B
1325
984
1266
900
1266
900
Table 2 Number of Gold Standard data registrations within and outside operating range o , for three ranges of initial displacement from gold standard position.
Within Operating Range o Outside Operating Range o
0–6 mm
6–12 mm
12–18 mm
Total
2625 304
1371 1634
429 1737
4425 3675
agreement algorithm only images where at least two correct registrations occurred were used, as this is the condition under which we want to train and test the algorithm. For the same reason only the images above were used for the combined verification scheme as well. Table 1 shows the total number of poses in the training and testing sets for each algorithm. 3.3. Experiments for fully automated registration To evaluate the complete registration pipeline (Fig. 2) both patient and the gold standard data were used. For the patient data experiments, images from Groups B and C were used. The entire registration pipeline was run, on each of these images, using each time one of the three verification methods presented in Section 2.2. When the similarity value and the pose agreement based methods were used, they were applied on each of the vertebra corresponding poses produced in the output of 2D–3D registration. When the combined verification scheme was used, it was applied in the way described in Section 2.2.3, i.e. first selecting the poses which pass the pose agreement method and then applying the similarity value based method on the pose corresponding to the vertebra closest to the centre of the image. When the pipeline produced a registration result, its correctness was manually checked using visual inspection. There are therefore three possible results of the registration pipeline: •
•
•
Correct registration: the pipeline produces a registration result and it is verified as correct by the visual inspection. Incorrect registration: the pipeline produces a registration result but the visual inspection shows that it is incorrect. No registration: the pipeline produces no result (i.e. no vertebra based registration is accepted by the result verification module).
The complete registration pipeline, with combined verification, was also evaluated with the use of the gold standard dataset. For each of the 18 fluoroscopy images, registrations were started from the 450 provided start positions (8100 registrations in total), which cover a wide range of transformations (up to 51.6°) from the gold standard positions. In the original publication, Tomaževiˇc et al. (2003), the starting positions are split into three ranges based on distance to the gold standard position, 0–6 mm, 6–12 mm, 12–18 mm computed as a mean TRE using the provided eight target points. These starting positions were separated into two sets, those which fell within the operating range (o defined in Section 3) of our initial pose estimation algorithm, and those which were outside this range. Table 2 shows the number of registrations in each set. The main reason for starting positions falling outside the operating range (particularly in the 0–6mm
113
bracket) was due to a large in-plane rotation (patient rotation around an axis perpendicular to the bed for an anterior-posterior view). Large rotations around this axis have not been observed during our clinical ◦ ◦ procedures and so our algorithm only searches between [−8 , 8 ] for this parameter. To test our algorithm with multiple starting positions, R-tables were produced to cover the complete range of starting positions, with the same rotational (4°) and translational (50 mm) step size as used with the patient data. For each individual starting position, a subset of these R-tables were extracted which just covered our method’s operating range o . In addition, for each individual starting position, the 16 R-tables which represented the positions closest to the gold standard position were altered so that they were an integer multiple of the step size from the starting position. These 16 R-tables are expected to be the most critical for the initial pose identification, so we changed them to be identical to the ones we would produce if our operating range was sampled exactly about the given starting position. The complete registration pipeline (Fig. 2) was tested using the five lumbar vertebrae. As mentioned in Section 2.2, the final pose for our method is the pose of the registered vertebra closest to the centre of the image. We compared our final pose with ground truth, by calculating Target Reprojection Registration Error (TRRE) using the eight target points around L3 provided with the dataset. 3.4. Experiment for evaluation of visual verification The aim of this experiment was to evaluate the size of registration error that could be robustly detected using expert visual inspection. The CT scan and an anterior-posterior fluoroscopy image from the gold standard dataset were used. DRRs were produced from the CT scan at poses close to the ground truth pose. The mean reprojection error of each pose was calculated, and a set of 80 DRRs with error uniformly distributed between 0 and 4 mm was selected. Twenty DRRs produced at the gold standard position were added to this set, to produce a test data set of 100 DRRs. Each of these 100 DRRs were presented to the two experts who carried out the visual inspection verification to obtain the patient data “ground truth” CT poses, as described in Section 3. The DRRs were presented in random order, overlaid to the fluoroscopy image and were classified by each expert as either “correct” (good registration) or “incorrect” (misregistration). 4. Results 4.1. Initial pose estimation performance An adequate initial pose (i.e. within the capture range of the 2D– 3D registration algorithm (Penney et al. (2011))) was produced for 211 out of 224 images in Group A. Therefore, our initialisation method has a 94.2% chance of successfully finding an initial 3D pose to a single vertebra. The distances between the 211 successfully found initial poses and the corresponding “ground truths” are illustrated in Fig. 6. The vast majority of estimated poses fall well within the capture range of the subsequent 2D–3D registration algorithm, which is indicated in Fig. 6 by the horizontal dotted lines. The success of the initial pose estimation method lies with its ability to produce very high rankings for accumulator arrays which correspond to poses close to the registration position. To illustrate this, Fig. 5 shows a fluoroscopy image and the accumulator arrays of the finally selected poses for the L2 and L3 vertebrae. The global maximum for each vertebra is clearly visible, and correctly calculates the position of the vertebra in the image (i.e. the in-plane translation parameters). The distinctiveness of the global maximum, for two vertebrae which are partly occluded by interventional instruments, in a noisy, low-dose fluoroscopy image, is indicative of the robustness of our method. Note also that, although vertebrae have very similar shapes, their simultaneous presence in the fluoroscopy image does
114
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119 Table 3 Similarity value based verification method results.
Training set Testing set
True Positive
False Positive
True Negative
False Negative
Correctly Classified
734 (99.32%)
0 (0%)
586 (100%)
5 (0.68%)
1320 (99.62%)
429 (93.67%)
0 (0%)
526 (100%)
29 (6.33%)
955 (97.05%)
Fig. 5. Fluoroscopy image (with ground truth rotations) and accumulator arrays corresponding to the estimated pose for the pictured vertebrae L2 and L3.
Fig. 6. Box and whisker plot of errors for each extrinsic parameter after the initial pose estimation, for the 211 successful cases in Group A. Box shows upper quartile, median and lower quartile, whiskers indicate percentile range 5–95%. Horizontal dotted lines indicate the capture range of the subsequent 2D–3D registration algorithm.
not create strong local maxima in the accumulator array: the method is able to identify specific vertebrae. Let us also give here some statistics about the “adequacy” of the poses indicated by the accumulator array ranking, in order to illustrate the contribution of further evaluating the top K poses with the use of the GDSM. When considering only the top pose indicated by the accumulator array ranking using Eq. (1), this is within the capture range of the 2D–3D registration algorithm for ∼85% of the 224 images in group A. This percentage of images increases to ∼95% and ∼99% when 10 and 100 top poses are considered respectively. Therefore, our choice for K = 100, combined with the efficiency of the GDSM for pose evaluation, significantly increases the chances of producing an adequate initial estimate.
Fig. 7. Scatter plot of similarity value based features for training (a) and testing (b) set.
4.2. Registration result verification performance
note again, that all classification rules were produced using just the training data (Group A). Table 3 shows a summary of the classification performance in the training (Group A) and testing (Group B) datasets respectively. In both cases the True Positive Rate is high ( 99.32% and 93.67% respectively), whereas the False Positive Rate is 0%. Let us note here that we use the terms (True/False) Positive and Negative for registrations which have been (correctly/incorrectly) classified as correct and incorrect by the proposed verification algorithms respectively.
4.2.1. Results with similarity value based verification Fig. 7 shows scatter plots of the similarity value based features using the training (Fig. 7(a)) and testing (Fig. 7(b)) data. The solid line shows the two feature classification rule produced by the SVM, the dotted lines show the two single feature classification rules. Let us
4.2.2. Results with inter-vertebrae pose agreement based verification Fig. 8 shows scatter plots of the pose agreement based features using the training (Fig. 8(a)) and testing (Fig. 8(b)) data. Table 4 summarises the classification performance. The separability of correct from incorrect poses is good, with correct poses of the same image
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
115
Table 4 Pose agreement based verification method results.
Training set Testing set
True Positive
False Positive
True Negative
False Negative
Correctly Classified
729 (100%)
18 (3.35%)
519 (96.65%)
0 (0%)
1248 (98.58%)
442 (99.55%)
8 (1.75%)
448 (98.25%)
2 (0.45%)
890 (98.89%)
50 correct registration incorrect registration Support vectors
45
Maximum angle difference ( o)
40
Table 5 95% confidence intervals for the False Positive and False Negative Rates of the similarity value based and the pose agreement based verification methods (computed by data in testing sets).
35
False Positive Rate
False Negative Rate
0%–0.7%
4.28%–8.97%
0.76%–3.43%
0.05%–1.62%
30
Similarity value based verification Pose agreement based verification
25 20 15 10 5 0 0
10
20
30 40 50 60 70 80 Minimum translational difference (mm)
90
100
110
(a) Training set 50 correct registration incorrect registration
45
Correctly identifying incorrect registrations is most important for medical applications; therefore the similarity value based algorithm is preferable. However, since in the proposed registration pipeline we produce a number of (vertebra based) registrations from which we need to select only one correct, the two methods can be combined in a computationally efficient way, retaining the high specificity of the similarity value based algorithm, as described in Section 2.2.3.
40 o
Maximum angle difference ( )
1. the similarity value based verification method is statistically significantly better than the pose agreement based method for correctly classifying incorrect registrations (fewer False Positives). 2. the pose agreement based verification method is statistically significantly better than the similarity value based method for correctly classifying correct registrations (fewer False Negatives).
35 30 25 20 15 10 5 0 0
10
20
30 40 50 60 70 80 Minimum translational difference (mm)
90
100
110
(b) Testing set Fig. 8. Scatter plot of inter-vertebrae pose agreement based features for training (a) and testing (b) set.
being similar, within a translational and rotational tolerance of 5 mm ◦ and 5 respectively. On the other hand, most incorrect poses to a given image are very different to each other, resulting in overall classification performance of 98.58% (training data) and 98.89% (testing data). However, a few incorrect poses can be seen to have resulted in good agreement with each other, resulting in some False Positive results, 3.35% (training data) and 1.75% (testing data). 4.2.3. Results comparing the verification methods To compare the performance of the similarity value based and the inter-vertebrae pose agreement based verification methods, we present in Table 5 the 95% confidence intervals for the False Positive and False Negative Rates. These binomial confidence intervals were computed using the Clopper–Pearson method on the data of the testing sets. As these intervals do not overlap, they show that:
4.2.4. Results with combined verification scheme Table 6 summarises the results for the verification scheme that combines the similarity value and the inter-vertebrae pose agreement based classification algorithms. Due to the way the algorithms are combined (both need to pass to classify a produced pose as correct), the False Positive Rate will be decreased, with a corresponding increase in the False Negative Rate. Note that a direct comparison can be done only between Tables 4 and 6, as it is in these cases that the underlying datasets are exactly the same. The combined verification scheme reduced the False Positive Rate to 0%, with an increase in the False Negative Rate which is 0.69% and 5.41% in the training and testing sets respectively. 4.3. Automated registration performance Table 7 summarises the performance of the automated registration pipeline using the test patient data set (i.e. Group B). All of these images are within the operating range of the algorithm, and so a perfect result would be 100% correct registrations. Best performance is achieved with the similarity value based verification, 95.73% success, and 0% incorrect registrations are classified as correct. A lower success rate of 89.63% is achieved with the pose agreement verification method, and now a few (3.66%) incorrect registrations are classified as being correct. Combined use of both classifiers results in a slightly lower overall success rate (87.8%) and 0% incorrect registrations. It should be noted that although the combined approach
116
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119 Table 6 Results for combined use of similarity value and pose agreement based verification methods.
Training set Testing set
True Positive
False Positive
True Negative
False Negative
Correctly classified
724 (99.31%)
0 (0%)
537 (100%)
5 (0.69%)
1261 (99.61%)
420 (94.59%)
0 (0%)
456 (100%)
24 (5.41%)
876 (97.33%)
Table 7 Performance of the registration pipeline for images within the operating range of interest (Group B). Type of verification used in the system
Correct registration
Incorrect registration
No registration
Similarity value based verification Pose agreement based verification Combined verification
157 (95.73%)
0 (0%)
7 (4.27%)
147 (89.63%)
6 (3.66%)
11 (6.71%)
144 (87.8%)
0 (0%)
20 (12.2%)
Table 8 Performance of the registration pipeline for images outside the operating range of interest (Group C). Type of verification used in the system
Incorrect registration
No registration
Similarity value based verification Pose agreement based verification Combined verification
0 (0%)
29 (100%)
1 (3.45%)
28 (96.55%)
0 (0%)
29 (100%)
produces a slightly lower success rate, its time requirements, compared to similarity value based verification, are significantly reduced (see Section 4.5). The performance of the automated registration pipeline using images outside the operating range (i.e. Group C) is presented in Table 8. Correct registration is not feasible to these images, and so the desired output is a “no registration” result. Using the similarity value based verification or the combined verification, all registrations are correctly rejected yielding 0% incorrect registrations. The pose agreement based verification has a slightly lower performance of 3.45% incorrect registration rate. The proposed registration pipeline, with combined verification, was used during complex aortic repairs of 12 patients in Group B. This enabled 3D renderings of the aorta and visceral vessels from CT to be overlayed onto the fluoroscopy during the procedure. We show in Fig. 9 four examples of such overlays. The images on the left are typical examples of the low dose fluoroscopy images that were used in this study. Note also the good agreement of the aortic overlays on the right with the underlying fluoroscopy images at anatomical points critical for these procedures. These points are the ostia of the visceral vessels which agree well with the cannulating wires on the fluoroscopy images, indicating accurate registrations. Table 9 shows results using the gold standard data for the 4425 cases when the initial position was inside the operating range. A registration was defined to be successful if the Target Registration Reprojection Error (TRRE) for all eight target points was smaller than 2 mm. The results show very accurate registrations and a 100% success rate for all three starting position ranges. For the 3675 cases where the initial position was outside the operating range, a no registration result was produced for 3657 (99.5%) of them. Incorrect registrations were produced for the remaining 18
Fig. 9. Overlay examples: typical low dose fluoroscopy images (left) and CT rendered, aortic overlays produced after registration (right). The arrows indicate the ostia of visceral vessels (key clinical landmarks), which agree well with the cannulating wires on the fluoroscopy images, indicating accurate registrations. Note: due to deformation renal arteries do not agree with wire positions distal to the aorta.
cases which corresponded to strong local extrema due to the reflection symmetry of a vertebra in a sagittal plane. This results in DRRs produced at an angle ± x° from true lateral view appearing similar (Penney (2000)). Such cases were not observed in our patient data ◦ as lateral rotations larger than 90 are not generally used in clinical practice. These failures can be avoided in a clinical situation as long as the operating range of the algorithm is not extended to be close
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
117
Table 9 Target Registration Reprojection Error (TRRE) and Rotation Error (γ )a before and after 2D/3D registration produced with Gold Standard Data. Percentage of successful registrations for three ranges of initial displacement from gold standard position, within the algorithm’s operating range. Before registration TRRE (mm) RMS Max 5.9 23.4
After registration
γ() ◦
RMS 14.2
Max 43.7
TRRE (mm) RMS Max 0.21 0.62
Successful registrations (%)
γ() ◦
RMS 0.24
0–6 mm
Max 0.69
γ < 16.7
◦
100
6–12 mm ◦ γ < 32.7 100
12–18 mm ◦ γ < 43.7 100
The rotation γ between two poses was computed by decomposing the pose difference into translation and rotation components using the same approach as Tomaževiˇc et al. (2003). a
20
20 18 16
classified as correct classified as incorrect
16 14
10 8
Number of DRRs
Number of DRRs
14 12
classified as correct classified as incorrect
18
12 10 8
6
6
4
4
2
2
0
0
0 0.2 0.6 1 1.4 1.8 2.2 2.6 3 3.4 3.8 Mean Reprojection Distance from gold position (mm) (a)
0 0.2 0.6 1 1.4 1.8 2.2 2.6 3 3.4 3.8 Mean Reprojection Distance from gold position (mm) (b)
Fig. 10. Number of DRRs classified as “correct” and “incorrect” as a function of their distance from gold standard position. Classification took place through visual inspection by two experts (a) and (b).
to true lateral. Our use of a global search strategy, which limits the search algorithm to clinically used positions, could help avoid such local extrema which may be found using other search strategies.
Table 10 Representative computation time (in seconds) for the three stages of the automated registration pipeline. Initial pose estimation
2D–3D registration
Result verification
Total
13
7.5
30
50.5
13
7.5
0
20.5
13
7.5
5
25.5
4.4. Evaluation of visual verification Fig. 10 shows the results of the experiment to evaluate the size of registration error which could be robustly detected using visual inspection. Both experts correctly classified all 20 DRRs which had been produced at the gold standard position; therefore with this data visual inspection had a sensitivity of 100%. Only 5 of the 2 × 80 DRRs produced away from the gold standard were classified as correct (specificity of 96.9%). The mean and maximum error from these 5 incorrectly classified DRRs were 0.72 mm and 1.8 mm. All DRRs with an error larger than 2 mm were correctly classified. These results show our ability to discriminate between correct and incorrect registrations through visual inspection, and confirms the validity of using visual inspection to verify the ground truth used in the patient data experiments. 4.5. Time Requirements for automated registration To enable registration in a clinically acceptable time, a computer with two NVidia GTX 690 Graphics cards was used. Each card contains two GPUs, so in total four GPUs were used simultaneously. We implemented a parallel version of the three computationally expensive parts of the registration pipeline: production of the accumulator arrays, DRR rendering and similarity value computation.
Similarity value based verification Pose agreement based verification Combined verification
Computation times for the three stages of the automated registration pipeline are summarised in Table 10. Exact computation time varies depending on the magnification setting. Reported times correspond to an intermediate magnification. The initial pose estimation algorithm can produce 8400 accumulator arrays corresponding to a single vertebra in 2 s. Therefore, for all six vertebrae, complete initialisation stage is achieved in 13 s. A single vertebra based 2D–3D registration is completed in 1.25 s. Thus, the 2D-3D registration stage of the automated pipeline requires 7.5 s (6 vertebrae × 1.25 s). Finally, for result verification, similarity value based verification requires four additional 2D–3D registrations (i.e. 5 s) per vertebra; therefore 30 s for all 6 vertebrae. The computational cost of pose agreement based verification is insignificant. Combined verification makes use of the similarity value based verification only for a single vertebra and so is completed in 5 s. Therefore, the time required for the complete automated pipeline when the similarity value, the pose agreement
118
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
and the combined verification is used, is 50.5, 20.5 and 25.5 s respectively. 5. Discussion We have proposed methods to automate 2D–3D registration. The only manual input in our experiments was the magnification setting of the fluoroscopy set, information which could be transferred automatically via the image header. Therefore, provided with an interventional fluoroscopy image, and magnification setting, our proposed methods are able to carry out 2D–3D registration and verification with no additional manual input. For clinical use a system should have a high correct registration rate, low (preferably zero) incorrect registration rate, and be fast, preferably enabling registration and validation in just a few seconds. Our system, using similarity value based verification with patient data sets achieved 95.73% correct registration rate and a 0% incorrect registration rate. Registrations to the phantom (ground truth) data set showed the algorithm to be accurate (0.21 mm RMS error) and robust (100% successful for images in operating range). A very small number (0.5%) of incorrect registrations were not detected, however as long as the operating range is not extended close to a true lateral view, and then these failures would require view directions which are very unlikely to occur in a clinical situation. The algorithm took 50 s to register. Although 50 s is much longer than clinically desired, use of combined verification reduced this time to 25 s which is a feasible time scale for clinical use. Additional code optimisation is ongoing and further computational advances should reduce this time to the desired few seconds in the near future. Our initial pose estimation method relies heavily on the ability of the GHT and GDSM to detect specific vertebra. Although vertebrae visually appear very similar, the sensitivity of the GHT and GDSM to their small differences in shape enables the robust detection of specific vertebra. The standard use of the GHT is to detect multiple reasonably generically shaped objects, and variations in shape due to view angle is normally a confounding factor. By producing vertebra specific and pose specific R-tables we are able to achieve high signal to noise using the GHT, as illustrated in Fig. 5. This is also achieved using low dose fluoroscopy images which often contain many overlying interventional instruments (see Figs. 1 and 9). Our verification methods are statistical, and so are not guaranteed to work. Although a 0% incorrect registration rate was achieved using the similarity value based verification on patient data, and is validated here using a large number of images, if enough data is tested it is likely that some incorrect registrations would not be detected. How large an incorrect registration rate would be clinically acceptable is a subject for further research. However, it should be noted that guaranteeing a 0% incorrect registration rate is unlikely to be feasible even when manual inspection of registrations is used, as they are still subject to human error. Combined use of multiple verification methods has the potential to greatly reduce incorrect registration rate if they are “not strongly correlated in their misclassifications” (Kittler et al. (1998)). This appears to be the case with our proposed verification methods: out of the 55 registrations in Section 4.2 which were classified by both verification methods and at least one of them failed, only one registration was jointly misclassified by both methods. Another strategy is for clinicians to never rely on a single registration; an easy to use (i.e. fast and automatic) registration algorithm will enable additional registrations to be carried out and confidence can be increased by low variability in registration position. However, care should be taken using such an approach that the separate registrations have some degree of independence. Let us also mention here that although the proposed methods make specific use of vertebrae, there is potentially a wide range of applications for our methods. Firstly, a wide range of medical operations take place in the area of the thorax and/or abdomen where vertebrae
can be used as landmarks. Secondly, our methods could be adapted to work with single bony structures by dividing them into sub-volumes. Each sub-volume would need to contain enough features to enable an accurate registration, and there would need to be a high probability of features from multiple sub-volumes appearing in the same fluoroscopy view. Such an adaptation has the potential to extend our proposed methods for use in the cranial and pelvic regions. Finally we would like to discuss how the proposed approach compares to image guided systems that use mechanical (Ruijters et al. (2011)) or optical (Livyatan et al. (2003)) tracking of the Carm movement. C-arm tracking enables the maintenance of a 3D overlay based on an initially produced and verified registration between intraoperative and preoperative data. Although such systems remove the computational needs of continuous 2D-3D registration, they are associated with a number of limitations which our proposed methods are able to overcome. Firstly, the initial registration stage typically involves extended manual interaction, as manual registration or identification of the initial position is used. High radiation dose is also associated with the systems where interventional 3D imaging is used to establish correspondence with the preoperative data (Ruijters et al. (2011)). The 2D–3D registration method presented here can remove the above obstacles from the initialisation process, as it has a wide capture range and uses only 2D interventional fluoroscopy. The second limitation of C-arm tracking based methods comes from the fact that intraoperative patient movement can introduce errors in the initially established correspondence. Commercially available image guided systems with C-arm tracking at the moment require the manual identification of patient motion and repetition of the cumbersome initialisation process (Ruijters et al. (2011), Rossitti and Pfister (2009)). Automated correction for patient motion has been proposed with the use of 3D motion compensation methods (Wang et al. (2014)), whereas fast 2D–3D registration techniques with limited capture range could also be employed, based on the fact that a good initial position in such cases is available. However, such techniques will at times fail to provide an accurate registration; challenging scenarios include: a highly magnified view with few bony features, views with large instruments obscuring bony detail, large patient movement, or a combination of these effects. Therefore, methods for automatic result verification like the ones presented here, could significantly improve automatic motion correction algorithms. Finally, it should be noted that C-arm mechanical tracking is only provided by high end fluoroscopy sets, limiting their use to sites where such equipment is available. 6. Conclusions We presented novel techniques and demonstrated their use in fully automating the 2D–3D registration process. The proposed registration pipeline requires no manual input (apart from fluoroscopy magnification setting) in either the initialisation or the verification process and it can cope with a wide range of movements of the fluoroscopy set in a clinically acceptable time. The complete registration process was extensively tested in real clinical conditions enabling automatic CT-to-fluoroscopy image registration. The system was very robust producing correct registrations for the vast majority of images within the operating range of interest. Moreover no wrong registrations were produced for the remaining images as the results were successfully rejected by the verification algorithm. We believe that the proposed techniques can have significant impact on the practical use of 2D–3D registration technology in clinical practice and facilitate its widespread use in endovascular procedures. Acknowledgment The authors would like to thank the Guy’s and St Thomas’ charity for funding this work. The authors also acknowledge financial
A. Varnavas et al. / Medical Image Analysis 26 (2015) 108–119
support from the Department of Health via the National Institute for Health Research (NIHR) comprehensive Biomedical Research Centre award to Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London and King’s College Hospital NHS Foundation Trust. We would also like to thank the Laboratory of Imaging Technologies, University of Ljubljana, Faculty of Electrical Engineering, Slovenia, for providing the CT, X-ray images and gold standard registrations used in the cadaver experiments presented in this work. Finally, we would like to thank all clinical staff at St Thomas’ who aided this research and all of the patients who participated in this study. Appendix A. Complexity comparison between use of GHT and use of similarity measure Our main motivation for using the GHT is to enable fast registrations. We present the following simplified registration example to demonstrate the greatly reduced computational complexity of the GHT compared to a more standard intensity based similarity measure. Assume two images with equal pixel sizes are to be registered using a global strategy which searches for the optimum in-plane translation. The images are: a DRR with R1 × R2 pixels and fluoroscopy image with L1 × L2 pixels. When the intensity-based similarity measure is used, the DRR is translated across the fluoroscopy image using a step size s and the similarity value is computed in each step. The complexity of similarity value computation at each step is proportional to the number of pixels in the DRR. Therefore, overall complexity is given by:
C1 =
L 1 L2 R1 R2 n1 , s2
(5)
where n1 is the number of operations per pair of fluoroscopy and DRR pixels, required by the similarity measure. When the GHT is used, a set of edge points are found in the images, and the orientation of each edge point is calculated and placed in one of b equally sized bins. Let us assume we have p and p points per orientation bin in the fluoroscopy and the DRR respectively. Each point in the DRR will cast a vote for each point in the fluoroscopy image which belongs to the same orientation bin. Therefore, the GHT complexity is given by:
C2 = bp pn2 = b
R1 R2 r L1 L2 r R1 R2 r2 n2 = L 1 L2 n 2 b b b
(6)
where r is the fraction of pixels defined as edge points in the fluoroscopy and the DRR, and n2 is the number of operations required to cast a single GHT vote: a process which involves just two summations for the computation of an index variable and update of the indexed element in the accumulation space. Using Eqs. (5) and (6) the ratio ρ of the two complexities is given by:
ρ=
C1 bn1 = 2 2 . C2 r s n2
(7)
To estimate ρ we use the following approximate typical values 1 from the images used in our experiments: r = 10 (depends on image content); s = 10 (depends on pixel size); b = 12; and dependent on the similarity measure used n1 is typically several times larger than n2 . Using these values ρ is estimated to be between 10 and 100.
119
Therefore, we expect our GHT method to be at least a factor of 10 faster than using a standard intensity based similarity measure. References Ballard, D.H., 1981. Generalizing the hough transform to detect arbitrary shapes. Patt. Recogn. 13 (2), 111–122. van der Bom, M.J., Bartels, L.W., Viergever, M.A., Pluim, J.P.W., 2010. Robust initialization of 2D–3D image registration using the projection-slice theorem and phase correlation. Med. Phys. 37 (4), 1884–1892. Chintalapani, G., Chinnadurai, P., 2012. Role of 3D/3D and 2D/3D registration in computer-assisted stenting aneurysms. In: Lee, Demirci, Unal, Radeva (Eds.), Proceedings of the MICCAI-STENT, 2012, pp. 112–121. Cortes, C., Vapnic, V., 1995. Support-vector networks. Mach. Learning 20, 273–297. Gendrin, C., Furtado, H., Weber, C., Bloch, C., Figl, M., Pawiro, S.A., Bergmann, H., Stock, M., Fichtinger, G., Georg, D., Birkfellner, W., 2012. Monitoring tumor motion by real time 2D/3D registration during radiotherapy. Radiotherapy Oncol. 102, 274–280. Gendrin, C., Markelj, P., Pawiro, S.A., Spoerk, J., Bloch, C., Weber, C., Figl, M., Bergmann, H., Birkfellner, W., Likar, B., Pernus, F., 2011. Validation for 2D/3D registration. II: the comparison of intensity- and gradient-based merit functions using a new gold standard data set. Med. Phys. 38 (3), 1491–1502. Gong, R.H., Guler, O., Kurkluoglu, M., Lovejoy, J., Yaniv, Z., 2013. Interactive initialization of 2D/3D rigid registration. Med. Phys. 40 (12), 121911. Hamadeh, A., Lavallee, S., Cinquin, P., 1998. Automated 3-dimensional computed tomographic and fluoroscopic image registration. Comput. Aided Surg. 3 (1), 11–19. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20 (3), 226–239. Kovesi, P., 2003. Phase congruency detects corners and edges. In: The Australian Pattern Recognition Society Conference: DICTA 2003, pp. 309–318. Livyatan, H., Yaniv, Z., Joskowicz, L., 2003. Gradient-based 2-D/3-D rigid registration of fluoroscopic X-ray to CT. IEEE Trans. Med. Imaging 22 (11), 1395–1406. Markelj, P., Tomaževiˇc, D., Likar, B., Pernuš, F., 2012. A review of 3D/2D registration methods for image-guided interventions. Med. Image Anal. 16 (3), 642–661. Otake, Y., Armand, M., Armiger, R., Kutzer, M.D., Basafa, E., Kazanzides, P., Taylor, R.H., 2012a. Intraoperative image-based multiview 2D/3D registration for image-guided orthopaedic surgery: incorporation of fiducial-based C-arm tracking and GPUacceleration. IEEE Trans. Med. Imaging 31 (4), 948–962. Otake, Y., Schafer, S., Stayman, J.W., Zbijewski, W., Kleinszig, G., Graumann, R., Khanna, A.J., Siewerdsen, J.H., 2012b. Automatic Localization of Vertebral Levels in X-Ray Fluoroscopy Using 3D-2D Registration: A Tool to Reduce Wrong-Site Surgery. Phys. Med. Biol. 57 (17), 5485–5508. Otake, Y., Wang, A.S., Stayman, J.W., Uneri, A., Kleinszig, G., Vogt, S., Khanna, A.J., L., G.Z., Siewerdsen, J.H., 2013. Robust 3D-2D image registration: application to spine interventions and vertebral labeling in the presence of anatomical deformation. Phys. Med. Biol. 58, 8535–8553. Penney, G. P. (Ed.), 2000. Registration of Tomographic Images to X-ray Projections for Use in Image Guided Interventions. Doctoral Thesis. pp. 158--159 Penney, G.P., Varnavas, A., Dastur, N., Carrell, T., 2011. An image-guided surgery system to aid endovascular treatment of complex aortic aneurysms: description and initial clinical experience. In: Information Processing in Computer-Assisted Interventions. Springer, pp. 13–24. Penney, G.P., Weese, J., Batchelor, P.G., Hill, D.L.G., Hawkes, D.J., 2001. Validation of a two- to three-dimensional registration algorithm for aligning preoperative CT images and intraoperative fluoroscopy images. Med. Phys. 28 (6), 1024–1032. Rossitti, S., Pfister, M., 2009. 3D Road-mapping in the Endovascular Treatment of Cerebral Aneurysms and Arteriovenous Malformations. Interventional Neuroradiology 15, 283–290. Ruijters, D., Homan, R., Mielekamp, P., van de Haar, P., Babic, D., 2011. Validation of 3D multimodality roadmapping in interventional neuroradiology. Phys. Med. Biol. 56, 5335–5354. Tomaževiˇc, D., Likar, B., Pernuš, F., 2002. “Gold Standard” 2D/3D Registration of XRay to CT and MR Images. In: Medical Imaging Computing and Computer-Assisted Intervention—MICCAI ’02, pp. 461–468. Tomaževiˇc, D., Likar, B., Slivnik, T., Pernuš, F., 2003. 3-D/2-D registration of CT and MR to X-ray images. IEEE Trans. Med. Imaging 22 (11), 1407–1416. Varnavas, A., Carrell, T., Penney, G.P., 2013. Increasing the automation of a 2D-3D registration system. IEEE Trans. Med. Imaging 32 (2), 387–399. Wang, J., Borsdorf, A., Heigl, B., Köhler, T., Hornegger, J., 2014. Gradient-based differential approach for 3-D motion compensation in interventional 2-D/3-D image fusion. In: 2nd International Conference on 3D Vision (3DV), 2014, pp. 293–300. Weese, J., Penney, G.P., Desmedt, P., Buzug, T.M., Hill, D.L.G., Hawkes, D.J., 1997. Voxelbased 2-D/3-D registration of fluoroscopy images and CT scans for image-guided surgery. IEEE Trans. Informat. Technol. Biomed. 1 (4), 284–293.