Parallelized collision detection with applications in virtual bone machining

Parallelized collision detection with applications in virtual bone machining

Computer Methods and Programs in Biomedicine 188 (2020) 105263 Contents lists available at ScienceDirect Computer Methods and Programs in Biomedicin...

1MB Sizes 0 Downloads 35 Views

Computer Methods and Programs in Biomedicine 188 (2020) 105263

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine journal homepage: www.elsevier.com/locate/cmpb

Parallelized collision detection with applications in virtual bone machining Mohammadreza Faieghi a, O. Remus Tutunea-Fatan a,b,∗, Roy Eagleson a,c a

Biomedical Engineering, Western University, London, Ontario N6A 5B9, Canada Mechanical and Materials Engineering, Western University, London, Ontario N6A 5B9, Canada c Electrical and Computer Engineering, Western University, London, Ontario N6A 5B9, Canada b

a r t i c l e

i n f o

Article history: Received 23 May 2019 Revised 18 October 2019 Accepted 5 December 2019

Keywords: Collision detection Haptics Virtual reality Surgery simulation Orthopedic procedures

a b s t r a c t Background and objectives: Virtual reality surgery simulators have been proved effective for training in several surgical disciplines. Nevertheless, this technology is presently underutilized in orthopaedics, especially for bone machining procedures, due to the limited realism in haptic simulation of bone interactions. Collision detection is an integral part of surgery simulators and its accuracy and computational efficiency play a determinant role on the fidelity of simulations. To address this, the primary objective of this study was to develop a new algorithm that enables faster and more accurate collision detection within 1 ms (required for stable haptic rendering) in order to facilitate the improvement of the realism of virtual bone machining procedures. Methods: The core of the developed algorithm is constituted by voxmap point shell method according to which tool and osseous tissue geometries were sampled by points and voxels, respectively. The algorithm projects tool sampling points into the voxmap coordinates and compute an intersection condition for each point-voxel pair. This step is massively parallelized using Graphical Processing Units and it is further accelerated by an early culling of the unnecessary threads as instructed by the rapid estimation of the possible intersection volume. A contiguous array was used for implicit definition of voxmap in order to guarantee a fast access to voxels and thereby enable efficient material removal. A sparse representation of tool points was employed for efficient memory reductions. The effectiveness of the algorithm was evaluated at various bone sampling resolutions and was compared with prior relevant implementations. Results: The results obtained with an average hardware configuration have indicated that the developed algorithm is capable to reliably maintain < 1 ms running time in severe tool-bone collisions, both sampled at 10243 resolutions. The results also showed the algorithm running time has a low sensitivity to bone sampling resolution. The comparisons performed suggested that the proposed approach is significantly faster than comparable methods while relying on lower or similar memory requirements. Conclusions: The algorithm proposed through this study enables a higher numerical efficiency and is capable to significantly enlarge the maximum resolution that can be used by high fidelity/high realism haptic simulators targeting surgical orthopaedic procedures. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Virtual reality (VR) simulators are widely accepted as one of the useful technologies for training and pre-operative planning of the surgeries. For this reason, numerous VR simulators have been developed for different procedures [1–8], some of which have already adopted simulators in their core training programs [9]. However,

∗ Corresponding author at: Biomedical Engineering, Western University, London, Ontario N6A 5B9, Canada. E-mail address: [email protected] (O.R. Tutunea-Fatan).

https://doi.org/10.1016/j.cmpb.2019.105263 0169-2607/© 2019 Elsevier B.V. All rights reserved.

despite the recent advances, one surgical discipline that is somewhat lagging behind the others is orthopaedics, primarily due to lack of validated simulators in this field [10]. This paucity stems from the limited realism that was achieved so far towards the simulation of orthopaedic operations. Examples of such operations include different bone machining procedures such as drilling, reaming, sawing, burring that are commonly required in orthopaedic surgeries. One approach that has been widely used to generate force feedback during bone machining simulation is the well-known voxmap-point shell (VPS) method [11]. As depicted in Fig. 1, this

2

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

PointShell

Tool Objects Boundary

Voxmap

Bone

Fig. 1. Voxelized representations of tool and bone in the VPS technique

method relies on sampled geometries of objects, namely a point cloud of surgery tool (termed point shell) and a voxelized model of bone (termed voxmap) that was originally derived from computed tomography (CT). The main advantage of VPS is that its performance is independent from the geometrical complexities of objects. By contrast, sampling resolution plays a determining role on the quality of the haptic feedback since – for instance – coarse resolutions could mean that any changes in the intersection volume might result in abrupt changes of force feedback that in turn will deteriorate the haptic stability [12]. Furthermore, human bone is characterized by a heterogeneous structure whose stiffness varies spatially [13] and as such, coarse voxels may miss capturing these variations and thus will ultimately fail to incorporate them in downstream force computations. On the other hand, even though low resolutions cannot accurately capture some of the sharp geometrical features of objects – potentially leading to known “deep penetration” issues (Fig. 1) – they remain more appropriate for high fidelity haptic simulations. While these issues can be partially corrected through finer resolutions, it is worth emphasizing here that the maximum attainable sampling resolution is limited by the stringent 1 kHz refresh rate that is required by the haptic devices [14]. As such, “limitless” increases of the sampling resolution results in a drastically higher number of point-voxel collision queries, thus making collision detection a major bottleneck in the haptic rendering loop. Therefore, there is a need to develop more efficient collision detection algorithms that are capable to handle larger data sets in shorter amounts of time. To address this, the current study presents a new VPS collision detection technique that was purposely developed for the future generations of real-time bone removal simulators. Since parallel processing has become a more or less de facto standard in high performance computing [15], the algorithm proposed in this study is based on large scale parallelization to be accomplished by the many-core architecture of the graphics processing unit (GPU). 2. Background 2.1. Prior Work Voxmap-point shell method has been introduced by McNeely et al. in 1999 [11] and since then constitute the basis of many haptic applications [16–19]. This method had also been employed by several studies involving orthopaedic surgery simulators [20–23] in which the common approach was to transform tool points to voxmap coordinates in order to determine voxels that enclose the transformed points. Following this, the elemental forces are to be computed for intersecting point-voxel pairs which to be

subsequently aggregated to generate the resultant force feedback. Because the identification of the intersected voxels is performed in a serial manner, collision detection performance is poor and sampling resolutions tend to be rather limited. To address this, Vankipuram et al. [24] relied on bounding volume hierarchy for voxmap representations. While this approach can reduce the number of collision checks, numerous traverses over deep hierarchies will pose significant challenges for high resolution datasets. In the broader field of haptics, the use of precomputed distance fields for voxmap proved to be able to bring VPS computational efficiency increases [25–27] and additional improvements have been achieved by integrating distance fields with octrees [28]. Unfortunately, such approaches are not suitable for bone machining simulation because distance field must be reconstructed every time bone undergoes material removal. These updates constitute a significant overhead for collision detection that is typically incapable to adhere to the 1 kHz haptic framerate target, particularly in case of extended intersections. To address collision detections between deformable objects, Heidelberger et al. [29,30] have developed an algorithm similar to VPS but by relying on layered depth image (LDI) techniques instead. In this approach, the segment of each object falling within the intersection volume is voxelized and then collision is being detected through Boolean operations performed between the newly constructed voxels. To account for deformations, voxelization step has to be recomputed in every frame and its calculation is affected by the complexity of objects. In other words, the geometrical complexity of the objects also affects the performance of the LDI-based algorithm, as opposed to VPS whose efficiency is primarily dependent on object sampling resolution. More or less similar approaches were developed for machining simulations as integrated in Computer-Aided Manufacturing (CAM) software tools [31]. For instance, Hong et al. [32] have approximated tool geometry with implicit shapes and involved analytical equations to describe tool surface. Following that, collision detection has been determined by verifying the geometric location of the bone voxel coordinates against tool surface. However, the serial implementation of this algorithm tends to be rather slow and thus primarily suitable for offline simulation tasks. Evidently, concurrent collision detections are might be able to lead to significant performance increases. In contrast with this serial technique, Zheng et al. have reported a GPU-based parallelized variant of VPS for a tooth drilling simulator [33]. In this implementation, each GPU thread queries the collision of a single tool point against voxmap while a 3D grid constitutes the foundation of an implicit description of voxmap. The access to grid cells is provided by means of a look-up table while an

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

3

octree-inspired top-down division is used to traverse the 3D grid to find the voxels in an intersecting condition. Then, once the elemental forces acting on each tool point are calculated, the resultant force feedback is computed by means of a parallel reduction algorithm. Although this method proved to be capable to accelerate collision detection by up to 12 times when compared with an analogous CPU-based implementation, repeated grid traverses makes it sensitive to the inherent size of the voxmap. As a result, it can be speculated that algorithm performance will degrade significantly when applied to bone machining simulation, particularly since the size of typical skeletal human bone is much larger than that of a tooth which in turn will result in a voxmap that is likely to fall outside the capabilities of the method in discussion.

in the commercial CAM software since this implies a significant rewriting of its core kernel. It is rather important to underscore the “platform-independent” keywords since that implies a certain degree of decoupling from the widely-used CUDA-programming platform that almost inevitably ends up limiting the type of graphics hardware to be used. To address the aforementioned challenges, the current study aims to develop a new cross-platform GPU grid-based variant of the VPS technique to further reduce memory usage as well as running time in an attempt to efficiently handle the increased computational demands associated with the large datasets involved in real-time material removal as a component of orthopaedic surgery simulators.

2.2. Motivation an objective

3. Computational platform

The surveyed literature seems to suggest that the vast majority of orthopaedic surgery simulators that were developed so far have made use of VPS variants that are generally characterized by a poor computational performance that limits the sampling resolution and in turn leads to a low quality of the downstream haptic feedback. Furthermore, almost none of the prior studies focused on collision detection has specifically addressed real-time bone removal operations as performed for instance in upper or lower limb orthopaedic surgeries. The use of GPU grid-based approaches appears to be promising as it has been successfully applied in tooth surgery recently [33]. However, bone removal operations would normally require a more efficient algorithm that should be capable to handle larger voxmaps. Moreover, some orthopaedic procedures – such as glenoid and acetabular reaming – require large and complex tools and therefore it might be necessary to sample their geometry in high resolution in order to accurately capture their intricate cutting features. Consequently, the algorithm must be capable to handle well increasingly high resolutions of point shell. Along this line of thoughts, since the capacity of GPU memory is limited, low memory consumption will become critical for the ability of the developed technique to store and handle the large datasets that are anticipated. At this point, few comments are worth to be made with respect to the decision to use voxel-based over other virtual material removal approaches. Evidently, while in the biomedical context this paradigm is rarely disputed since voxel-based methods have been ubiquitously regarded as the natural extension of the CT scan data format in which the original geometry is often initially acquired/represented, this was not the case of offline CAM simulations that are widely used in the industrial machining context. In this case, while early voxel-based virtual machining attempts have been reported almost two decades ago [34], commercial CAM software has more or less “standardized” on polygonal mesh representations for material removal purposes. That being said, the advantages of voxel-based representations compared to others - even in the traditional machining context - are fairly clear [35] and while different enhancement routes can be taken to improve the accuracy/smoothness of the post-machined surface, it is almost impossible to refute the utility of voxel-based approaches when it comes to cutting force estimations, particularly when the workpiece material has a non-homogenous/heterogeneous structure. Clearly, while this is rarely the case in conventional machining, it represents the most common instance in biomedical applications. Beyond this, it is important to note here that while the importance of parallelized computations to make use of the large pool of cores that are available in modern graphic cards has already been noted in the past [36–38], platform-independent GPU-based algorithms are still in their infancy [39] and are yet to be included

3.1. Overview Fig. 2 depicts the main components of the proposed collision detection platform. The input to the algorithm consists of the volumetric CT data of the bone as well as the 3D model of tool, geometries to be represented as voxmap and point shell data structures, respectively. An initialization step will reconstruct and passes them to the GPU global memory. Following that, oriented bounding boxes (OBBs) are derived for each object, since they are required in the early phases of collision detection technique. The algorithm is comprised of two main phases: 1 Broad-phase will initially perform a quick intersection test between the OBBs of both objects in order to determine whether the two objects are about to contact or not. After that, a volume of interest (VOI) will be also determined in order to precalculate the possible intersection zone(s) between two objects. The use of VOI allows a rapid culling of all out-of-VOI voxels since they will be in a clear non-intersecting condition. Broadphase computations are performed in a single-thread CPU manner. 2 Narrow-phase will initially complete an exact intersection test between all elements located inside VOI, will determine all intersecting points and voxels, and then will compute the resultant force feedback. This phase will also update the voxmap geometry in order to account for bone removal. In contrast with its broad-phase counterpart, narrow-phase relies on GPU parallelization in a sense that each thread remains focused on collision of one point shell element against voxmap. Throughout the subsequent developments, unless otherwise mentioned, bold-face lowercase letters denote vectors and boldface capital letters indicate matrices. Vectors and matrices, if not explicitly stated, are assumed to have appropriate dimensions. The local coordinates frame of tool and bone are denoted by F T and FB , respectively. The notation BT T refers to the transformation matrix in homogenous coordinates from FB to FT , and TB T from FT to FB . 3.2. Voxmap data structure To construct the voxmap, digital imaging and communications in medicine (DICOM) files acquired through the CT of the bony geometry are initially loaded into a medical image processing software to allow noise filtering, thresholding and segmentation. The obtained CT voxels were then used to derive a uniform 3D grid to implicitly define geometry and material characteristics of bone data. Each occupied cell of the grid provides the coordinates as well as the Hounsfield unit (HU) value associated with a CT voxel. HU values provide a measure for spatial stiffness of bone

4

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

Fig. 2. Overview of the proposed platform

Fig. 3. Point cloud construction for tool geometry: a) triangular mesh model, b) surface voxelization, and c) point cloud derived from centroid of surface voxels

and they are important in the computation of the elemental cutting forces. For the time being, grid resolution will be primarily determined by the size of CT voxels. However, different resolutions can be achieved by merging/dividing CT voxels and averaging/interpolating their HU values, as described in [40]. An effective approach to store the grid is to use a contiguous array with length n, where n is the total number of cells in the grid and perform an invertible 3D to 1D mapping between cell coordinates and array indices. If a grid dimension is given by d1 × d2 × d3 , a simple way to perform mapping comes down to establishing associations between cells, integer coordinates (i1 , i2 , i3 ) and the element e = i1 + d1 i2 + d1 d2 i3 of the array. Based on this, the inverse mapping can be obtained according to the fole−i3 d1 d2 lowing computations i3 = d ed , i2 = , i1 = e − i3 d1 d2 − i2 d1 . d 1 2

1

In this array, the index of each element and the value stored in it correspond to the coordinates and HU values associated with each cell, respectively. Therefore, it is sufficient to allocate a float variable for each element. In practice, however, the distribution of HU values is usually much wider than those required for force modeling purposes. As a result, it is sufficient to map HU values to the 0–255 range, where 0 indicates a void (i.e., air) and 255 represents the highest HU value. Consequently, only an 8-bit integer is required for each element. In this manner, material removal can be easily implemented by simply setting the density of the removed elements/voxels to 0. This approach outlined above has several advantages. Firstly, – because arrays are known to respond fast to random access – each

element can be accessed rapidly without the need for expensive grid travels. Secondly, only 8 bits are required to store heterogeneous bone data and this favors low memory needs without compromising the beneficial random access performance. Finally, this implementation allows the assimilation of material removal with simple write memory operations that in turn will avoid expensive data reconstruction overheads. 3.3. Point shell data structure As pointed out in [11], one convenient way to construct point shell relies on the voxelization of tool surface followed by the extraction of centroid for the constructed voxels (Fig. 3). Surface voxelization can be done by computing the intersection between a triangle mesh model of tool and a uniform 3D grid encompassing the mesh domain. This process is well-addressed in the existing literature and can be accelerated using GPU computing [41–45]. For a point shell with m points, a contiguous array of 3m floats is sufficient to effectively store it in memory. However, a more compact representation can be obtained by computing the 3D integer coordinates of points on the grid in which surface voxelization has been computed and encode the three integers into a 32-bit integer via Morton encoding [46]. Using the method described in [47], 10 bits are required for each dimension of grid. Therefore, PointShells with a resolution as high as 10243 can be stored using 32 bits per point, resulting in

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

5

Fig. 4. Pseudocode for the broad-phase approach

VOImin T

VOImax B

Fig. 5. 2D representation of VOI computation: green-dashed box indicates the AABB inferred from bone OBB represented in tool coordinates while red-dashed box indicates VOI

66% memory savings compared to storing all the floats. In addition, decoding Morton codes requires only a few bit operations and therefore, the added cost of deriving points coordinates is negligible.

3.4. Broad-Phase The broad phase serves as an optimization step whose job is to determine a VOI that estimates the intersection volume between two objects and culls all the elements outside of this volume. The computations involved in this phase rely on oriented and axisaligned bounding boxes of objects. An axis-aligned bounding box (AABB) can be represented by its center c and its limits in each direction e, or – alternatively – through its minimum and maximum corners. An OBB can be represented in a similar manner, but three additional vectors are required to record the direction of its axes in order to account for arbitrary orientations.

As illustrated in Fig. 4, broad-phase starts by updating the position and orientation of objects OBBs based on user interactions and constructs the updated the transformation matrix from bone to tool coordinate system BT T. After that, it performs an exact OBBOBB Boolean intersection test based according to the implementation described in [48, pag. 103]. This test is derived from Separating Axis Theorem (SAT) [49] and relies on the transformation of one OBB into the coordinate system of the other, primarily for enhanced performance purposes. Once an intersection is detected, an AABB of the bone expressed in tool coordinates is inferred from the OBB of the bone. The resulting AABB will be aligned with tool OBB and therefore, VOI can be estimated by comparing the limits of the two aligned boxes. A graphical illustration the aforementioned approach is provided in Fig. 5. It is worth noting that the intersection volume itself is an AABB aligned with FT and thereby it can be identified by its minimum and maximum corners. This method is fast and always results in an overestimation of the intersection volume such that no intersecting element is missed.

6

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

Fig. 6. Pseudocode for the narrow-phase GPU kernel

3.5. Narrow-Phase

ting forces typically depend on the geometry of tool, stiffness of bone and machining condition and also vary from one machining/cutting procedure to the other. As indicated above, the goal of the present study is not related to the determination of these forces, but rather the development of a generic computational platform capable to efficiently identify tool-bone intersection and compute the resultant cutting force.

The narrow-phase queries the exact collision of points and voxels using a GPU kernel whose pseudocode is given by Fig. 6. Inputs to the kernel include the point shell and voxmap data structures which are denoted by P and V, respectively and are transferred to GPU global memory once during initialization. The other inputs are the transformation matrix from tool to bone coordinate frames represented by TB T and also the VOI boundary. These parameters are required to be updated in running time and delivered to GPU before every kernel launch. The kernel launches one thread per point shell element. Each thread takes one element from the point shell array, uses Morton decoding to derive the coordinates of a tool point, and checks the obtained coordinates against the VOI boundary. If the point lies outside the VOI, the thread terminates immediately. Otherwise, it continues by transferring the tool point into the bone coordinates system followed by the computation of its location within the voxmap array. If the array returns a non-zero value, it means that the tested point is located within the bone volume (= penetration condition). Therefore, the thread decreases the array element value or sets it to 0 (depending on a given erosion logic) in order to account for material removal. Moreover, the thread assigns an elemental force to the intersecting tool point. The resultant force can then be computed by aggregating these forces by means of atomic operations. Since atomic operations work with integers only, the elemental forces are multiplied by a large number N and casted to integer type prior to the operation. The determination of adequate values for elemental forces requires physical bone removal experiments in which cutting forces will be measured. Examples of such experimental trials/protocols can be found in [50–53]. According to these studies, elemental cut-

(a)

4. Results To assess its performance, the proposed framework was tested by means of a simulated bone machining procedure involving glenoid reaming. This surgical technique represents a challenging task in total shoulder arthroplasty and is well-suited for VR simulation [54]. As depicted in Fig. 7, the surgical tool – commonly termed as reamer – is utilized to resurface the glenoid surface, typically for better implant fixation purposes. The computational algorithm was developed by means of OpenCL v1.2 programming platform. The hardware involved in testing include a GeForce GTX 970 GPU with 4GB video memory and an Intel Core i7 6700K CPU with 16 GB RAM. The main metric to evaluate the algorithm was its running time, as measured in nanoseconds by means of the “chrono” clock available in the C++ standard library [55]. The measured running time was averaged over multiple runs in order to minimize the confounding effect of the noise caused by the relative interference with the kernel of the operating system (Windows 10, in this case). 4.1. Performance for different sampling resolutions Since the fidelity of VPS-based methods is strongly dependent on the sampling resolution, it is important to examine the per-

(b)

(c)

Fig. 7. Details of the glenoid reaming procedure: a) glenoid surface outlining the pilot hole used to guide the reaming tool; b) reaming tool; and c) reaming kinematics

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263 Table 1 Average running time in milliseconds for the proposed method as evaluated for various voxmap and point shell resolutions

Table 3 Average running time in milliseconds for Zheng et al. method [33] as evaluated for various voxmap and point shell resolutions

Point shell resolution 643 Voxmap resolution

3

128 2563 5123 10243

0.19 0.19 0.20 0.20

1283 0.20 0.21 0.21 0.21

2563 0.23 0.24 0.24 0.27

7

Point shell resolution 5123 0.34 0.34 0.34 0.36

10243 0.76 0.79 0.79 0.81

formance of the algorithm in the context of different resolutions in order to identify the maximum resolution capable to guarantee the target 1 kHz refresh rate required by the haptic module. To this end, voxmaps with different resolutions were derived from a glenoid CT image by means of the voxel resizing method described in the past [40]. To construct point shell at different resolutions, the 3D mesh model of the reamer was obtained through laser scanning and then passed to a surface voxelization program [41]. This enabled generation of different voxelized representations for the reamer, the centroids of the associated voxels were used to derive point shells. Table 1 presents typical results obtained at different sampling resolutions. The timings presented represent average computing times as measured in various positions and orientations of the reamer typically subjected to severe collision scenarios. Quite notably, the data in Table 1 suggest that the developed algorithm is capable to maintain execution times of under 1 ms for point shells and voxmaps with resolutions as fine as 10243 . Clearly, the running time of the algorithm increases as point shell resolution grows due to the added number of GPU threads to be launched by the kernel. Interestingly, changes in voxmap resolutions have minor effects on overall computing time, most likely because an increase in the number of voxels results in just a slight increase of the potential read/write operations to be performed by each thread on the voxmap array. These operations are inexpensive and are usually accelerated by coalesced memory access and the use of cache line and therefore their additional overhead tends to be insignificant. As such, it is possible to reach significantly higher resolutions for bone representations as long as enough computing memory is available for voxel storage. 4.2. Duration of running time phases To better understand the functionality of the proposed technique, the time associated with different processing phases was quantified for a severe collision scenario. The results presented in Table 2 suggest that a significant fraction of the algorithm running time is devoted to kernel execution. While this phase is the only one affected by variations in the resolution of the two colliding objects, the other processing steps will be data size-independent. The data in Table 2 also implies that the proposed implementation is capable to perform CPU-GPU data transfers in approximately 24%

Table 2 Running time breakdown of the proposed algorithm for point shell and voxmap resolution of 10243 and over 103,0 0 0 contact points Step

Time [μs]

OBB-OBB intersection test Computing VOI Data transfer from CPU to GPU Kernel execution Data transfer from GPU to CPU Total

0.77 0.26 169.9 568.72 71.03 810.67

Voxmap Resolution

3

128 2563 5123

643

1283

2563

5123

10243

0.33 0.86 5.20

0.34 0.89 5.14

0.40 1.03 5.29

0.68 1.22 5.52

1.70 1.86 6.70

of the allowable 1 ms interval. This was achieved by wrapping the exchange data into one structure followed by its passing through a single buffer which turned out to be much faster than a similar approach involving separate buffers. Interestingly, broad-phase computations were found to last for only 1 μs that is negligible in the context of the general 1 ms time scale. To assess the effect of the broad-phase – especially with respect to VOI – the running time was measured both with and without involving VOI while setting both point shell and voxmap resolutions at 10243 . The results presented in Fig. 8 prove the effectiveness of the VOI in improving the collision detection performance, particularly in case of small intersection volumes between the colliding objects. In practical material removal scenarios, toolworkpiece collisions are bound to occur at object boundaries and this in turn implies that the intersection volume will generally remain small. Consequently, it can be inferred that the use of VOI can accelerate collision detection in glenoid reaming by almost 50% and this further enhances the efficacy of the proposed method. 4.3. Comparisons with other implementations 4.3.1. Material removal simulator for dental applications One of the comparable techniques proposed so far is the GPUbased VPS method introduced by Zheng et al. in the context of a tooth drilling simulator [33]. Since the original source code was not made publicly available, it was virtually impossible to mimic with high fidelity absolutely all details of the previous implementation. However, since its main features such as the octree-inspired grid traversal, force computation using a reduction algorithm as well as data structure were well-explained, it is believed that a reasonably close functionality was achieved on the OpenCL platform that was consistently used for development throughout the current study. The results presented in Table 3 were obtained while testing Zheng et al.’s method [33] on glenoid reaming scenarios identical with those used for Table 1. The data in Table 3 suggests that the previously-proposed technique is considerably slower than the one developed in this study and fails to fall under the 1 ms target for higher resolutions. One possible explanation for the visible underperformance of Zheng et al.’s method could reside in that their method employs an octree-like recursive subdivision of the grid in order to determine the intersecting voxels. As a result, the workload of each thread becomes highly sensitive to grid resolution and eventually leads to poor performance in case of finer grids. By contrast, the algorithm proposed in the current study computes the location of tool points directly in the voxmap array. The search for intersecting voxels is much faster when performed in this manner and is associated with a constant computational complexity that in turn guarantees a superior performance for large datasets. Furthermore, since Zheng et al.’s method relies on reduction algorithms for computation of the resultant force, this approach is prone to launch unnecessary threads for points that have not (yet) immersed into bone. By contrast, the current implementation relies on atomic operations that are invoked only for intersecting points and thereby warrants an increased computational efficiency. One other metric with significance in context of GPU-based algorithms is represented by memory usage, particularly since most

8

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

Running Time (ms)

1 0.8 0.6 0.4 0.2

With VOI Without VOI

0 10

20

30

40

50

60

70

80

90

100

VOI Volume / Tool OBB Volume (%) Fig. 8. The effect of VOI on collision detection performance Table 4 Comparison between memory requirements of point shell and voxmap data structures Point shell required memory [MB]

Voxmap required memory [MB]

Resolution

Proposed Approach

Zheng et al. [33] Approach

Proposed Approach

Zheng et al. [33] Approach

643 1283 2563 5123 10243

0.02 0.07 0.30 1.19 4.75

0.05 0.22 0.89 3.56 14.26

0.26 2.10 16.78 134.22 1073.74

1.31 10.49 83.89 671.09 –

common video cards have a relatively limited storage capacity. To address this, the data presented in Table 4 compares the memory requirements of the algorithms used to store point shell and voxmap. Clearly, the framework developed in the present study has much lower memory needs compared to Zheng et al.’s method. As pointed out in Section 3, the developed method requires only an 8-bit integer per voxel and a 32-bit integer for each tool point whereas Zheng et al. stores each voxel in a structure constituted by a 32-bit float for its HU value and a Boolean variable to flag whether the voxel is in a collision condition or not. Since the later approach also required three 32-bit floats for the 3D coordinates of each tool point, it was practically impossible to run it for voxmap at 10243 resolutions because its memory requirements (5.3 GB) exceeded the 4 GB limit available in the hardware used for testing. The proposed algorithm uses approximately 1.07 GB for 10243 voxmaps and hence the resolution of the model can be further increased. However, the 8 bits/voxel characteristic implies that the maximum number of voxels that could be fit in the current hardware (4 GB video memory) would be somewhere in the vicinity of the 4 billion mark. Overall, the tests above support the idea that while Zheng et al.’s approach [33] can be successfully used for material removal simulators with dental applications, the size and complexity of the osseous geometry involved in orthopaedic procedures might exceed its capabilities. 4.3.2. Material removal simulator for machining applications Another method comparable with the one proposed in the current study is the one developed by Yau et al. [32] in the context of machining simulations. However, since the original CPU-based implementation of the method was found be too slow when tested on the higher resolution orthopaedic samples used throughout this study, a GPU-based variant of it was developed for comparison purposes. The voxmap structure of the GPU-based variant of Yau et al.’s algorithm replicates the one outlined in Section 3. Furthermore, for a rotating reamer like the one used in glenoid reaming operations, the surface swept by the cutting lips has a hemispher-

Table 5 Average running time in milliseconds for Yau et al. method [32] as evaluated for various voxmap resolutions Voxmap resolution

Average running time (ms)

1283 0.23

5123 2.19

2563 0.5

10243 13.67

ical shape that can be described by an implicit equation. This representation replaces the point shell descriptor used in our algorithm and can be easily updated in each computed frame according to the posture (i.e., position and orientation) of the reaming tool. Under this modified approach, the GPU kernel needs to allocate one thread per voxmap element in order to query the collision of a voxel against the analytical formulation of reamer edges. To decrease the computation time, a VOI in voxmap coordinates was used to cull threads associated with voxels located outside of the intersection volume of OBBs. Table 5 presents the average running time associated with this method when tested under scenarios that were similar to the ones illustrated in Table 1. While Yau et al.’s method is characterized by an acceptable performance at low resolutions, its running time increases significantly for higher resolutions, primarily due to the increased number of threads to be launched by the kernel. As such, while Yau et al.’s method remains slower than the proposed algorithm, its memory requirements are less rigorous since it does not make use of point shell. At this point, it is worth mentioning here that the reason for comparing to Yau et al.’s method was primarily selected to provide a comparison baseline with respect to a different approach that could be potentially employed in collision detection in a virtual surgery simulation context. This being said, it should also be emphasized that Yau et al.’s method was originally developed for five-axis machining simulations in which – typically – accuracy is favored over computational efficiency/speed. Then, since the size of the machining models could be significantly larger than that of the common osseous geometries to appear in surgical simulations,

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

the associated voxmap size and required sampling resolutions required by the virtual machining context might end up exceeding the rather limited memory of the GPUs. Nevertheless, if the approach proposed in this current study was to be become the core engine of a virtual machining application, one possible solution would be to stream on demand chunks of voxmap from RAM to GPU global memory, an approach that might degrade the computational performance reported in this study. 5. Conclusions This work presented in this study was focused on the development of a new VPS collision method capable to handle the large datasets that are often involved in virtual bone removal surgical simulators. The effectiveness of the proposed technique is a consequence of several efficient data structures and methods that were included in the framework of the developed algorithm such as: GPU-acceleration, VOI calculation, atomic operations as well as grid computations that avoided the need for grid transversals. The several tests performed suggested that the proposed method is effective both in terms of running time and memory usage, such that it exhibits performance characteristics that are comparable if not better than prior approaches that were developed for similar purposes. To the best of authors’ knowledge, the current study represents one of the first attempts demonstrating the effectiveness of platform-independent GPU-based parallelization in the context of real-time material removal and/or collision detection required by the large geometric datasets that are often involved in orthopaedic surgery simulations. Future work will attempt to further enhance the performance of the developed technique, possibly through more accurate VOI estimations, real-time updates of bone OBB as well as continuous collision detection techniques that are more suitable for deep penetration scenarios. Evidently, one important extension direction of the current work will be constituted by the coupling of the simulator with experimental glenoid removal trials performed on cadaveric specimens, a step that is a mandatory to realistically model cutting forces in glenoid reaming. The algorithm and the force model can then be utilized for high-fidelity haptic simulation of the operation. One important question that would be answered with haptic implementation would be finding the best way to account for tool rotational speed during collision detection and realistically reflect that in the haptic feedback. As final notes, it is anticipated that the advantages of the proposed technique will increase the level of realism associated with the future generations of surgical training simulators and it will likely also assist other applications in which collision detection between complex geometries continues to remain the main obstacle hindering their computational performance. Declaration of Competing Interest The authors declare no conflict of interest. Acknowledgments The authors would like to acknowledge the financial support provided in part by Natural Sciences and Engineering Research Council (NSERC) of Canada and Canadian Institutes of Health Research (CIHR) that was received under the framework of the Collaborative Health Research Projects (CHRP) program. References [1] H. Medellín-Castillo, E. Govea-Valladares, C. Pérez-Guerrero, J. Gil-Valladares, T. Lim, J.M. Ritchie, The evaluation of a novel haptic-enabled virtual reality approach for computer-aided cephalometry, Comput. Methods Programs Biomed. 130 (2016) 46–53.

9

[2] M.S. Yin, P. Haddawy, S. Suebnukarn, P. Rhienmora, Automated outcome scoring in a virtual reality simulator for endodontic surgery, Comput. Methods Programs Biomed. 153 (2018) 53–59. [3] C. Monserrat, U. Meier, M. Alcaniz, F. Chinesta, M.C. Juan, A new approach for the real-time simulation of tissue deformations in surgery simulation, Comput. Methods Programs Biomed. 64 (2001) 77–85. [4] R. Olszewski, M.B. Villamil, D.G. Trevisan, L.P. Nedel, C.M. Freitas, H. Reychler, B. Macq, Towards an integrated system for planning and assisting maxillofacial orthognathic surgery, Comput. Methods Programs Biomed. 91 (2008) 13– 21. [5] P. Wang, A.A. Becker, I.A. Jones, A. Glover, S. Benford, C. Greenhalgh, M. Vloeberghs, A virtual reality surgery simulation of cutting and retraction in neurosurgery with force-feedback, Comput. Methods Programs Biomed. 84 (2006) 11–18. [6] J. Wang, S. Liao, X. Zhu, Y. Wang, C. Ling, X. Ding, Y. Fang, X. Zhang, Real time 3D simulation for nose surgery and automatic individual prosthesis design, Comput. Methods Programs Biomed. 104 (2011) 472–479. [7] J.-R. Wu, M.-L. Wang, K.-C. Liu, M.-H. Hu, P.-Y. Lee, Real-time advanced spinal surgery via visible patient model and augmented reality system, Comput. Methods Programs Biomed. 113 (2014) 869–881. [8] T.-Y. Fang, P.-C. Wang, C.-H. Liu, M.-C. Su, S.-C. Yeh, Evaluation of a haptics-based virtual reality temporal bone simulator for anatomy and surgery training, Comput. Methods Programs Biomed. 113 (2014) 674–681. [9] D.J. Scott, Patient safety, competency, and the future of surgical simulation, Simul. Healthc. 1 (2006) 164–170. [10] G.W. Thomas, B.D. Johns, J.L. Marsh, D.D. Anderson, A review of the role of simulation in developing and assessing orthopaedic surgical skills, Iowa Orthop. J. 34 (2014) 181. [11] W.A. McNeely, K.D. Puterbaugh, J.J. Troy, Six degree-of-freedom haptic rendering using voxel sampling, in: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, ACM Press/Addison-Wesley Publishing Co, 1999, pp. 401–408. [12] F. Zheng, W.F. Lu, Y. San Wong, K.W.C. Foong, Graphic processing units (GPUs)-based haptic simulator for dental implant surgery, J. Comput. Inf. Sci. Eng. 13 (2013) 041005. [13] M. Marco, M. Rodríguez-Millán, C. Santiuste, E. Giner, M.H. Miguélez, A review on recent advances in numerical modelling of bone cutting, J. Mech. Behav. Biomed. Mater. 44 (2015) 179–201. [14] B. Siciliano, O. Khatib, Springer Handbook of Robotics, Springer Science & Business Media, 2008. [15] G. Pratx, L. Xing, GPU computing in medical physics: a review, Med. Phys. 38 (2011) 2685–2697. [16] M. Renz, C. Preusche, M. Pötke, H.-P. Kriegel, G. Hirzinger, Stable haptic interaction with virtual environments using an adapted voxmap-pointshell algorithm, in: In Proc. Eurohaptics, Citeseer, 2001. [17] W.A. McNeely, K.D. Puterbaugh, J.J. Troy, Voxel-based 6-dof haptic rendering improvements, (2006). [18] M. Sagardia, T. Hulin, C. Preusche, G. Hirzinger, Improvements of the Voxmap– Pointshell Algorithm-Fast Generation of Haptic Data-Structures, 53rd IWK-Internationales Wissenschaftliches Kolloquium, Ilmenau, Germany, 2008. [19] H. Xu, J. Barbicˇ , Adaptive 6-dof haptic contact stiffness using the gauss map, IEEE Trans. Haptics 9 (2016) 323–332. [20] M.-S. Hsieh, M.-D. Tsai, Y.-D. Yeh, An amputation simulator with bone sawing haptic interaction, Biomed. Eng. 18 (2006) 229–236. [21] M.-D. Tsai, M.-S. Hsieh, C.-H. Tsai, Bone drilling haptic interaction for orthopedic surgical simulator, Comput. Biol. Med. 37 (2007) 1709–1718. [22] M.-D. Tsai, M.-S. Hsieh, Accurate visual and haptic burring surgery simulation based on a volumetric model, J. X-Ray Sci. Technol. 18 (2010) 69–85. [23] M. Arbabtafti, M. Moghaddam, A. Nahvi, M. Mahvash, B. Richardson, B. Shirinzadeh, Physics-based haptic simulation of bone machining, IEEE Trans. Haptics 4 (2011) 39–50. [24] M. Vankipuram, K. Kahol, A. McLaren, S. Panchanathan, A virtual reality simulator for orthopedic basic skills: a design and validation study, J. Biomed. Inform. 43 (2010) 661–668. [25] J. Barbicˇ , D.L. James, Six-dof haptic rendering of contact between geometrically complex reduced deformable models, IEEE Trans. Haptics 1 (2008). [26] K. Moustakas, 6DoF haptic rendering using distance maps over implicit representations, Multimed. Tools Appl. 75 (2016) 4543–4557. [27] H. Xu, Y. Zhao, J. Barbicˇ , Implicit multibody penalty-baseddistributed contact, IEEE Trans. Vis. Comput. Graph. 20 (2014) 1266–1279. [28] H. Xu, J. Barbicˇ , 6-DoF haptic rendering using continuous collision detection between points and signed distance fields, IEEE Trans. Haptics 10 (2017) 151–161. [29] B. Heidelberger, M. Teschner, M. Gross, Real-time volumetric intersections of deforming objects, in: Vision, Modeling, and Visualization: Proceedings, AKA, 2003, p. 461. [30] B. Heidelberger, M. Teschner, M. Gross, Detection of collisions and selfcollisions using image-space techniques, (2004). [31] T.D. Tang, Algorithms for collision detection and avoidance for five-axis NC machining: a state of the art review, Comput.-Aided Des. 51 (2014) 1–17. [32] H.T. Yau, L.S. Tsou, Y.C. Tong, Adaptive NC simulation for multi-axis solid machining, Comput.-Aided Des. Appl. 2 (2005) 95–104. [33] F. Zheng, W.F. Lu, Y. San Wong, K.W.C. Foong, An analytical drilling force model and GPU-accelerated haptics-based simulation framework of the pilot drilling procedure for micro-implants surgery training, Comput. Methods Programs Biomed. 108 (2012) 1170–1184.

10

M. Faieghi, O.R. Tutunea-Fatan and R. Eagleson / Computer Methods and Programs in Biomedicine 188 (2020) 105263

[34] D. Jang, K. Kim, J. Jung, Voxel-based virtual multi-axis machining, Int. J. Adv. Manuf. Technol. 16 (20 0 0) 709–713. [35] J. Joy, H.-Y. Feng, Frame-sliced voxel representation: An accurate and memory– efficient modeling method for workpiece geometry in machining simulation, Comput.-Aided Des. 88 (2017) 1–13. [36] R. Lynn, D. Contis, M. Hossain, N. Huang, T. Tucker, T. Kurfess, Voxel model surface offsetting for computer-aided manufacturing using virtualized high-performance computing, J. Manuf. Syst. 43 (2017) 296–304. [37] D. Konobrytskyi, T. Kurfess, J. Tarbutton, T. Tucker, GPGPU accelerated 3-axis CNC machining simulation, (2013) V0 01T0 01A025. [38] F. Abecassis, S. Lavernhe, C. Tournier, P.A. Boucard, Performance evaluation of CUDA programming for 5-axis machining multi-scale simulation, Comput. Ind. 71 (2015) 1–9. [39] X. Shao, W. Xu, L. Lin, F. Zhang, A multi-GPU accelerated virtual-reality interaction simulation framework, PLoS One 14 (2019) e0214852-e0214852. [40] M. Faieghi, N.K. Knowles, O.R. Tutunea-Fatan, L.M. Ferreira, Fast generation of cartesian meshes from micro-computed tomography data, Comput. Aided Des. Appl. 16 (2019) 161–171. [41] M. Faieghi, O.R. Tutunea-Fatan, R. Eagleson, Fast and cross-vendor OpenCL-based implementation for voxelization of triangular mesh models, Comput. Aided Des. Appl. 15 (2018) 852–862. [42] Y. Fei, B. Wang, J. Chen, Point-tessellated voxelization, in: Proceedings of Graphics Interface 2012, Canadian Information Processing Society, 2012, pp. 9–18. [43] J. Pantaleoni, VoxelPipe: a programmable pipeline for 3D voxelization, in: Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics, ACM, 2011, pp. 99–106.

[44] W. Li, S. McMains, A GPU-based voxelization approach to 3D Minkowski sum computation, in: Proceedings of the 14th ACM Symposium on Solid and Physical Modeling, ACM, 2010, pp. 31–40. [45] M. Schwarz, H.-P. Seidel, Fast parallel surface and solid voxelization on GPUs, ACM Trans. Graph. 29 (2010) 179. [46] G.M. Morton, A computer oriented geodetic data base and a new technique in file sequencing, (1966). [47] J. Baert, A. Lagae, P. Dutré, in: Out-of-Core Construction of Sparse Voxel Octrees, Computer Graphics Forum, Wiley Online Library, 2014, pp. 220–227. [48] C. Ericson, Real-Time Collision Detection, CRC Press, 2004. [49] S. Gottschalk, A hierarchical structure for rapid interference detection, in: Proc. of ACM Siggraph, 1996. [50] A. Pourkand, N. Zamani, D. Grow, Mechanical Model of Orthopaedic Drilling for Augmented-Haptics-Based Training, Computers in biology and medicine, 2017. [51] A. Danda, M.A. Kuttolamadom, B.L. Tai, A mechanistic force model for simulating haptics of hand-held bone burring operations, Med. Eng. Phys. 49 (2017) 7–13. [52] J. Sui, N. Sugita, K. Ishii, K. Harada, M. Mitsuishi, Mechanistic modeling of bone-drilling process with experimental validation, J. Mater. Process. Technol. 214 (2014) 1018–1026. [53] J. Lee, B.A. Gozen, O.B. Ozdoganlar, Modeling and experimentation of bone drilling forces, J. Biomech. 45 (2012) 1076–1083. [54] G.M. Gartsman, T.B. Edwards, Shoulder Arthroplasty E-Book, Elsevier Health Sciences2008. [55] N.M. Josuttis, The C++ Standard Library: a Tutorial and Reference, Addison-Wesley, 2012.