Protuberance of depth : Detecting interest points from a depth image

Protuberance of depth : Detecting interest points from a depth image

Journal Pre-proof Protuberance of depth : Detecting interest points from a depth image Yuseok Ban, Sangyoun Lee PII: DOI: Reference: S1077-3142(20)3...

5MB Sizes 0 Downloads 22 Views

Journal Pre-proof Protuberance of depth : Detecting interest points from a depth image Yuseok Ban, Sangyoun Lee

PII: DOI: Reference:

S1077-3142(20)30018-7 https://doi.org/10.1016/j.cviu.2020.102927 YCVIU 102927

To appear in:

Computer Vision and Image Understanding

Received date : 26 March 2019 Revised date : 29 January 2020 Accepted date : 3 February 2020 Please cite this article as: Y. Ban and S. Lee, Protuberance of depth : Detecting interest points from a depth image. Computer Vision and Image Understanding (2020), doi: https://doi.org/10.1016/j.cviu.2020.102927. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Β© 2020 Published by Elsevier Inc.

Journal Pre-proof

Authorship Confirmation

lP repro of

Computer Vision and Image Understanding Please save a copy of this file, complete and upload as the β€œConfirmation of Authorship” file. As corresponding author I, Yuseok Ban, hereby confirm on behalf of all authors that:

1. This manuscript, or a large part of it, has not been published, was not, and is not being submitted to any other journal. 2. If presented at or submitted to or published at a conference(s), the conference(s) is (are) identified and substantial justification for re-publication is presented below. A copy of conference paper(s) is(are) uploaded with the manuscript. 3. If the manuscript appears as a preprint anywhere on the web, e.g. arXiv, etc., it is identified below. The preprint should include a statement that the paper is under consideration at Computer Vision and Image Understanding. 4. All text and graphics, except for those marked with sources, are original works of the authors, and all necessary permissions for publication were secured prior to submission of the manuscript. 5. All authors each made a significant contribution to the research reported and have read and approved the submitted manuscript. Signature BAN

Date 2019.03.23.

rna

List any pre-prints:

Jou

Relevant Conference publication(s) (submitted, accepted, or published):

Justification for re-publication:

Graphical Abstract (Optional)

lP repro of

Journal Pre-proof

To create your abstract, please type over the instructions in the template box below. Fonts or abstract dimensions should not be changed or altered.

π’™πŸ , π’šπŸ

π’™πŸ , π’šπŸ π’™πŸ• , π’šπŸ•

𝒙𝒄 , π’šπ’„

π’™πŸ‘ , π’šπŸ‘

π’™πŸ’ , π’šπŸ’ π’™πŸ” , π’šπŸ”

π’™πŸ“ , π’šπŸ“

Output

Low

Input

Input Data as Depth Images

High

π’™πŸ– , π’šπŸ–

Low

High

Protuberance of Depth : Detecting Interest Points from a Depth Image Yuseok Ban, Sangyoun Lee

rna

Protuberance of Depth (PoD) levels

Jou

Interest Point Detection Results

Research Highlights (Required)

lP repro of

Journal Pre-proof

It should be short collection of bullet points that convey the core findings of the article. It should include 3 to 5 bullet points (maximum 85 characters, including spaces, per bullet point.) β€’ We propose an interest point detection method from a depth image based on calculating the local feature of a depth region, namely Protuberance of Depth (PoD). β€’ PoD consistently extracts depth protuberance in the presences of the isometric deformation and varying orientation of a depth region. β€’ The proposed method effectively detects distinctive interest points with high repeatability and shows good tolerance to the rotation of a depth region.

Jou

rna

β€’ The method is efficient enough for real-time applications.

Journal Pre-proof

1

Computer Vision and Image Understanding journal homepage: www.elsevier.com

Yuseok Bana , Sangyoun Leeb,βˆ—βˆ— a Agency b School

lP repro of

Protuberance of Depth : Detecting interest points from a depth image

for Defense Development, Republic of Korea of Electrical & Electronic Engineering, Yonsei University, Seoul 120-749, Republic of Korea

ABSTRACT

Detecting distinctive interest points in a scene or an object allows estimating which details a human finds interesting in advance to understand the scene or the object. This also forms the important basis of a variety of latter tasks related to visual detection and tracking. In this paper, we propose a simple but effective approach to extract the feature from a depth image, namely Protuberance of Depth (PoD). The proposed approach semantically explores the inherent feature representing three-dimensional protuberance by using depth which only contains two-dimensional distance information. Our approach directly allows detecting consistent interest points in a depth image. The experimental results show that our method is effective against the isometric deformation and rotation of a depth region and is applicable for real-time applications. c 2020 Elsevier Ltd. All rights reserved.

1. Introduction

Jou

rna

Detecting repeatable and distinctive interest points has recently been an intensive research topic. Interest points imply the compact description of an object or a scene so as to provide preliminary solution for understanding the object or the scene. They form the basis of a variety of detection and trackingrelated tasks such as stereo matching, visual odometry, 3D object recognition, etc. These tasks share the initial requirement of fast and precise visual tracking based on the locations of the interest points which can be detected by describing local portions of an object or a scene. The interest point detection methods are divided into two categories, i.e., fixed-scale detection and adaptive-scale detection. Both categories share the selection procedure of interest points as local extrema of saliency measurement (refer to Tombari et al.’s work (Tombari et al., 2013)). To begin with, fixed-scale detection provides distinctive interest points at a specific constant scale as a preliminary parameter to the algorithm. Two steps would roughly abstract fixed-scale detection. The first step is pruning the input data by thresholding the data quality measure at each point, mainly to reduce the number of considering points. The second step is Non-Maxima Suppression (NMS) using conspicuity measurement calculated at each point. On the βˆ—βˆ— Corresponding

author: tel.: +82-2-2123-5768; fax: +82-2-313-2879; e-mail: [email protected] (Sangyoun Lee)

other hand, adaptive-scale detection takes into account both the location and scale of an interest point and can be abstracted into four steps. The first step is building a scale space for a given range image, while embedding of the data onto a twodimensional plane is adoptable. The second step is selecting a characteristic scale at each point generally based on the saliency measure. The third step is Non-Maxima Suppression (NMS) of the conspicuity at the characteristic scale of each point. The fourth step is optional pruning with which additional points can be dismissed based on predefined constraints. By virtue of the development of efficient three-dimensional sensing technology, studies examining how to incorporate 3D information into interest point detection have recently been drawing a significant amount of research attention in computer vision areas. The studies explore 3D information which is categorized into depth, point cloud, and mesh (as shown in Figure 1). Particularly, depth information has advantages such as the easiness of acquisition, the low computational cost, the robustness to illumination, and the unambiguously encoding of surface metric dimensions, i.e., the pure geometry and shape cues (Guo et al., 2014; Cheng et al., 2015). Depth provides important information of behaviorally relevant objects in a visual field (Jansen et al., 2009), and it can be used as relevant content information to indicate the degree of conspicuity of an interest point as well as to estimate the spatial structures of objects (Yu et al., 2018). However, depth contains limited information as two-dimensional distance from a single viewpoint. Further-

Journal Pre-proof

method of extracting feature to identify the regions that are distinguishable to their surrounding context. They showed that the proposed method captures the interesting features at all perceptually important scales. Sun et al. (HKS) (Sun et al., 2009) proposed Heat Kernel Signature using the properties of the heat diffusion process on a shape. Their method organizes the information of geometry of a shape in a deformation-invariant way. Their proposed method captures rich information contained in the heat kernel while preserving the intrinsic geometry of a shape under the perturbation of a shape. The authors presented the results of a non-rigid task to repeatedly detect interest points within perturbing shapes. Godil et al. (3D SIFT) (Godil and Wagan, 2011) suggested a formulation of the three-dimensional local feature for conspicuity using the voxel grid inspired by Scale Invariant Feature Transform (SIFT). Their local feature had been used to detect the salient interest points of a rigid model as well as an articulated and deformable model. Guo et al. (COV) used local shape descriptors based on covariance matrices and employed Principal Component Analysis (PCA) in a feature space contructed by sigma-point technique. (Guo et al., 2018). Covariance matrices naturally modeled nonlinear correlation of different low-level compact and rotational-invariant features of local geometric information and their method showed the advantages of being structure sensitive. Steder et al. (NARF) suggested Normal Aligned Radial Feature (NARF) to extract the interest points of an object (Steder et al., 2011). Their method effectively worked on a low-resolution depth image and produced interest points having a stable normal and a significant change in depth (Kiforenko et al., 2018). Rosten et al. (FAST) (Rosten and Drummond, 2006) proved that machine learning can be used to derive a feature detector which can operate in real-time frame rate applications. The feature detector, namely FAST, has been widely used for various latter applications. Rosten et al. (FAST-ER) (Rosten et al., 2008) optimized FAST detector directly to improve its repeatability, creating FAST-ER detector. They obtained not only considerable efficiency but also good consistency with variation in corner density. All of the comparing methods have strength in generally indicating the structural information of a perturbing shape. MS and HKS robustly operate against multiple scales, while 3D SIFT, COV, NARF, FAST, and FAST-ER are vulnerable to scale variation. MS and COV are relatively strong against the rotation of an object, while 3D SIFT, FAST, and FAST-ER show the limitation against rotation. Moreover, MS, 3D SIFT, and COV are not enough efficient to be applied to real-time applications. On the other hand, FAST and FAST-ER can be efficiently computed.

lP repro of

Fig. 1. Categories of 3D information, (left) depth, (middle) point cloud, and (right) mesh data.

2

Jou

rna

more, most of the previous approaches have focused on point cloud and mesh, while depth has been considered as either a prior information to preliminarily assist existing methods or a cue which is treated as an extension of the color, ignoring the difference of the physical characteristics of a type of information (Yu et al., 2018; Zhang and Tian, 2015). The approaches also have limitation that they only take into account depth data without fully exploiting the structural information of a depth region. Relatively few studies have delved deeply into extracting the salient feature of depth perception based on depth itself. The main issues toward detecting interest points include the isometric deformation, rotation, and scale variation of a depth region which make it difficult to extract consistent interest points. To address these issues, the performance of feature extraction for the conspicuity measurement at each point is crucial in order to selectively search meaningful interest points out from a depth image. In this paper, a novel approach to extract the feature of a depth region is proposed. Protuberance of depth (PoD) is our proposed method which facilitates the interest point detection in a depth image by describing how much a depth region is visually conspicuous. The challenge on calculating the protuberance in a depth image is that we must mimic the protuberance of actual 3D data by only leveraging the grayscale depth values carrying the two-dimensional distance information from a sensor. To address this challenge and the abovementioned issues, PoD effectively indicates how much a depth region stands out from the surroundings by mimicing the three-dimensional manner. As such, PoD provides a simple but powerful method to find interest points in a depth image, which is also efficient for real-time applications. The organization of the rest of this paper is as follows: Section 2 provides a review on related works of three-dimensional interest point detection. Section 3 describes the proposed method in detail. Section 4 presents the comparative experiments and the discussion. Finally, our conclusion is given in Section 5. 2. Background

Two recent reviews on three-dimensional interest point detection provide elaborate guidelines for the state-of-the-art and the performance evaluation criteria (Tombari et al., 2013; Dutagaci et al., 2012). Lee et al. (MS) (Lee et al., 2005) introduced mesh saliency as a scale-dependent measure of regional importance for three-dimensional data which uses center-surround filters with Gaussian-weighted curvatures. They designed an intuitive

3. Protuberance of Depth (PoD) The definition of protuberance can be explained by protuberant objects including protuberant shapes (e.g., the fingers of a hand, the legs of a chair, and the horn of a bull) or by objects containing protuberant parts (e.g., edges and corners). A protuberant region varies with its isometric deformation and orientation, resulting in inconsistently extracted features in the region. It is even more difficult to extract protuberance from a low resolution depth image. The purpose of this method is to consistently extract the protuberance regardless of the aforementioned

Journal Pre-proof (a)

(b)

(c)

(d)

3

Thirdly, Pc,v , the magnitude of vector summation, is obtained by adding up all the dropping vectors as in Equation 4. The vector summation implies the destructive interference effect between two vectors of counter directions. (See Figure 5 for the illustration and Figure 6 for the examples.)



N

X v~i

(4) Pc,v =



Ξ·v

lP repro of

i=1

v~i = (xi βˆ’ xc , yi βˆ’ yc ) √ Ξ·v = w2 + h2

Fig. 2. Representative illustrations of selecting the dropping points around a center point with (a-b) high protuberance and (c-d) low protuberance.

rna

factors. To do so, a novel way of leveraging feature responses to calculate the protuberance of a depth region, namely, Protuberance of Depth (PoD), is introduced. In order to express a measure of spatial importance of the region within an object, PoD is extracted as rotation invariant local feature (see the illustrations in Figure 2). Firstly, a center point (xc , yc ) is located in a depth image, and equally spaced N number of rays are set around the center point to search a dropping point along each ray. (Let us assume that N = 8 as default.) The first point along ith ray which has a larger depth value than the depth value of the center point dc is selected as dropping point (xi , yi ). (See Figure 3 for the illustration of a center point and its dropping points and Figure 4 for some examples in a depth image.) Secondly, each dropping distance from the center point to an dropping point is calculated and the inverted magnitude of scalar summation Pc,s is obtained by accumulating all the dropping distances as in Equation 1. (See Figure 5 for the illustration and Figure 6 for the examples.) Pc,s = 1 βˆ’

N X si Ξ· i=1 s

(1)

si = (xi βˆ’ xc , yi βˆ’ yc ) √ Ξ· s = w + h + 2 w2 + h2

(2) (3)

(5)

(6)

(vi is the ith dropping vector. Likewise, Ξ·v is a normalizer to ensure that Pc,v falls within the range of [0, 1] and the width and height of an image are defined as w and h, respectively.) Finally, the protuberance value at the center point Pc is produced by the addition of Pc,s and Pc,v as in Equation 7. Pc = Pc,s + Pc,v

(7)

The calculation of PoD can be counselled by two intuitions. The first intuition of PoD is that, when having the same magnitude of vector summation, the smaller the magnitude of scalar summation is the larger the protuberance is. Then, the second intuition is that, when having the same magnitude of scalar summation, the larger the magnitude of vector summation is the larger the protuberance is. The two intuitions mutually provide confidence levels to each other so as to indicate the degree of the protuberance which describes how a depth region outstands from the immediate surrounding regions. Besides, the degree of the protuberance does not depend on the direction of the uplift of a depth region because of the rotation invariance of non-orientable summations. Eventually, the protuberance value at each pixel is collected to present the entire resulting image of a two-dimensional topographic representation of visual conspicuity for each pixel. As a result, the regions on smooth or almost planar sections which draw little interest get small value of PoD, while a protuberant bump is highlighted by getting large value of PoD.

Jou

(si is the ith dropping distance. Ξ· s is a normalizer to ensure that Pc,s falls within the range of [0, 1], and the width and height of an image are defined as w and h, respectively.)

π’™πŸ , π’šπŸ

π’™πŸ– , π’š πŸ–

π’™πŸ , π’šπŸ

𝒙𝒄 , π’šπ’„

π’™πŸ• , π’šπŸ• π’™πŸ” , π’šπŸ”

π’™πŸ‘ , π’š πŸ‘

π’™πŸ’ , π’šπŸ’ π’™πŸ“ , π’šπŸ“

Fig. 3. Illustration of (red) center point and (green) dropping points of PoD.

Fig. 4. Examples of the center point and the dropping points for PoD.

Journal Pre-proof

4 High

π’—πŸ π’—πŸ–

π’”πŸ

π’”πŸ–

π’—πŸ

𝒙𝒄 , π’š 𝒄 π’”πŸ” π’”πŸ’

π’”πŸ‘

π’—πŸ•

π’—πŸ‘

𝒙𝒄 , π’šπ’„

(a) π’—πŸ”

π’—πŸ“

π’—πŸ’

(c)

(b)

Fig. 7. (a) An input depth image, (b) the PoD result, and (c) the corresponding interest point detection result, where red and black colors indicate large and small PoD levels, respectively.

lP repro of

π’”πŸ“

Low

π’”πŸ

π’”πŸ•

f (x) ≀ f (x0 ) for all x such that kx βˆ’ x0 k < r

(8)

Jou

rna

Therefore, an efficient implementation using searching window with its size 2r has been adopted to find local maxima. A local maximum is found whenever the point having the highest level in a searching window corresponds to the center of the searching window (see Figure 8). Accordingly, the size of a searching window is linked to the trade-off between the robustness to noise and the sparsity of detected interest points. Still, for any fixed size of searching window, it enables evaluating the repeatability of detecting interest points via PoD under varying conditions of an object of a scene. As presented in Figure 7, PoD levels can provide a set of local maxima (xβˆ— , yβˆ— ) as in Equation 9. (Note that P(x, y) is the PoD level at (x, y) position, Ri is the ith searching window with a local maximum at its center, and M is the number of detected interest points.) In summary, we provide a pseudo code for detecting interest points based on PoD from a depth image as shown in Table 1.

Fig. 6. Examples of (left) the dropping points, (middle) the scalar summation, and (right) the vector summation for PoD.

...

PoD can be directly used for finding interest points in a depth image as it determines the visual attention of a local area and helps interpreting the spatial depth region. As shown in Figure 7, high levels of PoD are computed on the conspicuous depth regions indicating them as protuberant regions. The points in a depth image with richly informative content are identified as interest points of which quality can be defined in terms of repeatability and informativeness (Guo et al., 2014). Since considering the richness of discriminative information on interest points is important for latter usage, it is necessary to detect interest points according to their distinctiveness (Mian et al., 2010). As such, finding the locations of interest points should be implemented with respect to the levels of PoD in a depth region. The locations of interest points can be obtained by finding local maxima among the levels of PoD. Local maxima are distinctive on the graph of a function and are therefore essential in understanding the shape of the graph (Sun et al., 2009). A scalar-valued function f has a local maximum at x0 if there exists some positive number r > 0, thought of as a radius, such that the following statement 8 is true:

 ο£Ά arg max P(x, y)ο£·ο£·ο£· ο£Ά   ο£·ο£·ο£·  (x1 βˆ— , y1 βˆ— ) ο£·ο£·ο£·  (x,y)∈R1 ο£·  ο£·  max P(x, y)ο£·ο£·ο£·ο£·ο£·  (x2 βˆ— , y2 βˆ— ) ο£·ο£·ο£·ο£·ο£· arg βˆ— βˆ— ο£·ο£·ο£· (x , y ) =  ο£·ο£·ο£· =  (x,y)∈R2 ο£·ο£·ο£· ο£·ο£·ο£Έ   ο£· βˆ— βˆ—  arg max P(x, y)ο£·ο£·ο£·ο£·ο£· (x M , y M ) ο£­ ο£Έ ...

Fig. 5. Illustrations of (left) the scalar summation and (right) the vector summation for PoD..

(9)

(x,y)∈R M

PoD

π‘₯2 βˆ— , 𝑦2 βˆ—

π‘₯1 βˆ— , 𝑦1 βˆ—

ℝ1

ℝ2

π‘₯𝑀 βˆ— , 𝑦𝑀 βˆ—

ℝ𝑀

0

π‘₯

Fig. 8. Illustration of finding local maxima among PoD levels simplified in one dimension. Table 1. The pseudo code for detecting interest points based on PoD.

Pseudo Code: Input: Depth image I, Width of image w, Height of image h Output: Protuberance image P, Positions of Interest Points (xβˆ— , yβˆ— ) Parameter: Half length of searching window r /βˆ— Calculate PoD at every pixel βˆ—/ for m = 1 : w do for n = 1 : h do 1: Consider the point at (m, n) as center point (xc , yc ) with depth level I(m, n) 2: Find dropping points (xi , yi ) along each equally spaced rays around (xc , yc ) 3: Calculate inverted magnitude of scalar summation Pc,s 4: Calculate magnitude of vector summation Pc,v 5: Add up the two magnitudes to produce PoD level, P(m, n) = Pc = Pc,s + Pc,v end for end for /βˆ— Find local maxima among PoD levels βˆ—/ for m = 1 + r : w βˆ’ r do for n = 1 + r : h βˆ’ r do 1: Set ROI R as (m βˆ’ r : m + r, n βˆ’ r : n + r) 2: Find location of maximum level in R, (mβˆ— , nβˆ— ) = arg max P(m, n) (m,n)∈R

if (mβˆ— , nβˆ— ) corresponds to the center of R do 3: Keep (mβˆ— , nβˆ— ) as a local maximum (xi βˆ— , yi βˆ— ) end if end for end for

Journal Pre-proof 4. Experiment and Discussion

depth and color images captured at 30 Hz which include 4,242,051 depth images as a whole. The video data have been downsampled by leaving an interval of 500 frames between two sampled frames as a criterion to avoid redundancy of similar view from consecutive frames. The combination of BundleFusion (Dai et al., 2017b) for pose alignment, VoxelHashing (Nießner et al., 2013) for volumetric integration, and Marching Cubes algorithm for extracting a high resolution surface mesh has been used to perform dense reconstruction. Missing values as depth holes have been filled by image inpainting technique. (The number of data : 8,250 images Γ— 13 angles Γ— 4 scales Γ— 2 flips)

lP repro of

4.1. Experimental Setup 4.1.1. Dataset Three datasets have been used. They comprise threedimensional objects and scenes of different shapes. Experiments with varying rotation angles and scales as well as horizontal flips are extensively carried out. The increment of angle by 15β—¦ and the increasing scale (1, 2, 4, and 8) are used to address the rotation and scale issues, respectively. The 3D data of each dataset has been projected to a plane so as to generate the corresponding depth data, in order to make use of both the depth image and the point cloud of a single view.

5

Jou

rna

β€’ Benchmark-SHREC dataset (Dutagaci et al., 2012) : This dataset is a set of 3D models chosen from SHREC2007 dataset (Giorgi et al., 2007) and The Stanford 3D Scanning Repository dataset (Stanford Computer Graphics Laboratory). It contains the 3D models from the two datasets which are widely used for the shape representation of a three-dimensional object, such as Utah teapot and David’s head, etc. (The number of data : 43 objects Γ— 13 angles Γ— 4 scales Γ— 2 flips) β€’ Standford-Random Views dataset (Tombari et al., 2013) : This dataset is based on the three-dimensional models in The Stanford 3D Scanning Repository dataset (Stanford Computer Graphics Laboratory) which include bunny and buddha, etc. The first set of models in The Stanford 3D Scanning Repository dataset was scanned with a sweptstripe laser triangulation range scanner (Cyberware 3030 MS scanner or Stanford Large Statue Scanner), while the second set is acquired at a XY scan resolution of 100 microns using the XYZ RGB auto-synchronized camera, which is based on technology developed in the Visual Information Technology group of the Canadian National Research Council (NRC). The alignment of the models was done by a modified Iterative Closest Point (ICP) algorithm, the surface reconstruction was done using zippering or volumetric merging methods, and the hole filling was performed using diffusion-based hole filler. (The number of data : 6 objects Γ— 13 angles Γ— 4 scales Γ— 2 flips) β€’ ScanNet dataset (Dai et al., 2017a): This dataset contains richly-annotated RGB-D video data of real-world environments which are captured from 707 distinct spaces by utilizing an easy-to-use and portable low-cost RGB-D capture system. 1513 scans have been performed to acquire 1513 image sequences as video data of synchronized

Table 2. Dataset specification.

Dataset

BenchmarkSHREC

StandfordRandom Views

ScanNet

# of data

4,472

624

858,000

Size

640 by 640

1,280 by 1,280

640 by 480

Type

Mesh to depth Point cloud to depth Point cloud to depth

4.1.2. Comparison MS (Lee et al., 2005), HKS (Sun et al., 2009), 3D SIFT (Godil and Wagan, 2011), COV (Guo et al., 2018), NARF (Steder et al., 2011), FAST (Rosten and Drummond, 2006), and FAST-ER (Rosten et al., 2008) are considered as the detectors for comparative experiments. MS, HKS, 3D SIFT, NARF, FAST, and FAST-ER are based on the author’s open code with the given setting of default parameters, while COV is reproduced based on the details of the paper.

4.1.3. Evaluation Metrics The most common trait of the performance of interest point detection is the repeatability of detected points (refer to the detail described in (Tombari et al., 2013)). Given a model, this trait accounts for how repetitively a method can find same interest points on different instances typically caused by the changing scale and rotation of a depth region. When an interest point kh i is detected from the model Mh , the interest point transformed according to the ground truth rotation and translation (Rhl , thl ) is considered to be repeatable if the distance from its nearest neighbor kl j in the set of interest points detected from the scene Sl is less than a threshold  as shown in Equation 10. ( can be defined by the distance of 2 mesh resolution (mr) (Johnson and Hebert, 1999).)



Rhl kh i + thl βˆ’ kl j

< 

(10)

Given the set of repeatable interest points Phl , the relative repeatability rrel is defined as the following equation 11. (The operator |Β·| counts the number of points in a set of points, and Ohl is the set of all the interest points detected from Mh which are not occluded in Sl .) rrel =

|Phl | |Ohl |

(11)

The relative repeatability serves as an important trait indicating the overall ratio of repeatable interest points that the detector is able to provide to a latter application. 4.1.4. Hardware Configuration Experiments have been carried out on a desktop computer with Intel(R) Core(TM) i7 CPU @ 3.20 GHz and 18.0 GB RAM.

Journal Pre-proof

R e la tiv e R e p e a ta b ility

(a)

it detects interest points well on protuberant regions. Nevertheless, the repeatability of detected points using NARF is rather negatively affected by a rotating object. FAST and FAST-ER effectively detect interest points on corners, but they rather show poor repeatability on edges when rotating a depth region. As the authors have already pointed out in (Rosten and Drummond, 2006), the result can be 1-pixel wide line at certain angles, when the quantisation of the circle misses the line. On the other hand, PoD better detects consistent interest points on the convex structure of a rotating object (see the last row). These observations are supported by Figure 10 which presents the evaluation of relative repeatability against rotation conducted on Benchmark-SHREC. As the test is based on 2.5D depth image (Naoufel et al., 2015), the repeatability has been calculated in the range of rotation from βˆ’90β—¦ to 90β—¦ . Overall, the number of repeatable interest points commonly drops in the presence of the rotation of an object for all methods. Specifically, it is worth to observe that PoD predominantly outper-

lP repro of

4.2. Result and Discussion 4.2.1. The Rotation of an Object Firstly, Figure 9 visually compares the results of interest point detection using Benchmark-SHREC dataset with the eight methods, MS, HKS, 3D SIFT, COV, NARF, FAST, FAST-ER, and PoD, concerning the challenge of the rotation of an object. An important note of this experiment is that a powerful detector finds spatially identical interest points in the presence of the variations of an object. Obtaining a high ratio of repeatable interest points to the total amount of interest points contributes to high peformance of relative repeatability. As can be seen, although the detection result of MS fairly covers all the corners and convex edges, unnecessary interest points are included as well (e.g., the interest points on the planar region of an object as shown in Figure 9(a)). The result of HKS is rather biased toward corners, i.e., extremely convex regions, as shown in the second row. The performance of HKS is also highly influenced by the rotation of an object. 3D SIFT frequently misses the interest points on corners, though it substantially searches the interest points out on convex edges (as shown in the third row). COV effectively detects convex regions leveraging the low-level and rotational-invariant features. NARF is initially designed to make use of the borders of an object and works on a region with stable surface and sufficient changes in the vicinity, and

6

0

-9 0

9 0

0 .2

-6 0

0 .4 0 .6 0 .8 1

-3 0

6 0

M S H K S 3 D S C O V N A R F A S F A S P o D

IF T F T T -E R

3 0

0

R o ta te d A n g le

(d)

(e)

(f)

(g)

rna

(c)

Fig. 10. The relative repeatabilities of interest point detection against the challenge of rotation on Benchmark-SHREC.

Jou

(b)

(a)

(b)

(c)

(h)

Fig. 9. The results of interest point detection against the challenge of rotation on Benchmark-SHREC, (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, (f ) FAST, (g) FAST-ER, and (h) PoD.

Fig. 11. Three examples of interest point detection against the challenge of rotation on Benchmark-SHREC using PoD, (a) glasses, (b) cup, and (c) table.

Journal Pre-proof

R e la tiv e R e p e a ta b ility

(a)

the surroundings such as the fingers of armadillo model. However, PoD shows better result of detecting repetitive interest points under the condition of a rotating object. Figure 14 demonstrates three examples showing that PoD consistently extracts protuberance in rotating depth regions such as the fingers, snout, and ears of armadillo model (see the last row). Thirdly, Figure 15 compares the experimental results of detecting interest points using ScanNet dataset. The dataset includes depth images captured from various real-world environments and multiple objects are presented in a scene. COV and NARF effectively detect interest points on convex edges against the rotation of a depth image. Despite the decent robustness to the rotation, NARF tends to focus only on distinct edges. Although MS finds a majority of interest points, unnecessary points are included as well. The result of HKS is biased to the regions which have abrupt depth change. 3D SIFT performs good interpretation of the structure of a depth region, whereas it shows relatively poor repeatability when the depth region is rotated, as presented in Figure 16. FAST and FAST-ER detected

lP repro of

forms the other comparing methods in respect of relative repeatability showing its strength against rotation. PoD remarkably detects the geometric set of interest points from a rotated object that is highly consistent with that of the original object. It should be pointed out that this result has been achieved by constructing the rotation invariant local feature to calculate PoD. Figure 11 demonstrates three examples against the challenge of rotation using our method. Each example shows that PoD extracts consistent protuberance in the presence of a rotating object, i.e., interpreting the inherent meaning of depth. Secondly, experiments have been carried out using Standford-Random Views. Figure 12 compares the resiliences of the eight methods to the rotation of an object. As can be seen, MS substantially finds interest points on convex edges and corners, yet dispensable interest points have also been detected. HKS detects the interest points on corners relatively well but produces some nonrepetitive interest points on convex edges. As demonstrated in Figure 13, 3D SIFT poorly produces repetitive interest points against rotation. COV detects repetitive interest points both on concave and convex regions while an object rotates. NARF finds meaningful interest points, but the repeatability of the results is low. FAST and FAST-ER detect a substantial amount of interest points on distinct edges but miss the points on protuberant region popping out from

7

0

-9 0

9 0

0 .2

-6 0

0 .4 0 .6 0 .8 1

-3 0

6 0

M S H K S 3 D S C O V N A R F A S F A S P o D

IF T F T T -E R

3 0

0

R o ta te d A n g le

(b)

(d)

(e)

(f)

(g)

Jou

(c)

rna

Fig. 13. The relative repeatabilities of interest point detection against the challenge of rotation on Stanford-Random Views dataset.

(a)

(b)

(c)

(h)

Fig. 12. The results of interest point detection against the challenge of rotation on Stanford-Random Views dataset, (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, (f ) FAST, (g) FAST-ER, and (h) PoD.

Fig. 14. Three examples of interest point detection against rotation on Stanford-Random Views using PoD, (a) duck, (b) octopus, and (c) armadillo.

Journal Pre-proof

R e la tiv e R e p e a ta b ility

(a)

repeatability of interest point detection in the presence of the rotation of a depth region than the comparing methods do. The robustness of PoD against rotation comes from its radial form consistently calculating the protuberance from a depth region. As presented in Section 5.2.1 of our supplementary material, Equations 15, 16, and 17 also theoretically verify the robustness of PoD to rotation. In general, the results of an interest point detector which uses anisotropic discrete derivative filters vary with image rotation (Schmid et al., 2000; Grabner et al., 2006). In order to obtain the rotation invariance of an interest point detector, many approaches estimated the dominant orientation in a local neighborhood that is centered at the location of an interest point (Alcantarilla et al., 2012; Darom and Keller, 2012). Transforming data of a local region from Cartesian to Polar coordinates enables using the radial feature of the local region as well (Maver, 2010). However, PoD which is based on flexible dropping points introduces a more spatially adaptive radial form compared to the methods based on discrete signals on the circumference of a fixed radius.

lP repro of

interest points only on the depth region that has abrupt change, for example, the distinct edge where the background in far distance and the foreground in close distance meets. Many of the depth images that are obtained in real-world environments contain smooth edges rather than distinct edges because of the resolution of a capturing sensor. As FAST and FAST-ER are originally proposed to be utilized in a grayscale intensity image having brightness values which does not express the distance values, they miss lots of interest points in a depth image. Accordingly, Figure 16 depicts that FAST and FAST-ER are vulnerable against the rotation of a depth region. However, PoD finds interest points useful for interpreting a depth structure, such as guitar and wall-closet, which include not only the convex edges and corners but also protuberant depth regions (e.g., washstand and toilet in Figure 17(a)). Figure 17 demonstrates three examples which show that PoD effectively calculates the degree of protuberance in a depth region. It should be pointed out that the direction of protuberance can be horizontal as in x βˆ’ y axis as well as vertical as in z axis, that is, the forward direction from an image plane (further discussed in Section 4.2.3). The rotation of a depth region is a challenging issue when detecting interest point in the depth image. Some of the positionally identical depth regions appear totally different when being rotated. As can be seen in Figure 16, PoD shows better

8

0

-9 0

9 0

0 .2

-6 0

0 .4 0 .6 0 .8 1

-3 0

6 0

M S H K S 3 D S C O V N A R F A S F A S P o D

IF T F T T -E R

3 0

0

R o ta te d A n g le

(b)

(d)

(e)

(f)

(g)

Jou

(c)

rna

Fig. 16. The relative repeatabilities of interest point detection against the challenge of rotation on ScanNet dataset.

(a)

(b)

(c)

(h)

Fig. 15. The results of interest point detection against the challenge of rotation on ScanNet dataset, (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, and (f ) FAST, (g) FAST-ER, and (h) PoD.

Fig. 17. Three examples of interest point detection against rotation on ScanNet using PoD, (a) bathroom, (b) kitchen, and (c) bedroom.

Journal Pre-proof

(a)

on the support size parameter, which is the diameter of the sphere around the interest point (Steder et al., 2011). FAST and FAST-ER is based on the fixed size of positions of offsets for determining a corner, which results in vulnerability against scale change. Although PoD also shows dependency on scale, it produces a higher level of repeatability than 3D SIFT, COV, NARF, FAST, and FAST-ER do. Figure 20 demonstrates three

lP repro of

4.2.2. The Scale Variation of an Object It is important to look into the scale dependency of a method, regarding the spatial extents of local geometric structures (Novatnack and Nishino, 2007). As can be observed in Figure 18 and verified using Benchmark-SHREC dataset in Figure 19, MS and HKS are strong against scale variation so that they maintain relatively high repeatability through changing scales. The performance of MS results from its design operating at multiple scales, and the heat distribution of the feature point neighborhood of HKS shows stability towards scale change (Kiforenko et al., 2018). However, the repeatability of 3D SIFT is relatively more affected by the increasing scale of a depth region. COV suffers performance degradation from the scale change as being based on the calculation of features through certain scales. NARF is highly influenced by the scale factor and achieved poor repeatability for interest point detection because it is based

9

1 .2

R e la tiv e R e p e a ta b ility

1 .0

0 .8

0 .6

0 .4

M S H K S 3 D S C O V N A R F A S F A S P o D

0 .2

0 .0

1

2

4

8

S c a le

(b)

Fig. 19. The relative repeatabilities of interest point detection against the challenge of scale on Benchmark-SHREC dataset.

(c)

(e)

(f)

(g)

Jou

(d)

rna

(a)

(b)

(c)

(h)

Fig. 18. The results of interest point detection against the challenge of scale on Benchmark-SHREC dataset, (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, (f ) FAST, (g) FAST-ER, and (h) PoD.

Fig. 20. Three examples of interest point detection against the challenge of scale on Benchmark-SHREC using PoD, (a) glasses, (b) cup, and (c) table.

IF T F T T -E R

Journal Pre-proof

(a)

scale. Qualitatively, PoD effectively detects repeatable interest points in the presence of the scale variation of an object. Lastly, ScanNet has been used in order to test the comparing methods regarding scale issue using data taken from realworld environments. MS and HKS show the robustness in finding consistent interest points against scale changes. Still, the result of MS includes unnecessary interest points, and an amount

lP repro of

examples against the challenge of changing scale when using our method. Despite the dependency on scale discussed above, PoD still extracts meaningful protuberance through the scale variation of an object. Furthermore, Standford-Random Views dataset has been used to compare the repeatabilities of the methods against varying scale. Similarly to the previous experiment, MS and HKS show strength to scale, whereas 3D SIFT, COV, NARF, FAST, and FAST-ER are strongly affected by altering scales. PoD achieves the performance which is better than those of 3D SIFT, COV, NARF, FAST, and FAST-ER as shown in Figure 22. For instance at the scale of 8, PoD produces more consistent interest points than 3D SIFT does as can be explained by (c) and (h) of the fourth column in Figure 21. Figure 23 demonstrates three different examples of the proposed method against the varying

10

1 .2

R e la tiv e R e p e a ta b ility

1 .0

0 .8

0 .6

0 .4

M S H K S 3 D S C O V N A R F A S F A S P o D

0 .2

0 .0

1

2

4

8

S c a le

(b)

Fig. 22. The relative repeatabilities of interest point detection against the challenge of scale on Stanford-Random Views.

(c)

(e)

(f)

(g)

(h)

Jou

(d)

rna

(a)

Fig. 21. The results of interest point detection against the challenge of scale on Stanford-Random Views dataset, (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, (f ) FAST, (g) FAST-ER, and (h) PoD.

(b)

(c)

Fig. 23. Three examples of interest point detection against the challenge of scale on Stanford-Random Views using PoD, (a) duck, (b) octopus, and (c) armadillo.

IF T F T T -E R

Journal Pre-proof

(a)

coding of non-invariant information related to the voxels and results in the weakness toward the changing scale of an object. COV calculates covariance descriptors based on certain scales, resulting in its scale dependency. Because of the scale parameter of NARF defining the diameter of the shpere around an interest point, the results of NARF depend on scale. FAST and FAST-ER are based on the segment test of neighborhood considered as using a fixed radius, so they are vulnerable to scale variation. PoD is negatively affected by scale change, though it fairly produces the consistent interpretation of protuberance in the presence of scale variation (see Figures 20, 23, and 26).

lP repro of

of important points are missing for HKS. 3D SIFT, COV, and NARF are negatively influenced by scale changes. Also, FAST and FAST-ER show weakness against scale variation as they leverage a segment test using fixed radius. PoD also shows dependency to scale which is further discussed in Section 5.2.2. Overall, MS is originally designed for operating at multiple scales, thereby showing the robustness against varying scales as described in (Lee et al., 2005). However, Lee et al. admitted that MS takes a long time to compute the feature and concluded with the future work to speed it up by a multiresolution hierarchy which accelerates filtering at coarser scales (Lee et al., 2005). Also, HKS is proposed to provide a computable method to encode the shape information at a given point around the neighborhood by recording the heat diffusion over time (Sun et al., 2009). Accordingly, MS and HKS show strength in finding what appears interesting in an object when considering scale variation. On the other hand, the scale of 3D SIFT is represented by a convolution with three-dimensional Gaussian filter of which size increases by the power of 3 as cube (Godil and Wagan, 2011). This correspondingly brings about the en-

11

1 .2

R e la tiv e R e p e a ta b ility

1 .0

0 .8

0 .6

0 .4

M S H K S 3 D S C O V N A R F A S F A S P o D

0 .2

0 .0

1

(b)

2

4

8

S c a le

(d)

(e)

(f)

(g)

(h)

Jou

(c)

rna

Fig. 25. The relative repeatabilities of interest point detection against the challenge of scale on ScanNet.

Fig. 24. The results of interest point detection against the challenge of scale on ScanNet dataset, (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, (f ) FAST, (g) FAST-ER, and (h) PoD.

(a)

(b)

(c)

Fig. 26. Three examples of interest point detection against the challenge of scale on ScanNet using PoD, (a) bathroom, (b) kitchen, and (c) bedroom.

IF T F T T -E R

Journal Pre-proof

(a)

(b)

(c)

(d)

(e)

Table 3. The compared computational times of the interest point detectors (unit: milliseconds).

rna

(f)

(g)

(h)

Jou

Fig. 27. The comparison of results in examplar areas having protuberant depth regions from ScanNet, (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, (f ) FAST, (g) FAST-ER, and (h) PoD. (a)

(b)

4.2.4. Computational Efficiency Table 3 lists the average execution time of each method for detecting interest points with respect to CPU time. A fixed scale of image has been applied for a fair comparison, so the size of an image has been resized to 240 by 240. The demand of selectively prioritizing and processing the largely increasing graphics datasets for image synthesis and analysis will increase in respect of depth data (Lee et al., 2005). As such, interest point detection in a depth image needs to be efficient because it is generally adopted preliminarily to a latter application. The experimental results on computational time show that PoD operates very efficiently. The two aspects of extracting the feature, flexibly selecting the dropping points around a center point instead of an exhaustive pyramid search and using simple summations instead of encoding rich feature code, contribute to such an efficiency of PoD. In addition, it can be computed more rapidly by C-based implementation as shown in Table 4. Besides, the code can be further optimized to provide a fast version of PoD, resulting in 7.4 times faster computation in return of minimum amount of performance degradation. Nevertheless, FAST and FAST-ER as grayscale intensity image detectors are much faster than PoD as a depth image detector. More researches are needed to improve the efficiency of a depth image detector. It is also worth noting that the decisive reason of the outstanding performance of PoD not only in repetitive interest point detection in a rotated object but also in efficient computation is that PoD is designed to perform the specific task rather than to be a general descriptor, by using simple calculations as scalar and vector summations based on spatially-adaptive feature computing in flexible radial form.

lP repro of

4.2.3. Interpreting Depth Structure As shown in Figure 27, PoD significantly detects interest points that helps understand the structure of a depth region by leveraging the quantitative interpretation of depth as threedimensional distance value in the vertical direction against an image plane. The detection result of PoD covers most of the representative areas including the headstock, neck, and body of the guitar presented in the first column. It also fairly detects interest points on smoothly protuberant edge as the one of wallcloset presented in the second column. As shown in Figure 28, PoD effectively detects the fingers of armadillo which are not distinctly expressed as corners in depth image.

12

(c)

(d)

(e)

(f)

(g)

Rank 1 2 3 4 5 6 7 8

Method

Language

FAST FAST-ER PoD HKS NARF COV MS 3D SIFT

Matlab Matlab Matlab Matlab Matlab Matlab Matlab Matlab

CPU Time

Mean

STD

2.4 5.4 549.8 1,761.2 1,893.2 16,823.3 22,849.9 59,080.4

0.4 0.7 198.5 10,159.4 3,976.7 11,027.0 7,872.5 39,811.6

(h)

Fig. 28. The comparison of results in an examplar area having multiple protuberant depth regions from Standford-Random Views (top row) and its 3D mesh representation in various views (bottom row), (a) MS, (b) HKS, (c) 3D SIFT, (d) COV, (e) NARF, (f ) FAST, (g) FAST-ER, and (h) PoD.

Table 4. The computational time of PoD when implementing with C++ (unit: milliseconds).

Method PoD Boost PoD

Language C++ C++

CPU Time Mean

STD

7.3 54.2

5.6 199.2

Journal Pre-proof 5. Conclusion

Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J., 2014. 3D object recognition in cluttered scenes with local surface features: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 2270–2287. Guo, Y., Wang, F., Xin, J., 2018. Point-wise saliency detection on 3d point clouds via covariance descriptors. The Visual Computer , 1–14. Jansen, L., Onat, S., KΒ¨onig, P., 2009. Influence of disparity on fixation and saccades in free viewing of natural scenes. Journal of Vision 9, 29–29. Johnson, A.E., Hebert, M., 1999. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on pattern analysis and machine intelligence 21, 433–449. Kiforenko, L., Drost, B., Federico, T., Kruger, N., Buch, A.G., 2018. A performance evaluation of point pair features. Computer Vision and Image Understanding 166, 66–80. Lee, C.H., Varshney, A., Jacobs, D.W., 2005. Mesh saliency, in: ACM SIGGRAPH, ACM. pp. 659–666. Maver, J., 2010. Self-similarity and points of interest. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1211–1226. Mian, A., Bennamoun, M., Owens, R., 2010. On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. International Journal of Computer Vision 89, 348–361. Naoufel, W., Claudio, T., Stefano, B., Alberto, d.B., 2015. Local binary patterns on triangular meshes: Concept and applications. Computer Vision and Image Understanding 139, 161–177. Nießner, M., ZollhΒ¨ofer, M., Izadi, S., Stamminger, M., 2013. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration, in: ACM Transactions on Graphics (ToG), ACM. p. 169. Novatnack, J., Nishino, K., 2007. Scale-dependent 3d geometric features, in: 2007 IEEE 11th International Conference on Computer Vision, IEEE. pp. 1–8. Rosten, E., Drummond, T., 2006. Machine learning for high-speed corner detection, in: European Conference on Computer Vision, Springer. pp. 430– 443. Rosten, E., Reid, P., Drummond, T., 2008. Faster and better: A machine learning approach to corner detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 105–119. Schmid, C., Mohr, R., Bauckhage, C., 2000. Evaluation of interest point detectors. International Journal of computer vision 37, 151–172. Stanford Computer Graphics Laboratory, . The Stanford 3D Scanning Repository dataset. URL: http://graphics.stanford.edu/data/ 3Dscanrep/. Steder, B., Rusu, R.B., Konolige, K., Burgard, W., 2011. Point feature extraction on 3d range scans taking into account object boundaries, in: International Conference on Robotics and Automation, IEEE. pp. 2601–2608. Sun, J., Ovsjanikov, M., Guibas, L., 2009. A concise and provably informative multi-scale signature based on heat diffusion, in: Computer graphics forum, Wiley Online Library. pp. 1383–1392. Tombari, F., Salti, S., Stefano, L.D., 2013. Performance evaluation of 3d keypoint detectors. International Journal of Computer Vision 102, 198–220. Yu, Q., Liang, J., Xiao, J., Lu, H., Zheng, Z., 2018. A novel perspective invariant feature transform for rgb-d images. Computer Vision and Image Understanding 167, 109–120. Zhang, C., Tian, Y., 2015. Histogram of 3d facets: A depth descriptor for human action and hand gesture recognition. Computer Vision and Image Understanding 139, 29–39.

References

rna

lP repro of

In this paper, an effective and efficient interest point detection method, which is based on the local feature of a depth region, has been introduced. The proposed method, namely Protuberance of Depth (PoD), provides a simple yet effective method to detect distinctive interest points in a depth image, while most of the previous interest point detectors have focused on point cloud and mesh. Relatively few studies on interest point detection have delved deeply into extracting the salient feature of a depth region based on depth data itself. PoD, however, intuitively exploits the inherent protuberance information from depth data. It is worth observing that PoD consistently extracts protuberance in the presences of the isometric deformation and varying orientation of a depth region. Based on its rotation invariance, PoD outperforms the comparing methods in detecting interest points against the challenge of the rotation of a depth region, i.e., PoD consistently detects interest points with high repeatability. Furthermore, it is important to note that PoD is efficient enough for real-time applications. The repeatability is important because the same scene viewed from a different position should yield interest points which correspond to the same real-world three-dimensional locations, and the efficiency is important because the detector should operate at frame rate being combined with further processing (Rosten et al., 2008). We propose our method in an effort to find the balance between repeatability and efficiency of the interest point detector of a depth image. In the future, we intend to explore other potential architectures for describing the feature of a depth structure as well as to deal with the scale dependency by adopting multi-scale technique. Moreover, by leveraging other advanced detection and tracking techniques, it will be important to learn the benefits of PoD in more complicated applications using depth data, for instance, depth saliency detection, object tracking based on depth information, 3D object reconstruction and recognition, etc.

Jou

Alcantarilla, P.F., Bartoli, A., Davison, A.J., 2012. Kaze features, in: European Conference on Computer Vision, Springer. pp. 214–227. Cheng, Y., Zhao, X., Huang, K., Tan, T., 2015. Semi-supervised learning and feature evaluation for rgb-d object recognition. Computer Vision and Image Understanding 139, 149–160. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M., 2017a. Scannet: Richly-annotated 3d reconstructions of indoor scenes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839. Dai, A., Nießner, M., ZollhΒ¨ofer, M., Izadi, S., Theobalt, C., 2017b. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration, in: ACM Transactions on Graphics (ToG), ACM. p. 24. Darom, T., Keller, Y., 2012. Scale-invariant features for 3-d mesh models. IEEE Transactions on Image Processing 21, 2758–2769. Dutagaci, H., Cheung, C.P., Godil, A., 2012. Evaluation of 3d interest point detection techniques via human-generated ground truth. The Visual Computer 28, 901–917. Giorgi, D., Biasotti, S., Paraboschi, L., 2007. Shape retrieval contest 2007: Watertight models track. SHREC competition 8. Godil, A., Wagan, A.I., 2011. Salient local 3d features for 3d shape retrieval, in: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics. pp. 78640S–78640S. Grabner, M., Grabner, H., Bischof, H., 2006. Fast approximated sift, in: Asian conference on computer vision, Springer. pp. 918–927.

13