Information Sciences 510 (2020) 33–49
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
On shortened 3D local binary descriptors Siwen Quan a, Jie Ma b,∗ a
School of Electronics and Control, Chang’an University, Xi’an 710064, China National Key Laboratory of Science and Technology on Multi-spectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
b
a r t i c l e
i n f o
Article history: Received 23 March 2019 Revised 10 September 2019 Accepted 14 September 2019 Available online 14 September 2019 Keywords: 3D point cloud Local feature description Bit-selection Feature matching Binary descriptor
a b s t r a c t The wide-spread mobile systems nowadays desire ultra lightweight local geometric features to accomplish tasks relying on correspondences. Nonetheless, most existing 3D local feature descriptors, though shown to be distinctive and robust, still are real-valued and/or high-dimensional. Accordingly, this paper conducts a comparative study on current bit-selection methods with a focus on shortening 3D local binary descriptors. By analyzing several bit-selection techniques, we develop and evaluate various approaches to obtain a shortened version of a state-of-the-art feature remaining discriminative and robust. Through extensive experiments on four standard datasets with different data modalities (e.g., LiDAR and Kinect) and application scenarios (e.g., 3D object retrieval, 3D object recognition, and point cloud registration), we show that a small subset of representative bits are sufficient to achieve promising feature matching results as the initial descriptor. Moreover, the shortened binary descriptors still hold competitive or better distinctiveness and robustness compared to several state-of-the-art real-valued descriptors, e.g., spin image, SHOT, and RoPS, albeit being dramatically more efficient to match and store. Key to the foreseen research trend of local geometric feature description is dealing with compact binary descriptors; thus, our work may pave the way for this new research direction. © 2019 Published by Elsevier Inc.
1. Introduction Local shape description for rigid data, such as 3D point clouds and meshes is a pervasive problem in the realm of computer vision, computer graphics, and robotics. The objective is to use a feature vector to fully parameterize the geometric information contained in a local shape [44]. It has been applied to numerous real-world applications, e.g., 3D object recognition, point cloud registration, simultaneous localization and mapping (SLAM), and reconstruction. A large corpus of research on 3D local shape descriptors has been conducted in the past two decades [13,16,18,32,38,44,45,48]. The existing 3D local descriptors are either hand-crafted or learned. According to the taxonomy in [37], hand-crafted features can be categorized into three classes: histogram, signature, and hybrid. Histogram-type descriptors, e.g., fast point feature histograms (FPFH) [32] and local feature statistics histograms (LFSH) [44], represent the local shape geometry by calculating the statistical histogram of point attributes, such as normal deviation and curvatures. Signature-type descriptors, e.g., 3D shape context (3DSC) [4], mainly resort to the spatial distribution of point attributes for shape description. Hybrid-type descriptors combine the traits of the former two categories to achieve a good balance ∗
Corresponding author. E-mail addresses:
[email protected] (S. Quan),
[email protected] (J. Ma).
https://doi.org/10.1016/j.ins.2019.09.028 0020-0255/© 2019 Published by Elsevier Inc.
34
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
Fig. 1. Illustration of bit-selection on 3D binary descriptors. Binary descriptors are first extracted for keypoints detected on a 2.5D/3D model. After performing bit-selection on the initial bits (red), reasonable bits (green) are concatenated as a new descriptor. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
between descriptiveness and robustness, with typical examples including signature of histograms of orientations (SHOT) [38] and rotational projection statistics (RoPS) [13]. Learned features, e.g., 3DMatch [48] and CGF [18], have achieved a clear performance gain over hand-crafted ones when trained on large datasets. These works have made great progress toward crafting discriminative and robust geometric features. However, most existing 3D local features are real-valued and highdimensional, and incur low efficiency in both storage and feature matching. With the rise of portable and commodity range sensors, e.g., the Microsoft Kinect and Intel RealSense, there is a desperate need for ultra-light-weight feature descriptors. Similar to the trace in 2D the image domain [6,31], designing binary descriptors recently has become a new trend in 3D the domain. Compared with float strings, bit strings greatly reduce the storage volume and time cost of feature matching. Typical examples of 3D binary features include B-SHOT [27] and local voxelized structure (LoVS) [29]. B-SHOT converts the SHOT descriptor using a quantization algorithm, and is shown to hold limited descriptiveness due to information loss [29]. LoVS achieves superior performance when benchmarked on a series of datasets against several real-valued descriptors [29]. Nonetheless, LoVS, which is composed of 729 bits, suffers from low efficiency in terms of storage and feature matching. Although 3D local descriptors have been studied since the 1990s, the development of binary descriptors began in 2015 [27] and existing binary descriptors are relatively high-dimensional [29]. We notice that in the 2D image domain, effective feature description can be achieved with as few as 32 bits, i.e., the D-BRIEF descriptor [42]. This motivates us to explore the possibility of crafting low-dimensional 3D local binary descriptors. Dimension reduction is a common practice for achieving lower-dimensional descriptors. However, this is mostly performed on float feature descriptors. Popular dimension reduction approaches include principal component analysis (PCA) [16] and linear discriminant embedding (LDE) [15]. However they are not suitable for binary descriptors because the resulting descriptors are real-valued. There are also methods that convert real-valued descriptors to binary descriptors, to shorten a descriptor, e.g., thresholding [2] and non-linear neighborhood component analysis [40]. Yet, these methods require float descriptors as the input. For the problem on which this paper focuses, we wish to select a few reasonable bits from a initial binary descriptor to achieve the goal of shortening binary descriptors with few discriminative power loss. We notice the “binary test selection” technique in the 2D image domain [31,46,47] and borrow ideas from it. The definition of the binary test is comparing the values of two image pixels and labeling the result with ‘0’ or ‘1’ [31,47]. Binary test selection then refers to as the selection of a portion of all binary tests within a local image patch. Because binary test selection allows binary in and binary out, we apply it to shortening binary feature descriptors and dub it as bit-selection in our context. The distinction is that we perform treat feature bins as input instead of two image pixels. This paper, to the best of our knowledge, presents the first attempt to compute very low-dimensional 3D local binary descriptors. Directly computing compact binary descriptors from raw data relies heavily on choosing salient pixels or feature bins, which are somewhat subjective [47]. A more prevalent solution prefers to first use a high-dimensional vector to fully characterize the information and then perform dimension reduction [15,31]. Similar to existing works that build advanced variants over existing matured features, e.g., PCA-SIFT over SIFT [17] and compressed SHOT over SHOT [22], we design distinctive, robust, and ultra-lightweight binary features based upon a state-of-the-art 3D descriptor, i.e., LoVS [29]. LoVS is a binary descriptor and exhibits competitive performance with existing real-valued features. For a given application task, we believe that not all bits provide positive contributions for feature matching. For instance, when performing 3D registration for partially overlapped data, bits extracted spatially closer to the keypoints are supposed to be more convincing, as boundary regions are usually incomplete [45]. To this end, we perform bit-selection on LoVS. To provide a better understanding of bit-selection, we illustrate its concept in Fig. 1. This work specifically studies eight bit-selection algorithms: two baselines, i.e., random sampling and uniform sampling, and six published ones, i.e., Correlation [31], Entropy [46], Coding [30], Matching
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
35
[30], Coding+Matching [30], and AdaBoost [47]. Remarkably, there has been no study to date in both the 2D and 3D domains that comprehensively analyzes or evaluates existing bit-selection algorithms. Systematic experiments are carried out on four standard datasets that address various application scenarios including 3D object retrieval, 3D object recognition, and point cloud registration. Both LiDAR and Kinect data are included in the four datasets, with a variety of perturbations including synthetic noise, real-noise, partial overlap, clutter, and occlusion. To demonstrate the effectiveness of our generated binary descriptors after bit-selection, three renowned float 3D local descriptors, i.e., spin image [16], SHOT [38], and RoPS [13], are tested on experimental datasets for comparative evaluation. The results turn out quite satisfactory: 96 (or even less) bits in the LoVS descriptor are able to deliver better distinctiveness and robustness than the compared state-of-the-art float descriptors, particularly on challenging real-world datasets. To summarize, this paper presents the following contributions: • A survey and abstraction of existing bit-selection techniques with core computational steps that are helpful to researchers for either proposing effective improvements or ground-breaking new methods. The code of these bit-selection methods implemented by us will be made publicly available at https://sites.google.com/view/siwenquanshomepage. • Towards the foreseen research topic of dealing with compact binary local shape descriptors, we suggest utilizing proper bit-selection methods to achieve light-weight features to on one hand preserve the major discriminative power and on the other remove redundant/noisy bits. This avoids redesigning new features heuristically and instead accomplishes the goal in a data-driven manner based on existing matured descriptors. • The resulting binary descriptors with bit-selection show many appealing traits: they are low-dimensional, distinctive, and robust to various perturbations including noise, partial overlap, clutter, and occlusion. This makes them good gifts for applications on mobile platforms. The remainder of this paper is structured as follows. Section 2 gives a brief review of existing methods to shorten a descriptor and the LoVS descriptor. Section 3 gives a survey of existing bit-selection methods and abstracts their core computational steps. Section 4 presents the experimental evaluation of the generated descriptors after bit-selection on four standard datasets. Section 5 gives a summary of the findings by this study together with necessary discussions. Finally, Section 6 draws conclusion and describes potential future research directions. 2. Related work This section first gives a brief review of the existing literature of local binary descriptors and shortening descriptors. Then, the technical approach for computing the LoVS descriptor is introduced. 2.1. Local binary descriptors In the 2D domain, to act as accelerated versions of SIFT [19], a set of local binary descriptors have been proposed in the past decade. Calonder et al. [7] presented a BRIEF descriptor by using simple intensity difference tests within a local image patch; they showed that comparable performance can be achieved by BRIEF with several popular float descriptors. Rublee et al. [31] proposed an ORB descriptor based on BRIEF; different with BRIEF, ORB is rotation invariant by computing an orientation based on intensity centroid. Alahi et al. [1] computed a cascade of binary strings by efficiently comparing image intensities over a retinal sampling pattern that is inspired by the human visual system, and proposed a FREAK descriptor. Recently, Balntas et al. [3] introduced binary online learned descriptors (BOLD), a patched-adapted feature that was built online from a subset of features leading to low intra-class and large inter-class distances. Duan et al. [10] proposed an unsupervised feature learning method with multi-quantization for visual matching; they reduced the quantization loss by applying a K-AutoEncoders network to jointly learn parameters and binarization functions. In addition to image matching, there also exist local binary descriptors for other vision tasks. Examples include the local binary test (LBP) descriptor [26] for texture classification and learned binary descriptors [9,20,21] for face recognition. In the 3D domain, the study of local binary descriptors began from the B-SHOT descriptor [27], which converts the SHOT feature [38] to binary strings with a heuristics quantization approach. Later, Quan et al. [29] proposed a LoVS descriptor by voxelizing the local surface and describing shape geometry with voxel labels. Unfortunately, there are still a few 3D local binary descriptors at present and existing methods suffer from limited descriptiveness and/or compactness. 2.2. Shortening a descriptor Dimension reduction. PCA is arguably the most commonly-used technique for descriptor dimension reduction, which learns a projection matrix in an unsupervised manner. It has been successfully applied to the well-known SIFT [19] image descriptor [17]. Many 3D local shape descriptors also use PCA to remove redundant dimensions. Johnson and Hebert [16] applied PCA to compress the spin image descriptor for more efficient object recognition in clutter. Guo et al. [14] proposed a TriSI descriptor by performing PCA on a raw feature vector concatenated by three spin image signatures. Recently, Prakyha et al. [28] studied the use of PCA to create lower-dimensional SHOT [38], fast point feature histograms (FPFH) [32], and RoPS [13] descriptors, showing that descriptors with much fewer dimensions can achieve comparable performance to the original descriptors. Besides PCA, Hua et al. [15] proposed linear discriminant embedding (LDE) by using a lower dimensional embedding to optimize the descriptors’ discriminative power. The descriptors generated by these methods are lower-dimensional yet still real-valued.
36
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
Binarization. To generate binary descriptors from float descriptors, Baber et al. [2] transformed the 128-byte SIFT descriptor to a 128-bit descriptor by thresholding each element in gradient orientation histograms. Zhou et al. [49] proposed B-SIFT by calculating the median of the SIFT vector and comparing the median with each element in SIFT to obtain a binary vector. Tra et al. [41] leveraged the binary coded position of the cumulative maximum value in each sub-histogram of SIFT to generate a 48-bit Dominant-SIFT descriptor. Prakhya et al. [27] generated a binary version of SHOT [38], called B-SHOT, by splitting the float vector into quadruples and comparing the values within each quadruple. Besides hand-crafted methods, there are also learning-based approaches for descriptor binarization such as linear discriminant analysis hashing (LDAHash) [36] and discriminative projection [42]. Different from thresholding-based methods, learning-based methods try to perform a re-projection of float vectors to the Hamming space that is achieved by minimizing a matching cost function. These methods are particularly designed for float descriptors for the purpose of reducing the descriptor dimension and time cost of matching. Binary test selection. The seminal work on binary test selection is proposed in [31], where binary tests with high variance and low correlation are selected from initial binary tests to form the 256-bit ORB descriptor. The later FREAK [1] descriptor also relies on this approach to achieve compact binary features. Yang and Cheng [46] proposed an entropy-based method that prefers binary tests with high Shannon entropy. Redondi et al. [30] developed three binary test selection methods that use conditional entropy, matching cost, and both terms to rank bits. Yang and Cheng [47] applied AdaBoost to the problem of bit-selection, where bits are served as independent classifiers and a matching loss is calculated repeatedly to determine the best combination of classifiers. This approach is also adopted in [11] to shorten the raw feature descriptor. Unfortunately, existing studies have only compared a small portion of existing methods on very limited experimental data with a focus on image descriptors. The concept of binary test selection can be also applied for dimension reduction on binary descriptors (we call it bit-selection), though it is originally performed on local image patches. We will specifically study the effectiveness of all above mentioned bit-selection methods for the problem of shrinking 3D local binary descriptors. 2.3. The LoVS descriptor The LoVS descriptor [29] resorts to voxel grids to encode the local geometry around a keypoint. For a keypoint, its radius neighbors in an r-ball are first employed to calculate a local reference frame (LRF) as in [45]. Second, to achieve a more uniform partition in the local volume, a cubic volume that is centered at the keypoint and aligned with the LRF is calculated. Third, the local cubic-intersected surface is transformed with respect to the LRF to achieve rotation invariance. By performing voxelization in the cubic volume, 729 voxels (9 × 9 × 9) are generated. Then, voxels with points inside are labeled with 1, and 0 otherwise. Finally, all these labels are concatenated as the LoVS descriptor. Because the LoVS descriptor leverages the spatial information of points rather than normals and curvatures, which are shown to be sensitive to noise [13], it manages to behave more robustly on noisy data. In addition, no projection is performed during feature encoding as in many prior methods [13,16], LoVS exhibits decent discriminative power. A drawback of the LoVS descriptor, as mentioned in Section 1, is the low efficiency in storage and matching arises from its high dimensionality. 3. Bit selection for 3D binary descriptors b ∈{ 0 , 1 }
i Formally, a binary descriptor can be denoted by f = {bi }i=1 . We wish to select Nbs bits from the initial N bits with ,2,··· ,N
bt ∈f Nbs < N to generate a more compact descriptor fbs = { bt }t=1 , where { bt } denote a permutation of the selected bits. ,2,··· ,N bs
This section describes the bit-selection algorithms we have considered to achieve such goal. They are either supervised or unsupervised, specifically including Correlation [31], Entropy [46], Coding [30], Matching [30], Coding+Matching [30], and AdaBoost [47]. In addition, uniform sampling and random sampling are considered as two baselines. Random sampling is straightforward to implement. To perform uniform sampling, we evenly select bits by keeping the interval between any two neighboring bits as NN (· being a round-down operator). bs
We consider these bit-selection methods because both supervised and unsupervised techniques are included that may satisfy requirements by various tasks. In the following, we concretely describe the main steps of these bit-selection algorithms. 3.1. Unsupervised algorithms Unsupervised algorithms tend to evaluate the significance and/or correlation of each bit with statistics from a large number of extracted binary features. Correlation [31]. This method selects bits based on the rule that selected bits should have high variance and low correlation with each other. First, a raw binary descriptor (LoVS descriptor in our case) is extracted for each sample in the training data. Second, the bits are ordered in the raw descriptor by the distance of the mean from 0.5 where the mean is calculated for a particular bit on all training descriptors, and subsequently put the first bit into fbs . Third, compare the next bit with all the elements in fbs . If the correlation is greater than a threshold τ corr , discard it. Otherwise, add it to fbs . By repeating the third step until there are Nbs bits in fbs . If not, raise τ corr and try again.
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
37
Entropy [46]. Entropy simply ranks bits based on the entropy calculated for each bit over a training set of descriptors. In particular, the entropy of a bit bi is defined as:
H ( bi ) = −
pi (x )log2 pi (x ),
(1)
x ∈0,1
where pi (x) represents the percentage of bi being x that is obtained by analyzing a training set of descriptors. Then, elements in f are ranked in a decreasing order according to the entropy, and the final fbs is composed of the top Nbs elements after ordering. Coding [30]. Rather than using entropy as the rule for selection as in Entropy, Coding considers the conditional entropy. Following Eq. 1, the entropy of the i-th bit can be calculated as H(bi ). The conditional entropy H (bi1 |bi2 ) is defined as:
H ( bi1 |bi2 ) =
pi1 i2 (x, y )log2
x ∈0,1 y∈0,1
pi1 ( x ) . pi1 i2 (x, y )
(2)
This method further models the descriptor after bit-selection as a Markov source of the first order, i.e., H ( bt | bt−1 ). Therefore, 1 the bits can be selected iteratively. Through choosing the first bit as the one with the maximum entropy , the following bits can be located using the following rule:
bt = arg max H (bi | bt−1 ).
(3)
3.2. Supervised algorithms Supervised algorithms opt for a feature-matching-guided strategy to select reasonable bits. Bits that boost the feature matching accuracy are more likely to be selected. Matching [30]. The idea behind Matching is taking the joint distribution of bits computed in the case of matching and non-matching local patches into consideration. Given a set M of matching pairs and a set N of non-matching pairs, we can compute the mutual information of each bit bi in f as:
I M ( bi ) =
pM i (x, y )log2
x ∈0,1 y∈0,1
pM (x, y ) i , pi ( x ) pi ( y )
(4)
where pM (x, y ) measures the joint probability of zeros and ones in the descriptors of matching pairs. For instance, pM ( 0, 0 ) i i indicates that the probability of the i-th bit of both descriptors of a matching pair being zero. Analogously, the mutual information of each bit in the non-matching pair set N can be calculated as IN (bi ). Accordingly, the following scoring function can be computed to measure the quality of bit bi :
Ji = IM (bi ) − IN (bi ).
(5)
The eventual descriptor fbs is generated by concatenating the top-Nbs bits according to Ji . Coding+Matching [30]. This method is a hybrid version of Coding and Matching that aims at achieving a trade-off between both methods. The Nbs bits are determined by virtue of the following greedy approach:
bt = arg max α (IM (bi ) − IN (bi )) + (1 − α )H (bi | bt−1 ),
(6)
where α ∈ (0, 1) controls the weight of each method. Notably, the coding term, i.e., H (bi | bt−1 ), is replaced by H(bi ) to calculate the first bit. AdaBoost [47]. AdaBoost, as originally proposed by Viola and Jones [43], is designed to select a small set of face features that can optimally recognize face and non-face images. Because the local feature matching problem could also be seen as differentiating matches from non-matches, Yang and Cheng [47] presented the first application of AdaBoost on bit-selection to shorten a binary descriptor while maintaining its quality. Specifically, each bit in f is served as a classifier, and the best subset of classifiers are determined by means of the following strategy. Given a training data set T = {X j , Y j } j=1,2,··· ,M , where Xj is a pair of point cloud patch and Yj is the label of Xj (Y j = 1 indicates a matching pair and Y j = 0 otherwise), we first compute the N-bit f descriptors for all patches in T . Second, set 1 equal weight d j = M to all training pairs in X for initialization and compute a matching error for each bit as:
( bi ) =
1 M |Y j − Yj |, j=1 M
(7)
∈ {0, 1} is the predicted label of bit b and equals 1 only if the values of the ith bit of both descriptors extracted where Y j i from training sample Xj are identical. Third, for t = 1 to Nbs , find a bit bt with the minimum accumulated classification error:
bt = arg min accu (t ), 1
We experimentally confirmed that the bit with the maximum entropy is more reasonable.
(8)
38
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
where
accu (t ) = accu (t − 1 ) + (bi )
(9)
and accu (0 ) = 0. If ( bt ) < 0.5, then update the weight of each training sample in X for the next bit-selection process as:
dt+1, j =
dt, j ϕt, j σt e , Zt
(10)
= Y and ϕ = −1 otherwise. If ( where Zt is a normalizing factor, σt = 0.5In 1−(bt ) , and ϕt, j = 1 if Y bt ) ≥ 0.5, which j j t, j
(bt )
indicates that the selected bit bt is even less effective than random classification, switch to a new training set and reset 1 weights to M for all training samples. 4. Experiments This section presents the experiments performed to verify the effectiveness of bit-selection algorithms when applied in 3D local shape description case, and quantitatively assess the quality of the generated compact LoVS variants after bitselection. We first present the description of our experimental setup, including datasets, training details, criteria, parameter settings of bit-selection algorithms, and compared state-of-the-art local shape descriptors. Next, results regarding feature matching, compression, and efficiency are enumerated and discussed. Finally, some visual matching examples are given to illustrate the effectiveness of compressed LoVS descriptors on real-world applications. We implemented our method using Visual C++ on a PC with a 3.3 GHz CPU and 8 GB of RAM. 4.1. Setup 4.1.1. Datasets Four standard datasets are selected for experiments: the Bologna 3D Retrieval (B3R) [39] dataset, UWA 3D Object Recognition (U3OR) dataset [23,24], UWA 3D Registration (U3M) [24] dataset, and Bologna Mesh Registration (BMR) [38] dataset. They are collected for different application scenarios including 3D retrieval, object recognition, and point cloud/mesh registration. Moreover, the considered four datasets possess various data modalities, either scanned by LiDAR or Kinect. Therefore, we are able to investigate the performance of different bit-selection algorithms thoroughly under different contexts and perturbations. More details, including application context, acquisition, challenges, number of source shapes and target shapes, and matching pairs, are given in Table 1. Fig. 2 also visualizes some samples taken from the four experimental datasets. The ground truth transformations are from the publishers for the B3R and U3OR datasets. For the U3M and BMR datasets, we use the ground truth data given by the authors of [45], which were obtained via first manual alignment and then ICP [5] refinement. 4.1.2. Training The bit-selection algorithms investigated in this paper are either supervised or unsupervised. For supervised methods, i.e., Matching, Coding+Matching, and AdaBoost, we use 50% of the matching shapes in each dataset for training and the rest for testing. To generate positive training samples, 30 0 0 points are randomly selected first from the source shape and their corresponding points (if exist) are located using the ground truth transformation. Then, corresponding matching pairs between the source shape and the target shape are served as positive training samples. To generate negative training samples, we set the number of non-corresponding pairs to Nneg/pos times of the number of corresponding pairs. We determine the value of Nneg/pos based on a tuning experiment on the U3OR dataset. Specifically, we test the performance of 96-bit LoVS descriptors compressed by two typical supervised methods, i.e., Matching and AdaBoost, on the U3OR dataset when varying Nneg/pos . The result, as shown in Fig. 3, suggests that Nneg/pos > 1.5 is adequate for supervised methods to achieve good performance. We therefore set Nneg/pos to 2 in our experiments. Table 1 Properties of four experimental datasets. The symbol ‘∗’ indicates a synthetic dataset. Each dataset is composed of a set of source shapes (denoted by PS ) and a set of target shapes (denoted by PT ), where source shapes are matched against some of the target shapes. In the U3M and BMR datasets that consist of several sets of views from different objects, a view is matched against all other views from the same object. For the U3M and BMR datasets, we denote the number of source shapes as the number of views and use a symbol ‘−’ to denote the number of target shapes. The number of matching shapes (PS , PT ) in these two datasets refers to the number of view pairs whose overlap ratio [25] is greater than 0.1. No.
Datasets
Acquisition
Challenges
# PS
# PT
# ( PS , PT )
1 2 3 4
B3R [39] U3OR [23,24] U3M [24] BMR [38]
LiDAR∗ LiDAR LiDAR Kinect
Gaussian noise Clutter and occlusion Partial overlap Partial overlap and real noise
6 5 85 95
18 50 − −
54 188 496 485
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
39
Fig. 2. One sample source shape and two sample target shapes (from left to right per sub-figure) from the B3R [39], U3OR [23,24], U3M [24], and BMR [38] datasets. All shapes are visualized in the mesh representation after triangulation.
Fig. 3. AUC performance of 96-bit LoVS descriptors compressed by Matching and AdaBoost on the U3OR dataset when varying Nneg/pos .
For unsupervised methods, i.e., Correlation, Entropy, and Coding, we compute the percentages of each bit being 0 or 1 over a large set of local patches as the way in the 2D domain [30,31]. Specifically, we consider 10 publicly available models including Angel, Birds, Cranium, Rabbit, Pump house, Psu, Thermostat, White plastic wheel, Cobber disc, and Galvanized fork from a recent large-scale dataset [35], as shown in Fig. 4. On each model, 30k points are sampled, resulting 300k LoVS descriptors for off-line training. 4.1.3. Criteria As suggested by many previous studies [13,38,45], we use 1-Precision v.s. Recall Curve (PRC) to measure the feature matching quality of a feature descriptor. To compute PRC, 10 0 0 points are randomly sampled on the source shape and their corresponding points (if they exist) are found using the ground truth transformation. Then, a source feature is matched against all target features. If the ratio of the nearest feature distance to the second nearest feature distance is smaller than a threshold, the source feature and its nearest neighbor in the target feature set is served as a match. If the match is geometrically consistent with the ground truth transformation, the match is judged as correct. Here, precision refers to the ratio of the number of descriptor-identified correct matches to the number of matches, and recall refers to the ratio of the number of descriptor-identified correct matches to the total number of correct matches. By varying the threshold, a curve will be generated. In addition to PRC, we also consider the AUC metric [8], i.e., the area under PRC, to aggregately measure the precision and recall performance of a descriptor. An ideal descriptor is desired to achieve an AUC value of 1. To assess the compression performance of different bit-selection algorithms, we utilize the Normalized AUC [12] metric that refers to the ratio of the AUC after bit-selection to the AUC of the raw descriptor. By drawing Nbs v.s. Normalized AUC figures, we can observe the quantitative compression performance of a certain bit-selection algorithm. A Normalized AUC value of 1 with Nbs bits indicates that the compressed Nbs -bit descriptor after bit-selection obtains identical feature matching performance with the original descriptor. An ideal bit-selection algorithm is supposed to attain a high Normalized AUC value using a few bits.
40
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
Fig. 4. 10 models taken from the dataset provided in [35], including either geometry-rich and geometry-poor shapes that are used for descriptor analysis for unsupervised bit-selection algorithms.
4.1.4. Parameters and compared methods There are two parameters in the considered bit-selection algorithms, i.e., τ corr in Correlation and α in Coding+Matching. We use a value of 0.65 for τ corr , and make α identical to the setting in the original paper [30], i.e., 0.75. For the setting of Nbs , i.e., the number of bit to be selected from the 729-bit LoVS feature, we consider the following range, i.e., {16, 32, 64, 96, 128, 256, 384, 512}. In addition to comparing different bit-selection algorithms, we also compare the compressed LoVS descriptors with several state-of-the-art 3D local shape descriptors, including spin image [16], SHOT [38], RoPS [13], and LoVS [29]. Spin image is the most cited real-valued descriptor in this field; SHOT and RoPS are two real-valued descriptors that have been demonstrated to achieve superior feature matching performance in a set of datasets [13,33]; LoVS is considered here to measure the performance variation after bit-selection. Their parameters are reported in Table 2. Because the resulted descriptors after bit-selection are generated from the raw LoVS descriptors, they also have the same support length (i.e., the scale of a local surface patch) with other compared descriptors, making the comparison fair. 4.2. Feature matching performance In the following, we provide the PRC and AUC performances of tested descriptors on each experimental dataset. Because of space limitations, we consider Nbs ∈ {32, 96, 256, 512} in this section. We will provide more detailed and aggregated results in Section 4.3. 4.2.1. The B3R dataset Fig. 5 gives the feature matching results on the B3R dataset. We can see that either AdaBoost or Matching achieves the best performance among all considered bit-selection algorithms for 32, 96, and 256 selected bits. A common trait of both methods is that matching supervision is enforced during training. Regarding unsupervised methods, Coding and Correlation achieve better performance than other unsupervised ones with Nbs = 32 and Nbs = 96, respectively. For larger values of Nbs , all compressed LoVS descriptors show similar performance except Random. This indicates the effectiveness of bit-selection even for cases that discard only a few bits. It is interesting to note that Uniform also delivers good performance in many cases, e.g., Nbs being 96 and 256. This is because LoVS is a signature-based descriptor, and uniform sampling can be treated as a coarse sampling version of LoVS that is also shown to be discriminative. We can also find that LoVS, after discarding approximately 87% bits, is able to outperform all compared real-valued descriptors with only 96 bits as shown in Fig. 5(b). When compared with the raw 729-bit LoVS descriptor, parallel performance is achieved with 512 bits.
Table 2 Parameters of compared local shape descriptors. pr denotes point cloud resolution. All descriptors are computed within a spherical volume except LoVS, which is computed in a cubic volume. The support radius of the spherical volume coincides with the support length of the cubic volume ( 12 of the cubic side length) as suggested in [29].
Spin image [16] SHOT [38] RoPS [13] LoVS [29]
Type
Dimension
Real-valued Real-valued Real-valued Binary
225 352 135 729
(15 × 15) (8 × 2 × 2 × 11) (3 × 3 × 3 × 5) (9 × 9 × 9)
Storage (bit)
Scale (pr)
1800 (225 × 8) 2816 (352 × 8) 1080 (135 × 8) 729 (729 × 1)
15 15 15 15
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
41
Fig. 5. PRC and AUC (shown in square brackets) performance of tested descriptors on the B3R dataset. PRCs of binary and real-valued descriptors are depicted here and hereinafter using solid lines and dashed lines, respectively.
4.2.2. The U3OR dataset For the results on the U3OR dataset, as shown in Fig. 6, one can observe that AdaBoost achieves the best performance for all values of Nbs . Moreover, AdaBoost-compressed LoVS manages to exceed spin image and SHOT with only 32 bits. For 96 selected bits, AdaBoost-compressed LoVS outperforms all compared real-valued descriptors. One can also observe that the gap between AdaBoost and other bit-selection algorithms becomes more obvious as Nbs gets smaller. All above findings demonstrate two facts. One is that AdaBoost is the most superior bit-selection algorithm for LoVS compression on the U3OR dataset that addresses the 3D object recognition scenario. The other is that LoVS holds the potential of using very few bits (32 bits on the U3OR dataset) to achieve acceptable performance for 3D object recognition. Different from the results on the B3R dataset, 256 bits are adequate to achieve comparable performance with the original LoVS descriptor with the help of proper bit-selection algorithms. This also reveals the information redundancy contained in the original LoVS descriptor when performing feature matching in object recognition scenarios.
4.2.3. The U3M dataset Fig. 7 presents the results of tested descriptors on the U3M dataset. Three main observations can be made from the figure. First, the performance of the same descriptor on different datasets varies significantly. For example, spin image is inferior to most other descriptors on the B3R dataset but exhibits the best performance among real-valued descriptors on the U3M dataset. This reflects the challenge of achieving pleasurable performance in different application and perturbation contexts. However, LoVS and its variants after bit-selection still behave satisfactorily. Second, three supervised bit-selection algorithms, i.e., Matching, Coding+Matching, and AdaBoost, show better feature matching performance than those unsupervised methods. The relative margin is obvious with small values of Nbs , e.g., 32 and 96, indicating that bits owing low matching loss during training are more discriminative than those with high information variance typically in point cloud registration scenario. Third, 256 bits are sufficient to achieve similar performance as the raw LoVS descriptor. One can also find that the 96-bit Matching-compressed LoVS already outperforms SHOT and RoPS by a large margin, and 256-bit AdaBoost-compressed as well as Matching-compressed LoVS descriptors exceed all compared real-valued descriptors on this dataset.
42
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
Fig. 6. PRC and AUC (shown in square brackets) performance of tested descriptors on the U3OR dataset.
4.2.4. The BMR dataset The BMR dataset is more challenging than the former three datasets because of severe real-noise brought by the low-cost Kinect sensor. The results on the BMR dataset are shown in Fig. 8. An interesting phenomenon can be found in that bitselection is especially effective for selecting reasonable bits on more challenging data. Specifically, 96 bits suffice to achieve better performance than the three real-valued descriptors, as suggested by Fig. 8(b). Moreover, even better performance is achieved than the raw LoVS descriptor when selecting 512 bits using any of the supervised bit-selection algorithms as witnessed by Fig. 8(d). This verifies our opinion that not every bit in a binary descriptor has a positive contribution to feature matching. Besides redundant bits (as already demonstrated by the results on the former three datasets), there are also noisy bits. This is potentially because exiting local geometric feature descriptors hold the assumption that the local surface to be encoded is complete [45]. However in applications such as point cloud registration and 3D objection recognition, local surfaces are not promised to be complete due to partial overlap and occlusion. Thus, some bits may bring side effects. Removing those redundant and noisy bits endows a feature descriptor with better distinctiveness and compactness. It is also worth noting that the unsupervised Coding algorithm achieves the best performance among compressed descriptors with 32 selected bits. However, as Nbs increases, supervised methods still exhibit better performance than unsupervised methods, which is consistent with the results on the other three datasets. 4.2.5. Overall feature matching performance Table 3 summarizes the performance shown from Figs. 5 to 8 by listing top-ranked bit-selection methods on each dataset. Based on the overall feature matching performance of tested descriptors on the four datasets, we give the following summary. First, supervised bit-selection algorithms are generally superior to unsupervised ones. This is not confusing, as the ultimate goal of descriptors in feature matching context is generating correct matches, and matching-supervised methods focus on selecting bits that can better distinguish matches and non-matches. By contrast, unsupervised methods concentrate on the information variance of each bit. Nevertheless, bits with high variance are not guaranteed to be appropriate for feature matching intuitively. Second, bit-selection is shown to be effective for 3D local binary descriptors. Because we have witnessed that satisfactory performance is retained with few selected bits on all four datasets. In particular, we find that 96 bits are sufficient to achieve comparable or even better performance with the state-of-the-art real-valued descriptors. In
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
43
Fig. 7. PRC and AUC (shown in square brackets) performance of tested descriptors on the U3M dataset.
addition, 512, 256, 256, and 256 bits are required to achieve similar performance as the raw 729-bit binary descriptor on the B3R, U3OR, U3M, and BMR datasets, respectively. We have also witnessed the performance gain even when discarding bits on the BMR dataset. This highlights that performing bit-selection on 3D local binary descriptors is advantageous for achieving compact and distinctive binary descriptors. 4.3. Compression performance As we hope to use less bits to achieve comparable or even better performance than the raw binary descriptor, we examine the compression performance of each bit-selection algorithm on the four experimental datasets. The results are shown in Fig. 9. On the B3R dataset, Coding achieves the best compression performance with Nbs = 16, followed by Entropy. As Nbs increases, AdaBoost becomes the best one. On the U3OR dataset, AdaBoost consistently exceeds other bit-selection algorithms for all values of Nbs . The margin is dramatic with Nbs being 32 and 64. AdaBoost is also the best algorithm on the U3M dataset with Nbs being 16 and 32. When Nbs gets larger than 32, Matching, Coding+Matching, and AdaBoost generally outper-
Table 3 Top-ranked bit-selection methods on experimental datasets. US1–US5: five unsupervised methods including Random, Uniform, Correlation, Entropy, and Coding; S1–S3: three supervised methods including Matching, Coding+Matching, and AdaBoost.
B3R [39] U3OR [23,24] U3M [24] BMR [38]
32 bits
96 bits
256 bits
512 bits
S3, US5 S3 S3 US5, S3
S1, S3 S3, US3 S1–S3 S1–S3
S1, US2 S1, S3, US2 S1–S3 S1, S3
US2–US5, S1–S3 US3–US5, S1–S3 S1, S3 S1–S3
44
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
Fig. 8. PRC and AUC (shown in square brackets) performance of tested descriptors on the BMR dataset.
form other methods. It is remarkable that Matching and AdaBoost yield a Normalized AUC value that is larger than 1 with 512 selected bits, indicating that better performance is achieved than the original LoVS descriptor. This finding can be also seen on the BMR dataset for these two methods when Nbs equals 384 and 512. A salient observation on the BMR dataset can be made that the best performance of Matching and AdaBoost is achieved with 384 bits, rather than 512 bits. However, the performance of other methods generally improves as Nbs increases. This is because noisy bits are contained in the raw LoVS descriptor, and adding more bits is not guaranteed to achieve a performance gain.
4.4. Efficiency The efficiency performance of a feature descriptor is critical to applications on mobile devices. Here, we list the storage and matching efficiency results of eight descriptors on the experimental datasets, where the AUC performance is taken into consideration as well. The eight descriptors include the compared three real-valued descriptors, the raw LoVS descriptor as well as LoVS variants after bit-selection that have 32, 96, 256, and 512 bits, respectively. On each dataset, we consider the compressed LoVS descriptor that achieves the best performance among all compressed ones. Regarding feature matching efficiency, the average brute-force matching time for one shape pair over a whole dataset is collected for each descriptor. Results of storage efficiency and feature matching efficiency are displayed in Figs. 10 and 11, respectively. Fig. 10 suggests that LoVS descriptors after bit-selection are much more compact than those real-valued descriptors. Additionally, better AUC performance is also achieved by compressed LoVS descriptors, mostly with mere 96 bits, indicating the new generated binary descriptors by performing bit-selection on LoVS appear to be both compact and distinctive. Moreover, almost comparable performance is attained with 256 bits for shorted LoVS descriptors when compared with the initial LoVS on all datasets. In Fig. 11, one can see that the compressed LoVS descriptors occupy less time to perform feature matching than real-valued descriptors. The 135-byte RoPS descriptor is also efficient for matching whose time cost is similar to that required by the 512-bit compressed LoVS descriptor. Nonetheless, the AUC performance of RoPS is dramatically inferior. Above observations demonstrate that employing proper bit-selection algorithms on the LoVS descriptor can generate binary descriptors that satisfy the distinctiveness, efficient storage, and efficient matching requirements simultaneously.
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
45
Fig. 9. Compression performance of eight bit-selection algorithms in terms of different numbers of selected bits on the four experimental datasets.
Fig. 10. Storage v.s. AUC performance of four LoVS variants (respectively with 32, 96, 256, and 512 bits) generated by performing bit-selection on the initial LoVS descriptor and four competitors on four experimental datasets.
4.5. Visual matching examples Finally, we present some visual matching examples by compressed LoVS descriptors after bit-selection. Four 2.5D scene point cloud pairs with partial overlap were taken from the Microsoft 7-Scenes dataset [34] (obtained via the Kinect sensor), and we use the 96-bit Matching-based LoVS descriptor trained on the U3M dataset to perform local feature description. To conduct registration, we follow the pipeline in [29] with a drop-in descriptor replacement. The results are shown in Fig. 12. The figure suggests that using the 96-bit compressed descriptor is able to produce sufficient reasonable point-to-point correspondences between scene fragments with only partial overlap and real noise, which eventually contributes to successful registration. This clearly demonstrates the effectiveness of the compact variants of LoVS generated via bit-selection in practical applications. We also note that we did not train the descriptor on the Microsoft 7-Scenes dataset, which shows certain generalization ability of the compressed descriptor after bit-selection.
46
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
Fig. 11. Average matching time (per shape pair) v.s. AUC performance of four LoVS variants (respectively with 32, 96, 256, and 512 bits) generated by performing bit-selection on the initial LoVS descriptor and four competitors on four experimental datasets.
Fig. 12. Visual matching examples by the 96-bit Matching-based compressed LoVS descriptor trained on the U3M dataset on four 2.5D scene point cloud pairs from the Microsoft 7-Scenes dataset [34]. The pipeline in [29] is employed to perform registration. From left to right: two initial point clouds in red and blue, consistent feature correspondences generated by the compressed descriptor, and two point clouds after registration. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
5. Summary and discussion With the outcomes presented in Section 4, this section first gives a summary of the core experimental findings and relevant analysis. Then, we summarize the performance and peculiarities of each tested bit-selection approach. Results. The following results present valuable findings by our study. (1) There exist redundant/noisy bits in initial 3D local binary descriptors. First, we find that fewer bits can achieve comparable performance on all tested datasets. This indicates the fact that redundancy exist in raw descriptors. Second, on the U3M and BMR datasets, shortened descriptors with proper bit-selection methods even outperform the initial descriptor, showing that some bits are noisy and result in adverse impacts. The reason is that the initial descriptor is designed in a heuristic manner without considering the data properties of a particular application and the way that concatenating all bits as the descriptor to cope with different challenges is unsuitable. This is because different
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
47
datasets have various geometric patterns. Therefore, it is reasonable that data-driven approaches (e.g., bit-selection) can identify some less distinctive or noisy bits and achieve compact yet distinctive binary descriptors. (2) Bit selection is more effective on challenging datasets. Results in Fig. 9 suggest that 512 bits are required to achieve parallel performance as the initial descriptor on the B3R dataset, but less (256 or 384) bits are sufficient for the other datasets that are more challenging due to partial overlap, clutter, and occlusion. This is because the corresponding patches in the B3R dataset addressing shape retrieval scenario have more overlapping areas, thus theoretically more bits can better reflect the similarity of two patches. By contrast, corresponding patches in other datasets have lower overlapping ratios, making some bits representative while others contribute less for feature matching. Such a case appears more frequently in real-world applications [44,45], highlighting the practicability of bit-selection on 3D local binary descriptors. Methods. For each tested bit-selection method, it is desired to summarize its behavior on the problem studied in this paper and provide the necessary explanations. (1) Random and Uniform. As two baselines, Random performs bit-selection without assessing the quality of each bit and is inferior to other methods; Uniform sometimes achieves better performance than other unsupervised ones because LoVS is a signature-type descriptor and uniformly sampling bits can be served as a “coarse” approximation of the initial descriptor. This preserves the basic structure of LoVS. (2) Correlation, Entropy, and Coding. All of these three unsupervised methods assign a confidence score to each bit, yet based on different rules. Meanwhile, Entropy assesses bits independently but Correlation and Coding take other bits into consideration. As previously demonstrated that redundant bits exist, some bits in the initial descriptor are therefore closely correlated, and it is more reasonable to consider other bits while selecting a particular bit. Experimental results confirm the rationality of our analysis because either Correlation or Coding achieves the best performance in most cases among the three methods. (3) Matching and AdaBoost. Both methods achieve top-ranked performance on the U3M and BMR datasets, and AdaBoost exceeds Matching on the B3R and U3OR datasets. Because the eventual goal of local descriptors is finding correct feature correspondences, the two methods with matching supervision significantly outperform unsupervised ones. For these two methods, AdaBoost behaves better because it additionally considers previously selected bits to judge the quality of the current bit, while Matching ignores the effect of other bits. As the final shortened descriptor is the concatenation of all selected bits, evaluating the significance of each bit should take those selected bits into consideration. (4) Coding+Matching. The idea of combining variance and feature matching scores does not boost the performance compared to only considering the latter one. This is because bits with high variance are not guaranteed to attain a performance gain in terms of feature matching. As a result, the feature matching performance on experimental datasets generally deteriorates when using Coding+Matching as compared to Matching. 6. Conclusion and future work This paper has investigated several bit-selection approaches for attaining compact representations of 3D local binary descriptors. Specifically, eight bit-selection methods are studied and employed to shrink the LoVS descriptor, a distinctive yet high-dimensional binary shape descriptor. The promising experimental results on four datasets addressing a variety of application scenarios have demonstrated the validity of achieving light-weight 3D local descriptors with bit-selection. In addition, the attained descriptors are able to behave better than several renowned float descriptors with dramatically less storage and feature matching time cost. As early as the 1990s, research efforts characterized the field of 3D local geometric description, and over the last two decades, a great quantity of 3D local descriptors have been proposed. However, the era of crafting binary descriptors finally arrives in 2015 [27] to cater for applications with strict demands on efficiency. Having realized the situation that existing 3D local binary descriptors are still high-dimensional, we suggest handling this issue with bit-selection, which may foster a new research direction and inspire subsequent research. In our future work, we expect to further shrink the binary descriptor’s length with less distinctiveness loss, by developing more advanced bit-selection algorithms for 3D local binary descriptors. This is because we have observed that most existing bit-selection algorithms tend to select bits independently. However, the ultimate goal is selecting bits that result in the best combinational performance. Seeking such combination may be achieved by resorting to deep neural networks that assign labels to bits simultaneously. We also expect an application of bit-selection-based low-dimensional descriptors to efficient point cloud registration and 3D object recognition in mobile devices, such as telephones and drones. Declaration of competing interest None. Acknowledgments The authors would like to acknowledge the Stanford 3D Scanning Repository, the University of Bologna, the University of Western Australia, the Technical University of Denmark, and Microsoft Research for making their datasets publicly available
48
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
to us. Our work is supported by the Wisdom of Marine Science and Technology Foundation (Grant no. 2015HUST) and Shanghai Aerospace Science and Technology Foundation (Grant no. sast2016063). References [1] A. Alahi, R. Ortiz, P. Vandergheynst, Freak: Fast retina keypoint, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 510–517. [2] J. Baber, M.N. Dailey, S. Satoh, N. Afzulpurkar, M. Bakhtyar, Big-oh: binarization of gradient orientation histograms, Image Vis. Comput. 32 (11) (2014) 940–953. [3] V. Balntas, L. Tang, K. Mikolajczyk, Binary online learned descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 40 (3) (2017) 555–567. [4] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Trans. Pattern Anal. Mach. Intell. 24 (4) (2002) 509–522. [5] P.J. Besl, N.D. McKay, Method for registration of 3-d shapes, IEEE Trans. Pattern Anal. Mach. Intell. 14 (2) (1992) 239–256. [6] M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, P. Fua, Brief: computing a local binary descriptor very fast, IEEE Trans. Pattern Anal. Mach. Intell. 34 (7) (2012) 1281–1298. [7] M. Calonder, V. Lepetit, C. Strecha, P. Fua, Brief: binary robust independent elementary features, in: Proceedings of the European Conference on Computer Vision, Springer, 2010, pp. 778–792. [8] J. Davis, M. Goadrich, The relationship between precision-recall and roc curves, in: Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006, pp. 233–240. [9] Y. Duan, J. Lu, J. Feng, J. Zhou, Context-aware local binary feature learning for face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 40 (5) (2018) 1139–1153. [10] Y. Duan, J. Lu, Z. Wang, J. Feng, J. Zhou, Learning deep binary descriptor with multi-quantization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1183–1192. [11] Y. Gao, W. Huang, Y. Qiao, Local multi-grouped binary descriptor with ring-based pooling configuration and optimization, IEEE Trans. Image Process. 24 (12) (2015) 4820–4833. [12] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, N.M. Kwok, A comprehensive performance evaluation of 3d local feature descriptors, Int. J. Comput. Vis. 116 (1) (2016) 66–89. [13] Y. Guo, F. Sohel, M. Bennamoun, M. Lu, J. Wan, Rotational projection statistics for 3d local surface description and object recognition, Int. J. Comput. Vis. 105 (1) (2013) 63–86. [14] Y. Guo, F. Sohel, M. Bennamoun, et al., A novel local surface feature for 3d object recognition under clutter and occlusion, Inf. Sci. (Ny) 293 (2015) 196–213. [15] G. Hua, M. Brown, S. Winder, Discriminant embedding for local image descriptors, in: Proceedings of the IEEE International Conference on Computer Vision, IEEE, 2007, pp. 1–8. [16] A.E. Johnson, M. Hebert, Using spin images for efficient object recognition in cluttered 3d scenes, IEEE Trans. Pattern Anal. Mach. Intell. 21 (5) (1999) 433–449. [17] Y. Ke, R. Sukthankar, Pca-sift: A more distinctive representation for local image descriptors, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2, IEEE, 2004, p. II. [18] M. Khoury, Q.-Y. Zhou, V. Koltun, Learning compact geometric features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 153–161. [19] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91–110. [20] J. Lu, V.E. Liong, J. Zhou, Simultaneous local binary feature learning and encoding for homogeneous and heterogeneous face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 40 (8) (2017) 1979–1993. [21] J. Lu, V.E. Liong, X. Zhou, J. Zhou, Learning compact binary face descriptor for face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (10) (2015) 2041–2056. [22] F. Malaguti, F. Tombari, S. Salti, D. Pau, L.D. Stefano, Toward compressed 3d descriptors, in: Proceedings of the International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 2013, pp. 176–183. [23] A. Mian, M. Bennamoun, R. Owens, On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes, Int. J. Comput. Vis. 89 (2–3) (2010) 348–361. [24] A.S. Mian, M. Bennamoun, R. Owens, Three-dimensional model-based object recognition and segmentation in cluttered scenes, IEEE Trans. Pattern Anal. Mach. Intell. 28 (10) (2006) 1584–1601. [25] A.S. Mian, M. Bennamoun, R.A. Owens, A novel representation and feature matching algorithm for automatic pairwise registration of range images, Int. J. Comput. Vis. 66 (1) (2006) 19–40. [26] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell. (7) (2002) 971–987. [27] S.M. Prakhya, B. Liu, W. Lin, B-shot: A binary feature descriptor for fast and efficient keypoint matching on 3d point clouds, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2015, pp. 1929–1934. [28] S.M. Prakhya, B. Liu, W. Lin, K. Li, Y. Xiao, On creating low dimensional 3d feature descriptors with pca, in: Proceedings of the IEEE Region Ten Conference, IEEE, 2017, pp. 315–320. [29] S. Quan, J. Ma, F. Hu, B. Fang, T. Ma, Local voxelized structure for 3d binary feature representation and robust registration of point clouds from low-cost sensors, Inf. Sci. (Ny) 444 (2018) 153–171. [30] A. Redondi, L. Baroffio, J. Ascenso, M. Cesano, M. Tagliasacchi, Rate-accuracy optimization of binary descriptors, in: Proceedings of the IEEE International Conference on Image Processing, IEEE, 2013, pp. 2910–2914. [31] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, Orb: An efficient alternative to sift or surf, in: Proceedings of the IEEE International Conference on Computer Vision, 2011, pp. 2564–2571. [32] R.B. Rusu, N. Blodow, M. Beetz, Fast point feature histograms (fpfh) for 3d registration, in: Proceedings of the IEEE International Conference on Robotics and Automation, 2009, pp. 3212–3217. [33] S. Salti, F. Tombari, L. Di Stefano, Shot: unique signatures of histograms for surface and texture description, Comput. Vision Image Understand. 125 (2014) 251–264. [34] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, A. Fitzgibbon, Scene coordinate regression forests for camera relocalization in rgb-d images, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2930–2937. [35] T. Solund, A.G. Buch, N. Kruger, H. Aanas, A large-scale 3d object recognition dataset, in: Proceedings of the International Conference on 3D Vision, IEEE, 2016, pp. 73–82. [36] C. Strecha, A. Bronstein, M. Bronstein, P. Fua, Ldahash: improved matching with smaller descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 34 (1) (2012) 66–78. [37] F. Tombari, L. Di Stefano, Object recognition in 3d scenes with occlusions and clutter by hough voting, in: Proceedings of the Fourth Pacific-Rim Symposium on Image and Video Technology, IEEE, 2010, pp. 349–355. [38] F. Tombari, S. Salti, L. Di Stefano, Unique signatures of histograms for local surface description, in: Proceedings of the European Conference on Computer Vision, 2010, pp. 356–369. [39] F. Tombari, S. Salti, L. Di Stefano, Performance evaluation of 3d keypoint detectors, Int. J. Comput. Vis. 102 (1–3) (2013) 198–220.
S. Quan and J. Ma / Information Sciences 510 (2020) 33–49
49
[40] A. Torralba, R. Fergus, Y. Weiss, Small codes and large image databases for recognition (2008) 1–8. [41] A.T. Tra, W. Lin, A. Kot, Dominant sift: a novel compact descriptor, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2015, pp. 1344–1348. [42] T. Trzcinski, V. Lepetit, Efficient discriminative projections for compact binary descriptors, in: Proceedings of the European Conference on Computer Vision, Springer, 2012, pp. 228–242. [43] P. Viola, M.J. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57 (2) (2004) 137–154. [44] J. Yang, Z. Cao, Q. Zhang, A fast and robust local descriptor for 3d point cloud registration, Inf. Sci. (Ny) 346 (2016) 163–179. [45] J. Yang, Q. Zhang, Y. Xiao, Z. Cao, Toldi: an effective and robust approach for 3d local shape description, Pattern Recogn. 65 (2017) 175–187. [46] X. Yang, K.-T. Cheng, Ldb: An ultra-fast feature for scalable augmented reality on mobile devices, in: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, IEEE, 2012, pp. 49–57. [47] X. Yang, K.-T. Cheng, Local difference binary for ultrafast and distinctive feature description, IEEE Trans. Pattern Anal. Mach. Intell. 36 (1) (2014) 188–194. [48] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, T. Funkhouser, 3dmatch: Learning local geometric descriptors from rgb-d reconstructions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2017, pp. 199–208. [49] W. Zhou, H. Li, R. Hong, Y. Lu, Q. Tian, Bsift: toward data-independent codebook for large scale image search, IEEE Trans. Image Process. 24 (3) (2015) 967–979.