Point Encoder GAN: A Deep Learning Model for 3D Point Cloud Inpainting
Communicated by Dr. H. Yu
Journal Pre-proof
Point Encoder GAN: A Deep Learning Model for 3D Point Cloud Inpainting Yikuan Yu, Zitian Huang, Fei Li, Haodong Zhang, Xinyi Le PII: DOI: Reference:
S0925-2312(19)31735-7 https://doi.org/10.1016/j.neucom.2019.12.032 NEUCOM 21661
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
30 June 2019 1 December 2019 6 December 2019
Please cite this article as: Yikuan Yu, Zitian Huang, Fei Li, Haodong Zhang, Xinyi Le, Point Encoder GAN: A Deep Learning Model for 3D Point Cloud Inpainting, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.12.032
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Elsevier B.V. All rights reserved.
Point Encoder GAN: A Deep Learning Model for 3D Point Cloud Inpainting✩ Yikuan Yua,b , Zitian Huanga,b , Fei Lic,d , Haodong Zhanga,b , Xinyi Lea,b,∗ a School
of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 20040, China Key Laboratory of Advanced Manufacturing Environment,Shanghai 200240, China c State Key Laboratory of Intelligent Manufacturing System Technology, Beijing Institute of Electronic System Engineering, Beijing 100854, China d Beijing Complex Product Advanced Manufacturing Research Center, Beijing Simulation Center, Beijing 100854, China b Shanghai
Abstract In this paper, we propose a Point Encoder GAN for 3D point cloud inpainting. Different from other 3D object inpainting networks, our network can process point cloud data directly without any labeling and assumption. We use a max-pooling layer to solve the unordered of point cloud during the learning procedure. We add two T -Nets (from PointNet) to the encoder-decoder pipeline, which can yield better feature representation of the input point cloud and a more suitable rotation angle of the output point cloud. We then propose a hybrid reconstruction loss function to measure the difference between the two sets of unordered data. Using small sample models on ModelNet40 only, the proposed Point Encoder GAN yields end-to-end inpainting results surprisingly. Experiment results have shown a high success rate. Several technical measures are used to identify the good qualities of our generated models. Keywords: Point cloud, neural network, inpainting, encoder, generative adversarial nets(GANs) 1. Introduction Nowadays, 3D laser scanners or photo scanners are frequently used for point cloud acquisition [1]. This data structure is widely applied for engineering design [2], geographical mapping [3], and scene recognition [4, 5]. However, due to the limitation of instrument precision, point cloud sets are defective and incomplete most of the time. The extra information missing hinders the use of point clouds, which leads to an urgent need for point cloud inpainting. Neural networks have great information processing ability [6] and help us complete this task. During the past few years, some inpainting methods are proposed and proved to be effective for image and 3D object processing. For instance, inpainting methods based on CNN (Convolutional Neural Network) and GAN (Generative Adversarial Network) have achieved great performance for 2D images [7– 11] and 3D objects [12–16]. Effective as they are, these methods for 3D object inpainting still require 3D voxel data rather than the first-hand point cloud data. As shown in Figure 2(c), different from previous works in Figure 2(a) and 2(b), inpainting on point cloud set directly is our main target in this paper. As point clouds are irregularly defined in Euclidean Space, it is difficult to feed typical convolutional architectures on point ✩ The work described in the paper was jointly sponsored by Startup Fund for Youngman Research at SJTU (SFYR at SJTU), Open Fund of State Key Laboratory of Intelligent Manufacturing System Technology, Natural Science Foundation of Shanghai (18ZR1420100) and National Natural Science Foundation of China (61703274). ∗ Corresponding author Email addresses:
[email protected] (Yikuan Yu),
[email protected] (Zitian Huang),
[email protected] (Fei Li),
[email protected] (Haodong Zhang),
[email protected] (Xinyi Le)
Preprint submitted to Elsevier
cloud data. Quite a few previous works transform the 3D point cloud to regular 3D voxel data [17–21]. But voxelization often results in information reduction. In this paper, Point Encoder GAN, an original network structure, is proposed based on PointNet and GAN. This network directly takes defective point cloud as input and generates the missing part. Our contributions are as follows: • The network is able to process raw defective point cloud data without voxelization, which prevents the extra information loss compared with the voxel-based methods [12–14]. • The proposed Point Encoder GAN is trained on a small data set of randomly corrupted point clouds constructed from ModelNet40 [22, 23], which performs the great generalization capability. • The Point Encoder GAN needs no structure and classification information above the objects such as symmetry and category. It is an authentic end-to-end model for point cloud inpainting.
Figure 1: An example of point cloud inpainting for an airplane under two views.
December 18, 2019
theory problem. If we want to obtain a GAN with great performance, an appropriate learning rate should be selected. GAN can be applied in multiple areas [26]. Some researchers use GAN for natural language processing [27–30] and achieve great performance. Image generation methods based on GAN make a huge boost to computer vision [7, 14, 31–33]. Using GAN to process medical images is another research hotspot [34]. GAN can also be applied for anomaly detection [35]. One of the most important applications of GAN in computer vision is object generation including images and 3D point cloud. LGAN [36] introduces the first deep-learning-based network for point cloud completion by utilizing an Encoder-Decoder framework. [37] provides the point cloud generation method by graph convolution. GAN can be used to generate 3D point cloud from a single 2D RGB image [38]. This work proves the feasibility and effectiveness of 3D point cloud generation by GAN. As described below, GAN can assist inpainting model training by its alternant performance increasing. 2.2. Inpainting Model Inpainting is a hot spot in computer vision. Both 2D image inpainting and 3D object inpainting can be implemented by deep learning methods. Image Inpainting In recent years, deep learning methods achieve better results than traditional algorithms. Pathak [7] proposed context encoders combining with GAN to predict the missing parts from their surroundings. Yang [8] enhanced the performance of the encoder with the texture network. Gao [9] applied a fully connected channel to improve the network. Similar to the image inpainting, the encoder-based network can also be utilized to solve 3D object inpainting problems.
Figure 2: Different input data and representive inpainting networks. (a) Image inpainting by Context Encoders. (b) 3D object inpainting using voxel data by 3D-Rec GAN. (c) 3D object inpainting using point cloud data by Point Encoder GAN.
Figure 1 is an inpainting example of an airplane via Point Encoder GAN. Given a point cloud set with missing points, our inpainting algorithm can generate points with the same amount and combine the output with the defective point cloud. The paper is organized as follows. In Section 2, some related work of our model is introduced including GAN, inpainting models and point cloud learning properties. Section 3 describes the task in detail. Section 4 states the network structure of Point Encoder GAN and the joint loss function definition with mathematical derivation. Section 5 gives the experiment process and the visualized results. Meanwhile, this part also shows the quantitative evaluation of the results. At last, the conclusion and future work will be introduced in Section 6.
3D Object Inpainting Wang [14] used a hybrid framework combining a 3D Encoder-Decoder with GAN to rebuild the missing 3D voxel data in low-resolution. Meanwhile, they applied a long-term recurrent convolutional network to minimize GPU memory usage and transformed the 3D model into an object model with higher resolution. In addition, [13] proposed a novel 3D-RecGAN approach to fill the missing region in 3D occupancy grid which only takes the voxel grid representation as input. Although these methods show good results, the input of their network is voxel data rather than point cloud data. Different from these papers, Point Encoder GAN can directly fill the missing point cloud data.
2. Related Work
2.3. Point Cloud Learning
2.1. Generative Adversarial Network
Because of the diversity and specificity of point cloud data, it is difficult to use regular deep learning network directly in point cloud learning. In order to solve these problems, researchers propose some networks using point cloud directly as input. These networks usually have delicate structures. PointNet [39] uses max-pooling and T -Net to obtain global features of point cloud. PointNet++ [40] can perceive local features due to its hierarchical structure based on PointNet.
GAN (Generative Adversarial Network) proposed by Goodfellow [24, 25] consists of two deep networks, a generator G and a discriminator D. The generator generates fake samples and the discriminator tries to distinguish real samples from overall data. G and D are trained jointly until discriminator cannot distinguish whether the generated samples are real or fake. We can train generator and discriminator which constitutes GAN alternately. In other words, GAN can be regarded as a game 2
Figure 3: The architecture of Point Encoder GAN. The specific explanation is shown in Section 4. This network is trained by both reconstruction loss (hybrid) and adversarial loss.
erased point cloud. Thus, the trained network can generate the missing point cloud with the same amount as erased points. We use ModelNet40 for training and validation of Point Encoder GAN. We call such task as point cloud inpainting throughout the paper. There are two difficulties: the unique properties of point cloud and the definition of loss function between two sets of point clouds. Our solution and mathematical derivation is given in Section 4.
PointCNN [41] proposes a X-Conv operation for feature acquisition of point cloud. They respectively achieve the accuracy of 89.2%, 90.7%, and 91.7% for classification task on ModelNet40. Therefore, some structures of these networks are worth learning for our references. 3. Task Statement Different from 2D images, 3D point cloud has the following unique features: unordered and rotational invariance.
4. Point Encoder GAN
Unordered In essence, a point cloud is a series of points in 3D space. The overall shape of the point cloud has no concern with the order of points. In other words, different sequences of points in the input set should result in the same output of the network theoretically.
4.1. Network Architecture Overview As illustrated in Figure 3, the proposed Point Encoder GAN consists of generator network (G-Net) and discriminator network (D-Net). The whole framework is inspired by Context Encoders [7]. The encoder of G-Net transforms point clouds into a compact feature representation. The decoder of G-Net generates the missing point cloud data out of this representation. The D-Net is given to help the G-Net predict the missing points from the latent feature representation. T -Net is a data-dependent spatial transformer that helps to transform the input data optimally in PointNet [39]. So, we add T -Net to both G-Net and D-Net to solve the rotation invariance property of point cloud data. We bring the GAN model to promote training of the encoderdecoder network (G-Net). The essence of GAN training procedure is a game theory problem. The object is to get a G-Net which can learn the data contribution from the training samples. The addition of GAN encourages the entire output of the encoder more realistic. In other words, during the incessant “frauds” between G-Net and D-Net, the output of G seems more suitable. To conclude, Point Encoder GAN enjoys the advantages of PointNet [39] for dealing with point cloud, Context Encoders
Rotational Invariance This property usually refers to rotation invariance. As for the same object, the coordinate of a certain point in a point cloud would vary with rotation. In our method for 3D point cloud, point cloud rotations should not alter classification results. In our model, the primary input and output are unordered point cloud sets. A set of 3D point cloud with size n can be represented as {Pi |i = 1, . . . , n} and Pi is a vector of (xi , yi , zi ) in Euclidean Space. Assume N and M are the numbers of the points in the initial point cloud and the erased point cloud, respectively. Our goal of proposed Point Encoder GAN is to output the generated missing point cloud with size M. We initialized the missing point cloud as zero point cloud with (0, 0, 0) coordinates. In other words, the initial input of Point Encoder GAN is a defective point cloud with size (N − M) and a zero point cloud (xi , yi , zi = 0|i = 1, . . . , M) with size M. During the training process, the zero point cloud gradually converges towards the 3
[7] for auto-encoding, and GANs [24] for discrimination and generation, thus delivering satisfactory results.
organized data is easy to define because it belongs to a one-toone relationship. For example, the image loss of picture A and picture B with the same size N×N can be determined by:
T -Net Structure We use a max-pooling layer to solve the unordered of point cloud, and T -Net to overcome point cloud invariance according to the structure of PointNet [39]. As shown in Figure 3, a T -Net combines with serial layers of shared 64MLP (Multi-Layer Perception), shared 128-MLP, shared 1024MLP, a max-pooling layer, 256-FCL (fully connected layer), 9-FCL to obtain a 3 × 3 matrix. Its output is the matrix multiplication of the input point cloud matrix and this 3 × 3 matrix.
L(A, B) =
N N 1 XX L(Ai, j , Bi, j ), N 2 i=1 j=1
(3)
where Ai, j and Bi, j is respectively the pixel point location of picture A and B.
G-Net Structure An encoder-decoder pipeline constitutes the G-Net. We use a part of PointNet structure [39] as the encoder. As one part in Figure 3, the decoder consists of several layers of fully connected layers and a T -Net. Suppose the size of the input data is N × 3, and the size of output data is M × 3. The specific structure has serial layers of a 3 × 3 T -Net, shared 64-MLP, shared 64-MLP, shared 128-MLP, shared 512-MLP, shared 1024-MLP, a max-pooling layer, 512FCL, M-FCL, a M-channel deconvolution, and a 3 × 3 T -Net. D-Net Structure A point cloud classification network constitutes the D-Net, which is also illustrated in Figure 3. Suppose the size of the input data is M × 3 and the size of output data is M × 3. The specific structure has serial layers of a 3 × 3 T Net, shared 64-MLP, shared 64-MLP, shared 256-MLP, a maxpooling layer, 128-FCL, 16-FCL, and a Sigmoid-classifier.
Figure 4: Reconstruction loss difference between (a) organized data and (b) unordered data. It is improper to use one-to-one loss for unordered data.
However, the one-to-one relationship does not hold for two point cloud sets, as Figure 4 shows. The loss function L of point cloud Aˆ and point cloud Bˆ must satisfy the interchangeability as follows: ˆ B) ˆ = L( B, ˆ A). ˆ L(A, (4)
4.2. Loss Function In this network, the loss function has two parts: adversarial loss and reconstruction loss. The former function is defined by the whole GAN model and the latter one represents the difference between the real point cloud and the generated point cloud. The loss function is also visualized in Figure 3. The loss function of our network is determined by: L = λadv Ladv + λrec Lrec ,
Inspired by [42], we define the hybrid loss function of point cloud Aˆ and Bˆ with the same length N. This function is a rotational symmetrical expression: ˆ B) ˆ = L(A,
(1)
where λadv and λrec is the weight of the adversarial loss and the reconstruction loss, respectively. They satisfy λadv + λrec = 1.
(5)
ˆ and ωB:ˆ Aˆ satiswhere ωA:ˆ Bˆ is the weight of point cloud Aˆ to B, fies ωA:ˆ Bˆ + ωB:ˆ Aˆ = 1. We then use Chamfer Distance [43] to define the loss between one point P and a point cloud Sˆ with length K. It is worth noting that Chamfer Distance is L2 -Norm value:
Adversarial Loss This loss roots in GAN model [24]. We regard the G-Net and the D-Net as parametric functions. G : X → Y is considered as the mapping function from input samples X to real samples Y, which is the approximation of G0 : X → Y0 that maps from input samples X to data contribution Y0 . D-Net tries to distinguish the generated data from G-Net and the authenticated samples. The adversarial loss function can be defined by: X X Ladv = ln(D(yi )) + ln(1 − D(G(xi ))), (2) 1≤i≤S
N N ωA:ˆ Bˆ X ωB:ˆ Aˆ X ˆ ˆ ˆ L(Ai , B) + L( Bˆ i , A), N i=1 N i=1
L(P, Sˆ ) = min |P, Sˆ i |2 . 1≤i≤K
(6)
Combining with the Eq. 5 and Eq. 6, the loss function definition is determined by: ˆ B) ˆ = L2 (A,
1≤i≤S
N ωA:ˆ Bˆ X min |Aˆ i , Bˆ j |2 N i=1 1≤ j≤N
N ωˆ ˆ X + B:A min |Aˆ i , Bˆ j |2 . N j=1 1≤i≤N
where xi ∈ X, yi ∈ Y, i = 1, . . . , S . S is the sample size of X, Y. Reconstruction Loss Pixel data (image) and voxel data (3D grid) are both organized data. The loss function for such 4
(7)
5. Experimental Validation
Earth Mover Ratio For evaluating the aggregation of the generated points compared with the ground truth, we define another ratio based on Earth Mover Distance [44]:
5.1. Model Training We use PyTorch as our deep learning framework to implement Point Encoder GAN. Our data set is composed of 12308 generated defective point clouds from ModelNet40. To acquire training set, we take initial point clouds from ModelNet40 (1024 points) and erase 256 points around a random kernel in each one. Thus, each point cloud in our data set contains 768 points, represented as coordinates (xi , yi , zi ). The data set is split into two subsets: 9840 samples for training, 2468 samples for test and both of the subsets include all the 40 categories in ModelNet40.
EMR = 10 × | log10
(8)
We will use both L1 and L2 loss for specific evaluations. Thus, in our experiments, we have two evaluation indexes, Regression Ratio of L1 -Norm Rreg,L1 and Regression Ratio of L2 -Norm Rreg,L2 . Matching Distance Ratio In some cases, the generated point cloud is not similar to the original one but still makes sense. In such kind of generation verisimilitude, the matching effect is great although the regression ratio is not high. So, we define Matching Distance Ratio (MDR). The mathematical definition is given as follows:
DS
|
(dB),
(10)
Evaluation Indexes of Different Models After visualization, quantitative indexes are calculated to evaluate the inpainting quality of different models, which substantiates the effectiveness of our model. Rreg represents for the regression ratio of the missing points based on the loss function (Rreg,L1 and Rreg,L2 ). MDR (FDMR and VMDR) represents for the quality of generated point cloud, including the completeness and homogeneity. EMR quantifies the density difference between the generated point cloud and the ground truth. A model with higher Rreg , lower MDR, and lower EMR is the preference. We calculate the indexes of four models: 1, 5, 7, 10 epoch(s). All the indexes of the above models are shown in Table 1. The visualization of these models are given in Figure 6.
ˆ = SReal ˆ Rreg = 100%, if SGen ˆ = SZero ˆ . Rreg = 0%, if SGen
DM
(dB),
Inpainting Results on ModelNet40 Some of the validation results on ModelNet40 are shown in Figure 5. The highlighted parts represent for the erased points of initial point cloud and the generated missing points of our model. Generally, the inpainting results of most categories have met our expectation. Comparing with the ground truth point sets, the generated missing points match with the defective point clouds pretty well, visually and perceptually. In the presented two views, the generated points show sound similarity with the erased point cloud.
ˆ , S Real ˆ , S Zero ˆ represents the generated, real, and zero where S Gen point cloud with the same size M, respectively. Rreg ∈ [0% − 100%] indicates the reconstruction degree of the inpainting process. Consider these two extreme situations:
MDR = 10 × | log10
|
5.3. Test Results In our experiments, we test our model on 2468 samples within ModelNet40, and also examine it out of ModelNet40. The visualized results and the evaluation comparisons are stated, followed with relevant analysis.
Regression Ratio Our first goal is to generate the point clouds as similar as the original ones. In order to evaluate the disparity of generated M missing points from G-Net and erased M points quantitatively, Regression Ratio is then raised. The mathematical definition is as follows: ˆ , S Real ˆ ) L(S Gen ) × 100%, ˆ , S Real ˆ ) L(S Zero
EMDT
where EMDG and EMDT is the Earth Mover Distance of the generated point cloud and the ground truth, respectively. EMR measures the density difference between generated samples and true samples. The optimal value of EMR is 0 dB when EMDG = EMDT .
5.2. Evaluation Measures Since point cloud inpainting is a frontier in computer vision, quantitative indexes for point cloud inpainting are inadequate. Some reasonable indexes are established for point cloud evaluation.
Rreg = (1 −
EMDG
(9)
where D M is the mean value of the point distances in inpainting matching margin, and DS is the point cloud density. If taking the density value of ground truth as DS , we call this value Fixed Matching Distance Ratio (FMDR). If taking the density value of generated point cloud as DS , we call this value Variable Matching Distance Ratio (VMDR). The optimal value of MDR is 0 dB when D M = DS .
Epoch
Rreg,L1 (%)
Rreg,L2 (%)
FMDR (dB)
VMDR (dB)
EMR (dB)
1 5 7 10
45.01 61.81 53.82 51.15
36.47 55.79 45.05 43.07
3.218 2.328 2.785 2.996
2.991 1.814 2.563 2.565
1.522 1.005 1.367 1.416
Table 1: Results of Evaluation Indexes on ModelNet40 of Different Epochs.
According to Table 1, it is clear that the 5-epoch model achieves the best results in all the indexes, significantly higher than the others. This model get higher Rreg and lower EMR which are also confirmed by visulized results in Figure 6. After 5
Figure 5: Examples of inpainting results with two views in ModelNet40. Our network achieves an end-to-end inpainting task without label-based data preprocessing.
detailed shape of original undamaged region. In addition, PCN takes more times to train. Our model takes up 8MB of memory space which is smaller than PCN (22.3MB in Pytorch). As Figure 8 shows, our model needs less training and gets better visualization performance. It is worth mentioning that our inpainting model can retain the initial shape of incomplete point cloud. Generation Capability Figure 9 shows another experimental phenomena. As for the point cloud of this plant, the erased points mainly are distributed around the stalk on the right. Since some points around the middle stalk are also erased, the generated points are mainly distributed around the middle stalk, which differs from the ground truth but makes sense. During the experiments, point clouds of this type are noticed in a notable number. Although the generated points are different with the ground truth, we still consider the inpainting task is successful. Thus, as shown in the example, if some key features are removed, Point Encoder GAN may generate point cloud in a different but reasonable way, which enables the network to produce new point clouds.
Figure 6: Inpainting results for a cone and a bottle of 1, 3, 5, 7, 10 epoch(s). The 5-epoch model has the best performance.
5-epoch, the model will be more and more overfitting. It results in that inpainting point cloud shape becomes divergent. Influence of T-Net for the Whole Network Figure 7 reflects the capability of T -Net to stabilize and smooth the output. T -Net structure is able to extract more features from training data and adjust the rotation angles. Compared with the model with T -Net, the model without T -Net has lower structural complexity to extract obvious features.
Generalization Performance We also test the trained network on other datasets out of ModelNet40. Figure 10 depicts the inpainting results of Stanford bunny and horse. The back of bunny and the belly of horse are filled with generated points in an analogous way. But the foreleg of the horse remains uncompleted which reflects the weakness of local feature learning. Generally speaking, the cross-dataset experimental results seem acceptable, delivering a promising capability of generalization.
Comparison with Other Methods We compared Point Encoder GAN with PCN(Point Completion Network) [42] which is the one of best networks in point cloud completion task field. It combines the advantages of fully-connected network and FoldingNet [45] to generate the whole complete point cloud from partial point cloud. However, this architecture may change the 6
Figure 8: The visualization comparison with PCN. Our model has higher robustness and smoothness of generation.
Figure 7: Inpainting results for the cone and the bottle of different network structures. The network with T -Net performs better than that without T -Net.
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
6. Conclusion In this paper, we propose a novel network structure named Point Encoder GAN for end-to-end point cloud inpainting, which takes point cloud data as the input directly without label-based data preprocessing. Using ModelNet40 for training, the Point Encoder GAN shows a great performance. Supported with experimental results, the network shows the capability of point cloud inpainting. The superiority is that the network trains fast but enjoys great generalization capability, which is a potential solution for enlarging 3D point cloud data sets. Another potential application is transforming 2.5D point cloud to 3D point cloud fleetly. This application may lower the cost of 3D point cloud acquisition. However, the local feature learning ability of the Point Encoder GAN needs to be enhanced. We aim to build a hierarchical network structure for deeper feature representation in the future.
References [1] F. Bosche, C. T. Haas, Automated retrieval of 3D CAD model objects in construction range images, Automation in Construction 17 (4) (2008) 499–512. [2] R. J. Urbanic, H. A. Elmaraghy, W. Elmaraghy, A reverse engineering methodology for rotary components from point cloud data., The International Journal of Advanced Manufacturing Technology 37 (11) (2008) 1146–1167. [3] T. Santos, N. Gomes, S. Freire, M. C. Brito, L. Santos, J. A. Tenedorio, Applications of solar mapping in the urban environment, Applied Geography 51 (2014) 48–57. [4] R. B. Gomes, B. Silva, L. Rocha, R. V. Aroca, L. Velho, L. M. G. Goncalves, Efficient 3D object recognition using foveated point clouds, Computers and Graphics 37 (5) (2013) 496–508. [5] W. Liu, S. Li, D. Cao, S. Su, R. Ji, Detection based object labeling of 3d point cloud for indoor scenes, Neurocomputing 174 (174) (2016) 1101– 1106. [6] X. Le, S. Chen, Z. Yan, J. Xi, A neurodynamic approach to distributed optimization with globally coupled constraints, IEEE Transactions on Cybernetics 48 (11) (2018) 3149–3158. [7] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, A. A. Efros, Context encoders: Feature learning by inpainting, in: The IEEE Conference on Computer Vision and Pattern Recognition, 2016. [8] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, H. Li, High-resolution image inpainting using multi-scale neural patch synthesis, Computer Vision and Pattern Recognition (2017) 4076–4084. [9] R. Gao, K. Grauman, On-demand learning for deep image restoration, International Conference of Computer Vision. [10] S. Iizuka, E. Simo-Serra, H. Ishikawa, Globally and locally consistent image completion, ACM Transactions on Graphics 36 (4) (2017) 107. [11] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, B. Catanzaro, Image inpainting for irregular holes using partial convolutions, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 85–100. [12] J. Varley, C. DeChant, A. Richardson, J. Ruales, P. Allen, Shape completion enabled robotic grasping, in: International Conference on Intelligent Robots and Systems, IEEE, 2017, pp. 2442–2447.
Declaration of Competing Interest All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. 7
Figure 10: Inpainting results for a bunny and a horse out of ModelNet40. It performs well despite some problems.
[26] M. Fridadar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification, Neurocomputing 321 (2018) 321–331. [27] J. Li, W. Monroe, T. Shi, S. Jean, A. Ritter, D. Jurafsky, Adversarial learning for neural dialogue generation, Empirical Methods in Natural Language Processing (2017) 2157–2169. [28] S. Subramanian, S. Rajeswar, F. Dutil, C. Pal, A. C. Courville, Adversarial generation of natural language., Meeting of the Association for Computational Linguistics (2017) 241–251. [29] L. Yu, W. Zhang, J. Wang, Y. Yu, SeqGAN: Sequence generative adversarial nets with policy gradient, National Conference on Artificial Intelligence (2017) 2852–2858. [30] Y. Zhang, Z. Gan, K. Fan, Z. Chen, R. Henao, D. Shen, L. Carin, Adversarial feature matching for text generation., International Conference on Machine Learning (2017) 4006–4015. [31] P. Isola, J. Zhu, T. Zhou, A. A. Efros, Image-to-image translation with conditional adversarial networks, Computer Vision and Pattern Recognition (2017) 5967–5976. [32] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution using a generative adversarial network, Computer Vision and Pattern Recognition (2017) 105–114. [33] J. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, International Conference on Computer Vision (2017) 2242–2251. [34] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo, et al., DAGAN: Deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction, IEEE Transactions on Medical Imaging 37 (6) (2017) 1310–1321. [35] H. Zenati, C. Foo, B. Lecouat, G. Manek, V. Chandrasekhar, Efficient GAN-Based anomaly detection, arXiv: Learning. [36] P. Achlioptas, O. Diamanti, I. Mitliagkas, L. J. Guibas, Learning representations and generative models for 3D point clouds, International Conference on Learning Representations. [37] D. Valsesia, G. Fracastoro, E. Magli, Learning localized generative models for 3D point clouds via graph convolution, International Conference on Learning Representations. [38] P. M. Chu, Y. Sung, K. Cho, Generative adversarial network-based method for transforming single rgb image into 3D point cloud, IEEE Access 7 (2018) 1021–1029. [39] C. R. Qi, H. Su, K. Mo, L. J. Guibas, Pointnet: Deep learning on point sets for 3D classification and segmentation, in: The IEEE Conference on Computer Vision and Pattern Recognition, 2017. [40] C. R. Qi, L. Yi, H. Su, L. J. Guibas, Pointnet++: Deep hierarchical feature learning on point sets in a metric space, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, Curran Associates, Inc., 2017, pp. 5099–5108. [41] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, B. Chen, PointCNN: Convolution on X-Transformed points, in: Advances in Neural Information Processing Systems 31, Curran Associates, Inc., 2018, pp. 820–830.
Figure 9: Inpainting results for a plant and a bottle. Although the generated point cloud is not exactly similar with the ground truth, our inpainting action is also reasonable (seems like another authentic sample).
[13] B. Yang, H. Wen, S. Wang, R. Clark, A. Markham, N. Trigoni, 3D object reconstruction from a single depth view with adversarial learning, in: The IEEE International Conference on Computer Vision Workshops, 2017. [14] W. Wang, Q. Huang, S. You, C. Yang, U. Neumann, Shape inpainting using 3D generative adversarial network and recurrent convolutional networks, in: The IEEE International Conference on Computer Vision, 2017. [15] X. Chen, Y. Chen, K. Gupta, J. Zhou, H. Najjaran, Slicenet: A proficient model for real-time 3d shape-based recognition, Neurocomputing 316 (2018) 144 – 155. [16] Z. Liu, G. Song, J. Cai, T.-J. Cham, J. Zhang, Conditional adversarial synthesis of 3d facial action units, Neurocomputing 355 (2019) 200 – 208. [17] D. Maturana, S. Scherer, Voxnet: A 3D convolutional neural network for real-time object recognition, in: International Conference on Intelligent Robots and Systems, 2015. [18] A. Brock, T. Lim, J. M. Ritchie, N. Weston, Generative and discriminative voxel modeling with convolutional neural networks, Computer Science. [19] M. Nießner, M. Zollh¨ofer, S. Izadi, M. Stamminger, Real-time 3D reconstruction at scale using voxel hashing, ACM Transactions on Graphics 32 (6) (2013) 169. [20] Y. Li, S. Pirk, H. Su, C. R. Qi, L. J. Guibas, Fpnn: Field probing neural networks for 3D data, in: Advances in Neural Information Processing Systems, 2016, pp. 307–315. [21] D. Z. Wang, I. Posner, Voting for voting in online point cloud object detection, in: Robotics: Science and Systems XI, Sapienza University of Rome, Rome, Italy, 2015. [22] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, 3D ShapeNets: A deep representation for volumetric shapes, in: The IEEE Conference on Computer Vision and Pattern Recognition, 2015. [23] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al., Shapenet: An informationrich 3D model repository, arXiv preprint arXiv:1512.03012. [24] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, X. Bing, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: International Conference on Neural Information Processing Systems, 2014. [25] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, arXiv Preprint arXiv:1701.07875.
8
[42] W. Yuan, T. Khot, D. Held, C. Mertz, M. Hebert, PCN: Point completion network, International Conference on 3D Vision. [43] G. Borgefors, Hierarchical chamfer matching: A parametric edge matching algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (6) (1988) 849–865. [44] Y. Rubner, C. Tomasi, L. J. Guibas, The earth mover’s distance as a metric for image retrieval, International Journal of Computer Vision 40 (2) (2000) 99–121. [45] Y. Yang, C. Feng, Y. Shen, D. Tian, FoldingNet: Point cloud auto-encoder via deep grid deformation (2018) 206–215.
9
Yikuan Yu- writing, review editing, formal analysis Zitian Huang- visualization, validation Fei Li- resources, funding acquisition, investigation Haodong Zhang- paper revision Xinyi Le -Conceptualization, projection administration, supervision.
1
Authors’ Bios Yikuan Yu, Zitian Huang, Fei Li, Haodong Zhang, Xinyi Le
Yikuan Yu received the B.E. and B.Ec degree from Shanghai Jiao Tong University, Shanghai, China, in 2017. He is currently pursuing the master’s degree from the School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China. His current research interests include computer vision and deep learning.
Zitian Huang received the B.E. degree in mechanical engineering from South China University of Technology, Guangzhou, China, in 2018. He is currently pursuing the master’s degree from the School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China. His current research interests include computer vision and deep learning.
Fei Li received the B.E. and Ph.D. degrees in mechanical and electronic engineering from Zhejiang University, Hangzhou, China, in 2011. He is a Staff Researcher with the State Key Laboratory of Intelligent Manufacturing System Technology, Beijing Institute of Electronic System Engineering, Beijing, China. His current research interests include intelligent manufacturing, Internet of Things, and big data science and application.
Haodong Zhang received the B.E. degree in mechanical engineering from Shanghai Jiao Tong University, Shanghai, China, in 2018. He is currently pursuing the master’s degree from the School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China. His current research interests include intellengent manufacturing, defect detection and neural networks.
Xinyi Le (S’13, M’17) received the B.E. and B.S. degrees from Tsinghua University, Beijing, China, in 2012, and the Ph.D. degree from the Chinese University of Hong Kong, Hong Kong, in 2016. She is a Lecturer with the School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China. Her current research interests include neural networks, distributed optimization, robust control, and intelligent manufacturing. Dr. Le was a recipient of Shanghai Pujiang Talent Plan.