PGNet: A Part-based Generative Network for 3D object reconstruction

PGNet: A Part-based Generative Network for 3D object reconstruction

Journal Pre-proof PGNet: A Part-based Generative Network for 3D object reconstruction Yang Zhang, Kai Huo, Zhen Liu, Yu Zang, Yongxiang Liu, Xiang Li,...

3MB Sizes 0 Downloads 52 Views

Journal Pre-proof PGNet: A Part-based Generative Network for 3D object reconstruction Yang Zhang, Kai Huo, Zhen Liu, Yu Zang, Yongxiang Liu, Xiang Li, Qianyu Zhang, Cheng Wang

PII: DOI: Reference:

S0950-7051(20)30060-5 https://doi.org/10.1016/j.knosys.2020.105574 KNOSYS 105574

To appear in:

Knowledge-Based Systems

Received date : 25 July 2019 Revised date : 24 January 2020 Accepted date : 25 January 2020 Please cite this article as: Y. Zhang, K. Huo, Z. Liu et al., PGNet: A Part-based Generative Network for 3D object reconstruction, Knowledge-Based Systems (2020), doi: https://doi.org/10.1016/j.knosys.2020.105574. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.

*Author Contributions Section

Journal Pre-proof Author Contributions Yang Zhang: Conceptualization, Methodology, Software, Writing- Original draft preparation

Liu Zhen: Investigation and Supervision Yu Zang: Writing- Reviewing and Editing Yongxiang Liu: Validation and Supervision Xiang Li: Validation and Supervision

re-

Qianyu Zhang: Investigation and Editing

pro of

Kai Huo: Data curation and Supervision

Jo

urn a

lP

Cheng Wang: Investigation and Supervision

*conflict of Interest Statement

Journal Pre-proof

Declaration of interests 周The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

pro of

口The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

re-

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed.

.1�才 l
加才

1/iJ

l飞、N叫



urn a 训



1

λ

IS ,)}ov

2k L γL

lP

Signed by all authors as follows:

't,01/.

IS.IV 》

2,,01

Jo

只伽f

乙j

乌人γ

协丁

l户 寸

J

r

I 、/〉

I {I,

r飞



I)>

川,人以1 )W

仅γ

I

Journal Pre-proof *Revised Manuscript (Clean Version) Click here to view linked References

pro of

PGNet: A Part-based Generative Network for 3D Object Reconstruction Yang Zhanga,b , Kai Huob , Zhen Liub , Yu Zanga,∗, Yongxiang Liub , Xiang Lib , Qianyu Zhangc , Cheng Wanga a School

of Information Science and Technology, Xiamen University, Xiamen 361005, China of Electronic Science, National University of Defense Technology, Changsha 410073, China c College of Geography, University of Leeds, Leeds LS2 9JT,UK

b College

re-

Abstract

Deep-learning generative methods have developed rapidly. For example, various single- and multi-view generative methods for meshes, voxels, and point clouds have been introduced. However, most 3D single-view reconstruction methods

lP

generate whole objects at one time, or in a cascaded way for dense structures, which misses local details of fine-grained structures. These methods are useless when the generative models are required to provide semantic information for parts. This paper proposes an efficient part-based recurrent generative network,

urn a

which aims to generate object parts sequentially with the input of a single-view image and its semantic projection. The advantage of our method is its awareness of part structures; hence it generates more accurate models with fine-grained structures. Experiments show that our method attains high accuracy compared with other point set generation methods, particularly toward local details. Keywords: 3D reconstruction, point cloud generation, part-based, semantic

Jo

reconstruction

I Fully

documented templates are available in the elsarticle package on CTAN. author Email address: [email protected] (Yu Zang)

∗ Corresponding

Preprint submitted to Journal of LATEX Templates

January 24, 2020

Journal Pre-proof

1. Introduction

pro of

As an important branch in computer vision, 3D reconstruction has made great progress. At the same time, deep learning has been widely applied to object generation, and other fields [1, 2, 3, 4]. However, single-view reconstruc5

tion based on deep learning methods still faces many challenges, such as low accuracy and the missing of fine-grained parts.

In this paper, we aim to reconstruct objects from single-view images while retaining more fine-grained details, based on a recurrent generative network. Our method is distinguished by the involvement of the semantic object projection, which can instruct object part generation in a sequential manner. Consequent-

re-

10

ly, the generated objects preserve more details of local structures with a high degree of consistency with the actual models. With the part-based generative approach, we can obtain each semantic part of a model, which has been difficult

15

lP

for previous methods.

In summary, we propose a part-based recurrent generative network, which is the first single-view generative part-based network. Our method achieves high accuracy for both partial and integral similarity,

urn a

with a partial Chamfer distance and repulsion loss.

2. Related Work 20

Single-view reconstruction is a longstanding issue in the 3D vision field, due to the ill-posed problem of one-to-many mapping from 2D images to 3D models. Early work utilized some a priori knowledge to complete the target, such as the texture, specularity, and shadow [5, 6, 7, 8]. However, the requirement of priors

Jo

on natural images has greatly limited the range of their applications.

25

Based on the establishment of the large-scale dataset of 3D CAD models,

ShapeNet [9], many deep-learning methods have been proposed in 3D vision. The start moment has appeared after PointNet [10], which is developed to solve the irregular sequence in point clouds by the maxpooling operation, and achieves better performance on classification and segmentation than previous methods. 2

Journal Pre-proof

30

The scores are improved by PointNet++ [11], with a hierarchical structure to

pro of

perceive larger fields. Single-view 3D reconstruction has developed rapidly. Early work concentrated on generating objects in the grid representation (voxel), which is derived from 2D convolution by using 3D convolutional layers as the basic unit. 3D-GAN [12] utilizes a generative adversarial network to form objects. 35

3D-R2N2 [13] proposes a recurrent architecture to combine multi-view images to construct 3D models. To overcome large storage requirements, OGN [14] is based on an octree to represent and operate in the network. However, the unnatural form of these voxel-based methods and the occupation of much storage

40

re-

space has limited them to larger and finer processing of 3D models.

Later work has tended toward point-based networks for 3D reconstruction. PSGN [15] utilizes a complex 2D generative network to encode nature images, and obtains relatively effect. One contribution is to apply CD as the loss func-

lP

tion to update the network. Afterwards, other clues are involved to improve the results. An unsupervised network [16] is proposed by utilizing the multi-view 45

point projection to obtain the 3D point positions. RealPoint3D [17] aims to reconstruct 3D models from nature images with complex backgrounds by instructing a nearest shape from the ShapeNet. A multi-view projection network [18]

urn a

aims to generate dense point clouds by supervising multiple projections. Although many point cloud generative networks aim to solve the image50

based generation issue, some problems still exist. Local partial structures are usually blurred, and the generated models tend to be closer to the average shape of their category. Our method aims to generate semantic part-aware models.

Jo

3. Methods

Network Architecture. The proposed network consists of three parts: an en-

55

coder, a recurrent feature fusing block, and some part-specialized decoders. The input of PGNet includes the original images and partial projection images, which can be obtained from the semantic projection. Despite the increase of involved data, we can see that the mentioned data are already available online (sec. 4).

3

Journal Pre-proof

Conv2

3

3

Partn

Part2

Out1

Part1-FC1

Part1-FC2

LSTM Block

Part1-Conv5

Original

256

N

256

3

lP

Part1

Part1-Deconv

Data generation

Encoder

128 64

re-

Original

SqueezeNet

Conv4

Conv1

Sequentially

Conv3

pro of

Part1

Part1

LSTM Block

urn a

Sequentially

Partn-FC1

Partn-FC2

LSTM Block

Partn-Conv5

Outn-1

Jo

Partn

Partn-Deconv

Sequentially

256

N

256

3

Partn Part-specialized Decoder

Figure 1: Network architecture

First, the original and partial projection images are fed into the encoder, which

60

contains four convolutional layers and a pre-trained SqueezeNet [19]. The benefit of SqueezeNet is its reduction of network parameters, and it remains com-

4

Journal Pre-proof

petitive with others, such as VGG [20]. Actually, the chair with four parts in

pro of

Fig. 1 is similar to a four-word sentence, which is fed into the network four times. In each iteration, the original RGB image and a corresponding partial 65

projection image are fed into the network, whose features are extracted by the same pre-trained SqueezeNet.

For the former part of the decoder, the partial projection features are concatenated with the feature of the original image, which forms a 512-dimension vector to put into the LSTM block. For subsequent branches, its own extracted 70

feature is concatenated with the previous output of the LSTM block. Such a

re-

design considers the inner association between different parts within an object, utilizing the fusing function of the recurrent networks, which can integrate different parts sequentially with their potential association, such as the space and structure relationships.

lP

Finally, for different parts, we construct several part-specialized decoders

75

to conduct the partial point clouds. By separating the final decoding layers, the network can generate more targeted partial structures and utilize the partinner similarities from different objects. The generated part models are then concatenated to form the whole model. The details of PGNet can been seen in Fig. 1.

urn a

80

Loss functions. The loss function of our network has two parts: the part-based Chamfer distance (CD) and the repulsion loss. First, to measure the similarity of the generated parts and the ground truth, we use CD for each part, which is widely used in single-view reconstruction networks [15, 17]. It can be written

Jo

as

1 X min k x − y k ng x∈g y∈r , 1 X + min k y − x k nr y∈r x∈g

LPi (g, r) =

(1)

where Pi denotes the i − th part of the object. g and r denote generated models

and the reference (ground truth). ng and nr denote the number of points of g

5

Journal Pre-proof

and r. k · k denotes the l2 − norm. The CD loss of the whole object is the average of all parts:

pro of

n

LCD (g, r) =

1X LP (g, r). n i=0 i

(2)

The second part is the repulsion loss, which is used by PU-Net [21]. During experiments, we find the generated points tend to cluster in some local positions. Inspired by PU-Net, we use the repulsion loss to eliminate clustering and form more uniform models. It can be written as ng X

X

i=0 xj ∈K(xi )

kxj − xi k e−

kxj −xi k2 h2

,

re-

LR (g) = −

(3)

where K(xi ) denotes the k − nearest neighborhood of xi . h is a constant parameter with value 0.02. The number of K(xi ) is 5. So, the object function is the sum of the CD loss and repulsion loss:

lP

L = LCD (g, r) + αLR (g),

(4)

where α denotes the weight of the repulsion loss, which is set to 0.02.

urn a

4. Experiments

Dataset. As mentioned above, our method requires labeled semantic parts, and fortunately, there are several approaches to semantic part labeling of objec85

t meshes [22, 23, 24]. We construct our training and testing data from the dataset from [23], which contains shape models of the OFF format and labels of ShapeNetCore. In our experiments, we choose three categories (airplane, car, chair) to evaluate the effectiveness of our approach. Considering the data

Jo

format is mesh, we first sample points on the surfaces uniformly on each part 90

of these individual objects. Then we render these models to get original images and partial projection images. Note that for each part, the numbers of point clouds are equal. In Fig. 1, we circle the data generation procedure on the left. For each model, we get a rendered RGB image and several partial projection images, which are rendered from the labeled models, and we also get

6

Journal Pre-proof

95

a whole object point cloud model and several partial point cloud models, which

pro of

are sampled from the labeled meshes.

Implementation Details. We use PyTorch as our programming framework and Adam [25] as the optimizer, with 10−4 as the decay rate. We train the network on a Titan X GPU and set the batch size to 32 with a total of 100 epochs. 100

The input image is 227 × 227, and the point number of each part is 500. In the encoder, the kernel sizes of the first two convolutional layers are 1 and the last two are 3. The dimensions of input and latent features in the LSTM block are 512 and 256, respectively. In the decoder, one deconvolutional layer and

105

re-

one convolutional layer are mainly used to adjust the point number, with kernel sizes of 5 and 1. The last two fully connected layers are to adjust the channel dimension to get the final three coordinates. ReLU is used as the activation

lP

function.

Comparison of different settings. Single-view networks have an obvious disadvantage; as the training goes on, the network tends to learn the average 110

image-to-model mapping and misses many individual details, hence the generated models for two similar images are also very similar. Our proposed network

urn a

is designed to learn the local partial structures and individual details. To evaluate the part-aware function of PGNet, we compare the part-based and object-level CDs. Specifically, our CD loss is the average of each part, 115

which learns the object structure in a partwise way and produces semantic partial structures, as shown in Eq. 1 and Eq. 2. This owes to the ability of the network to learn the inner association of each part within an object. The result can be seen in Fig. 2. Due to the part-based CD, PGNet can generate

Jo

semantic parts of an object. But the object-level CD misses small structures,

120

resulting in some clusters, scattered parts, and confusing relations of different local structures. The explanation can be derived from the principle of CD scores: for each point in one model, we calculate the shortest distance to another model by searching for the nearest point and calculating the straight-line distance between them. However, there are no explicit correspondences for these points, 7

Journal Pre-proof

125

and the situation worsens when the number of points increases and the two

pro of

shapes differ greatly. So, it is reasonable that the object-based CD has more difficulty finding the nearest point, as the search scope is the whole model, while for the part-based CD, it is easier to find corresponding points and the CD score

(a)

lP

re-

is more accurate.

(b)

(c)

(d)

Figure 2: Comparison of object- and part-level CDs. (a) Input image; (b) Ground truth; (c)

130

urn a

Result of object-level CD; (d) Result of part-level CD.

To demonstrate the function of the repulsion loss, we remove the loss to see a comparison. We can see from Fig. 3 that without the repulsion loss, many clusters appear in the generated models. The repulsion loss acts as a “disperser”, penalizing the situation when some points gather together, and the loss increases as they become closer to each other, which can be seen from Eq. 3. In fact, the

135

repulsion loss is the total of the sums of local weighted distances of each point.

Jo

Here, the fast-decaying weight uses the exponential function based on e, and the local weighted distance decreases with increasing distance. Since the model is normalized, the total summation cannot be very big. From the results, we find that the small-scale parts, such as armrests and legs, tend to form local

140

gatherings, as it is harder to find the corresponding points when calculating the

8

Journal Pre-proof

CD scores, and some points many deem the same point in another model to be

pro of

their nearest corresponding points. When the repulsion loss is removed, some

(b)

lP

(a)

re-

apparent local gatherings come out in the armrest and the leg.

(c)

(d)

Figure 3: Comparison of repulsion loss. (a) Input image; (b) Ground truth; (c) Result without repulsion loss; (d) Result with repulsion loss.

145

urn a

Comparison to other methods. The proposed method is also compared to the recent single-view approach, PSGN [15]. Although some later networks have appeared and achieved slight improvements, considering that our method focuses on part-aware object generation, the CD score is used as the measurement in our work. So, in terms of accuracy, we choose to compare with PSGN to demonstrate the effectiveness of our method. The results can been seen in Fig. 4, 150

Fig. 5, and Fig. 6. We find that PSGN tends to generate the average shape

Jo

of a category and misses many local details, and some point clusters appear, which is similar performance to object-level CD loss. In fact, PSGN, which reconstructs models through the designed deep network all at once, uses the object-level CD to measure the discrepancy between the generated objects and

155

ground truth, while PGNet reconstructs different parts sequentially, retaining semantic and individual partial features. We also find that the performance of 9

Journal Pre-proof

PSGN is worse than the original work [15], since the amount of training and

pro of

testing data is much smaller than in the original ShapeNetCore for certain categories (several hundred versus several thousand), and this also results in clusters 160

in the generated models.

For PGNet, the generated objects have clear local structures and accurate semantic distinctions. As seen in Table 1, PGNet can also perform better than PSGN and PGNet with the object-level CD (PGNet-OCD). PGNet-OCD has similar performance to PSGN, since the object-level CD loss is used in both. 165

PGNet-OCD achieves slightly better results than PSGN, possibly because of its

re-

sequential generative manner with individual partial projection to instruct the

urn a

lP

reconstruction, with a smaller network architecture than PSGN.

(a)

(b)

(c)

(d)

Figure 4: Comparison to PSGN [15] on airplanes. (a) Input image; (b) Ground truth; (c) Result of PSGN; (d) Result of PGNet.

The results of CD scores of different parts can be seen in Table 2, and we

Jo

find that the reconstruction accuracy varies as the categories and parts change.

170

The average CD score reflects the reconstruction quality of different categories, which indicates the reconstructed chair models are not as accurate as those of airplanes and cars. This is because the parts of chairs are more complex, with various spatial structures and volumes, than those of airplanes and cars, while airplanes and cars achieve considerable results. 10

(a)

pro of

Journal Pre-proof

(b)

(c)

(d)

urn a

lP

of PSGN; (d) Result of PGNet.

re-

Figure 5: Comparison to PSGN [15] on cars. (a) Input image; (b) Ground truth; (c) Result

(a)

(b)

(c)

(d)

Figure 6: Comparison to PSGN [15] on chairs. (a) Input image; (b) Ground truth; (c) Result

Jo

of PSGN; (d) Result of PGNet.

175

For airplanes, the four parts are the fuselage, wing, empennage, and power-

plant. We can see that the empennage has the lowest CD score of the four parts, while the powerplant has the highest CD score, which means the powerplant is the most inaccurate part. The intuitive presentation is shown in Fig. 4. In the generation procedure, clear and intact exhibition of the part is the premise to

11

Journal Pre-proof

Table 1: CD scores of different methods. PGNet-OCD denotes PGNet with the object-level

180

pro of

CD loss.

Category

PSGN [15]

PGNet-OCD

PGNet

Airplane

0.0218

0.0205

0.0141

Car

0.0242

0.0228

0.0156

Chair

0.0341

0.0358

0.0315

reconstruct the model accurately. The projection of empennage is more intact

re-

than that of the other parts, with more distinct border lines from the other parts and backgrounds. However, the powerplant is smaller and the right portion is always blocked by the fuselage. As for the other two bigger parts, i.e., the fuselage and wings, due to the larger size and more intersecting lines, they are more easily confused with the other parts, which may lower the reconstruction accuracy.

lP

185

For cars in Fig. 5, the four parts are the roof, hood, tire, and body. The roof is the most accurate part, and the tire is the least accurate part. This is because the roof is on top of the car, with similar sizes, positions, and simple shapes. Half the tire is not visible, and due to the small size, a slight deviation

urn a

190

can lead to higher CD scores. The larger parts, i.e., the roof and hood, occupy more space, making it difficult to find and locate the corresponding points. For chairs in Fig. 6, the four parts are the backrest, seat, leg, and armrest. A chair is hard to reconstruct, since it has more complex structures, especially 195

the armrest, which is the most inaccurate part. With various structures and positions, the armrest is difficult to reconstruct, and due to the similar shapes,

Jo

the backrest is relatively easy. Equipped thin complex structures and various space distribution, parts of the chair achieve higher CD scores than those of the other two; hence the chair is generally hard to reconstruct.

12

Journal Pre-proof

200

Category

part1

part2

Airplane

0.0106

Car Chair

pro of

Table 2: CD scores of different parts. Each object consists of four parts.

part3

part4

average

0.0144

0.0099

0.0214

0.0141

0.0089

0.0103

0.0220

0.0211

0.0156

0.0186

0.0262

0.0395

0.0415

0.0315

5. Conclusion

re-

We have designed a part-based recurrent generation network that is more suitable for 3D fine-grained reconstruction, with the instruction of part-projection images. To make the network perceive local structures, three designs are involved, i.e. the sequential processing based on recurrent networks, the partial Chamfer distance and the repulsion loss. Such designs consider the inner asso-

lP

205

ciation between different parts within an object, which can integrate different parts sequentially with their potential association, such as the space and structure relationships. The generated objects have more uniform distributions and

210

urn a

preserve more details of local structures with partial semantic labels.

6. Acknowledgments

This work was supported by the National Science Foundation of China (Project No.61971363 and 61701191).

References

Jo

[1] P. Gao, R. Yuan, F. Wang, L. Xiao, H. Fujita, Y. Zhang, Siamese atten215

tional keypoint network for high performance visual tracking, KnowledgeBased Systems (2019) 105448doi:https://doi.org/10.1016/j.knosys. 2019.105448.

13

Journal Pre-proof

[2] P. Gao, Q. Zhang, F. Wang, L. Xiao, H. Fujita, Y. Zhang, Learning rein-

220

pro of

forced attentional representation for end-to-end visual tracking, Information Sciences 517 (2020) 52 – 67. doi:https://doi.org/10.1016/j.ins. 2019.12.084.

[3] T. Lai, H. Fujita, C. Yang, Q. Li, R. Chen, Robust model fitting based on greedy search and specified inlier threshold, IEEE Transactions on Industrial Electronics 66 (10) (2019) 7956–7966. 225

[4] T. Lai, R. Chen, C. Yang, Q. Li, H. Fujita, A. Sadri, H. Wang, Efficient

re-

robust model fitting for multistructure data using global greedy search, IEEE Transactions on Cybernetics (2019) 1–13doi:10.1109/TCYB.2019. 2900096.

[5] G. Healey, T. O. Binford, Local shape from specularity, Computer Vision Graphics and Image Processing 42 (1) (1988) 62–86.

lP

230

[6] J. Malik, R. Rosenholtz, Computing local surface orientation and shape from texture for curved surfaces, International Journal of Computer Vision 23 (2) (1997) 149–168.

235

urn a

[7] S. Savarese, M. Andreetto, H. Rushmeier, F. Bernardini, P. Perona, 3d reconstruction by shadow carving: Theory and practical evaluation, International Journal of Computer Vision 71 (3) (2007) 305–336.

[8] R. Zhang, P. Tsai, J. E. Cryer, M. Shah, Shape-from-shading: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (8) (1999) 690–706.

[9] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li,

Jo

240

S. Savarese, M. Savva, S. Song, H. Su, Shapenet: An information-rich 3d model repository, Computer Science.

[10] C. R. Qi, H. Su, K. Mo, L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation.

14

Journal Pre-proof

245

[11] C. R. Qi, L. Yi, H. Su, L. J. Guibas, Pointnet++: Deep hierarchical feature

pro of

learning on point sets in a metric space.

[12] J. Wu, C. Zhang, T. Xue, W. T. Freeman, J. B. Tenenbaum, Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling, neural information processing systems (2016) 82–90. 250

[13] C. B. Choy, D. Xu, J. Y. Gwak, K. Chen, S. Savarese, 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction, in: European Conference on Computer Vision, 2016, pp. 628–644.

re-

[14] M. Tatarchenko, A. Dosovitskiy, T. Brox, Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs, arXiv 255

preprint arXiv:1703.09438.

[15] H. Fan, H. Su, L. J. Guibas, A point set generation network for 3d object

lP

reconstruction from a single image., in: CVPR, Vol. 2, 2017, p. 6. [16] E. Insafutdinov, A. Dosovitskiy, Unsupervised learning of shape and pose with differentiable point clouds, in: Advances in Neural Information Pro260

cessing Systems (NeurIPS), 2018.

urn a

[17] Y. Zhang, Z. Liu, T. Liu, B. Peng, X. Li, Realpoint3d: An efficient generation network for 3d object reconstruction from a single image, IEEE Access PP (2019) 1–1. doi:10.1109/ACCESS.2019.2914150.

[18] C.-H. Lin, C. Kong, S. Lucey, Learning efficient point cloud generation for 265

dense 3d object reconstruction, in: AAAI Conference on Artificial Intelligence (AAAI), 2018.

Jo

[19] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer, Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size, arXiv:1602.07360.

270

[20] K. Simonyan, A. Zisserman, Very deep convolutional networks for largescale image recognition, international conference on learning representations. 15

Journal Pre-proof

[21] L. Yu, X. Li, C.-W. Fu, D. Cohen-Or, P.-A. Heng, Pu-net: Point cloud

275

pro of

upsampling network, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[22] E. Kalogerakis, A. Hertzmann, K. Singh, Learning 3d mesh segmentation and labeling, international conference on computer graphics and interactive techniques 29 (4) (2010) 102.

[23] E. Kalogerakis, M. Averkiou, S. Maji, S. Chaudhuri, 3d shape segmenta280

tion with projective convolutional networks, computer vision and pattern

re-

recognition (2017) 6630–6639.

[24] L. Yi, L. J. Guibas, A. Hertzmann, V. G. Kim, H. Su, E. Yumer, Learning hierarchical shape segmentation and labeling from online repositories, ACM

285

lP

Transactions on Graphics 36 (4) (2017) 70.

[25] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, inter-

Jo

urn a

national conference on learning representations.

16