SHREC’ 19 T rack : Extended 2D Scene Sketch-Based 3D Scene Retrieval

SHREC’19 Track: Extended • 2D Scene Sketch-Based 3D Scene Retrieval Juefei Yuan, Hameed Abdul-Rashid, Bo Li, Yijuan Lu, Tobias Schreck, Ngoc-Minh Bui, Trong-Le Do, Khac-Tuan Nguyen, Thanh-An Nguyen, • Vinh-Tiep Nguyen, Minh-Triet Tran, Tianyang Wang

Outline • Introduction • Benchmark • Methods • Results • Conclusions and Future Work

Introduction • 2D Scene Sketch-Based 3D Scene Retrieval • Focuses on retrieving relevant 3D scene models • Using scene sketches as input • Motivation • Vast applications: 3D scene reconstruction, autonomous driving cars, 3D geometry video retrieval, and 3D AR/VR Entertainment • Challenges • 2D sketches lack 3D scene information • Semantic gap: iconic 2D scene sketches and accurate 3D scene models

Introduction (Cont.) • 2D Scene Sketch-Based 3D Scene Retrieval • Brand new research topic in sketch-based 3D object retrieval: • A query sketch contains several objects • Objects may overlap with each other • Relative context configurations among the objects • Our previous work • SHREC’18 track: 2D Scene Sketch-Based 3D Scene Retrieval track • Built SceneSBR2018 [1] benchmark: 10 scene classes, each has 25 sketches and 100 3D models • Good performance called for a more comprehensive dataset • We build the SceneSBR2019 Benchmark • To further promote this challenging research direction • Most comprehensive and largest 2D scene sketch-based 3D scene retrieval benchmark • [1] J. Yuan and et al. SHREC’18 track: 2D scene sketch-based 3D scene retrieval. In 3DOR, pages 1–8, 2018

SceneSBR2019 Benchmark Overview • Overview • We have substantially extended the SceneSBR2018 with 20 additional classes • Building process • Voting method among three individuals • Scene labels chosen from Places88 [2] • Data collected from Flickr, Google Images and 3D Warehouse • [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018

SceneSBR2019 Benchmark • 2D Scene Sketch Query Dataset • 750 2D scene sketches • 30 classes, each with 25 sketches • 3D Scene Model Target Dataset • 3,000 3D scene models • 30 classes, each with 100 models • To evaluate learning-based 3D scene retrieval Table 1Training and testing dataset information of our SceneSBR2019 benchmark

2D Scene Sketch Query Dataset Fig. 1 Example 2D scene query sketches (1 per class)

3D Scene Model Target Dataset Fig. 2 Example 3D target scene models (1 per class)

Evaluation • Seven commonly adopted performance metrics in 3D model retrieval techniques [3]: • Precision-Recall plot (PR) • Nearest Neighbor (NN) • First Tier (FT) • Second Tier (ST) • E-Measures (E) • Discounted Cumulated Gain (DCG) • Average Precision (AP) • We also have developed the code to compute them: • http://orca.st.usm.edu/~bli/SceneSBR2019/data.html [3] B. Li, Y. Lu, C. Li, A. Godil, T. Schreck and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015.

Methods • ResNet50-Based Sketch Recognition and Adapting Place Classification for 3D Models Using Adversarial Training (RNSRAP) • View and Majority Vote Based 3D Scene Retrieval Algorithm (VMV)

RNSRAP: Sketch Recognition with ResNet50 Encoding and Adapting Place Classification for 3D Model Using Adversarial Training Ngoc-Minh Bui1, 2, Trong-Le Do1, 2, Khac-Tuan Nguyen1, Minh-Triet Tran1, Van-Tu Ninh1, Tu-Khiem Le1, Khac-Tuan Nguyen1, Vinh Ton-That1, Vinh-Tiep Nguyen2, Minh N. Do3, Anh-Duc Duong2 1Faculty of Information Technology, Vietnam National University - Ho Chi Minh City, Vietnam 2Software Engineering Lab, Vietnam National University - Ho Chi Minh City, Vietnam 3University of Information Technology, Vietnam National University - Ho Chi Minh City, Vietnam

Two-Step 3D Scene Classification Fig. 3 Two-step process of the 3D scene classification method

Sketch Recognition with ResNet50 Encoding • (1) Use ResNet50 output to encode a sketch image into a 2048-D feature vector • (2) Data augmentation: • Regular transformations: flipping, rotation, translation, and cropping • Saliency map based image synthesis • (3) Use two types of fully connected neural networks • (4) Use multiple classification networks with different initializations for the two types of neural networks • (5) Fuse the results of those models based on the majority-vote scheme to determine the label of a sketch query image

Saliency-Based Selection of 2D Screenshots • Use multiple views of a 3D object for classification • Randomly capture multiple screenshots at 3 different levels of details: • (1) general views, (2) views focusing on a set of entities, and (3) detailed views on a specific entity • Use DHSNet[4] to generate the saliency map of each screenshot • Select promising screenshots of each 3D model for place classification task • A 3D model can be classified with high accuracy (>92%) with no more than 5 information-rich screenshots [4] N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678–686.

Rank List Generation • Assign one or two best labels for each sketch image, and retrieve all 3D models having such labels • The similarity between a sketch and a 3D model: the product of the prediction score of the query sketch and that of the 3D model on the same label • Insert other 3D models which are considered irrelevant in the tail of that rank list with the distance of infinity

VMV: View and Majority Vote Based 3D Scene Retrieval Algorithm Juefei Yuan1, Hameed Abdul-Rashid1, Bo Li1, Yijuan Lu2, Tianyang Wang3 1School of Computing Sciences and Computer Engineering, University of Southern Mississippi, USA 2Department of Computer Science, Texas State University, USA 3Department of Computer Science & Information Technology, Austin Peay State University, USA

VMV Architecture Fig. 4 VMV architecture

VMV Algorithm • VMV six steps • (1) Scene view sampling (Qmacro script) • (2) Data Augmentation • Random rotations, reflections, or translations • (3) Pre-training and training on AlexNet1/VGG1 and • AlexNet2/VGG2 • (4) Fine-tuning on scene sketches/views • (5) Sketch/view classification • (6) Majority vote-based label matching Fig. 5 A set of 13 sample views of an apartment scene model

Precision-Recall Fig. 6 Precision-Recall diagram performance comparisons on the testing dataset of our SceneSBR2019 benchmark for two learning-based participating methods

Other Six Performance Metrics Table 2.Performance metrics comparison on our SceneSBR2019 benchmark for the two learning-based participating methods • More details about the retrieval performance of each individual query of every participating method are available on the SceneSBR2019 track homepage [5] • [5] SceneSBR2019 track Homepage: http://orca.st.usm.edu/~bli/SceneSBR2019/results.html

Discussions • Both of the two submitted approaches utilized CNN models • CNNs contribute a lot to the achieved performance of those two learning-based approaches • Bui utilized object-level semantic information for data augmentation and refining retrieval results • Very promising to utilize both deep learning and scene semantic informationto support large-scale scene retrieval • The overall performance achieved on the SceneIBR2019 track is better than that on the SceneSBR2019 track • SceneIBR2019 benchmark: • Replaced the query datasetwith query images: 1000 foreach class • Much larger 2D image query dataset for better training • More accurate 3D shape information in the query images • Much smaller semantic gap between images and models

Conclusions Conclusions Objective: To foster this challenging and interesting research direction: Scene Sketch-Based 3D Scene Retrieval Dataset: Build the current largest 2D scene sketch 3D scene retrieval benchmark Participation: Though challenging, 2 groups successfully participated in the track and contributed 4 runs of 2 methods Evaluation: Performed a comparative evaluation on the accuracy Impact: Provided the largest and most comprehensive common evaluation platform forsketch-based 3D scene retrieval

Future Work Future work Build a large 2D scene-based 3D scene retrieval benchmark in terms of number of categories and variations within each category Build/search other more realistic 3D scene models 2D scene sketch-based 3D scene retrieval by incorporating semantic information Extend the feature vectors by incorporating the geolocation estimation features 2D scene-based 3D scene retrieval related applications Deep learning models specifically designed for 3D scene retrieval

References • [1] J. Yuan and et al. SHREC’18 track: 2D scene sketch-based 3D scene retrieval. In 3DOR, pages 1–8, 2018 • [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018 • [3] B. Li and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015. • [4]N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678–686. • [5] Extended SceneSBR track Homepage: http://orca.st.usm.edu/~bli/SceneSBR2019/results.html

Thank you! Q&A?

SHREC’ 19 T rack : Extended 2D Scene Sketch-Based 3D Scene Retrieval