SHREC’19 Track: Extended 2D Scene Image-Based 3D Scene Retrieval

SHREC’19 Track: Extended 2D Scene Image-Based 3D Scene Retrieval Hameed Abdul-Rashid, Juefei Yuan, Bo Li, Yijuan Lu, Tobias Schreck, Ngoc-Minh Bui, Trong-Le Do, Mike Holenderski , Dmitri Jarnikov , Khiem T. Le, Vlado Menkovski, Khac-Tuan Nguyen, Thanh-An Nguyen, Vinh-Tiep Nguyen, Tu V. Ninh, Perez Rey, Minh-Triet Tran, Tianyang Wang

Outline • Introduction • Benchmark • Methods • Results • Conclusions and Future Work

Introduction • 2D Scene Image-Based 3D Scene Retrieval • Focuses on retrieving relevant 3D scene models • Using scene Images as input • Motivation • Vast applications: autonomous driving cars (Fig. 1), multi-view 3D scene reconstruction, VR/AR scene content generation, and consumer electronics apps • Challenges • Lacks substantial research due to the involved challenges • Lack of related retrieval benchmarks Fig. 1Renault SYMBIOZ concept

Introduction (Cont.) • 2D Scene Image-Based 3D Scene Retrieval • Brand new research topic in image-based 3D object retrieval: • A query image contains several objects • Objects may overlap with each other • Relative context configurations among the objects • Our previous work • SHREC’18 track: 2D Scene Image-Based 3D Scene Retrieval track • Built SceneIBR2018 [1] benchmark: 10 scene classes, each has 25 sketches and 100 3D models • Good performance called for a more comprehensive dataset • We build the SceneIBR2019 Benchmark • To further promote this challenging research direction • Most comprehensive and largest 2D scene image-based 3D scene retrieval benchmark • [1] H. Abdul-Rashid and et al. SHREC’18 track: 2D scene image-based 3D scene retrieval. In 3DOR, pages 1–8, 2018

SceneIBR2019 Benchmark Overview • Overview • We have substantially extended the SceneIBR2018 with 20 additional classes • Building process • Scene labels chosen from Places88 [2] • Select 30 from 88 available category labels in Places88 • Voting method among three individuals • 2D/3D scene data collected from • Flickr • Google Images • 3D Warehouse [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018

SceneIBR2019 Benchmark • 2D Scene Image Query Dataset • 30,000 2D scene Images • 30 classes, each with 1,000 Images • 3D Scene Model Target Dataset • 3,000 3D scene models • 30 classes, each with 100 models • To evaluate learning-based 3D scene retrieval Table 1Training and testing dataset information of our SceneIBR2019 benchmark.

2D Scene Image Query Dataset Fig. 2Example 2D scene query images (1 per class)

3D Scene Model Target Dataset Fig. 3 Example 3D target scene models (1 per class)

Evaluation • Seven commonly adopted performance metrics in 3D model retrieval techniques [3]: • Precision-Recall plot (PR) • Nearest Neighbor (NN) • First Tier (FT) • Second Tier (ST) • E-Measures (E) • Discounted Cumulated Gain (DCG) • Average Precision (AP) • We also have developed the code to compute them: • http://orca.st.usm.edu/~bli/SceneIBR2019/data.html [3] B. Li and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015.

Methods • ResNet50-Based Image Recognition and Adapting Place Classification for 3D Models Using Adversarial Training (RNIRAP) • Conditional Variational Autoencoders for Image Based Scene Retrieval (CVAE) • View and Majority Vote Based 3D Scene Retrieval Algorithm (VMV)

CVAE: Conditional Variational Autoencoders for Image Based Scene Retrieval Luis Armando Pérez Rey, Mike Holenderski and Dmitri Jarnikov Eindhoven University of Technology, The Netherlands

CVAE Overview Step 1: Render images from 3D scenes and image preprocessing Step 2: Encode the images as probability distributions over classes and latent space with a Conditional Variational Autoencoder (CVAE) Step 3: Calculate similarity between renderings and query image • Fig. 5 Three steps of Conditional Variational Autoencoders for Image Based Scene Retrieval method

Fig. 11 Precision-Recall diagram performance comparisons on the testing dataset of our SceneIBR2019 benchmark for three learning-based participating methods

Results: Performance Metrics Table 2.Performance metrics comparison on our SceneIBR2019 benchmark for the three learning-based participating methods • More details about the retrieval performance of each individual query of every participating method are available on the SceneIBR2019 track homepage [5] • [5] SceneIBR2019 track Homepage: http://orca.st.usm.edu/~bli/SceneIBR2019/results.html

Discussions • All the three methods are CNN deep leaning-based methods • Most promising and popular approach in tackling this direction • Finer classifications • RNIRAP and VMV-VGG: CNN + classification-based approach • CVAE: VAE only • RNIRAP: utilized object-level semantic information for data augmentation and refining retrieval results • Significant performance drop if compared with SceneIBR2018 • Distinct 10 scene categories in SceneIBR2018 • Introduction of many correlating categories in SceneIBR2019 • Better overall performance on the SceneIBR2019 track, compared with that on the SceneSBR2019 track • Same reason: a larger and information-rich query dataset

Conclusions and Future Work Conclusions Objective: To foster this challenging and interesting research direction: Scene Image-Based 3D Scene Retrieval Dataset: Build the current largest 2D scene image 3D scene retrieval benchmark Participation: Though challenging, 3 groups successfully participated in the track and contributed 8 runs of 3 methods Evaluation: Performed a comparative evaluation on the accuracy Future work Large-scale benchmarks supporting multiple modalities 2D queries: images, sketches 3D target models: meshes, RGB-D, LIDAR, range scans Semantics-driven retrieval approaches Classification-based retrieval

References • [1] H. Abdul-Rashid and et al. SHREC’18 track: 2D scene image-based 3D scene retrieval. In 3DOR, pages 1–8, 2018. • [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018 • [3] B. Li and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015. • [4] N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678–686. • [5] SceneIBR2019 track Homepage: http://orca.st.usm.edu/~bli/SceneIBR2019/results.html

Thank you! Q&A?

SHREC’19 Track: Extended 2D Scene Image-Based 3D Scene Retrieval