700 likes | 715 Views
Discover the latest insights, winners, and analysis of the VQA Challenge by top experts from Virginia Tech and Facebook AI Research. Explore dataset stats, accuracy metrics, and notable teams in this cutting-edge competition.
E N D
Overview of Challenge Aishwarya Agrawal(Virginia Tech) Stanislaw Antol(Virginia Tech) Larry Zitnick(Facebook AI Research) Dhruv Batra(Virginia Tech) Devi Parikh(Virginia Tech)
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
VQA Task What is the mustache made of?
VQA Task AI System What is the mustache made of?
VQA Task AI System bananas What is the mustache made of?
Real images (from COCO) Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext.” ECCV 2014. http://mscoco.org/
Questions Stump a smart robot! Ask a question that a human can answer,but a smart robot probably can’t!
Dataset Stats • >250K images (COCO + 50K Abstract Scenes) • >750K questions (3 per image) • ~10M answers (10 w/ image + 3 w/o image)
Two modalities of answering • Open Ended • Multiple Choice (18 choices) • 1 correct answer • 3 plausible choices • 10 most popular answers • Rest random answers
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
VQA Challenges on www.codalab.org Real Open Ended Real Real Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice
VQA Challenges on www.codalab.org Real Open Ended Real Real Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice
Real Image Challenges: Dataset Dataset size is approximate
Real Image Challenges: Dataset Dataset size is approximate
Real Image Challenges: Dataset Dataset size is approximate
Real Image Challenges: Test Dataset • 80K test images • Four splits of 20K images each • Test-dev(development) • Debugging and Validation - unlimited submission to the evaluation server. • Test-standard(publications) • Used to score entries for the Public Leaderboard. • Test-challenge(competitions) • Used to rank challenge participants. • Test-reserve(check overfitting) • Used to estimate overfitting. Scores on this set are never released. Dataset size is approximate Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015
VQA Challenges on www.codalab.org Real Open Ended Real Real Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
Abstract Scene Challenges • Open-Ended Challenge • 5 teams • 5 institutions • 3 countries • Multiple-Choice Challenge • 4 teams • 4 institutions • 3 countries • Top 3 teams are same for Open Ended and Multiple Choice
Abstract Scene Challenges Winner Team MIL-UT Andrew Shin* Kuniaki Saito* Yoshitaka Ushiku Tatsuya Harada Open Ended Challenge Accuracy: 67.39 Multiple ChoiceChallenge Accuracy: 71.18
Real Image Challenges • Open-Ended Challenge • 25 teams • 26 institutions • 8 countries • Multiple-Choice Challenge • 15 teams • 17 institutions • 6 countries • Top 5 teams are same for Open Ended and Multiple Choice
Real Image Challenges Honorable Mention Brandeis Aaditya Prakash Open Ended Challenge Accuracy: 62.80 Multiple Choice Challenge Accuracy: 65.17
Real Image Challenges Runner-Up Team Naver Labs Hyeonseob Nam JeongheeKim Open Ended Challenge Accuracy: 64.89 Multiple ChoiceChallenge Accuracy: 69.37
Real Image Challenges Winner Team UC Berkeley & Sony Akira Fukui Dong HukPark Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach Open Ended Challenge Accuracy: 66.90 Multiple ChoiceChallenge Accuracy: 70.52
Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results
Real Open-Ended Challenge arXiv v6 ICCV15
Real Open-Ended Challenge +12.76% absolute
Statistical Significance • Bootstrap samples 5000 times • @ 99% confidence
Easy vs. Difficult Questions (Real Open-Ended Challenge)
Easy vs. Difficult Questions (Real Open-Ended Challenge)
Easy vs. Difficult Questions (Real Open-Ended Challenge) • 80.6% of questions can be answered by at least 1 method! Difficult Questions
Easy vs. Difficult Questions (Real Open-Ended Challenge) Easy Questions Difficult Questions