1 / 70

Dhruv Batra (Virginia Tech)

Discover the latest insights, winners, and analysis of the VQA Challenge by top experts from Virginia Tech and Facebook AI Research. Explore dataset stats, accuracy metrics, and notable teams in this cutting-edge competition.

eturner
Download Presentation

Dhruv Batra (Virginia Tech)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Challenge Aishwarya Agrawal(Virginia Tech) Stanislaw Antol(Virginia Tech) Larry Zitnick(Facebook AI Research) Dhruv Batra(Virginia Tech) Devi Parikh(Virginia Tech)

  2. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  3. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  4. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  5. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  6. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  7. VQA Task

  8. VQA Task What is the mustache made of?

  9. VQA Task AI System What is the mustache made of?

  10. VQA Task AI System bananas What is the mustache made of?

  11. Real images (from COCO) Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext.” ECCV 2014. http://mscoco.org/

  12. and abstract scenes.

  13. Questions Stump a smart robot! Ask a question that a human can answer,but a smart robot probably can’t!

  14. VQA Dataset

  15. Dataset Stats • >250K images (COCO + 50K Abstract Scenes) • >750K questions (3 per image) • ~10M answers (10 w/ image + 3 w/o image)

  16. Two modalities of answering • Open Ended • Multiple Choice (18 choices) • 1 correct answer • 3 plausible choices • 10 most popular answers • Rest random answers

  17. Accuracy Metric

  18. Human Accuracy (Real)

  19. Human Accuracy (Real)

  20. Human Accuracy (Abstract)

  21. Human Accuracy (Abstract)

  22. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  23. VQA Challenges on www.codalab.org Real Open Ended Real Real Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

  24. VQA Challenges on www.codalab.org Real Open Ended Real Real Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

  25. Real Image Challenges: Dataset Dataset size is approximate

  26. Real Image Challenges: Dataset Dataset size is approximate

  27. Real Image Challenges: Dataset Dataset size is approximate

  28. Real Image Challenges: Test Dataset • 80K test images • Four splits of 20K images each • Test-dev(development) • Debugging and Validation - unlimited submission to the evaluation server. • Test-standard(publications) • Used to score entries for the Public Leaderboard. • Test-challenge(competitions) • Used to rank challenge participants. • Test-reserve(check overfitting) • Used to estimate overfitting. Scores on this set are never released. Dataset size is approximate Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015

  29. VQA Challenges on www.codalab.org Real Open Ended Real Real Multiple Choice Abstract Open Ended Abstract Abstract Multiple Choice

  30. Abstract Scene Challenges: Dataset

  31. Abstract Scene Challenges: Dataset

  32. Abstract Scene Challenges: Dataset

  33. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  34. Award GPUs!!!

  35. Abstract Scene Challenges • Open-Ended Challenge • 5 teams • 5 institutions • 3 countries • Multiple-Choice Challenge • 4 teams • 4 institutions • 3 countries • Top 3 teams are same for Open Ended and Multiple Choice

  36. Abstract Scene Challenges Winner Team MIL-UT Andrew Shin* Kuniaki Saito* Yoshitaka Ushiku Tatsuya Harada Open Ended Challenge Accuracy: 67.39 Multiple ChoiceChallenge Accuracy: 71.18

  37. Real Image Challenges • Open-Ended Challenge • 25 teams • 26 institutions • 8 countries • Multiple-Choice Challenge • 15 teams • 17 institutions • 6 countries • Top 5 teams are same for Open Ended and Multiple Choice

  38. Real Image Challenges Honorable Mention Brandeis Aaditya Prakash Open Ended Challenge Accuracy: 62.80 Multiple Choice Challenge Accuracy: 65.17

  39. Real Image Challenges Runner-Up Team Naver Labs Hyeonseob Nam JeongheeKim Open Ended Challenge Accuracy: 64.89 Multiple ChoiceChallenge Accuracy: 69.37

  40. Real Image Challenges Winner Team UC Berkeley & Sony Akira Fukui Dong HukPark Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach Open Ended Challenge Accuracy: 66.90 Multiple ChoiceChallenge Accuracy: 70.52

  41. Outline • Overview of Task and Dataset • Overview of Challenge • Winner Announcements • Analysis of Results

  42. Real Open-Ended Challenge arXiv v6 ICCV15

  43. Real Open-Ended Challenge +12.76% absolute

  44. Statistical Significance • Bootstrap samples 5000 times • @ 99% confidence

  45. Real Open-Ended Challenge

  46. Easy vs. Difficult Questions (Real Open-Ended Challenge)

  47. Easy vs. Difficult Questions (Real Open-Ended Challenge)

  48. Easy vs. Difficult Questions (Real Open-Ended Challenge) • 80.6% of questions can be answered by at least 1 method! Difficult Questions

  49. Easy vs. Difficult Questions (Real Open-Ended Challenge) Easy Questions Difficult Questions

  50. Difficult Questions with Rare Answers

More Related