130 likes | 139 Views
GuessWhat?! is a dataset and methodology for learning natural language acquisition through interaction on a visual task. It involves high-level image understanding, including spatial reasoning and language grounding. The dataset includes images and dialogue, making it the first large-scale dataset of its kind.
E N D
GuessWhat?! Visual object discovery through multi-modal dialogue Harm de Vries , Florian Strub, SarathChandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville
Motivation • Learn to acquire natural language by interaction on a visual task • First large-scale dataset involving images and dialogue. • Requires high-level image understanding, like spatial reasoning and language grounding
Critique -All information about image is useless in the guesser model -Using two trained models to evaluate Question Generator -Access to object list -When to guess? -More baselines -Unseen object categories -Unrealistic task