520 likes | 528 Views
CS 8803 CVL: Vision and Language. Devi Parikh School of Interactive Computing. Welcome!. Plan for today. Topic overview Introductions Course overview: Logistics Requirements Lecture format Please interrupt at any time with questions or comments. Computer Vision.
E N D
CS 8803 CVL: Vision and Language Devi Parikh School of Interactive Computing
Plan for today • Topic overview • Introductions • Course overview: • Logistics • Requirements • Lecture format • Please interrupt at any time with questions or comments
Computer Vision Automatic understanding of images and video Computing properties of the 3D world from visual data (measurement) Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation) Algorithms to mine, search, and interact with visual data (search and organization) Kristen Grauman
What does recognition involve? Fei-Fei Li
Object categorization mountain tree building banner street lamp vendor people
Instance recognition Potala Palace A particular sign
Scene and context categorization • outdoor • city • …
Attribute recognition gray made of fabric crowded flat
It was a great event! It brought families out, and the whole community together.
AI: What a nice picture! What event was this? User:“Color College Avenue”. It was a lot of fun! AI: I am sure it was! Do they do this every year? User:I wish they would. I don’t think they’ve organized it again since 2012. …
Why Words and Pictures? 1 Pictures are everywhere Words are how we communicate
Why Words and Pictures? 1 Applications
Why Words and Pictures? 1 Applications Interact with, organize, and navigate visual data
Why Words and Pictures? 1 Applications Leverage multi-modal information on the web
Why Words and Pictures? 1 Applications Aid visually-impaired users Microsoft
Why Words and Pictures? 1 Applications Aid visually-impaired users
Why Words and Pictures? 1 Applications Summarize visual data for analysts
Why Words and Pictures? 2 • Measuring and demonstrating AI capabilities • Image understanding • Language understanding
Why Words and Pictures? 3 • Beyond “bucket” recognition • Language is compositional “A steam engine is coming out of a fireplace.” René Magritte (1938)
Why Words and Pictures? 4 “Vision is our best sensor, and language is our best invention.” -- Viraj Prabhu
My goals (for you) • Be well-versed in the latest in vision + language • Critique research papers in vision + language • Identify interesting open questions and applications • Execute a research project in vision + language
Introductions • Devi Parikh • Ph.D., Carnegie Mellon University, 2009 • Research Assistant Professor, TTI-Chicago, 2013 • Assistant Professor, ECE, Virginia Tech, 2016 • Assistant Professor, School of Interactive Computing, Georgia Tech (currently) • Research Scientist, Facebook AI Research (currently)
Introductions • Arjun Chandrasekaran (your TA) • CS Ph.D. Student • Georgia Tech • CV, ML, NLP, AI • language and vision • making human-AI interaction more natural and efficient
Introductions • Larry He (your second TA) • CS MS Student • Georgia Tech
Introductions • Which program are you in? • How far along? • Have you taken a computer vision course before? • Have you taken a machine learning course before? • Do you know how CNNs and LSTMs work? • Have you used a deep learning package before? • What are you hoping to get out of this class?
This course CS 8803 CVL Klaus 2456, TR 1:30 pm to 2:45 pm Course webpage: http://www.prism.gatech.edu/~arjun9/CS8803_CVL_Fall17/ Piazza: https://piazza.com/gatech/fall2017/cs8803cvl/home Focus on topics at the intersection of vision and language Cutting edge research
Requirements Paper reviews each class [30%] Leading discussion(~once) on papers [10%] Project [60%] No “Assignments”, Exams, etc.
Prerequisites Course in computer vision Course in machine learning Basic knowledge of deep learning
Paper reviews For each class Review one paper Submit by midnight before class Submission workflow: TBD Skip reviews the class you are leading discussion Late reviews will not be accepted Will drop three lowest grades on reviews
Paper review guidelines One page Detailed review: Brief (2-3 sentences) summary Main contribution Strengths? Weaknesses? How convincing are the experiments? Suggestions to improve them? Extensions? Applications? Additional comments, unclear points Relationships observed between the papers we are reading Pull out most interesting thought Look at class webpage Write in your own words Write well, proof read
Leading Discussion ~ One of you will be assigned to argue for the paper ~ One of you will be assigned to argue against the paper Come prepared with 5 points Sign up here by August 29th: https://docs.google.com/spreadsheets/d/1E0uBxZ5gyKRzsrz2RJP9WTgCV5TYEby7EYTjvCwK-WM/edit?usp=sharing
Projects First few lectures: introductory talks Image captioning Visual question answering Visual dialog By lead authors of representative works in this space
Projects Possibilities: Design and evaluate a novel approach A novel application, use case Extension of a technique studied in class Be creative! Think: research paper at a good conference Work in teams of ~4 (at most 15 teams in the class) Sign up for teams by September 8thhttps://docs.google.com/spreadsheets/d/1n0aP3k7BwguFS0BNt5aUfMpo7-JPiHqQQ2K258YJPh0/edit?usp=sharing
Project timeline Four in-class presentations (see class schedule) Project ideas / proposal [10%] Update 1 [10%] Update 2 [10%] Final presentation [15%] Project video (1 minute) [15%] December 5th
Tips Make sure you are saying everything we need to know to understand what you are saying. Make sure you know what you are talking about. Think about your audience. Make your talks visual, animated (images, video, not lots of text). Stick to the time limit!
Tips Clearly define the problem statement (input, output) Place your work in the context of existing work you know of Lay out the set of experiments you’ll conduct to demonstrate the efficacy of your approach Present a timeline Concrete goals for next update in ~2.5 weeks Long shots Present updates along this plan See more details on class webpage Stick to the time limit!
Implementation Use any language / platform / package you like No support for code / implementation issues will be provided Possibility of consulting with lead authors who gave the introductory talks
Miscellaneous Best presentation, best project and best discussion prizes! We will vote Feedback welcome and useful
Context • Deep Learning (CS 7643) • This course is complementary to it
Coming up • Read the class webpage • Schedule is up • Select 6 dates (topics) you would like to lead the discussion on (by August 29th) • Sign up sheet shows how many people have already signed up for a topic • Select those that have fewer selections • Probability of dropping class? • Start thinking about project teams • Pointers to good presentations, reviews, etc. are on the class webpage.
Moving forward • No class on Thursday • Three lectures after that • No paper reading, no review, no discussion • Introductory talks covering spectrum of vision + language tasks
Each lecture after that • You will have read and summarized a paper the night before • ~ 15 minute discussion on paper we read • Led by two students: “for” and “against” • 10-minute presentation by 3 teams on projects • 10-minute discussion on each presentation
Last two lectures • Final project presentations