230 likes | 431 Views
TRECVID Evaluations. Mei-Chen Yeh 03/27/2012. Introduction. Text REtrieval Conference (TREC) Organized by National Institute of Standards (NIST) Support from government agencies Annual evaluation (NOT a competition) Different “tracks” over the years, e.g.
E N D
TRECVID Evaluations Mei-Chen Yeh 03/27/2012
Introduction • Text REtrieval Conference (TREC) • Organized by National Institute of Standards (NIST) • Support from government agencies • Annual evaluation (NOT a competition) • Different “tracks” over the years, e.g. • web retrieval, email spam filtering, question answering, routing, spoken documents, OCR, video (standalone conference from 2001) • TREC Video Retrieval Evaluation (TRECVID)
Introduction • Objectives of TRECVID • Promote progress in content-based analysis and retrieval from digital videos • Provide open, metrics-based evaluation • Model real world situations
Introduction • Evaluation is driven by participants • The collection is fixed, available in the spring • 50% data used for development, 50% for testing • Test queries available in July, 1 month to submission • More details: • http://trecvid.nist.gov/
TRECVID Video Collections • Test data • Broadcast news • TV programs • Surveillance videos • Video rushes provided by BBC • Documentary and educational materials supplied by the Netherlands Institute for Sound and Vision (2007-2009) • The Gatwick airport surveillance videos provided by the UK Home Office (2009) • Web videos (2010) • Languages • English • Arabic • Chinese
Collection History • 2011 • 19200 online videos (150 GB, 600 hours) • 50 hours of airport surveillance videos • 2012 • 27200 online videos (200 GB, 800 hours) • 21,000 equal-length, short clips of BBC rush videos • airport surveillance videos (not yet announced) • ~4,000-hour collection of Internet multimedia
Tasks • Semantic indexing (SIN) • Known-item search (KIS) • Content-based copy detection (CCD) – by 2011 • Interactive surveillance event detection (SED) • Instance search (INS) • Multimedia event detection (MED) • Multimedia event recounting (MER) – since 2012
Semantic indexing • System task: • Given the test collection, master shot reference, and conceptdefinitions, return for each concept a list of at most 2000 shot IDs from the test collection ranked according to their likeliness of containing the concept. • 500 concepts (since 2011) • “Concept pair” (2012)
Examples • Boy (One or more male children) • Teenager • Scientists (Images of people who appear to be scientists) • Dark skinned people • Handshaking • Running • Throwing • Eaters (Putting food or drink in his/her mouth) • Sadness • Anger • Windy (Scenes showing windy weather) Full list
Example (concept pair) • Beach + Mountain • Old_People+ Flags • Animal + Snow • Bird + Waterscape_waterfront • Dog + Indoor • Driver + Female_Human_Face • Person + Underwater • Table + Telephone • Two_People+ Vegetation • Car + Bicycle
Known-item search • Models the situation in which someone knows of a video, has seen it before, believes it is contained in a collection, but doesn't know where to look. • Inputs • A text-only description of the video desired • A test collection of videos • Outputs • Top ranked videos (automatic or interactive mode)
Examples • Find the video with the guy talking about how it just keeps raining. • Find the video about some guys in their apartment talking about some cleaning schedule. • Find the video where a guy talks about the FBI and Britney Spears. • Find the video with the guy in a yellow T-shirt with the big letter M on it. • … http://www-nlpir.nist.gov/projects/tv2010/ki.examples.html
Surveillance event detection • Detects human behaviors in vast amounts surveillance video, real time! • For public safety and security • Event examples • Person runs • Cell to ear • Object put • People meet • Embrace • Pointing • …
Instance search • Finds video segments of a certain specific person, object, or place, given a visual example.
Instance search • Input • a collection of test clips • a collection of queries that delimit a person, object, or place entity in some example video • Output • for each query up to the 1000 clips most likely to contain a recognizable instance of the entity
Multimedia event detection • System task • Given a collection of test videos and a list of test events, indicate whether each of the test events is present anywhere in each of the test videos and give the strength of evidence for each such judgment. • In 2010 • Making a cake: one or more people make a cake • Batting a run in: within a single play during a baseball-type game, a batter hits a ball and one or more runners (possibly including the batter) scores a run • Assembling a shelter: one or more people construct a temporary or semi-permanent shelter for humans that could provide protection from the elements. • 15 new events are released for 2011, not yet announced for 2012.
Multimedia event recounting • New in 2012 • Task • Once a multimedia event detection system has found an event in a video clip, it is useful for a human user to be able to examine the evidence on which the system's decision was based. An important goal is for that evidence to be semantically meaningful to a human. • Input • a clip and a event kit (name, definition, explication--textual exposition of the terms and concepts, evidential descriptions, and illustrative video exemplars) • Output • a clear, concise text-only (alphanumeric) recounting or summary of the key evidence that the event does in fact occur in the video
Schedule • Feb. call for participation • Apr.complete the guidelines • Jun.-Jul.release query data • Sep.submission due • Oct.return the results • Nov.paper submission due • Dec.workshop
Call for partners • Standardized evaluations and comparisons • Test on large collections • Failures are not embarrassing, and can be presented at the TRECVID workshop! • Anyone can participate! • A “priceless” resource for researches