140 likes | 203 Views
Vid2RSS Proposed CLEF Task. Vid2RSS in a nutshell. Task: Automatically generate topic-based RSS-feeds from video Data: Dual language TV programs Research challenge: Exploit speech recognition transcripts containing non-redundant spoken information in two languages. 2.
E N D
Vid2RSS in a nutshell • Task: Automatically generate topic-based RSS-feeds from video • Data: Dual language TV programs • Research challenge: Exploit speech recognition transcripts containing non-redundant spoken information in two languages 2
Examples of topic-based RSS-feeds for video • RSS-feeds on 3 topics displayed in Netvibes 3
What is an RSS-feed? • RSS-feeds also called channels; they deliver content intended for a specific audience • Widely used XML-based standard for distributing Internet content • Used to deliver news and other dynamic information sources (blogs, podcasts, vodcasts, …) • Encodes simple metadata (e.g., title, description, link) that allow users to decide to view/read the source • RSS-feed lists items syndicated by the same source or aggregated by topic/theme • Originally text only, but increasingly containing images and video keyframes 4
Basic task • Given • Video of a collection of episodes from a dual language television program • Speech recognition transcripts in both languages • Metadata such as title, description, keyframe and link • Produce • Categorization assigning the episodes to subject/topic category • Two feeds for each category, one in each language • Evaluate by • Comparing to ground-truth subject/topic categories • Visualizing the feeds in an RSS-reader 5
Data • Suggested data (“for starters”): Dutch language program Noorderlicht http://noorderlicht.vpro.nl/ • Dutch documentary show about scientific topics using spoken Dutch and a large portion of spoken English (interviews with international scientists) • Ca. 100 episodes included in TRECVID 2007 data collection • Provided by Beeld and Geluid (“Sound and Vision”) • Transcribed by the University of Twente • Spoken content was not fully exploited by TRECVID • Other dual language programs can be added to Vid2RSS if they can be found 6
Output of basic task • RSS-feeds with a <channel> element that is parent to a set of <item> elements • Each <item> element represents an episode in the basic task (example on next slide) • The automatically generated RSS-feeds output by the basic task should be as close as possible to the hand-generated RSS-feed used on the Noorderlicht site, cf. http://noorderlicht.vpro.nl/themasites/rss/weblog.jsp?rssnr=22236939 • By aiming to produce metadata of the same type that is currently already used to deliver the content will ensure that progress achieved in the Vid2RSS task will have good chance of direct take-up 7
Sample RSS-item • <item> • <title>Stemmen zit tussen de oren</title> • <link>http://noorderlicht.vpro.nl/noorderlog/ bericht/36627903/</link> • <description>Wat we stemmen houden we graag voor ons. Maar dat zou in de toekomst wel eens moeilijker kunnen worden. Want stemvoorkeuren blijken vast te liggen in ons brein, ontdekken Amerikaanse psychologen.</description> • <enclosure>http://images.vpro.nl/img.db?36628306 +s(400)</enclosure> • <pubDate>Mon, 10 Sep 2007 17:42:00 +0200</pubDate> • </item> 8
Vid2RSS basic task summary • Vid2RSS is a new problem: how to best exploit dual language multi-modal content • Basic task of Vid2RSS lays a basis for tasks which present increasingly challenging areas for scientific research (see next 4 sides) • If B&G and UTwente support the task, data is already available • Evaluation will be supported by the availability of human generated metadata for the video episodes • By generating a widely used metadata standard, task output can be directly used to deliver content 9
Advanced Task ISummarization/Keywords • Given • Video of a collection of episodes from a dual language television program • Speech recognition transcripts in both languages • Produce • Automatically generated sets of keywords or summaries that can be used to fill the <title> and <description> elements • Two feeds for each category, one in each language • Evaluate by • Exploiting existing metadata • Visualizing the feeds in an RSS-reader 10
Advanced Task IIKeyframe selection • Given • Video of a collection of episodes from a dual language television program • Speech recognition transcripts in both languages • Human generated metadata • Produce • Automatic selection of keyframe to represent each episode and to fill the <enclosure> element • Two feeds for each category, one in each language • Evaluate by • Human annotators • Visualizing the feeds in an RSS-reader 11
Advanced Task IIIExploiting metadata • Given • Video of a collection of episodes from a dual language television program • Speech recognition transcripts in both languages • Human generated metadata • Produce • Produce sets of keywords or summaries that can be used to fill the <title> and <description> elements • Select keyframe • Two feeds for each category, one in each language • Evaluate by • Exploiting existing metadata • Visualizing the feeds in an RSS-reader 12
Advanced Task IVSemantic segmentation • Given • Collection of episodes from dual language television program • Speech recognition transcripts in both languages • Human generated metadata • Produce • Division of episode into segments each relating to a sub topic • Summarization and keyword generation on a segment level • Keyframe selection on a segment level • Two feeds for each episode, one in each language • Evaluate by • Human annotators • Help from existing human generated metdata • Visualizing the feeds in an RSS-reader 13
Vid2RSS • One basic task • Mixing languages, content, media, metadata • Many possible advanced tasks • Mixing focused IR, summarization, and ... • See http://ilps.science.uva.nl/Vid2RSS/ 14