210 likes | 219 Views
The TREC2001 Video Track: Information Retrieval on Digital Video Information. Alan F. Smeaton Centre for Digital Video Processing, Dublin City University, Ireland Paul Over National Institute for Standards and Technology, USA
E N D
The TREC2001 Video Track:Information Retrieval on Digital Video Information Alan F. SmeatonCentre for Digital Video Processing, Dublin City University, Ireland Paul OverNational Institute for Standards and Technology, USA Cash J. Costello Applied Physics Laboratory, Johns Hopkins University, USA Arjen P. de Vries CWI, Amsterdam, The Netherlands David DoermannLaboratory for Language and Media Processing, University of Maryland, USA Alexander HauptmannSchool of Computer Science, Carnegie Mellon University, USA Mark E. RorvigSchool of Library and Information Sciences, University of North Texas, USA John R. SmithIBM T.J. Watson Research Center, USA Lide Wu Dept. of Computer Science, Fudan University, China
Presentation overview • TREC2001 • TREC2001 Video Track • TREC2001 Video Track Tasks • Shot Boundary Detection Task • Search Task • Search Task • Participants in Search Task & Their Focus • Summary of approaches by participants • Conclusion 2/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
TREC (Text REtrieval Conference) • Annual activity (1992- ) to “benchmark the retrieval effectiveness of Information Retrieval tasks” • Co-ordinator NIST (National Institute for Standards and Technology, US) defines & distributes: • Test document corpus • Topics (queries) • Participating groups develop an IR system, run Topics against Test document corpus, sends the results to NIST • NIST generate relevance assessments and calculate the performance in terms of precision & recall • Annual conference in Gaithersburg, Maryland 3/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
“Tracks” in TREC • Different streams, introduced to focuses on a particular sub-problems in Information Retrieval • 15 different “tracks” have been introduced, some stopped, some continuing, e.g: • Interactive track 1993- • Chinese language track 1995-1998 • Web track 1998- • Question Answering track 1998- • Video track 2001- 4/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Video Track in TREC2001 • 1st Video Track in 2001 • Promote progress in content-based retrieval from digital video via open, metrics-based evaluation • 12 Participating groups (5 USA, 2 Asia, 5 Europe) - contributing definition of corpus, topics, task via discussion, and running of the track • Following the TREC framework: NIST co-ordinated and provided: • Video document corpus • Topic queries 5/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Video Track in TREC2001 • Video document corpus - total 11.2 hours (85 video files in MPEG-1 format; 6.3 Gbytes), mostly documentary nature, varying in age, style and quality e.g: • “A New Horizon” (16 min; colour; documentary) - This Great Plains orientation tape explains the boundaries of the Great Plains Region which is one of five regions that make up the Bureau of Reclamation • “Challenge at Glen Canyon” (26 min; colour; documentary) - Shows how the repairing of the spillway caused by flooding along the Colorado River System was conducted 6/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Video Track in TREC2001 • 74Topics (queries) - with multimedia examples (audio/image/video) along with each topic, e.g: • Topic #8: “find clips showing the planet Jupiter” (with 2 images depicting Jupiter) • Topic #32: “find clips with a chopper landing” (with 3 audio clips of a helicopter sound) • Topic #54: “find clips showing Glen Canyon dam” (with a short video clip showing Glen Canyon dam) Number of topics 74 No. topics with image examples / Avg. number of images 26 / 2.0 No. topics with audio examples / Avg. number of audio 10 / 4.3 No. topics with video examples / Avg. number of videos 51 / 2.4 7/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Video file Camera shot Tasks in Video Track in TREC2001 • Two distinctive tasks: • Shot Boundary Detection task: engineering exercise to evaluate the accuracy of automatically detecting camera shot boundaries in the video corpus • Facilitates higher-level video indexing/browsing (e.g scene detection/navigation, news story segmentation…) 8/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Tasks in Video Track in TREC2001 • Two distinctive tasks: • Search task: running topic queries against the video corpus, searching for the video segments that answer the queries • Automatic • Interactive • Answer segments are submitted to NIST for evaluation 9/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Participating Groups in Search Task • Among 12 participating groups in the TREC2001 Video Track: • all 12 groups took part in the Shot Boundary Task • 8 groups took part in the Search Task • Participants in Search Task: • Carnegie Mellon University, USA • Dublin City University, Ireland • Fudan University, China • IBM Research, USA • Johns Hopkins University, USA • Lowlands Group (Netherlands) • University of Maryland, USA • University of North Texas, USA 10/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Carnegie Mellon University (USA) • Used Informedia Digital Video Library’s standard processing modules • Shot Boundary Detection (using color histogram comparison) • Keyframe extraction • Speech recognition (using Sphinx speech recogniser with 64,000 word vocabulary) • Face detection • Video OCR • Image search based on color histogram features in different colour spaces and textures • Informedia interface for Interactive track, users allowed to switch between multiple image search engines • Image retrieval augmented to process I-frames (not only keyframes) • Speaker identification component used to compare query audio example to the audio in the retrieved video segment • Image retrieval & video OCR had the largest impact on performance 11/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Dublin City University (Ireland) • Using FíschlárDigital Video System • Shot boundary detection & Keyframe extraction • Allowed users to browse through keyframes with different browsing interfaces including: • Timeline browser (linear, spatial keyframe presentation) • Slide Show browser (linear, temporal keyframe presentation) • Hierarchical browser (hierarchical, spatial keyframe presentation) • 30 test users (final year undergrads & research students) interacted with the system in controlled environment • 12 topic queries / user • 6 minutes / topic query • within-user setting (each user used all 3 browsers 4 times each, in round robin fashion) • Timeline browser allowed largest number of answer submissions, with lowest precision, Slide Show vice versa 12/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Fudan University (China) • Tried 17 topics including people searching, video text searching, camera motion, etc.) • Feature extraction module: • qualitative camera motion analysis module • face detection/recognition module (skin color based segmentation + motion/shape filtering, use of a new optimal discrimination criterion) • video text detection/recognition module (vertical edge based methods to detect text blocks; improved logical level technique to binarize text blocks) • speaker recognition / speaker clustering module • Speech SDK (Microsoft) to get transcript • Off-line indexing followed by on-line searching 13/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
IBM Research • Members from IBM T.J. Watson Research Center & IBM Almaden Research Center • Using IBM CueVideo System • Shot Boundary Detection & Keyframe extraction • MPEG-7 visual descriptors for indexing keyframes & answering automatic searches • Statistical model for classifying & generating labels/scores for: • events (fire, smoke, launch) • scenes (greenery, land, outdoors, rock, sand, sky, water) • objects (airplane, boat, rocket, vehicle, faces) • Query/filter pipelines to cascaded content- & model-based searching, e.g “shots that have similar colour to this image, have label ‘outdoors’ and show a ‘boat’ ” • Compared performance of content/module-based system vs. speech-based system: best results obtained by combining the two methods 14/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Johns Hopkins University (USA) • Automatic searching: • Keyframes are used for indexing by color histogram & image texture • Query representation consisting of image & video portion of information need • Similarity measure by weighting distance between the image features of the query representation and the indexed keyframes: Shots with most similar keyframes associated are then retrieved. 15/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Lowlands Group (The Netherlands) • Joint group among database group of CWI, multimedia group of TNO, vision group of University of Amsterdam, language technology group of University of Twente • Retrieval engine based on: • face detection • camera motion detection (pan, tilt, zoom) • monologue detection • video OCR detection • System heuristically selected a set of filters based on the detectors by analysing the query text with WordNet • Compared performance with Transcript-based (provided by CMU) system • Transcript-based system outperformed features-based system 16/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
University of Maryland (USA) • Temporal Color Correlogram - to capture the spatio-temporal relationship of colors in a video shot • Using MERIT system with VideoLogger video editing software (from Virage) • Keyframe extraction (1st frame in the shot) => static image color correlogram calculation => temporal correlogram calculation (by shot segmentation in equal intervals, then shot features fed into CMRS retrieval system) • TREC topic queries were translated into example videos/images 17/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
University of North Texas (USA) • Keyframe extraction (frames every 5 seconds) • Redundant keyframe removal (to ensure presence of frames outside the prescribed normal distribution limits) • Resulting keyframes placed into UNT’s Brighton Image Searcher application (retrieval based on mathematical measures that correspond to primitive image features) • 13 topics used by 2 members to retrieve relevant keyframes against topics • Chosen keyframes were then used as an exemplar to find other keyframes similar to them. • Precision scores were better than expected due to the human judgement presence 18/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Summary & Analysis of Approaches • Varied approaches by different groups • Interactive searching vs. automatic searching • Speech recognition transcript vs. visual-only • Various combination of different features for retrieval • Experienced groups vs. new groups in video retrieval • Performance (Precision) results varied greatly: • Interactive: Best group 0.6 - Worst group 0.23 (across same 31 topics) • Automatic: 0.609 - 0.002 • The video track was still shaping itself in 2001 & not complete • only small-scale comparisons possible (within-topic, between closely related system variants) • cross-system comparison possible only after achieving better consistency in topic formulation, agreement on better measures, larger numbers of data points) • Difficulties & unforeseen problems highlighted, tackled in 2nd Video track in TREC2002 19/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Conclusions • Revealed lots of issues to be addressed in evaluating the performance of retrieval on digital video information • There are groups working in this area worldwide who have the capability and the systems to support real information retrieval on significant volumes of digital video content • 2nd Video Track (2002) • more than 20 participating groups • 68.5 hours of video document corpus • 25 focused set of topic queries • Tasks: • Shot Boundary Detection - as before • Semantic feature extraction task (face, indoor/outdoor, landscape/cityscape, speech/music/monologue, etc.) • Search - interactive or automatic as before 20/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Conclusion TREC2001 Video Track website with papers: http://www-nlpir.nist.gov/projects/t01v/t01v.html Authors’ Note: The authors wish to extend our sympathies to the family and friends of our co-author, Mark E. Rorvig, who passed away shortly before this paper was submitted. 21/21 TREC2001 Video Track: Information Retrieval on Digital Video Information