Video indexing and retrieval at TREC 2002

Video indexing and retrieval atTREC 2002 Christian Wolf1 wolf@rfv.insa-lyon.fr David Doermann2 doermann@umiacs.umd.edu 1Laboratoire de Reconnaissance de Formes et Vision Institut National des Sciences Appliquées de Lyon Bât. Jules Verne, 20, Avenue Albert Einstein 69621 Villeurbanne cedex, France 2Laboratory for Language and Media Processing Institute for Advanced Computer Studies University of Maryland College Park, MD 20742-3275, USA

Introduction Features & Query types Experimental Results Impact of Features Conclusion Plan of the presentation • Introduction - The TREC Competition • Features & query techniques • Experiments & Results • Run types • Example queries • The impact of speech/text/color • Conclusion and Outlook

Introduction Features & Query types Experimental Results Impact of Features Conclusion The NIST TExt Retrieval Conference • The goal of the conference series is to encourage research in information retrieval from large amounts of text by providing • a large test collection • uniform scoring procedures • a forum for organizations interested in comparing their results • The Video Retrieval Track aims at the investigation of content-based retrieval from digital video. 68.45 hours of MPEG 1 from “the internet archive” and the “open video project”

Feature development collection (23.6h) Feature test collection (5h) Search test collection (40.12h) Introduction Features & Query types Experimental Results Impact of Features Conclusion Aims and Tasks 3 sub tasks are defined in the Video Track, and participants are free to choose for which tasks they want to submit results: Shot boundary determination Feature extraction Search

Introduction Features & Query types Experimental Results Impact of Features Conclusion Search: different query types Two different query types are supported by the competition: manual and interactive queries.

Introduction Features & Query types Experimental Results Impact of Features Conclusion Example search topics

Text examples Non-Text ex. A linear classifier trained with Fisher’s linear discriminant is used to classfy the OCR output for each text box into text and non text. Separation of characters into 4 types: TONY RIYERA ARNOLD GILLESPIE EUGENE PODDANY EMERY NAWKúN5 GEORGE GORDON GERALD NEYIU D i recto r TRUE BOAROMAN CARL URBAN Art Direction EMERY NAWKINS Music Score Director GEORGE GORDON l E W K E LLER PRODUCTION a yen Pu s1c~ . .a i ~ i a 7) E nAl~ 1 I. Mol, 6 I J'-N r ~v i r low r e,740~17-j F 00 Ii s !'/ Features: Number of good characters (upper+lower+digits) F1= Number of characters Number of class changes F2= Number of characters Introduction Features & Query types Experimental Results Impact of Features Conclusion The feature extraction task: overlay text Detection, Multiple frame integration Binarization: OCR: Scansoft “Soukaina Oufkir” Suppression of false alarms

Donated features: 10 different binary features from different donators (all in all 32 detectors). Confidence is given for each shot. MPEG7-XML Outdoors IBM Outdoors Mediamill Outdoors MSRA Face IBM Face IBM Face Mediamill Face MSRA 14524 shots Speech recognition LIMSI Donated featureMPEG7-XML Speech recognition MSRA Developped by INSA de Lyon. [Wolf and Jolion, 2002] Detected and recognized text Developed by UMD in collaboration with the University of Oulu. [Rautiainen and Doermann, 2002] Temporal Color Correlograms Introduction Features & Query types Experimental Results Impact of Features Conclusion Features search test collection (40h) Shot boundary definition (MPEG7-XML)

Introduction Features & Query types Experimental Results Impact of Features Conclusion Query techniques Temporal color features Text Speech Binary features Query

MG has been written for error free documents so it checks for exact matches on the stemmed words (e.g. produced fits producer). We added an inexact match feature by using N-grams: Target: “Nick Chandler” Query: “chandler” N-gram: chand|handl|andle|ndler|chandl|handle|andler|chandle|handler|chandler Results: ni ck l6 tia ndler colleges cattlemen handlers of livestock Introduction Features & Query types Experimental Results Impact of Features Conclusion Recognized text and speech • For the actual retrieval we used the freely available managing gigabytes software (http://www.cs.mu.oz.au/mg). Two query metrics are available: • Boolean • Ranked, based on the cosine measure.

X Training the combining classifier The product rule Quantifies the true likelihood, if the features are statistically independent. Bad if base classifiers are weakly trained or have high error rates. The sum rule Works well with base classifiers with independent noise behaviour. Cij ... Output of classifier j for class i Qi ... Output of combined classifier for class i Introduction Features & Query types Experimental Results Impact of Features Conclusion Binary features The binary features specify the presence of a feature in each shot, the information being given in the confidence measure [0,1]. People - IBM People - Mediamill People - MSRA Outdoors - IBM Outdoors - Mediamill Outdoors - MSRA ...

0.27 0.27 1.0 0.87 0.23 1.0 0.94 0.56 1.0 Eucledian distance 0.15 0.15 1.0 0.08 0.76 1.0 0.65 0.07 0.0 Mahalanobis distance 1 1 0  ... Covariance matrix for the complete data set Introduction Features & Query types Experimental Results Impact of Features Conclusion Binary features - ranked queries Query vector Shot 2 Shot 1 3 dimensional case: People - IBM People - Mediamill People - MSRA Outdoors - IBM Outdoors - Mediamill Indoors - IBM

Introduction Features & Query types Experimental Results Impact of Features Conclusion Temporal color features For each shot, a temporal color correlogram is held. [Rautiainen and Doermann, 2002]: It stores the probability that, given any pixel p1 of color ci, a pixel p2 at distance d is of color cj among the shots frames In. The distance is calculated using the L1 norm. TREC: Auto correlogram  ci = cj

Introduction Features & Query types Experimental Results Impact of Features Conclusion The query tool

Query 1 Query 2 Query 3 Query 4 1.00 1.00 1.00 1.00 0.96 0.70 0.30 0.20 0.00 0.00 0.00 0.00 Introduction Features & Query types Experimental Results Impact of Features Conclusion Querying • Keyword based queries on text or speech or both together, with or without n-grams, boolean or ranked. • Ranked color queries. • Ranked queries on binary features. • Filters on binary features. • AND, OR combination of query results incl. weighted combinations of the ranking of both queries. • Truncate queries. • View the keyframes of queries. • Export query results into stardom, the graphical browsing tool.

Introduction Features & Query types Experimental Results Impact of Features Conclusion Stardom

Introduction Features & Query types Experimental Results Impact of Features Conclusion

Introduction Features & Query types Experimental Results Impact of Features Conclusion Experiments • Manual run using all available features. • Manual run without speech recognition. • Interactive run using all available features. The graphical tool was used to browse the data, but all submitted results were queries submitted by the command line tool.

Introduction Features & Query types Experimental Results Impact of Features Conclusion Example queries “Find additional shots of James H. Chandler”: manual query: OR AND “Shots of rockets or missiles taking off”: manual & interactive: OR AND

Manual query OR AND Interactive query OR AND OR OR Introduction Features & Query types Experimental Results Impact of Features Conclusion Manual vs. interactive queries

Example false alarm Distributions of the 3 “people” detectors Introduction Features & Query types Experimental Results Impact of Features Conclusion Ranked binary queries: distance functions Full query Binary query only

Introduction Features & Query types Experimental Results Impact of Features Conclusion Precision curves per topic Precision / result set size Interactive Manual Manual no ASR Interactive Precision / recall

Introduction Features & Query types Experimental Results Impact of Features Conclusion Precision curves consolidated Precision / result set size Manual Manual no ASR Interactive Precision / recall

Introduction Features & Query types Experimental Results Impact of Features Conclusion Comparison with other teams Average precison - Manual

Introduction Features & Query types Experimental Results Impact of Features Conclusion Comparison with other teams Average precision - Interactive

Introduction Features & Query types Experimental Results Impact of Features Conclusion Speech The quality of the speech queries highly depends on the topic. In general, the return sets of speech queries are very heterogenous and need to be filtered, e.g. by binary filters. Example: “rocket missile”

Introduction Features & Query types Experimental Results Impact of Features Conclusion Color As expected, the color filters have been very useful in cases where the query images where very different from other images in terms of low level features, or where the relevant shots in the database share common color properties with the example query (e.g. shots are in the same environment). Query “living cells”: results of the run without speech are better than the run including speech.

Introduction Features & Query types Experimental Results Impact of Features Conclusion Color Searching for “James Chandler” using the color features only.

“Oil” “Air plane” “Airline” “Dance” Introduction Features & Query types Experimental Results Impact of Features Conclusion Recognized text “Music” The type of videos present in the collection does not favor the use of recognized text. In most videos, the only text present in the documentaries is the title at the beginning and the casting at the end. “Energy Gas”

Conclusion and Outlook • Exploit temporal continuities between the frames, as already proposed by the dutch team during TREC 2001. This seems to be especially important for video OCR, since sometimes single shots with text only “interrupt” content shots. • Training of the combination of features. • More research into the combination of the binary features (normalization, robust outlier detection etc.). • Browsing: The graphical viewing interface could be very promising, if it is possible to integrate tiny (and enlargable) keyframes into the grid. • Use of additional features: • Explicit color filters and query by (sketched) example: define regions and color ranges. • Motion features. • Usage of the internet to get example images (google). Introduction Features & Query types Experimental Results Impact of Features Conclusion

Video indexing and retrieval at TREC 2002