Interactive Event Detection in Video and Audio

Interactive Event Detection in Video and Audio Rahul SukthankarIntel Research Pittsburgh &Carnegie Mellon University

Contributors • Diamond team: L. Huston, Satya, L. Mummert, C. Helfrich, L. Fix • Forensic video retrieval:J. Campbell, P. Pillai, Diamond team • Volumetric video analysis:Y. Ke, M. Hebert • Sound object detection in soundtracks:D. Hoiem, Y. Ke • Interactive search-assisted diagnosis for breast cancer:Y. Liu, R. Jin, B. Zheng, D. Jukic

Why Interactive Event Detection? • Events of interest are often not known a priori • Data exploration: “find me more things like this” • User’s requirements change based on partial results • Surveillance: “Alert me if you see X… hmm… actually I want Y” • Challenges: • Limited training data • can we still learn good event detectors? • Efficiency • how best to organize/index/pre-process the data?

Outline • Event detection in audio • sound object detection from a few examples • Diamond • efficient search of non-indexed data • Event detection in video • forensic video surveillance • volumetric analysis for action detection

Example: Sound Object Detection • Applications of sound object detection • “Alert me if you hear a gunshot.” (monitoring) • “Fast forward to the next swordfight in LotR” (search and retrieval) • Approach: • Learn boosted classifier from ~5-10 examples of the object • Scan windowed classifier over all possible locations Clip 1 Clip Classifier … Classify each clip as object or non-object Return locations of detected sound object Audio stream Clip N [D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]

138 Features Decision nodes Leaf Nodes Sound Object Detection: Clip Classifier • Feature extraction • Weak classifier – small decision trees on features • Learn classifier cascade using Adaboost … [D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]

Best Performance Worst Performance Sound Object Detection: Results

Framework for Interactive Event Detection • Interactive event detection =?= non-indexed search • Search and indexing: • If queries can be predicted in advance, indexing is possible(e.g., Google for text data) • Alternative is brute-force search through non-indexed data • How to perform efficient non-indexed search? • May need to execute arbitrary code (learned event detector)

query results discard Brute-Force Search • Event detection: vast majority of the data is useless • BFS scales poorly with storage volume Search app Storage User

query query’ results late discard early discard Diamond: Early Discard • Reject as close to storage as possible • Reduce volume of data transferred • Scales much better! Search app Storage User

Searchlet API Host runtime Assoc DMA Search Application Linux Diamond Architecture Assoc DMA Searchlet App Code (proprietary or open) Filter API Storage Runtime Diamond API (open) Diamond code (open) Assoc DMA Searchlet Storage access protocol (open) Filter API Storage Runtime Assoc DMA Searchlet Diamond is a collaborative projectbetween Intel Research & CMU Filter API Storage Runtime

Anatomy of a Diamond Searchlet • Sequence of partially-ordered “filters” • each filter can pass or drop an object • filters share state through attributes • Diamond determines an optimal filter order

Timely reconstruction of a crime scene large quantities of video surveillance data current practice: gather & manually scan video tapes obvious optimization: transfer data to central site Better solution: send your detector to the data cam cam cam cam cam App Host Example Application: Forensic Video Surveillance [J. Campbell et al., VSSN 2004]

Video Action Detection: Goal

T X Y Idea: Treat Video as a Volume

Related work: Recognition usingSVMs on Space-Time Interest Points Space-time interest points Figures courtesy: [Schuldt et al., ICPR 2004]

Problem with Space-Time Interest Points:Too Sparse Two examples of smooth motions where no stable space-time interest points are detected.

Volumetric Features on Optical Flow

T X Y Our Features: 3D Extension of Viola-Jones Volumetric features Integral Volume (x, y, t) Volumetric features can be efficiently computed using integral volumes, with only 8 memory accesses per feature. The sum of the volume ise – a – f – g + b + c + h – d.

T X Y Classifier cascade learned usingDirect Feature Selection, Wu et al., NIPS, 2002 Millions of potential features for selection, so Adaboost is too slow. An example of the features learned by the classifier to recognize the hand-wave action in a detection volume

Detection • Use a sliding volume over video sequence • Model true event as a cluster of detections with Gaussian distribution.

Generic Volumetric Features • Processing non-indexed video is slow – lots of data • Are there application-independent representations for video? • Goal: pre-process video once, support multiple video event apps. [Y. Ke, unpublished 2006]

Related work:Space-Time Behavior Based Correlation Figures courtesy: [Shechtman & Irani, CVPR 2005]

Interactive Search-Assisted Diagnosis ISAD Results Rank1: benignbiopsy CLOSE? suspiciousmass (query) Rank2: benignbiopsy Rank3: malignantbiopsy Collaborators:B. Zheng, D. Jukic, L. Yang, R. Jin

Query-adaptive Local Distance Learning • Previously: • Various Lp norms: Euclidean distance is typically not the best • Global metric learning: • Learn metric that best satisfies user-given pairwise data constraints • Fares poorly with multimodal data • Local metric learning: • Learn metric that does above, but weighs nearby constraints higher • Chicken & egg problem • What’s new: • Learn a metric for the given query based on neighborhood

Summary • Many real applications require interactive event detection • Good for ML algorithms that: • operate with limited training data • train quickly/incrementally • exploit unlabeled data • Diamond – infrastructure for efficient non-indexed search http://diamond.cs.cmu.edu/ • Interactive event detection in video is still painful • Good general-purpose representation for event detection?

Interactive Event Detection in Video and Audio

Interactive Event Detection in Video and Audio

Presentation Transcript

Audio and Video

Digital Interactive Audio Video Recorder

Interactive Event

Streaming Audio and Video

Interactive Activities in Music Using Macromedia Flash Streaming Audio and Video in Blackboard

Interactive Event Detection in Video and Audio

Audio and Video Streaming

Basic Audio and Video

Audio in Video Games

HTML5 Audio and Video

Audio and Video

Audio/Video

Audio and Video

Audio and Video

Salient event detection in video surveillance s cenarios

Audio and Video

Audio and Video Communication

Audio and Video

Interactive Event

Audio and Video

Audio and Video Watermarking

Interactive Interiors Elevating Audio and Video Experiences in Charlotte, NC