1 / 26

Interactive Event Detection in Video and Audio

Explore interactive event detection in video & audio, including sound object & video analysis. Learn about challenges, approaches, applications, and framework for efficient event detection.

Download Presentation

Interactive Event Detection in Video and Audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactive Event Detection in Video and Audio Rahul SukthankarIntel Research Pittsburgh &Carnegie Mellon University

  2. Contributors • Diamond team: L. Huston, Satya, L. Mummert, C. Helfrich, L. Fix • Forensic video retrieval:J. Campbell, P. Pillai, Diamond team • Volumetric video analysis:Y. Ke, M. Hebert • Sound object detection in soundtracks:D. Hoiem, Y. Ke • Interactive search-assisted diagnosis for breast cancer:Y. Liu, R. Jin, B. Zheng, D. Jukic

  3. Why Interactive Event Detection? • Events of interest are often not known a priori • Data exploration: “find me more things like this” • User’s requirements change based on partial results • Surveillance: “Alert me if you see X… hmm… actually I want Y” • Challenges: • Limited training data • can we still learn good event detectors? • Efficiency • how best to organize/index/pre-process the data?

  4. Outline • Event detection in audio • sound object detection from a few examples • Diamond • efficient search of non-indexed data • Event detection in video • forensic video surveillance • volumetric analysis for action detection

  5. Example: Sound Object Detection • Applications of sound object detection • “Alert me if you hear a gunshot.” (monitoring) • “Fast forward to the next swordfight in LotR” (search and retrieval) • Approach: • Learn boosted classifier from ~5-10 examples of the object • Scan windowed classifier over all possible locations Clip 1 Clip Classifier … Classify each clip as object or non-object Return locations of detected sound object Audio stream Clip N [D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]

  6. 138 Features Decision nodes Leaf Nodes Sound Object Detection: Clip Classifier • Feature extraction • Weak classifier – small decision trees on features • Learn classifier cascade using Adaboost … [D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005]

  7. Best Performance Worst Performance Sound Object Detection: Results

  8. Framework for Interactive Event Detection • Interactive event detection =?= non-indexed search • Search and indexing: • If queries can be predicted in advance, indexing is possible(e.g., Google for text data) • Alternative is brute-force search through non-indexed data • How to perform efficient non-indexed search? • May need to execute arbitrary code (learned event detector)

  9. query results discard Brute-Force Search • Event detection: vast majority of the data is useless • BFS scales poorly with storage volume Search app Storage User

  10. query query’ results late discard early discard Diamond: Early Discard • Reject as close to storage as possible • Reduce volume of data transferred • Scales much better! Search app Storage User

  11. Searchlet API Host runtime Assoc DMA Search Application Linux Diamond Architecture Assoc DMA Searchlet App Code (proprietary or open) Filter API Storage Runtime Diamond API (open) Diamond code (open) Assoc DMA Searchlet Storage access protocol (open) Filter API Storage Runtime Assoc DMA Searchlet Diamond is a collaborative projectbetween Intel Research & CMU Filter API Storage Runtime

  12. Anatomy of a Diamond Searchlet • Sequence of partially-ordered “filters” • each filter can pass or drop an object • filters share state through attributes • Diamond determines an optimal filter order

  13. Timely reconstruction of a crime scene large quantities of video surveillance data current practice: gather & manually scan video tapes obvious optimization: transfer data to central site Better solution: send your detector to the data cam cam cam cam cam App Host Example Application: Forensic Video Surveillance [J. Campbell et al., VSSN 2004]

  14. Video Action Detection: Goal

  15. T X Y Idea: Treat Video as a Volume

  16. Related work: Recognition usingSVMs on Space-Time Interest Points Space-time interest points Figures courtesy: [Schuldt et al., ICPR 2004]

  17. Problem with Space-Time Interest Points:Too Sparse Two examples of smooth motions where no stable space-time interest points are detected.

  18. Volumetric Features on Optical Flow

  19. T X Y Our Features: 3D Extension of Viola-Jones Volumetric features Integral Volume (x, y, t) Volumetric features can be efficiently computed using integral volumes, with only 8 memory accesses per feature. The sum of the volume ise – a – f – g + b + c + h – d.

  20. T X Y Classifier cascade learned usingDirect Feature Selection, Wu et al., NIPS, 2002 Millions of potential features for selection, so Adaboost is too slow. An example of the features learned by the classifier to recognize the hand-wave action in a detection volume

  21. Detection • Use a sliding volume over video sequence • Model true event as a cluster of detections with Gaussian distribution.

  22. Generic Volumetric Features • Processing non-indexed video is slow – lots of data • Are there application-independent representations for video? • Goal: pre-process video once, support multiple video event apps. [Y. Ke, unpublished 2006]

  23. Related work:Space-Time Behavior Based Correlation Figures courtesy: [Shechtman & Irani, CVPR 2005]

  24. Interactive Search-Assisted Diagnosis ISAD Results Rank1: benignbiopsy CLOSE? suspiciousmass (query) Rank2: benignbiopsy Rank3: malignantbiopsy Collaborators:B. Zheng, D. Jukic, L. Yang, R. Jin

  25. Query-adaptive Local Distance Learning • Previously: • Various Lp norms: Euclidean distance is typically not the best • Global metric learning: • Learn metric that best satisfies user-given pairwise data constraints • Fares poorly with multimodal data • Local metric learning: • Learn metric that does above, but weighs nearby constraints higher • Chicken & egg problem • What’s new: • Learn a metric for the given query based on neighborhood

  26. Summary • Many real applications require interactive event detection • Good for ML algorithms that: • operate with limited training data • train quickly/incrementally • exploit unlabeled data • Diamond – infrastructure for efficient non-indexed search http://diamond.cs.cmu.edu/ • Interactive event detection in video is still painful • Good general-purpose representation for event detection?

More Related