1 / 16

Sound Detection

Sound Detection. Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004. Objective. Learn model of sound object from few (10-20) examples and distinguish from all other sounds Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog bark, etc. Applications.

vangie
Download Presentation

Sound Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004

  2. Objective • Learn model of sound object from few (10-20) examples and distinguish from all other sounds • Examples of sound classes: • Gunshots, screams, laughter, car horns, meow, dog bark, etc

  3. Applications • “Tell me if you hear a gunshot.” (monitoring) • “Get me video clips containing dogs barking.” (search and retrieval) • “What’s going on?” (scene understanding)

  4. Why its difficult • Sound classes have large variations • Sounds are often ambiguous without context • Overlaid “noise” obscures sound

  5. Sound or not? Which of these sounds are not from their named classes? Car horn Dog bark Laser gun

  6. Previous work • Sound Classification (Wold 1996, Casey 2001, etc) • Categorize short sound clips • Reasonable accuracy (5-20% error) • Sound Detection (Defaux 2000, Piamsa-nga 1999) • Localize and recognize sound objects in long clips • Poor performance or assumption of unrealistic conditions (e.g., very quiet background)

  7. Clip 1 Clip 2 … Clip N Detection via Windowed Search Long Track Clip Classifier Return locations of detected sound object Break audio track into short overlapping short clips Independently classify short clips as object or non-object

  8. Features Features Features Features Time-frequency analysis: windowed Fourier transform Extract power percentage in each band over time and total power over time Compute features used for classification Representation meows phone rings Raw Representation

  9. Classification Features • Diverse feature set: • Different sound classes are distinctive in different ways • means and standard deviations of power at different frequencies • Band-width, peaks, loudness, etc. • 138 features in all

  10. Classification by Decision Trees • Try to find simple rules that discriminate object from non-object • Each decision is based on a threshold of a feature value • Assign confidence based on likelihood of data for object and non-object classes at each leaf node Decision nodes Leaf Nodes

  11. Boosted Trees • Problem: One decision tree by itself may not be a great classifier • Solution: Use several trees, with each one focusing on the mistakes of previously learned trees • Adaboost: • Weight training data uniformly • Learn a decision tree classifier on weighted data • Re-weight data giving more weight to incorrectly classified examples • Final classification based on linear combination of confidences from all learned decision trees

  12. Examples of Decision Trees Meow Gunshot Low percentage of power in low frequencies in mid-time of sound High power amplitude range Very high power amplitude range Gunshot More complex tree that focuses on examples misclassified by tree above

  13. Cascade of Classifiers • Goal: eliminate false positives with few false negatives in early stages • Advantages: • Allows use of large set of negative training examples • Improves classification speed • Dangers: cannot recover from false negatives Pass (5%) Pass (2%) Pass (0.005%) Sound Clip Stage 1 Stage 2 Stage 3 Pass Fail Fail Fail Fail

  14. Best Performance Worst Performance Results: Classification Error

  15. Results: ROC curves Note: to approximate negative error rate divide FP by 25,000

  16. Results: Anecdotal Gunshots Female Laugh Male Laugh Swords Scream

More Related