1 / 20

Classifying Motion Picture Audio

Classifying Motion Picture Audio. Eirik Gustavsen 07.06.07. Outline. Motivation Thesis State of the Art Proposed system Experimental setup Results Future work Conclusion. Motivation. Most projects classify clear classes or classes with noise.

cwen
Download Presentation

Classifying Motion Picture Audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classifying Motion Picture Audio Eirik Gustavsen 07.06.07

  2. Outline • Motivation • Thesis • State of the Art • Proposed system • Experimental setup • Results • Future work • Conclusion

  3. Motivation • Most projects classify clear classes or classes with noise. • Few clear boundaries in motion picture audio • Subjective descriptions of movies • Dificult to compare movie content

  4. Thesis It is possible to automatically create a table of contents of a motion picture, based on its audio track only.

  5. Research questions • Find best LLDs to classify motion picture audio • Detect boundaries between audio classes within complex audio segments • Automatically create a TOC based on the audio track only

  6. Pre-Processing 44100 Hz sample rate Mono 16 bits 30 ms windows (LW)

  7. Low Level Descriptors Time domain Frequency domain

  8. Low Level Descriptors • Total of 23 low level descriptors • TIME DOMAIN • Audio Power • Audio Wave Form • Root-Mean Square • Short Time Energy • Low Short Time Energy Ratio • Zero-Crossing Rate • High Zero-Crossing Rate Ratio • FREQUENCY DOMAIN • Audio Spectrum Centroid • Fundamental Frequency • 10 Mel-Frequency Cepstral Coefficients • Spectrum Flux

  9. Dimensionally reduction Principal components analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis. f(1) f(2) f(3) f(4) f(5) ... f(23) PCA d(1) d(2) d(3)

  10. K Nearest Neighbors

  11. Proposed system Pre- Prosessing LLD Norm TOC Generation Post- Prosessing KNN PCA

  12. Classifying Audio Music Speech Mixed audio classes Noise (white) ”Silence”

  13. Class Boundary Detection

  14. Class Boundary Detection

  15. Class Boundary Detection

  16. Finding most suitable LLDs Most Suitable: ASC AWF RMS HZCRR

  17. Sample Results Music with low volume Clear speech Speech with Background music Speech with background environmental sounds Jingle ” Some mistakes” Fading between music and speech

  18. To be done in this thesis Post processing TOC Open research questions for future works New motion picture audio classes Detecting sound objects Speech recognition Future Work

  19. Pre-processing makes it possible to classify motion picture audio correctly Using right combination of LLDs enhances the result of the classification Conclusion

  20. Questions ?

More Related