1 / 24

Timbre and Modulation Features for Music Genre/Mood Classification

Timbre and Modulation Features for Music Genre/Mood Classification. J.-S. Roger Jang & Jia -Min Ren Multimedia Information Retrieval Lab Dept. of CSIE, National Taiwan University. Outline. Audio features and modulation spectral analysis MIREX 2011 method and its improvement

kayla
Download Presentation

Timbre and Modulation Features for Music Genre/Mood Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Timbre and Modulation Features forMusic Genre/Mood Classification J.-S. Roger Jang & Jia-Min Ren Multimedia Information Retrieval LabDept. of CSIE, National Taiwan University

  2. Outline Audio features and modulation spectral analysis MIREX 2011 method and its improvement Experimental setup and results Conclusions and future work

  3. Introduction – music genres/moods Descriptions of music contents *pictures from www.playonradio.com, brainpickings.org & mpac.ee.ntu.edu.tw

  4. Motivation • Rapid growth of digital music • Apple iTunes: 28 million songs; 7digital: 20 million tracks • Organization of large collections of audio music • Important but challenging • Manual labeling by tags: labor intensive/time consuming • Thus, machine learning for classification is called for! Music clipsfor training Classifier Training Feature Extraction Classifiers KNN, GMM, SVM Short-term: MFCC, OSC Long-term: beat, tempo, pitch Feature Extraction Feature Extraction Evaluation Evaluation Music clipfor test Result Result

  5. System overview

  6. Performance evaluation • Dataset-dependent criteria for evaluation • GTZAN • 10-fold cross-validation • ISMIR2004Genre • Holdout test, same as the one used in ISMIR 2004 Genre Classification Contest, with 729 clips for training and 729 clips for test

  7. Audio features – short-term timbre features • Statistical spectrum descriptors (SSD) • Spectral centroid (SC) • Spectral flux (SF) • Spectral rolloff (SR), • Spectral skewness (SS) • Spectral kurtosis (SK). • MFCC • To model the subjective frequency contents of audio signals • 21-dim (including energy)

  8. For each subband, compute peak/valley by averaging values in the larger/smaller percentage of spectra ( ) Audio features – short-term timbre features • Spectral contrast & valley (SCV) • Measure spectral contrast/valley in octave-based subbands 8 frequency subbands:1: [0,100) 2: [100,200) 3: [200,400) 4: [400,800) 5: [800,1600) 6: [1600,3200) 7: [3200,6400) 8: [6400,11025] Peak: harmonic Valley: non-harmonic/noise FFT audio frame contrast=peak-valley:relative distribution

  9. Audio features – short-term timbre features • Spectral flatness measure (SFM) • Measures the noisiness of spectra within a subband • ≈1: similar amount of power is distributed in all spectral bands • ≈0: spectral power is concatenated in a relative small # bands • Spectral crest measure (SCM) the i-th magnitude spectrum in the a-th subband # of spectra in the a-th subband

  10. Audio features – short-term timbre features • For each feature dimension, we compute its mean and standard deviation. • Total dimensions for short-term timbre features 2*(5+21+16+16)=116 Mean & std SSD MFCC SCV SFM/SCM Octave-based subbands Frame-based features

  11. FFT Modulation spectral analysis • MFCC, SC, SFM/SCM • Capture only short-time spectral properties of audio signals • Modulation spectral analysis • Captures long-term spectral dynamics within audio signals • Computes spectrogram, then creates modulation spectrogram (by applying FFT again along time axis of spectrogram) • Low/high modulation frequency  slow/fast spectral change

  12. Modulation spectral analysis of timbre features • Flowchart 7 modulation freq. subbands:[0,0.33), [0.33,0.66), [0.66,1.32), [1.32,2.64), [2.64,5.28), [5.28,10.56), [10.56, 21.03) (MSC: modulation Spectral contrast) MSP/MSV:the strength of rhythm in music The same process is applied to MFCC, SFM/SCM. MSV MSC

  13. Modulation spectral analysis of timbre features • Reference • C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp.670-682, June 2009.

  14. FFT Proposed joint acoustic frequency and modulation frequency features • Motivation • Averaging and mean/std computation smooth out MD info. • Computation of joint frequency features (proposed) • Compute modulation spectrogram from an entire music clip • Compute SCV (spectral contrast/valley), SFM/SCM (spectral flatness/crest measure) within each joint acoustic-modulation (AM) frequency subband AMSCV, AMSFM/AMSCM Compute AMSCVAMSFM AMSCM

  15. Audio features used in our study • All possible audio features • Extract SSD, MFCC, SCV, and SFM/SCM from audio frames mean/std computation  MuStd • MuStd  dim=2*(5+21+16+16)=116 • Perform modulation spectral analysis on MFCC, OSC, SFM/SCM • MMFCC  dim=2*(21*2+7*2)=112 • MSCV  dim=2*(16*2+7*2)=92 • MSFM/MSCM  dim=2*(16*2+7*2)=92 • Compute SCV, SFM/SCM within acoustic-modulation (AM) frequency subbands  AMSCV, AMSFM/AMSCM • AMSCV  8*7*2=112 • AMSFM/AMSCM  dim = 8*7*2=112

  16. Audio feature sets and classifier • Audio feature sets • MIREX 2011 method • MuStd+MMFCC+MSCV+MSFM/MSCM • dim=116+112+92+92=412 • Improved method • MuStd+MMFCC+AMSCV+AMSFM/AMSCM • dim=116+112+112+112=452 • Classifier construction with • RBF kernel SVMs • Three-fold inside cross-validation to tune hyper-parameters

  17. Experimental setup and results of MIREX 2011 genre/mood classification tasks • Datasets • Genre classification: 10 genres, 700 30-sec clips in each one • Mood classification: 5 categories, 120 30-sec clips in each one • Evaluation metric • Three-fold cross-validation; classification accuracy • Results (JR1 is ours)

  18. Experimental results of MIREX 2008-2012 genre/mood classification tasks

  19. Extended experiments • Four datasets • Performance evaluation • Randomlystratified 10-fold cross-validation (repeating 10 times) • Repeat the above process 10 times to obtain the average result

  20. Extended experiments • Averaged classification accuracy (%) of combining different feature sets on four datasets

  21. Extended experiments • Comparison of our methods with other recent work

  22. Conclusions • Timbre & modulation features • Won 1st place (MIREX 2011 mood classification) • Timbre & improved modulation • Improves 2.47%/2.08% on GTZAN/Unique • Achieves 2.50%/0.14% higher than MIREX 2011 method on Soundtracks/MIR-Mood

  23. Thank you for listening. Questions & comment welcome!

More Related