240 likes | 497 Views
Timbre and Modulation Features for Music Genre/Mood Classification. J.-S. Roger Jang & Jia -Min Ren Multimedia Information Retrieval Lab Dept. of CSIE, National Taiwan University. Outline. Audio features and modulation spectral analysis MIREX 2011 method and its improvement
E N D
Timbre and Modulation Features forMusic Genre/Mood Classification J.-S. Roger Jang & Jia-Min Ren Multimedia Information Retrieval LabDept. of CSIE, National Taiwan University
Outline Audio features and modulation spectral analysis MIREX 2011 method and its improvement Experimental setup and results Conclusions and future work
Introduction – music genres/moods Descriptions of music contents *pictures from www.playonradio.com, brainpickings.org & mpac.ee.ntu.edu.tw
Motivation • Rapid growth of digital music • Apple iTunes: 28 million songs; 7digital: 20 million tracks • Organization of large collections of audio music • Important but challenging • Manual labeling by tags: labor intensive/time consuming • Thus, machine learning for classification is called for! Music clipsfor training Classifier Training Feature Extraction Classifiers KNN, GMM, SVM Short-term: MFCC, OSC Long-term: beat, tempo, pitch Feature Extraction Feature Extraction Evaluation Evaluation Music clipfor test Result Result
Performance evaluation • Dataset-dependent criteria for evaluation • GTZAN • 10-fold cross-validation • ISMIR2004Genre • Holdout test, same as the one used in ISMIR 2004 Genre Classification Contest, with 729 clips for training and 729 clips for test
Audio features – short-term timbre features • Statistical spectrum descriptors (SSD) • Spectral centroid (SC) • Spectral flux (SF) • Spectral rolloff (SR), • Spectral skewness (SS) • Spectral kurtosis (SK). • MFCC • To model the subjective frequency contents of audio signals • 21-dim (including energy)
For each subband, compute peak/valley by averaging values in the larger/smaller percentage of spectra ( ) Audio features – short-term timbre features • Spectral contrast & valley (SCV) • Measure spectral contrast/valley in octave-based subbands 8 frequency subbands:1: [0,100) 2: [100,200) 3: [200,400) 4: [400,800) 5: [800,1600) 6: [1600,3200) 7: [3200,6400) 8: [6400,11025] Peak: harmonic Valley: non-harmonic/noise FFT audio frame contrast=peak-valley:relative distribution
Audio features – short-term timbre features • Spectral flatness measure (SFM) • Measures the noisiness of spectra within a subband • ≈1: similar amount of power is distributed in all spectral bands • ≈0: spectral power is concatenated in a relative small # bands • Spectral crest measure (SCM) the i-th magnitude spectrum in the a-th subband # of spectra in the a-th subband
Audio features – short-term timbre features • For each feature dimension, we compute its mean and standard deviation. • Total dimensions for short-term timbre features 2*(5+21+16+16)=116 Mean & std SSD MFCC SCV SFM/SCM Octave-based subbands Frame-based features
FFT Modulation spectral analysis • MFCC, SC, SFM/SCM • Capture only short-time spectral properties of audio signals • Modulation spectral analysis • Captures long-term spectral dynamics within audio signals • Computes spectrogram, then creates modulation spectrogram (by applying FFT again along time axis of spectrogram) • Low/high modulation frequency slow/fast spectral change
Modulation spectral analysis of timbre features • Flowchart 7 modulation freq. subbands:[0,0.33), [0.33,0.66), [0.66,1.32), [1.32,2.64), [2.64,5.28), [5.28,10.56), [10.56, 21.03) (MSC: modulation Spectral contrast) MSP/MSV:the strength of rhythm in music The same process is applied to MFCC, SFM/SCM. MSV MSC
Modulation spectral analysis of timbre features • Reference • C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp.670-682, June 2009.
FFT Proposed joint acoustic frequency and modulation frequency features • Motivation • Averaging and mean/std computation smooth out MD info. • Computation of joint frequency features (proposed) • Compute modulation spectrogram from an entire music clip • Compute SCV (spectral contrast/valley), SFM/SCM (spectral flatness/crest measure) within each joint acoustic-modulation (AM) frequency subband AMSCV, AMSFM/AMSCM Compute AMSCVAMSFM AMSCM
Audio features used in our study • All possible audio features • Extract SSD, MFCC, SCV, and SFM/SCM from audio frames mean/std computation MuStd • MuStd dim=2*(5+21+16+16)=116 • Perform modulation spectral analysis on MFCC, OSC, SFM/SCM • MMFCC dim=2*(21*2+7*2)=112 • MSCV dim=2*(16*2+7*2)=92 • MSFM/MSCM dim=2*(16*2+7*2)=92 • Compute SCV, SFM/SCM within acoustic-modulation (AM) frequency subbands AMSCV, AMSFM/AMSCM • AMSCV 8*7*2=112 • AMSFM/AMSCM dim = 8*7*2=112
Audio feature sets and classifier • Audio feature sets • MIREX 2011 method • MuStd+MMFCC+MSCV+MSFM/MSCM • dim=116+112+92+92=412 • Improved method • MuStd+MMFCC+AMSCV+AMSFM/AMSCM • dim=116+112+112+112=452 • Classifier construction with • RBF kernel SVMs • Three-fold inside cross-validation to tune hyper-parameters
Experimental setup and results of MIREX 2011 genre/mood classification tasks • Datasets • Genre classification: 10 genres, 700 30-sec clips in each one • Mood classification: 5 categories, 120 30-sec clips in each one • Evaluation metric • Three-fold cross-validation; classification accuracy • Results (JR1 is ours)
Experimental results of MIREX 2008-2012 genre/mood classification tasks
Extended experiments • Four datasets • Performance evaluation • Randomlystratified 10-fold cross-validation (repeating 10 times) • Repeat the above process 10 times to obtain the average result
Extended experiments • Averaged classification accuracy (%) of combining different feature sets on four datasets
Extended experiments • Comparison of our methods with other recent work
Conclusions • Timbre & modulation features • Won 1st place (MIREX 2011 mood classification) • Timbre & improved modulation • Improves 2.47%/2.08% on GTZAN/Unique • Achieves 2.50%/0.14% higher than MIREX 2011 method on Soundtracks/MIR-Mood
Thank you for listening. Questions & comment welcome!