Timbre and Modulation Features for Music Genre/Mood Classification

Timbre and Modulation Features forMusic Genre/Mood Classification J.-S. Roger Jang & Jia-Min Ren Multimedia Information Retrieval LabDept. of CSIE, National Taiwan University

Outline Audio features and modulation spectral analysis MIREX 2011 method and its improvement Experimental setup and results Conclusions and future work

Introduction – music genres/moods Descriptions of music contents *pictures from www.playonradio.com, brainpickings.org & mpac.ee.ntu.edu.tw

Motivation • Rapid growth of digital music • Apple iTunes: 28 million songs; 7digital: 20 million tracks • Organization of large collections of audio music • Important but challenging • Manual labeling by tags: labor intensive/time consuming • Thus, machine learning for classification is called for! Music clipsfor training Classifier Training Feature Extraction Classifiers KNN, GMM, SVM Short-term: MFCC, OSC Long-term: beat, tempo, pitch Feature Extraction Feature Extraction Evaluation Evaluation Music clipfor test Result Result

System overview

Performance evaluation • Dataset-dependent criteria for evaluation • GTZAN • 10-fold cross-validation • ISMIR2004Genre • Holdout test, same as the one used in ISMIR 2004 Genre Classification Contest, with 729 clips for training and 729 clips for test

Audio features – short-term timbre features • Statistical spectrum descriptors (SSD) • Spectral centroid (SC) • Spectral flux (SF) • Spectral rolloff (SR), • Spectral skewness (SS) • Spectral kurtosis (SK). • MFCC • To model the subjective frequency contents of audio signals • 21-dim (including energy)

For each subband, compute peak/valley by averaging values in the larger/smaller percentage of spectra ( ) Audio features – short-term timbre features • Spectral contrast & valley (SCV) • Measure spectral contrast/valley in octave-based subbands 8 frequency subbands:1: [0,100) 2: [100,200) 3: [200,400) 4: [400,800) 5: [800,1600) 6: [1600,3200) 7: [3200,6400) 8: [6400,11025] Peak: harmonic Valley: non-harmonic/noise FFT audio frame contrast=peak-valley:relative distribution

Audio features – short-term timbre features • Spectral flatness measure (SFM) • Measures the noisiness of spectra within a subband • ≈1: similar amount of power is distributed in all spectral bands • ≈0: spectral power is concatenated in a relative small # bands • Spectral crest measure (SCM) the i-th magnitude spectrum in the a-th subband # of spectra in the a-th subband

Audio features – short-term timbre features • For each feature dimension, we compute its mean and standard deviation. • Total dimensions for short-term timbre features 2*(5+21+16+16)=116 Mean & std SSD MFCC SCV SFM/SCM Octave-based subbands Frame-based features

FFT Modulation spectral analysis • MFCC, SC, SFM/SCM • Capture only short-time spectral properties of audio signals • Modulation spectral analysis • Captures long-term spectral dynamics within audio signals • Computes spectrogram, then creates modulation spectrogram (by applying FFT again along time axis of spectrogram) • Low/high modulation frequency  slow/fast spectral change

Modulation spectral analysis of timbre features • Flowchart 7 modulation freq. subbands:[0,0.33), [0.33,0.66), [0.66,1.32), [1.32,2.64), [2.64,5.28), [5.28,10.56), [10.56, 21.03) (MSC: modulation Spectral contrast) MSP/MSV:the strength of rhythm in music The same process is applied to MFCC, SFM/SCM. MSV MSC

Modulation spectral analysis of timbre features • Reference • C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Trans. Multimedia, vol. 11, no. 4, pp.670-682, June 2009.

FFT Proposed joint acoustic frequency and modulation frequency features • Motivation • Averaging and mean/std computation smooth out MD info. • Computation of joint frequency features (proposed) • Compute modulation spectrogram from an entire music clip • Compute SCV (spectral contrast/valley), SFM/SCM (spectral flatness/crest measure) within each joint acoustic-modulation (AM) frequency subband AMSCV, AMSFM/AMSCM Compute AMSCVAMSFM AMSCM

Audio features used in our study • All possible audio features • Extract SSD, MFCC, SCV, and SFM/SCM from audio frames mean/std computation  MuStd • MuStd  dim=2*(5+21+16+16)=116 • Perform modulation spectral analysis on MFCC, OSC, SFM/SCM • MMFCC  dim=2*(21*2+7*2)=112 • MSCV  dim=2*(16*2+7*2)=92 • MSFM/MSCM  dim=2*(16*2+7*2)=92 • Compute SCV, SFM/SCM within acoustic-modulation (AM) frequency subbands  AMSCV, AMSFM/AMSCM • AMSCV  8*7*2=112 • AMSFM/AMSCM  dim = 8*7*2=112

Audio feature sets and classifier • Audio feature sets • MIREX 2011 method • MuStd+MMFCC+MSCV+MSFM/MSCM • dim=116+112+92+92=412 • Improved method • MuStd+MMFCC+AMSCV+AMSFM/AMSCM • dim=116+112+112+112=452 • Classifier construction with • RBF kernel SVMs • Three-fold inside cross-validation to tune hyper-parameters

Experimental setup and results of MIREX 2011 genre/mood classification tasks • Datasets • Genre classification: 10 genres, 700 30-sec clips in each one • Mood classification: 5 categories, 120 30-sec clips in each one • Evaluation metric • Three-fold cross-validation; classification accuracy • Results (JR1 is ours)

Experimental results of MIREX 2008-2012 genre/mood classification tasks

Extended experiments • Four datasets • Performance evaluation • Randomlystratified 10-fold cross-validation (repeating 10 times) • Repeat the above process 10 times to obtain the average result

Extended experiments • Averaged classification accuracy (%) of combining different feature sets on four datasets

Extended experiments • Comparison of our methods with other recent work

Conclusions • Timbre & modulation features • Won 1st place (MIREX 2011 mood classification) • Timbre & improved modulation • Improves 2.47%/2.08% on GTZAN/Unique • Achieves 2.50%/0.14% higher than MIREX 2011 method on Soundtracks/MIR-Mood

Thank you for listening. Questions & comment welcome!

Timbre and Modulation Features for Music Genre/Mood Classification

Timbre and Modulation Features for Music Genre/Mood Classification

Presentation Transcript

Introduction to Music

Overview

Genres for Young Adult Literature

Electro-absorption Modulators

Author : Julius Lester Genre : Expository Nonfiction

Electro-absorption Modulators

MOOD Indicative, Imperative, Subjunctive

Genre: Realistic fiction Audience: all grades

THE ELEMENTS OF MUSIC

Feature Selection

Music Theory

Chapter 3 Analog Signal Transmission and Reception

DIGITAL CARRIER MODULATION SCHEMES

Chapter 5 Modulation Techniques for Mobile Radio – Part I

ANALOG MODULATION

Chapter 4 : Angle Modulation Transmission and Reception

Chapter 6. Classification and Prediction

The Evolution of Music

Sinusoidal Pulse-Width Modulation

Modulation Formats

Industrial Electronic Features