280 likes | 305 Views
Dive into the world of genre classification in music through this survey, covering feature extraction techniques, paradigms, results, and future directions. Explore the complexities, critical issues, and various methodologies involved in this nontrivial task.
E N D
Automatic Genre Classification of Music Content[A survey] Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek, IEEE SIGNAL PROCESSING MAGAZINE MARCH 2006 By Yi-Tang Wang
Outline • Introduction • Feature extraction techniques • Genre classification paradigms • Classification results • Future directions & Conclusion
Introduction • EMD (electronic music distribution) • Restoration of analog archives • New content • music catalogues become huge • What do you want to listen ? • 1 million tracks online • Efficient ways to browse & organize
Introduction (cont.) • Music Genres • Categories to characterize similarities • Boundaries are fuzzy • Automatic Classification • Finding a taxonomy • Hierarchical set of categories • Nontrivial task
Critical issues • Artists, Albums, or Titles • One song to one genre(?) • Albums - heterogeneous material • Artists - several albums • Same Titles? • Nonagreement on Taxonomies • Allmusic, Amazon, Mp3 [2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. Content- Based Multimedia Information Access (RIAO), Paris, France, 2000
Critical issues (cont.) • ILL-Defined Genre Labels • Varied criteria (geographically, timely, etc) • Dependant on cultural • Scalability of genre taxonomies • New genres appear frequently • Merging or splitting • Automatic system
Feature extraction techniques • High-level model • Event-like format (MIDI) • Symbolic format (MusicXML) • Rarely availiable • Low-level • Audio samples • Low level and low density of info • Do feature extraction • Timbre, Melody, Harmony, Rhythm
Timbre • Same pitch and loudness but sound different • Features to characterize timbre • Temporal features • Energy features • Spectral shape features • Perceptual features • Some have been normalized in MPEG-7
Timbre (cont.) • Transformations • new feature or increase dimensionality • Suggested transforming into logarithmic decibel scale • Texture window • Larger window • Reduce computation • Increase classification accuracy • 1s • Variant size and positions
Timbre (cont.) • Texture model • model of features over texture window: • 1) simple modeling with low-order statistics • 2) modeling with autoregressive model • 3) modeling with distribution estimation algorithms (for example, EM estimation of a GMM of frames)
Melody & Harmony • Melody • succession of pitched events • Horizontal element • Harmony • pitch simultaneity, chords • Vertical element
Melody & Harmony (cont.) • Pitch function • Characterizing pitch distribution • Amplitude, position of main peak, … • Unfolded • Contains pitch content and info of its range • Folded • Mapped to a single octave • Harmonic content
Rhythm • No precise definition • Generically, all of the temporal aspects • Periodicity function • Low level approach as pitch function • 1) tempo: periodicities typically in the range 0.3–1,5s (i.e., 200–40 bpm) • 2) musical pattern: periodicities between 2 and 6 s (corresponding to the length of one or more measure bar) • Gouyon et al. get MFCCs-like descriptors
Extracting from segments • Small segment may contain sufficient information • Reduced required computation • Typically 30s segment • and 30s after beginning • Artist classification • Voice is easier to identify than music only
Local conclusion • High level descriptors from polyphonic audio signal is not yet state of the art • Focus on timbre modeling • Timbre may contain sufficient info • 250ms : 53% , 3s : 72% • Among 10 genres
Local conclusion (cont.) • Another point of view (pessimistic) • Timbre similarity measure & 20,000 titles distributed over 18 genres • Little correlation • May not scalable • Take cultrual features into account
Genre classification • Expert systems • Unsupervised approach • clustering • Supervised approach • Machine learning algorithms
Expert systems • A knowledge based system made up of a set of rules • No model based on it so far • Expensive to implement and maintain • May yield unexpected interactions
Expert systems (cont.) • Pachet and Cazaly’s work • State differences with language based, e.g. instrumentation
Unsupervised approach • Clustering with similarity measures • Similarity measures • If time invariant • Euclidean distance or cosine distance • Otherwise • Build statistical model (Gaussian or GMMs) • Kullback-Leibler divergence, relative entropy • Sampling, Earth’s mover distance, asymptotic likelihood approximation • Shao et al. use HMMs
Unsupervised approach • Clustering algorithms • K-means • Shao et al.’s work • agglomerative hierarchical clustering • SOM (self-organizing map) • Artificial neural network • High dim onto lower dim • GHSOM (growing hierarchical SOM) • Rauber et al.
Supervised approach • A taxonomy of genres is given • VS. Expert System • No rules (or description to genre) • Supervised machine learning algo • KNN (K-Nearest Neighbor) • GMMs (Gaussian Mixture Models) • HMM (Hidden Markov Models) • LDA (Linear Discriminant Analysis) • SVMs (Support Vector Machines) • ANNs (Artificial Neural Networks)
Classification results • MIREX genre classification contest • 1,005 / 510 songs over ten genres • 940 / 447 songs over six genres
Future directions • Classification into perceptual categories • Moods, emotions • Novelty Detection • New or unknown data (not belong to any class) • Classification with multiple labels • Probably closer to human experience • From taxonomies to folksonomies • Does the taxonomy fit to users
Conclusion • Definitions of music genres are convoluted • Features → classification → result • Research is evolving from purely objective machine calculations to techniques • Machine learning plays a fundamental role in classification domains