1 / 19

A MISSING FEATURE APPROACH TO INSTRUMENT IDENTIFICATION IN POLYPHONIC MUSIC

Explore the automatic music transcription technique for distinguishing instruments in polyphonic music using key features for instrumental timbre recognition. Compare human vs. computer identification accuracy with different approaches.

clevelandb
Download Presentation

A MISSING FEATURE APPROACH TO INSTRUMENT IDENTIFICATION IN POLYPHONIC MUSIC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A MISSING FEATURE APPROACH TO INSTRUMENT IDENTIFICATION IN POLYPHONIC MUSIC Jana Eggink and Guy J. Brown University of Sheffield

  2. Automatic Music Transcription • input: audio recording• output: score or other symbolic representation • needed (for every note): • pitch• start and duration • instrument• extras: key (C major), meter (4/4), bars, loudness, expression... • useful for: • musicologists• musicians• music information retrieval

  3. Instrument Identification possible clues: • method of excitation (hitting, blowing, plucked or bowed strings) causes:• noise during onset• delayed begin of individual partials during onset• spectral fluctuations during steady state • resonance properties of the instrument body mostly effect the steady state:• energy distribution among high and low partials• formant regions• spectral bandwidth

  4. Example Spectrograms oboe cello

  5. Human Instrument Identification • different clues from onset and steady state are used, individual clues like e.g. static spectrum can be enough to identify some, but not all instruments • onset seems most relevant for instrument family discrimination • better performance on musical phrases than on single tones • experts are better than non-experts

  6. Computer Instrument Identification JC Brown et al. (2001): • GMM classifier • frame based cepstral coefficients • 4 woodwinds (flute, clarinet, oboe, saxophone) • realistic, monophonic phrases • computer:60% correct average80% best parameter choice • humans: 85% KD Martin (1999): • hierarchical classification scheme • different features, both temporal and spectral • 27 different instruments • realistic, monophonic phrases and single notes • computer:48% instrument correct75% instrument family • humans: 57% instrument correct95% instrument family

  7. Polyphonic Kashino & Murase (1999) • time domain approach • example waveforms stored for each note of each instrument • best match found using adaptive filtering techniques • iterative subtraction scheme • 3 instruments: flute, violin, piano • specially made recording • F0s and onset times supplied • 68% correct (max. polyphony 3) Kinoshita et al. (1999) • frequency domain approach • features measuring temporal variation at the onset, and spectral energy distribution • colliding partials are identified and • corresponding feature values are (mostly) ignored • 3 instruments: clarinet, violin, piano • random chord combinations made from 2 isolated tones • 70% correct (78% if correct F0s were supplied)

  8. Our System • missing feature approach – works for speech recognition in the presence of noise • GMMs trained with spectral features perform well for realistic monophonic music and • GMMs have also been used in combination with a missing feature approach for speaker identification in noise use a GMM classifier in combination with a missing feature approach for instrument recognition in realistic, polyphonic music

  9. System Overview

  10. F0-analysis • iterative approach based on harmonic sieves (Scheffers, 1983) bad fitting sieve best fitting sieve determines F0

  11. Missing Feature Estimation • finding reliable and unreliable features is one of the main problems • instrument tones have an approximately harmonic overtone series • based on the extracted F0s, all frequency regions where a partial from a non-target tone is found are marked as unreliable and excluded from the recognition process

  12. Features • local spectral features are required for missing feature • frame based (exact onset detection is hard in polyphonic music) • energy in narrow frequency bands (60 Hz) • linear spacing, corresponding to linear spacing of partials

  13. Example Features with Mask target tone (violin D) non-target tone(oboe G sharp) mixture target tone + mask non-target tone + mask mixture + mask

  14. GMMs • approximate a distribution by a combination of individual gaussians example of a 2-dimensional distribution modeled by a GMM consisting of 3 individual Gaussians • means and covariances trained by EM-algorithm

  15. GMMs with Missing Features probability density function (pdf) of observed spectral D-dimensional feature vector x is modeled as: assuming feature independence, this can be rewritten as: approximating the pdf from reliable data only leads to: N = number of Gaussians in the mixture model, pi = mixture weight, Fi = univariate Gaussians with mi = mean vector, mij = mean, Si = covariance matrix, s2ij = standard deviation, M’ = subset of reliable features in Mask M

  16. Results Monophonic • GMMs trained for 5 instruments: flute, clarinet, oboe, violin, cello • realistic monophonic phrases (3-4 per instrument) 83% correct • single notes: 66% instrument correct, 85% instrument family correct

  17. Random 2-tone Chords • correct F0 were provided • 49% instrument correct, 72% instrument family

  18. Realistic Duet Recording • duet for flute and clarinet by H. Villa-Lobos• F0s extracted by the system system output: original score: flute clarinet in A fundamental frequency (Hz) F0s according to the score in Hz:415 - 415 - 415 - 622 - 622208 - 185 - 175 - 277 - 294 - 247 - 220 - 208 time (frames)

  19. Conclusions • looks promising for small ensembles• works with realistic stimuli Future Work • include temporal information• idea: one HMM for every instrument tone• missing feature approach comparable to the one used or • spectral subtraction based on templates

More Related