1 / 33

Instrument Identification in Polyphonic Audio

Instrument Identification in Polyphonic Audio. Sarah Smith: Presentation for ECE 492 . Defining the task. GOAL: given a (single channel) recording of polyphonic music, identify the instruments that are playing Full source separation not necessary? How do we define ‘instrument’

onaona
Download Presentation

Instrument Identification in Polyphonic Audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instrument Identification in Polyphonic Audio Sarah Smith: Presentation for ECE 492

  2. Defining the task • GOAL: given a (single channel) recording of polyphonic music, identify the instruments that are playing • Full source separation not necessary? • How do we define ‘instrument’ • Grand piano vs. upright piano? • Electric vs. acoustic bass?

  3. Why is this useful? • Annotation in Musical databases • Enables searching by instrument • Possible to group similar types of ensembles • Can be combined with source separation to extract individual instruments • For editing or remixing • Removing solo over accompaniment

  4. What are the challenges • More parts => more possible combinations • 10 possible instruments => 210 quartet possibilities • Harder to differentiate similar (or same) instruments • String quartet • Often faces many of the same challenges as source separation

  5. Possible approaches • Start by performing source separation then identify an instrument to match each source • Relies heavily on accurate streaming in the source separation stage • Can identify instruments separately • Reinterpret the task as “ensemble identification” to identify the group • This greatly expands the number of possibilities • Does not rely on source separation techniques

  6. Instrument Recognition in Polyphonic Music Based on Automatic Taxonomies Slim Essid, Gaël Richard, and Bertrand David

  7. Proposed System • Develop a taxonomy of musical instrument classes • Using the identified features, cluster instrument classes based on their separation in the feature space • Develop a system of binary classifiers to identify the instrumentation of a test piece • Feature selection used at each node to determine the optimal features for classification

  8. Audio Features • In order to find the optimal feature set, calculate everything and then choose the best ones • Combination of ~ 100 spectral, temporal, and statistical features. • Could then use PCA or similar to reduce the dimensionality of feature space • This results in basis vectors that are a combination of many calculated features.

  9. Creating a taxonomy • Perform Principal component analysis on the extracted features to identify key components • Calculate a distance between each pair of instrument classes • Similar classes are clustered together • Iterate through multiple levels of the taxonomy

  10. Defining a distance metric • Using principal components features, a distance can be calculated between each pair of instrument classes • Divergence distance • Bhattacharryya distance

  11. Resulting taxonomy Bs = Double Bass Dr = Drums Eg = Electro-acoustic guitar Gt = Spanish guitar Pn = Piano Pr = Percussion Tr = Trumpet Ts = Tenor sax Vf, Vm = female, male voice V = voice W = Wind Instrument M = Melody (W, Vm or Eg)

  12. Learned Classifiers • Given the taxonomy, and unknown ensemble can be identified by using a series of classifiers at each node • Optimizing each classifier individually is a lot of work • Want an optimization method that can be easily generalized

  13. Feature Selection • Starting with the full database of features (D) we want to choose a subset (d) • Choose an optimization criteria and search the database to find the best features. • Desirable characteristics of chosen feature • Varies between instruments • Doesn’t vary for the same instrument class

  14. Feature Optimization • variables: • M = number of classes (=2 for pairwise feature selection) • Nm = Number of training data instances of class m • N = Total number of training data feature vectors • m(i) = mean of feature i over the training dataset • mm(i) = mean of feature i for class m • xnm(i)= value of the nmthfeature vector within class i Variation between classes Variation within class nm

  15. Testing the classifier • Data used included jazz ensembles ranging from solo to quartet • At each level of the taxonomy, use a “one vs.. one” binary classifier to choose which cluster the sound belongs to • For nodes with more than two possible classes, a majority vote is used to decide

  16. Correct common confusions • Use knowledge about which classes are commonly confused to go back and reevaluate classification • Example: solo drums are often classified as larger ensembles that include drums • If a sample is classified as an ensemble with drums, check to see if the second best fit is solo drums

  17. Overall Performance • Proposed system has 53% accuracy when compared with baseline model at 47% • Baseline (no hierarchy) model performs better for a certain subset of classes • Classification for the top level of taxonomy was 65%

  18. Summary • Treat each possible ensemble as an instrument for purposes of classification • Taxonomy structure gives classification into broad groupings of ensembles • Binary classifier is choosing between a smaller number of options at each node • However, a missed top level classification cannot be corrected later

  19. Musical Instrument Recognition in Polyphonic Audio using source-filter model for sound separation Toni Heittola, AnssiKlapuri, Tuomas Virtanen

  20. Proposed Method • First perform source separation on the input audio (NMF + Source Filter Model) • Run instrument detection on each of the separated streams (GMM)

  21. Source Filter Model • Sound coming from an instrument can be viewed as the product of a harmonic excitation (source) filtered according to the acoustical properties of the instrument • Example: Speech • Source: glottal pulse • Filter : vocal tract

  22. Source Filter Model Applied • Excitation spectrum includes fundamental frequency plus integer overtones, with equal weights • Filter modeled with filter bank coefficients on mel scale

  23. Creating an NMF Model • A naïve NMF approach to both separating the audio streams and labeling instruments would require a large number of parameters # of basis vectors= (# of instruments)(# of pitches) - Each basis vector has the same dimension as the FFT • If we limit ourselves to cases described by the source filter model, we now only need # Basis Vectors= (# of instruments) + (# of pitches) - Each instrument vector characterized by ~30 filter coefficients. - Excitation determined by a single pitch value.

  24. Signal Model Individual filter frequency responses Filter bank coefficients Hi(k) Mixture weights Excitation Spectrum (depends on note) Filter model (depends on instrument)

  25. EstimatingtheCoefficients • Filter bank responses aj(k) and excitation spectrum en,t(k) are known. • Multipitch estimation to find excitation spectrum • Can use a multiplicative update to find values for ci,j and gn,i,j within the mixture.

  26. Optional Streaming Model • The basic model is very general • Any instrument can play any or all of the notes in a single frame. • Each note can be played by more than one instrument • If we know that each instrument only plays one note, then we can form streams and identify the instrument for each stream

  27. Identifying the individual instruments • Using the source separation data, reconstruct sound for each of identified instrument sources • After extracting the MFCC data from the separated tracks, a Gaussian Mixture Model can be used to identify the instrument associated with each track

  28. Evaluation • Test signals were synthesized to be a random combination of instruments, each playing a random series of notes • Range from one to six part polyphony • Number of instruments is known

  29. Results • Accuracy evaluated using F measure • Separation quality measured using SNR Classification Accuracy SNR (dB)

  30. Observations • Source filter model improves instrument separation • Provides additional information that can be used in streaming • Randomly selected notes and instruments will tend to be easier to separate than music • Only 62% accuracy for the monophonic case is rather low

  31. Comparing the approaches • Decompose the task into source separation + instrument ID • Breaks the problem down into previously addressed problems • Identifying the whole ensemble at once • Doesn’t require information about the score or number of parts • Difficult to extend to purely arbitrary ensembles

  32. Conclusions • Much work remains to be done in this area. • Existing approaches can only achieve about 50 – 70% accuracy in this task, even with a large training data set • Similarly trained humans would likely perform much better. • Both approaches would struggle for groups of similar instruments (i.e. string quartet)

  33. Questions?

More Related