Instrument Identification in Polyphonic Audio

Instrument Identification in Polyphonic Audio Sarah Smith: Presentation for ECE 492

Defining the task • GOAL: given a (single channel) recording of polyphonic music, identify the instruments that are playing • Full source separation not necessary? • How do we define ‘instrument’ • Grand piano vs. upright piano? • Electric vs. acoustic bass?

Why is this useful? • Annotation in Musical databases • Enables searching by instrument • Possible to group similar types of ensembles • Can be combined with source separation to extract individual instruments • For editing or remixing • Removing solo over accompaniment

What are the challenges • More parts => more possible combinations • 10 possible instruments => 210 quartet possibilities • Harder to differentiate similar (or same) instruments • String quartet • Often faces many of the same challenges as source separation

Possible approaches • Start by performing source separation then identify an instrument to match each source • Relies heavily on accurate streaming in the source separation stage • Can identify instruments separately • Reinterpret the task as “ensemble identification” to identify the group • This greatly expands the number of possibilities • Does not rely on source separation techniques

Instrument Recognition in Polyphonic Music Based on Automatic Taxonomies Slim Essid, Gaël Richard, and Bertrand David

Proposed System • Develop a taxonomy of musical instrument classes • Using the identified features, cluster instrument classes based on their separation in the feature space • Develop a system of binary classifiers to identify the instrumentation of a test piece • Feature selection used at each node to determine the optimal features for classification

Audio Features • In order to find the optimal feature set, calculate everything and then choose the best ones • Combination of ~ 100 spectral, temporal, and statistical features. • Could then use PCA or similar to reduce the dimensionality of feature space • This results in basis vectors that are a combination of many calculated features.

Creating a taxonomy • Perform Principal component analysis on the extracted features to identify key components • Calculate a distance between each pair of instrument classes • Similar classes are clustered together • Iterate through multiple levels of the taxonomy

Defining a distance metric • Using principal components features, a distance can be calculated between each pair of instrument classes • Divergence distance • Bhattacharryya distance

Resulting taxonomy Bs = Double Bass Dr = Drums Eg = Electro-acoustic guitar Gt = Spanish guitar Pn = Piano Pr = Percussion Tr = Trumpet Ts = Tenor sax Vf, Vm = female, male voice V = voice W = Wind Instrument M = Melody (W, Vm or Eg)

Learned Classifiers • Given the taxonomy, and unknown ensemble can be identified by using a series of classifiers at each node • Optimizing each classifier individually is a lot of work • Want an optimization method that can be easily generalized

Feature Selection • Starting with the full database of features (D) we want to choose a subset (d) • Choose an optimization criteria and search the database to find the best features. • Desirable characteristics of chosen feature • Varies between instruments • Doesn’t vary for the same instrument class

Feature Optimization • variables: • M = number of classes (=2 for pairwise feature selection) • Nm = Number of training data instances of class m • N = Total number of training data feature vectors • m(i) = mean of feature i over the training dataset • mm(i) = mean of feature i for class m • xnm(i)= value of the nmthfeature vector within class i Variation between classes Variation within class nm

Testing the classifier • Data used included jazz ensembles ranging from solo to quartet • At each level of the taxonomy, use a “one vs.. one” binary classifier to choose which cluster the sound belongs to • For nodes with more than two possible classes, a majority vote is used to decide

Correct common confusions • Use knowledge about which classes are commonly confused to go back and reevaluate classification • Example: solo drums are often classified as larger ensembles that include drums • If a sample is classified as an ensemble with drums, check to see if the second best fit is solo drums

Overall Performance • Proposed system has 53% accuracy when compared with baseline model at 47% • Baseline (no hierarchy) model performs better for a certain subset of classes • Classification for the top level of taxonomy was 65%

Summary • Treat each possible ensemble as an instrument for purposes of classification • Taxonomy structure gives classification into broad groupings of ensembles • Binary classifier is choosing between a smaller number of options at each node • However, a missed top level classification cannot be corrected later

Musical Instrument Recognition in Polyphonic Audio using source-filter model for sound separation Toni Heittola, AnssiKlapuri, Tuomas Virtanen

Proposed Method • First perform source separation on the input audio (NMF + Source Filter Model) • Run instrument detection on each of the separated streams (GMM)

Source Filter Model • Sound coming from an instrument can be viewed as the product of a harmonic excitation (source) filtered according to the acoustical properties of the instrument • Example: Speech • Source: glottal pulse • Filter : vocal tract

Source Filter Model Applied • Excitation spectrum includes fundamental frequency plus integer overtones, with equal weights • Filter modeled with filter bank coefficients on mel scale

Creating an NMF Model • A naïve NMF approach to both separating the audio streams and labeling instruments would require a large number of parameters # of basis vectors= (# of instruments)(# of pitches) - Each basis vector has the same dimension as the FFT • If we limit ourselves to cases described by the source filter model, we now only need # Basis Vectors= (# of instruments) + (# of pitches) - Each instrument vector characterized by ~30 filter coefficients. - Excitation determined by a single pitch value.

Signal Model Individual filter frequency responses Filter bank coefficients Hi(k) Mixture weights Excitation Spectrum (depends on note) Filter model (depends on instrument)

EstimatingtheCoefficients • Filter bank responses aj(k) and excitation spectrum en,t(k) are known. • Multipitch estimation to find excitation spectrum • Can use a multiplicative update to find values for ci,j and gn,i,j within the mixture.

Optional Streaming Model • The basic model is very general • Any instrument can play any or all of the notes in a single frame. • Each note can be played by more than one instrument • If we know that each instrument only plays one note, then we can form streams and identify the instrument for each stream

Identifying the individual instruments • Using the source separation data, reconstruct sound for each of identified instrument sources • After extracting the MFCC data from the separated tracks, a Gaussian Mixture Model can be used to identify the instrument associated with each track

Evaluation • Test signals were synthesized to be a random combination of instruments, each playing a random series of notes • Range from one to six part polyphony • Number of instruments is known

Results • Accuracy evaluated using F measure • Separation quality measured using SNR Classification Accuracy SNR (dB)

Observations • Source filter model improves instrument separation • Provides additional information that can be used in streaming • Randomly selected notes and instruments will tend to be easier to separate than music • Only 62% accuracy for the monophonic case is rather low

Comparing the approaches • Decompose the task into source separation + instrument ID • Breaks the problem down into previously addressed problems • Identifying the whole ensemble at once • Doesn’t require information about the score or number of parts • Difficult to extend to purely arbitrary ensembles

Conclusions • Much work remains to be done in this area. • Existing approaches can only achieve about 50 – 70% accuracy in this task, even with a large training data set • Similarly trained humans would likely perform much better. • Both approaches would struggle for groups of similar instruments (i.e. string quartet)

Questions?

Instrument Identification in Polyphonic Audio

Instrument Identification in Polyphonic Audio

Presentation Transcript

Recognition and Analysis of Melodies from Polyphonic Musical Audio Data

Counterpoint, harmony and polyphonic retrieval

Audio recovery and identification of first Norwegian sound recording

Instrument identification and line symbols

Polyphonic Audio Key Finding Using the Spiral Array CEG Algorithm

Instrument Identification Assessment

AUDIO-VISUAL SPEAKER IDENTIFICATION USING THE CUAVE DATABASE

Audio in VEs

Instrument Classification in a Polyphonic Music Environment

INSTRUMENT IDENTIFICATION IN POLYPHONIC MUSIC

Robust Audio Identification for Commercial Applications

Polyphonic Inter-animation of Voices in Chats

Adapted representations of audio signals for music instrument recognition

MSc Project Musical Instrument Identification System MIIS

System Generated Procurement Instrument Identification Numbers (PIIDs)

Polyphonic Transcription

Polyphonic JR

Report about polyphonic music transcription

Audio recovery and identification of first Norwegian sound recording

Polyphonic Queries

Berkeley Audio Store | Offer a Range Of Music Instrument