180 likes | 291 Views
Introduction to Music Informatics: I548/N560, Spring 2011. Instructor: Eric Nichols epnichols@gmail.com http://tinyurl.com/Info548. Overview Tues, Feb 15. HW – questions? HW: contest and output format Dynamic Time Warping for Audio-to-MIDI alignment Symbolic Representations
E N D
Introduction to Music Informatics: I548/N560, Spring 2011 Instructor: Eric Nichols epnichols@gmail.com http://tinyurl.com/Info548
OverviewTues, Feb 15 • HW – questions? • HW: contest and output format • Dynamic Time Warping for Audio-to-MIDI alignment • Symbolic Representations • Reading: Dannenberg
Polyphonic Audio Matching and Alignment • NingHu, Roger B. Dannenberg and George Tzanetakis • Goal: align polyphonic audio to a symbolic score • Does not perform transcription • Used to search MIDI databases for a match to a given audio recording
Motivation • Query by Humming is an important problem, and it uses a symbolic database. • Why is symbolic better than audio matching for this problem? • Possible solution: do polyphonic transcription on the query. Then find best match. However, transcription is hard.
Idea • Instead of transcription of the query, convert the symbolic database into audio! • Instead of using an entire spectrum, convert to a chroma vector. • Do dynamic time warping (DTW) on audio to look for matches.
Chroma Vector • For each bin in the FFT • Assign the bin to the nearest half-step • Remove octave information • For each pitch class (1-12), average the value of its associated bins. • For this paper: 0.25 seconds of audio per chroma vector. Nonoverlapping windows. • Computing pitch from MIDI and vice versa • freq = 440 * 2^((MIDI-69) / 12.0) • MIDI = 69 + 12*log(freq/440.0) / log(2)
Why chroma? • Not super-sensitive to spectral distribution – ignores many details of timbre by collapsing everything into one octave • Mostly is sensitive to fundamental pitches and chords
Converting MIDI to chroma • Two possibilities: • Render the MIDI with a synthesizer, and then compute the FFT and then the chroma vector. • Go directly from MIDI to chroma with a theoretical model (in this paper, it is assumed that no overtones are present in the chroma for each given MIDI pitch.) • One difficulty: dealing with percussive sounds
Chroma Similarity • Now we have lists of chroma vectors for an audio query and for a database of MIDI files • Normalize all vectors to have mean 0 and variance 1 • This helps reduce differences in vectors due to absolute loudness • Compute the Euclidean distance between vectors (0 distance = perfect match) • Compute the entire similarity matrix between vector pairs.
Similarity Matrix Dark = highly similar Black diagonal = matching path Note start, end, and length disparity
Conclusion • More sophisticated DTW could be used to speed up the search • Gives an example of linking symbolic and audio domains
Discussion • What elements/features of music should we represent? • Can we create a “dream” representation?