120 likes | 414 Views
Locating Cover Songs and Alternate Performances in Databases of Raw Audio Robert Turetsky rjt72@columbia.edu Advent Workshop May 17, 2002 Technology enables “liquid music” Production Distribution Consumption Content-Based Analysis: Motivation
E N D
Locating Cover Songs and Alternate Performances in Databases of Raw Audio Robert Turetsky rjt72@columbia.edu Advent Workshop May 17, 2002
Technology enables “liquid music” Production Distribution Consumption
Content-Based Analysis: Motivation • Search on file-sharing systems (e.g. KaZaA) involves meta-data • Meta-data prone to errors, omission, distortion • Only works if user already knows what to look for • Musical Content Analysis means: • Query by humming • Query by segment/prototype • Recommendation engines and artist discovery • Machine feedback/collaboration in composition • Locating cover songs is a first step
Locating Cover Songs: Prior Work • Query By Humming • Mature field (kiosks, applets) but limited to monophonic music or manually transcribed polyphonic music • Jonathan Foote (FX Palo Alto) • ARTHUR (2000): align RMS energy. Works only on orchestral music, pop music has less dynamic range. • Content-Based Retrieval of Music and Audio (1997). Measures acoustic similarity, not equivalence. • Cheng Yang (Stanford) • Music Database Retrieval Based on Spectral Similarity (2001). Aligns MFCC at points of high energy using DTW. • MACS (2001). Aligns estimates of pitch likelihood. Indexing. “Bad” alignments discarded after linearity filter.
Why is locating cover songs so difficult? • Alternate performances can vary: • Studio vs. Live • Tempo (non-linear time shifting) • Pitch transposition • Production technique, acoustic character • Additions (i.e. audience interaction) • Alternate lyrics (i.e. Don’t Cry versions I and II) • Cover versions, artist re-interpretations • Vocalist, instrumentation, ornamentation • Entire character changes (i.e. Layla, dance remixes) • Yet we still know these songs are the same!
System Overview Locate Section Breaks Identify Summary Sections Preprocessing Pitch Extraction Tonic Estimation Query Alignment
Phase 1: Locate Section Breaks • Employ Foote’s Similarity Matrix • Theory: Windows of same section will have similar features. Windows of different sections will have features. • Similarity Matrix: Cosine distance between every fixed width window of the song • Novelty Score - measure of ‘newness’: correlation with checkerboard matrix. • Section breaks are peaks in the Novelty Score.
Phase 2: Summary Segments Section 1 -> • Motivation: Only transcribe and align salient segments • Measure of salience: Repetition • Method: Search for largest off-diagonal line in Similarity Matrix for each segment to measure extent of repetition (“score”) • Summary segment is most repeated section. Prune rows/columns of similar sections in score matrix. Repeat until 45-75 sec of audio is kept Section 4 -> Sec 1 Sec 2 Sec 3 Sec 4 Sec 1 Sec 2 Sec 3 Sec 4
Phase 3: Pitch Extraction Noise Suppression • Multi-pitch extraction algorithm based on Klapuri et al, 2001. • Works well, except in presence of drums. Predominant Pitch Estimation Time -> Estimate Pitched Sound Characteristics Estimate # Voices and Iterate Remove Found Sound from Mixture <- Pitch ->
Phase 3: MPE Details Noise Reduction: RASTA style filter Predominant pitch estimation: “Fuzzy search” for harmonic peaks Spectral Smoothing to estimate sound parameters Resynthesis Repeat on mixture after removal Resynthesis
Phase 4-5: Query-time alignment • Exhaustively align summary segments • Two alignments needed: Pitch and Time • Pitch Alignment: Tonic Estimation • Align two piano rolls at point of maximum cross-correlation between note histograms • Temporal Alignment: Dynamic Programming (Dynamic Time Warp) • Currently investigating different weights for rewarding note matches, penalizing mismatches
Locating Cover Songs: Future Work • Indexing scheme, other alignment techniques to improve speed of query • Thematic extraction to find only melody or harmony lines • Include Beat Tracking as part of score • Investigate harmonic analysis (identifying chord structure) for better feature • Speech recognition on lyrics???