Locating Cover Songs and Alternate Performances in Databases of Raw Audio

Locating Cover Songs and Alternate Performances in Databases of Raw Audio Robert Turetsky rjt72@columbia.edu Advent Workshop May 17, 2002

Technology enables “liquid music” Production Distribution Consumption

Content-Based Analysis: Motivation • Search on file-sharing systems (e.g. KaZaA) involves meta-data • Meta-data prone to errors, omission, distortion • Only works if user already knows what to look for • Musical Content Analysis means: • Query by humming • Query by segment/prototype • Recommendation engines and artist discovery • Machine feedback/collaboration in composition • Locating cover songs is a first step

Locating Cover Songs: Prior Work • Query By Humming • Mature field (kiosks, applets) but limited to monophonic music or manually transcribed polyphonic music • Jonathan Foote (FX Palo Alto) • ARTHUR (2000): align RMS energy. Works only on orchestral music, pop music has less dynamic range. • Content-Based Retrieval of Music and Audio (1997). Measures acoustic similarity, not equivalence. • Cheng Yang (Stanford) • Music Database Retrieval Based on Spectral Similarity (2001). Aligns MFCC at points of high energy using DTW. • MACS (2001). Aligns estimates of pitch likelihood. Indexing. “Bad” alignments discarded after linearity filter.

Why is locating cover songs so difficult? • Alternate performances can vary: • Studio vs. Live • Tempo (non-linear time shifting) • Pitch transposition • Production technique, acoustic character • Additions (i.e. audience interaction) • Alternate lyrics (i.e. Don’t Cry versions I and II) • Cover versions, artist re-interpretations • Vocalist, instrumentation, ornamentation • Entire character changes (i.e. Layla, dance remixes) • Yet we still know these songs are the same!

System Overview Locate Section Breaks Identify Summary Sections Preprocessing Pitch Extraction Tonic Estimation Query Alignment

Phase 1: Locate Section Breaks • Employ Foote’s Similarity Matrix • Theory: Windows of same section will have similar features. Windows of different sections will have features. • Similarity Matrix: Cosine distance between every fixed width window of the song • Novelty Score - measure of ‘newness’: correlation with checkerboard matrix. • Section breaks are peaks in the Novelty Score.

Phase 2: Summary Segments Section 1 -> • Motivation: Only transcribe and align salient segments • Measure of salience: Repetition • Method: Search for largest off-diagonal line in Similarity Matrix for each segment to measure extent of repetition (“score”) • Summary segment is most repeated section. Prune rows/columns of similar sections in score matrix. Repeat until 45-75 sec of audio is kept Section 4 -> Sec 1 Sec 2 Sec 3 Sec 4 Sec 1 Sec 2 Sec 3 Sec 4

Phase 3: Pitch Extraction Noise Suppression • Multi-pitch extraction algorithm based on Klapuri et al, 2001. • Works well, except in presence of drums. Predominant Pitch Estimation Time -> Estimate Pitched Sound Characteristics Estimate # Voices and Iterate Remove Found Sound from Mixture <- Pitch ->

Phase 3: MPE Details Noise Reduction: RASTA style filter Predominant pitch estimation: “Fuzzy search” for harmonic peaks Spectral Smoothing to estimate sound parameters Resynthesis Repeat on mixture after removal Resynthesis

Phase 4-5: Query-time alignment • Exhaustively align summary segments • Two alignments needed: Pitch and Time • Pitch Alignment: Tonic Estimation • Align two piano rolls at point of maximum cross-correlation between note histograms • Temporal Alignment: Dynamic Programming (Dynamic Time Warp) • Currently investigating different weights for rewarding note matches, penalizing mismatches

Locating Cover Songs: Future Work • Indexing scheme, other alignment techniques to improve speed of query • Thematic extraction to find only melody or harmony lines • Include Beat Tracking as part of score • Investigate harmonic analysis (identifying chord structure) for better feature • Speech recognition on lyrics???

Locating Cover Songs and Alternate Performances in Databases of Raw Audio

Locating Cover Songs and Alternate Performances in Databases of Raw Audio

Presentation Transcript

Performances of “Blackness”

Song of Songs

CULTURAL REINTERPRETATION OF POPULAR MUSIC : THE CASE OF JAPANESE/AMERICAN COVER SONGS

Patterns and Performances in Speech and Music

“The Chimney Sweeper” from Songs of Innocence and Songs of Experience

SONG OF SONGS

Performances of Hamlet’s “OPHELIA”

Songs of Slavery

Study in Song of Songs

Physics performances and benchmarking of INCL++

Constructions and Performances of gender

SONG OF SONGS

Songs of Innocence and Songs of Experience

Detection of Target Speakers in Audio Databases

Audio Databases

Locating

Cover and back cover

Songs of Ascents

Patterns and Performances in Speech and Music

Locating The Best Songs To Get The Best Discounts: Audio Downloads Guidelines