Similarity Matrix Processing for Music Structure Analysis

Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006

System Framework

Pitch Class Profile (PCP) • The PCP vector is a 12-dimensional vector, which shows the relative intensities of the 12 pitch classes, {C, C#, D, D#, E, F, F#, G, G#, A, A#,B} • Normalized to a unit vector

Pitch Class Profile (PCP)

Measure-based Similarity Matrix • Previous similarity matrix • Pre-defined window size • results in a similarity matrix of a large size that makes further processing more expensive • In this paper • Use measure as the element of similarity matrix

Measure-based Similarity Matrix • PCP Vector generation • choose a window size that is equal to the duration of one half beat • Detect onset signal • compute the change of the spectral content between two adjacent shifting windows of 20ms long and with 50% overlap

Measure-based Similarity Matrix • the autocorrelation function (ACF) of the onset signal is calculated to determine the beat period • Example: • 100BPM → length of half beat is 300 ms • Longer than the window size commonly use in previous work

Measure-based Similarity Matrix • Grouping N successive PCP vectors • Since PCP vectors are unit vectors, 0 <= sij <= 1 • dynamic time warping (DTW) can be used to enhance the sij value

Dynamic Time Warping

Measure-based Similarity Matrix • After the simplification, a 3-minute song with a tempo of 100BPM can form a 75 × 75 similarity matrix • MSM reveals more the chord similarity rather than the melody similarity

Two MSM Examples • Johnny Cash’s Hurt repeatedly uses the chord succession {Am, Am, C, D} in the 1st and 3rd sections while {G, A, F, C} in the 2nd and 4th sections. • Beatles’ Yesterday does not have chord succession of short periods. Its music form structure is P = {I V V C V C V O}

Detection of Local Similarity • Using a 2D moving window

Detection of Local Similarity • move the 2D moving window along the diagonal line of the MSM

Detection of Long Range Similarity • The Viterbi algorithm is used to find segments with consecutive large similarity values along the 45-degree direction • we can exploit the output from the second module that provides the chord succession similarity to enhance the long range similarity detection.

Detection of Long Range Similarity • interpret the x-axis as the “time”, the y-axis as the “state”

Detection of Long Range Similarity • use “scores” instead of “probabilities” • The score of a path is defined as the product of similarity value of all states and scores of all state transitions

Detection of Long Range Similarity • PT0 > PT1 to guarantee the preference along the 45-degree direction. • The larger the ratio, the more favorable the path will proceed along the 45-degree direction. • In our experiment, the ratio PT0/PT1 is chosen to be 1.5

Detection of Long Range Similarity • Pruning with Chord Succession Information • sections with repetitive chord successions of a certain period should be similar to sections of same period • A period value p is tagged to a measure

Detection of Long Range Similarity

Post-processing • we begin with the state j that gives the highest Q(L, j) at time L, and perform a back-tracking process. • Segments with length smaller than φ measures are removed • In our implementation, φ = 8. • Segments whose mean similarity value is less than a threshold, τ , are removed • τ = mean + standard deviation (for all sij)

Post-processing • Each segment should be divided • if their two corresponding sections in the song overlap with each other • if there is a significant difference between similarity values before and after a certain point in the segment. • If there are conflicts on sections, the one with a higher similarity value has the priority to keep the boundaries • For those songs in verse-chorus form, similarity values are clustered into two classes • high similarity values are claimed to be the chorus

Experiment • collection of 120 pop, country and rock songs after 60’s. • 100 of them are of the verse-chorus form and 20 are of the AAA or other form • mono audio sampled at a rate of 22,050Hz, with 16 bits per sample.

Experimental Results • The pattern extraction of a song is claimed to be correct if all patterns in the song are extracted without distinguishing between verse and chorus • The accurate detection rate is 112/120 = 93.33%.

Experimental Results

Similarity Matrix Processing for Music Structure Analysis

Similarity Matrix Processing for Music Structure Analysis

Presentation Transcript

Protein Structure Similarity

Ancient Greek Music as Matrix for Christian Worship Music

Music image processing

Music Processing

Computational Music Structure Analysis

Music Processing Algorithms

Bayesian Nonparametric Matrix Factorization for Recorded Music

Tone Matrix Music Box

Tone Matrix Music Box

Sequence similarity Analysis

Bayesian Nonparametric Matrix Factorization for Recorded Music

Sequence similarity Analysis

Protein Structure Similarity

Sequence similarity Analysis

Quantification of Solid Structure Similarity

Approximation of Protein Structure for Fast Similarity Measures

Matrix Pseudoinversion for Image Neural Processing

Matrix Structure

Tone Matrix music box

Approximation of Protein Structure for Fast Similarity Measures

Stakeholder Analysis Matrix

Music image processing