270 likes | 424 Views
Similarity Matrix Processing for Music Structure Analysis. Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006. System Framework. Pitch Class Profile (PCP).
E N D
Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006
Pitch Class Profile (PCP) • The PCP vector is a 12-dimensional vector, which shows the relative intensities of the 12 pitch classes, {C, C#, D, D#, E, F, F#, G, G#, A, A#,B} • Normalized to a unit vector
Measure-based Similarity Matrix • Previous similarity matrix • Pre-defined window size • results in a similarity matrix of a large size that makes further processing more expensive • In this paper • Use measure as the element of similarity matrix
Measure-based Similarity Matrix • PCP Vector generation • choose a window size that is equal to the duration of one half beat • Detect onset signal • compute the change of the spectral content between two adjacent shifting windows of 20ms long and with 50% overlap
Measure-based Similarity Matrix • the autocorrelation function (ACF) of the onset signal is calculated to determine the beat period • Example: • 100BPM → length of half beat is 300 ms • Longer than the window size commonly use in previous work
Measure-based Similarity Matrix • Grouping N successive PCP vectors • Since PCP vectors are unit vectors, 0 <= sij <= 1 • dynamic time warping (DTW) can be used to enhance the sij value
Measure-based Similarity Matrix • After the simplification, a 3-minute song with a tempo of 100BPM can form a 75 × 75 similarity matrix • MSM reveals more the chord similarity rather than the melody similarity
Two MSM Examples • Johnny Cash’s Hurt repeatedly uses the chord succession {Am, Am, C, D} in the 1st and 3rd sections while {G, A, F, C} in the 2nd and 4th sections. • Beatles’ Yesterday does not have chord succession of short periods. Its music form structure is P = {I V V C V C V O}
Detection of Local Similarity • Using a 2D moving window
Detection of Local Similarity • move the 2D moving window along the diagonal line of the MSM
Detection of Long Range Similarity • The Viterbi algorithm is used to find segments with consecutive large similarity values along the 45-degree direction • we can exploit the output from the second module that provides the chord succession similarity to enhance the long range similarity detection.
Detection of Long Range Similarity • interpret the x-axis as the “time”, the y-axis as the “state”
Detection of Long Range Similarity • use “scores” instead of “probabilities” • The score of a path is defined as the product of similarity value of all states and scores of all state transitions
Detection of Long Range Similarity • PT0 > PT1 to guarantee the preference along the 45-degree direction. • The larger the ratio, the more favorable the path will proceed along the 45-degree direction. • In our experiment, the ratio PT0/PT1 is chosen to be 1.5
Detection of Long Range Similarity • Pruning with Chord Succession Information • sections with repetitive chord successions of a certain period should be similar to sections of same period • A period value p is tagged to a measure
Post-processing • we begin with the state j that gives the highest Q(L, j) at time L, and perform a back-tracking process. • Segments with length smaller than φ measures are removed • In our implementation, φ = 8. • Segments whose mean similarity value is less than a threshold, τ , are removed • τ = mean + standard deviation (for all sij)
Post-processing • Each segment should be divided • if their two corresponding sections in the song overlap with each other • if there is a significant difference between similarity values before and after a certain point in the segment. • If there are conflicts on sections, the one with a higher similarity value has the priority to keep the boundaries • For those songs in verse-chorus form, similarity values are clustered into two classes • high similarity values are claimed to be the chorus
Experiment • collection of 120 pop, country and rock songs after 60’s. • 100 of them are of the verse-chorus form and 20 are of the AAA or other form • mono audio sampled at a rate of 22,050Hz, with 16 bits per sample.
Experimental Results • The pattern extraction of a song is claimed to be correct if all patterns in the song are extracted without distinguishing between verse and chorus • The accurate detection rate is 112/120 = 93.33%.