1 / 67

Music Retrieval and Analysis

Music Retrieval and Analysis . Part I: Music Retrieval Arbee L.P. Chen National Tsing Hua University ISMIR’03 Tutorial III. Outline. Technologies Architecture for Music Retrieval Music Representations Music query processing Music indexing Similarity measures Systems and evaluation

ohanzee
Download Presentation

Music Retrieval and Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Music Retrieval and Analysis Part I: Music Retrieval Arbee L.P. Chen National Tsing Hua University ISMIR’03 Tutorial III

  2. Outline • Technologies • Architecture for Music Retrieval • Music Representations • Music query processing • Music indexing • Similarity measures • Systems and evaluation • Existing systems • Meldex • Themefinder • SEMEX • PROMS • OMRAS • System evaluation • Future research directions

  3. Architecture for Music Retrieval Users Music Player Music Query Interface Query result Music query Result music objects Music Feature Extractor Music Query Processor Music Storage manager Music features Music objects Result music objects Music Database Music Index

  4. Music Representations

  5. Styles of Music Composition • Monophony • Monophonic music has at most one note playing at any given time; before a new note starts the previous note must have ended • Homophony • Homophonic music has at most one set of notes playing at the same time. For any set of notes that start at the same time, no new note or notes may begin until every note in that set has ended • Polyphony • Polyphonic music has no such restrictions. Any note or set of notes may begin before any previous note or set of notes has ended

  6. Monophony Representations • Absolute measure • Absolute pitch • C5 C5 D5 A5 G5 G5 G5 F5 G5 • Absolute duration • 1 1 1 1 1 0.5 0.5 1 1 • Absolute pitch and duration • (C5,1)(C5,1)(D5,1)(A5,1)(G5,1)(G5,0.5)(G5,0.5)(F5,1)(G5,1) • Relative measure • Contour (in semitones) • 0 +2 +7 -2 0 0 -2 +2 • IOI (Inter onset interval) ratio • 1 1 1 1 0.5 1 2 1 • Contour and IOI ratio • (0,1)(+2,1)(+7,1)(-2,1)(0,0.5)(0,1)(-2,2)(+2,1)

  7. Polyphony Representations • All information preservation • Keep all information of absolute pitch and duration (start_time, pitch, duration) • (1,C5,1)(2,C5,1)(3,D5,1)(3,A5,1)(4,F5,4)(5,C6,1)(6,G5,0.5)(6.5,G5,0.5)… • Relative note representation • Record difference of start times and contour (ignore duration) • (1,0)(1,+2)(0,+7)(1,-4)… • Monophonic reduction • Select one note at every time step (main melody selection) • (C5,1)(C5,1)(A5,1)(F5,1)(C6,1)... • Homophonic reduction (chord reduction) • Select every note at every time step • (C5)(C5)(D5,A5)(F5)(C6)(G5)(G5)…

  8. Music Representation - Theme • Theme • A short tune that is repeated or developed in a piece of music • A small part of a musical work • Efficient retrieval • A highly semantic representation • Effective retrieval • Automatic theme extraction • Exact repeating patterns • Approximate repeating patterns

  9. Music Representation – Markov Models • Capture global information for a music piece • Repeating patterns • Sequential patterns • A lossy representation • Good for music classification

  10. Markov Model Representation • [Pickens and Crawford, CIKM‘02] • Homophonic reduction • For each chord, compute its distance with the 24 lexical chords • Capture statistical properties by Markov models • The representation of each song is reduced into a matrix

  11. Markov Model Representation (Cont.) Chord Markov model representation Lexical chords

  12. Music Query Processing • On-line methods (string matching algorithms) • Exact string matching • Brute-force method • KMP algorithm • Boyer-Moore algorithm • Shift-Or algorithm • Partial string matching • Shift-Or algorithm • Approximate string matching • Dynamic programming

  13. Brute-Force Method • T: A5 B5 A5 C5 A5 B5 A5 B5 • P: A5 B5 A5 B5 • Time complexity O(mn)

  14. KMP Algorithm • Left-to-right scan • Failure function shift rule • O(m+n) Failure function f(i) 0 0 1 2 Skip this step

  15. Boyer-Moore Algorithm • If the pattern P is relatively long and the alphabet is reasonably large, this algorithm is likely to be the most efficient string matching algorithm • Right-to-left scan • Bad character shift rule • Good suffix shift rule • O(m+n)

  16. Bad Character Shift Rule bad character Right to left scan skip thesesteps

  17. Good Suffix Shift Rule Good suffix Skip this step

  18. Shift-Or Algorithm An example of the shift-or algorithm for p=aab and s=abcaaab T a b c a 0 1 1 0 1 1 1 0 1 a b E S(E) T[a] E S(E) T[b] E S(E) T[c] E S(E) T[a] E E S(E) T[a] E S(E) T[a] E S(E) T[b] a a b 1 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1

  19. Shift-Or Algorithm for Partial Matching [Lemstrom and Perttu, ISMIR’00] An example of the shift-or algorithm for p=aab and s=(ab)(ca)(aab) T a b c a 0 1 1 0 1 1 1 0 1 a b E S(E) T[a]^T[b] E S(E) T[c]^T[a] E E S(E) T[a]^T[a]^T[b] a a b 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 0

  20. Approximate Matching • In practical pattern matching applications, exact matching is not always suitable • In the field of MIR, approximation is measured mainly by the edit distance: the minimal number of local edit operations needed to transform one music object into another • Dynamic programming method serves this purpose

  21. Edit Distance • Unit cost edit distance • W(ab)=1, ab (Replacement) • W(a  )=W(b)=1 (Deletion and Insertion) • Non-unit cost edit distance (Content-sensitive) • The costs of replacement, deletion and insertion can be any values which depend on the cost function • E.g., W(mn)=0.2 and W(a )=0.8

  22. Dynamic Programming Method • Given any two strings S1=abac, S2=aaccb • The edit distance evaluated by DP • The edit distance is 3 a b a c c b (1 deletion, 2 insertions) a a c c b

  23. Music Indexing • Tree-based index (Suffix tree) • List-based index (1D-list) • N-gram index • Indexing Markov models?

  24. Tree-Based Index • [Chen, et al., ICME‘00] • Music objects are coded as strings of music segments • Four segment types to model music contour • Pitch and duration are considered • Index structures • Augmented suffix tree • Both incipit/partial and exact/approximate matching can be handled

  25. Tree-Based Index (Cont.) Four segment types type A type B type C type D

  26. Tree-Based Index (Cont.) A A B $ A B 3 $ B 2 $ • An example suffix tree • A 1-D augmented suffix tree 1 The suffix tree of the string S=“ABCAB”

  27. List-Based Index • [Liu, Hsu and Chen, ICMCS‘99] • Music objects are coded as melody strings • “so-mi-mi-fa-re-re-do-re-mi-fa-so-so-so” • Melody strings are organized as linked lists • Both incipit/partial and exact/approximate matching can be handled • Exact link, insertion link, dropout link, transposition link

  28. List-Based Index (Cont.)

  29. N-Gram Index • A widely used technique in music databases • Target strings are cut into index terms by a sliding window with length N • Index can be implemented by various methods, e.g., inverted file • Queries are also cut into index terms with length N • Searching and joining are then performed

  30. N-Gram Index [Doraisamy and Ruger, ISMIR’02] Query=bbca Cut into 2-grams S=aabbcaab bb, ca Position: 3 Position: 5 Join Inverted file The substring is found from position 3 to position 6

  31. Similarity Measures • The effectiveness of MIR depends on the similarity measure • Edit distance (Suitable for short queries) • Difference between two probability matrices • Note shift distance [Typke, et al., ISMIR‘03]

  32. Probability Matrix Distance S1: CCCAABCB S2: CCAAABCB D(S1||S2)=0.1092 Kullback-Liebler (KL) divergence: The value is 0 when two matrices are the same q: Query probability matrix d: Data probability matrix i: row x: column

  33. Probability Matrix Distance (Cont.) • Ineffective for MIR with short queries • 0-entries in the query model mean unknown values? • 0-entries in the corpus model means facts? • Performance comparison with string matching needed

  34. 0 0 0 0 0 Note Shift Distance • Sum of the two dimensional distance between the notes of the query and the notes of the answer

  35. Music Retrieval Systems • Music Representations • Music Query processing • Special features

  36. Meldex • [McNab, et al., D-Lib Magazine‘97] • "melodic contour" or "pitch profile“ • 113531  RUUDD (R:Repeat, U:Up, D:Down) • Approximate string matching • Dynamic programming • Query by humming

  37. Themefinder • [Kornstadt, Computing in Musicology‘98] • Select themes manually • Allow different query types • Pitch • Interval • Contour • Provide exact matchingonly

  38. SEMEX • [Lemstrom and Perttu, ISMIR’00] • The pattern is monophonic; the musical source is polyphonic • Finding all positions of S (source) that have an occurrence of p • p=bca • S=<a, b, c><a, b><b, c><a, b> • Shift-or algorithm • No similarity function

  39. PROMS • [Clausen, et al., ISMIR‘00] • Representation by pitch and onset time (ignore duration) • Index by inverted file • Fault-tolerant music search • Allow missing notes • Allow fuzzy notes • Query=(b, (d or c), a, b)

  40. OMRAS • [Dovey, ISMIR‘01] • Searching in a “piano roll” model • Gaps based dynamic programming • Example (gap = 2): • Data • T0=<64,72,76>, T1=<60>, T2=<59,67,79>, T3=<55,63>, T4=<55,67,79> • Query • S0=<59,67>, S1=<55,67> Piano roll

  41. System Evaluation • Traditional measures of effectiveness are precision and recall

  42. The Recall-Precision Curve

  43. A Platform for Evaluating MIR Systems • Evaluation of various music retrieval approaches • Efficiency • response time • Effectiveness • recall-precision curve • The Ultima project builds such a platform [Hsu, Chen and Chen, CIKM’01] • Same data set and query set for various approaches • Compare recall-precision curves

  44. The Ultima Project • Data store • Query generation module • Query processing module • Result summarization module • Report module • Mediator

  45. Future Research Directions • Music Retrieval based on music structure • Music retrieval based on user’s perceptiveness • Similarity measure for polyphonic music • A novel index structure for polyphony • Fair evaluation method

  46. Music Retrieval and Analysis Part II: Music Analysis

  47. Outline • Music Segmentation and Structure Analysis • Local Boundary Detection • Repeating Pattern Discovery • Phrase Extraction • Music Classification • Music Recommendation Systems • Future Research Directions

  48. Local Boundary Detection • [Cambouropoulos, ICMC’01] • Segment music by local discontinuities between notes • Calculate boundary strength values for each interval of a melodic surface, i.e., pitch, IOI, and rest, according to the strength of local discontinuities

  49. Local Boundary Detection (Cont.) • A music object m has a parametric profile Pk, which is represented as a sequence of n intervals • Pk = [x1, x2, …, xn] where k{pitch, IOI, rest} • Pitch interval measured in semitones • IOI and rest intervals measured in milliseconds or numerical duration values • IOI (Inter onset interval) • The amount of time between the onset of one note and the onset of the next note • Rest • The amount of time between the offset of one note and the onset of the next note IOI pitch rest

  50. Local Boundary Detection (Cont.) • The degree of change r between two successive interval values xi and xi+1 is: • The strength of the boundary si for interval xi is: • Overall local boundary strength based on the three intervals • wp*si(p)+wd*si(d)+wr*si(r)

More Related