1 / 41

Singing-based Music Retrieval System

Explore the Content-based Music Retrieval system that retrieves music by singing or humming. Learn about related works, proposed methods, and experimental results. Discover the innovative on-line and off-line processing techniques involved in music note extraction and query results ranking.

Download Presentation

Singing-based Music Retrieval System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query by Singing (CBMR: Content-based Music Retrieval) MATLAB Conf. 1999 J.-S. Roger Jang (張智星) CS Dept, Tsing-Hua Univ, Taiwan http://www.cs.nthu.edu.tw/~jang Brought to you by Roger Jang

  2. Outline • Part 1 : Introduction • Part 2 : Related Work • Part 3 : Proposed Methods • Part 4 : Experimental Results and Demos • Part 5 : Conclusions and Future Work

  3. About Me • Experiences • 1993-1995: The MathWorks, Inc., U.S.A. • 1995-now: Associate Prof. At CS Dept., Tsing Hua Univ., Taiwan • Special achievements: Have survived • 1989 S.F. earthquake (7.1) • 1999 Taiwan earthquake (7.6) • 2009 ?

  4. Part 1: Introduction to CBMR • CBMR: Content-based music retrieval • Goal: Music retrieval by singing/humming • Traditional database query • Text-based search, SQL-based queries • Features used by CBMR systems • melody • rhythm • chord

  5. Part 2: Related Work • Query by humming by Ghias,Loga and Chamberlin in 1995 - Modified autocorrelation - 183 songs in database • MELDEX systemby New Zealand Digital Library Project in 1996 - Gold/Rabiner Algorithm (800 songs) - Sing ‘la’ or ‘ta’ when transposition

  6. Pitch Determination Methods • Time-domain analysis • Autocorrelation(1976) • AMDF(Average magnitude difference function) • Gold-Rabiner Algorithm(1969) • Frequency-domain analysis • Cepstrum (Noll 1964) • Harmonic product spectrum (Schroeder 1968) • Chen’s heuristic method (Chen 1998) • Others • Maximum likelihood • Simple inverse filter tracking (SIFT) • Neural network approaches

  7. PART 3: Proposed Method On-line processing Flow Chart : Microphone Signal input Sampling Short-term Autocorrelation Center Clipping Note Segmentation 11KHz Mid-level Representation Similarity Comparison Query Results (Ranking List) Music Note Extraction Midi Song Database Off-line processing

  8. Microphone Signal Input • Sampling & low-pass filtering • Wave file: (Happy Birthday) Note starts Note ends Note ends Note starts

  9. Autocorrelation in speech signal Speech wave form : Zoom in Overlap Frame

  10. Short-term Autocorrelation Autocorrelation of N points ending at M : Frame size 256 points, shift 128 points Using rectangular window

  11. Short-term Autocorrelation 1 128 s(n): s(n-h): h=30 x(30) = dot prod. of overlapped = sum(s(31:128).*s(1:99) Autocorrelation g(h): Pitch period 30

  12. Center Clipping Clipping limits are set to r% of the absolute maximum of the autocorrelation data output output output r% 0 input 0 input 0 input (a) (b) (c)

  13. Autocorrelation without Clipping

  14. Autocorrelation with Clipping

  15. The Range of Fundamental Frequency

  16. Computing Fundamental Frequency • Fundamental frequency: • Removal of unreasonable pitch: 1. Outside the fundamental frequency range 2. Sharp transition from both sides

  17. Pitch Tracking Pitch tracking via autocorrelationfor茉莉花 (jasmine)

  18. Pitch Contour Yellow line : Correct pitch contour

  19. Note Segmentation • Segmentation based on energy - Necessary to have intensity contrast - Hard to define each note boundary • Segmentation based on pitch - No constraints when singing - Reasonable in CBMR system

  20. Note Segmentation by Pitch Proposed approach : Sliding window method for I=1:seg_num if seg_length <= note_min find_note(min(seg(I))); // Find mean value else cut_note(min(seg(I))); // Sliding window end

  21. Sliding Window Method • Window size : 10 • Max standard deviation: 8 • for each window • if std(window) < 8 • find_note(window) • else • goto next winodw • end • end

  22. Transform pitches into notes • After segmentation by pitch Identified pitch frequencies: 329 392 440 523 440 392 440 392 Pitch contour

  23. Mid-level Representation(I) Numeric Contour • Find each note’s distance from A440 (La) Ex: So Mi Mi Fa Re Re Do Re Mi Fa So So So => So Mi Fa Re Do Re Mi Fa So (Removal of repeated notes) => 67 64 65 62 60 62 64 65 67 (Midi representation) => -2 -5 -4 -7 -9 -7 -5 -4 -2 (Distance from 69, or “la”) => -3 1 -3 -2 2 2 1 2 (Difference between neighbors)

  24. Mid-level Representation (II) Ternary Contour • Find each note’s distance from A440 (La) Ex:So Mi Mi Fa Re Re Do Re Mi Fa So So So => So Mi Fa Re Do Re Mi Fa So (Removal of repeated notes) => 67 64 65 62 60 62 64 65 67 (Midi representation) => -2 -5 -4 -7 -9 -7 -5 -4 -2 (Distance from 69, or “la”) => -1 1 -1 -1 1 1 1 1 (use 1, 0 -1 as contour)

  25. Mid-level Representation (III) Chord Contour C: do, mi, sol Dm: re, fa, la Em: mi, sol, si F: fa, la, do G: sol, si, mi Am: la, do, mi E1: mi E2: mi, sol E2 D2

  26. Wave Transformation Frequency (Hz) : 293 293 329 440 392 261 => -7 -7 -5 0 -2 -9 (Semitone offset) => -7 -5 0 -2 -9 (Removal of repeated notes) => 2 5 -2 -7 (Difference between neighbors) Frequency to semitone offset : freq : Note frequency(Hz) Offset : Semitone from A440 (La)

  27. 1-D String Matching •  : 1 1 2 0 -1 0 1 2 0  : -3 1 1 2 4 -1 1 2 5 • LCS: Longest common subsequence • lcs(, ) = 6 • LCCS: Longest common consecutive subsequence • lccs(, ) = 3 •  : 1 1 2 0 -1 0 1 2 0 •  : -3 1 1 2 4 -1 1 2 5

  28. Similarity Evaluation N : Number of sequence ,.  : Standard Deviation of , LCS distance Euclidean distance Cosine distance

  29. Modified LCS/LCCS Algorithm Two 1-D stringγ,λ: Initial values :

  30. Our Simulink Model Pre-recorded wave files Mid-level representation Simularity measures Microphone input

  31. Demo : Query Results Scores Song name

  32. Part 4: Experiment Results(I) • Songs database : 212 Chinese/English/Taiwan songs • Experiment 1:

  33. Experiment Results(I) • Experiment 2:

  34. Experiment Results(II) Ternary contour Cosine Distance Ranking Titles

  35. Experiment Results(III) • Methods comparisons 1.Use 60 wave files as our acoustic inputs 2.For each file, find the output ranking number 3.Find the total rank number of the 60 wave files.

  36. Computation Time • Total time = Sound recording time + Retrieval time • Sound recording time: 5 sec (default setting) • Retrieval time = Note segmentation + similarity comparison

  37. Demos • Pitch determination demo • Standard octave • On-line display of autocorrelation • CBMR System Demo • DTW (dynamic time warping) demo • Original • Input • Warping path

  38. PART 5: Conclusions • MATLAB, Simulink, and DSP Blockset are ideal tools for real-time audio signal processing. • Different similarity measures lead to different results. • The most time-consuming part is to key in single-channel midi files.

  39. Current Limitations • Matching only starts at the beginning of a song. • Retrieval time is proportional to the number of songs in the database. • Lyrics are neither identified nor used for similarity comparison.

  40. Future Work • Use DTW to allow matching at any place of a song • Use tree search techniques to shorten matching time • Recognize major channels in midi files • Construct a MIDI search engine on the Web • Try other types of content-based audio retrieval • Automatic music score generation from wave input (wave to midi converter) • Chip implementation

  41. The End

More Related