MIR: Status and Trends 音樂資訊檢索的現況與未來

MIR: Status and Trends音樂資訊檢索的現況與未來 J.-S. Roger Jang (張智星) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang

Outline • Intro. to music information retrieval (MIR) • Our work on MIR (with demos) • Query by singing/humming (QBSH) • Singing voice separation • Conclusions

Types of MIRSystems • Text-based MIR • Text input • 歌名、歌手、歌詞、作詞者、作曲者 • Metadata: 類別、情緒、口水歌 • Content-based MIR • Symbolic input • Music score info: 音符、節拍、和弦等 • Acoustic input • By example: 原曲輸入 • By humans: 哼唱、口哨、敲擊、鼓聲

Span of MIRResearch • Content analysis • Audio music • Low-level feature extraction • High-level feature representation • Symbolic music • High-level feature representation • Retrieval methods • Text-based information retrieval • Data clustering • Pattern recognition • Distance measures

MIR Methods for Audio Music • Audio features • Low-level features • MFCC, spectral flux, rolloff freq, … • High-level features • Pitch, onset, beat, tempo, chord, key, … • Vocal extraction • Others • Collaborative filtering • Retrieval methods • Clustering • K-means, VQ, hierarchical clustering • Classification • SVM, GMM, LSA, HMM, ANN… • Distance measure • DTW, KL, cosine similarity, edit distance • Others: Learning to rank

MIR Major Events • ISMIR/MIREX • Int. Sym. on music information retrieval, since 2000 • Music Information Retrieval Evaluation eXchange, since 2005 • ICMC • Int. Computer Music Conference, since 1974 • ICASSP • Int. Conf. on Acoustics, Speech, and Signal Processing , since 1976

ISMIR Growth: 2000-2009

ISMIR Locations 2003, Baltimore 2005, London 2006, Victoria 2008, Philadelphia 2009, Kobe 2007, Vienna 2002, Paris 2004, Barcelona 2001, Bloomington 2000, Plymouth

State-of-the-Art MIR: Tasks at MIREX • Audio music • High-level feature identification • Audio onset detection • Audio beat tracking • Audio tempo extraction • Audio key detection • Audio chord estimation • Multiple fundamental frequency estimation & tracking • Audio structural segmentation • Classification • Artist • Genre • Mood • Retrieval • Audio cover song identification • Audio tag classification • Audio music similarity and retrieval • Alignment • Real-time audio to score Alignment (a.k.a score following) • Symbolic music • Symbolic melodic similarity • Symbolic music similarity and retrieval • Hybrid • Query by singing/humming • Query by tapping

MIREX: 2005 - 2008

Our Work on MIR • QBSH: Query by Singing/Humming (哼唱檢索) • Singing voice separation (人聲抽取) • Audio melody extraction(主旋律抽取)

Introduction to QBSH • QBSH: Query by Singing/Humming • Input: Singing or humming from microphone • Output: A ranking list retrieved from the song database • Overview • First paper: Around1994 • Extensive studies since 2001 • State of the art: QBSH tasks at ISMIR/MIREX

Challenges in QBSH Systems • Reliable pitch tracking for acoustic input • Input from mobile devices or noisy karaoke bar • Song database preparation • MIDIs, singing clips, or audio music • Efficient/effective retrieval • Karaoke machine: ~10,000 songs • Internet music search engine: ~500,000,000 songs

QBSH: Goal and Approach • Goal: To retrieve songs effectively within a given response time, say 5 seconds or so • Our strategy • Multi-stage progressive filtering • Indexing for different comparison methods • Repeating pattern identification

Flowchart of QBSH • Two steps • Pitch tracking • Comparison methods

Frame Blocking for Pitch Tracking Overlap Zoom in 256 points/frame 84 points overlap 11025/(256-84)=64 pitch/sec Frame

ACF: Auto-correlation Function 1 128 Frame s(n): Shifted frame s(n-h): h=30 acf(30) = inner product of overlap part = dot(abs(s(30:256), s(1:227)) Pitch period acf(h): 30

Frequency to Semitone Conversion • Semitone : A music scale based on A440 • Reasonable pitch range: • E2 - C6 • 82 Hz - 1047 Hz ( - )

Example of Pitch Tracking

Typical Result of Pitch Tracking Pitch tracking via autocorrelation for茉莉花 (jasmine)

Comparison of Pitch Vectors Yellow line : Target pitch vector

Scale the query linearly to match the candidate A typical example of linear scaling Linear Scaling (LS)

Linear Scaling (LS) • Characteristics • One-shot for dealing with key transposition • Efficient and effective • Some indexing methods • Cannot deal with large tempo variations • #1 method for task 2 in QBSH/MIREX 2006 • Typical mapping path

DTW Path of “Match Beginning”

DTW Path of “Match Anywhere”

QBSH at MIREX 2006 • 比賽方式：由主辦單位來測試每一個參賽團隊之程式碼的辨識效能。參加隊伍來自全球各地，包含澳洲、德國、法國、芬蘭、台灣、烏拉圭、荷蘭、中國等。 • 語料： • 人聲哼唱的測試資料包含 2797 首 wav 檔案（長度8秒，8KHz/8Bit），118 人所錄製，含 48 首兒歌，可自由下載。 • 歌曲資料庫包含 2048 首單音的 midi 檔案，除前述48首兒歌外，其餘歌曲由主辦單位提供，不公開。 • 評比項目： • 以 2797 wav 檔案為輸入來檢索 2048 midi 檔案：評比標準為 mean reciprocal rank，我們達到 0.883（第三名，全球共有13隊參賽） • 以 2797 wav 檔案為輸入來檢索其他 2797 wav 檔案：評比標準為 mean precision，我們達到 0.926（第一名，全球共有10隊參賽）

Demos of QBSH • Real-time pitch tracking demo • SAP toolbox (http://mirlab.org/jang/matlab/toolbox/sap) • goPtbyAcf.mdl • Demo of QBSH • http://mirlab.org/new/mir_products.asp#miracle • Most successful QBSH application • http://www.midomi.com

Singing Voice Separation • Characteristics • Easier on karaoke stereo songs • Harder for monaural polyphonic songs • Important step for a number of MIR applications • Demo clips • http://sites.google.com/site/unvoicedsoundseparation/

On-going Research at AIST, Japan • Systems for listening to singing voices • LyricSynchronizer: Automatic sync. of lyrics with polyphonic music recordings • Singer ID: Singer identification • MiruSinger: Singing skill visualization/training • Hyperlinking Lyrics: Creating hyperlinks between phrases in song lyrics • Breath Detection: Automatic detection of breath sounds in unaccompanied singing voice

On-going Research at AIST, Japan (II) • Systems for music information retrieval based on singing voices • VocalFinder: Music information retrieval based on singing voice timbre • Voice Drummer: Music notation of drums using vocal percussion input • Systems for singing synthesis • SingBySpeaking: Speech-to-singing synthesis • VocaListener: Singing-to-singing synthesis

The Grand Challenges of MIR • Polyphonic audio music transcription • Analogy to the problem of image understanding over semitranslucent overlayed images • 困難度如同觀察水波而得知烏龜或青蛙游過

Conclusions • MIR research is on the rise! • MIR research over audio music (which account for 86% of MIREX tasks from 2005~2008) • High-level feature identification • Applications to genre/mood/tag classification/retrieval • Preexisting approaches shed lights on MIR. • Speech recognition/synthesis • Text information retrieval • Music theory

References • J. S. Downie, D. Bryd, T. Crawford, “Ten Years of ISMIR: Reflections on Challenges and Opportunities”, Keynote talk, Kobe, ISMIR 2010. • M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Future Challenges”, Proceedings of IEEE, Vol. 96, No. 4, April 2008. • J.-S. R. Jang and H.-R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008. • Z.-S. Chen, and J.-S. R. Jang, "On the Use of Anti-word Models for Audio Music Annotation and Retrieval", IEEE Transactions on Audio, Speech, and Language Processing, 2009. • C.-L. Hsu and J.-S. R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing, 2009. • Masataka Goto, Takeshi Saitou, Tomoyasu Nakano, and Hiromasa Fujihara, “Singing Information Processing Based on Singing Voice Modeling”, PP. 5506-5509, ICASSP 2010.

MIR: Status and Trends 音樂資訊檢索的現況與未來