1 / 9

Structure Discovery of Pop Music Using HHMM

Structure Discovery of Pop Music Using HHMM. E6820 Project Jessie Hsu 03/09/05. Problem Description. Given Wav signal of a pop song Discover the structure of the song Intro Verse Chorus Bridge Outro. HMM Framework. Model the music signal as a series of state transitions.

jerica
Download Presentation

Structure Discovery of Pop Music Using HHMM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05

  2. Problem Description • Given • Wav signal of a pop song • Discover the structure of the song • Intro • Verse • Chorus • Bridge • Outro

  3. HMM Framework • Model the music signal as a series of state transitions Hidden States …… Observations ……

  4. HMM Framework: Hierarchical HMM • Each observation is an audio frame of one beat length Hidden States at Structure Level Intro Verse Outro Hidden States at Frame Level …… Observations ……

  5. Representing a HHMM • HHMM parameters • Prior of each state at structure level and frame level π • State transition probabilities at structure level and frame level α • Emission parameters for each state at both levels • Each state is modeled as a mixture of Gaussians • Mean μand covariance matrices Σof each Gaussian

  6. Training a HHMM • EM for HHMM • Look for maximum likelihood state sequence and model parameters • M-step: Best state sequence • Backward-forward algorithm • Viterbi algorithm • E-step: Parameter estimation • Priors at both levels π • State transition probabilities α • Emission parameters - Gaussian mixture mean μand covariance matrices Σ

  7. Preprocessing • Beat detection • Segment the music into beat-length frames • Feature extraction • Repetition related feature (chorus/nonchorus) – Chroma vector • Intensity related feature (vocal/nonvocal) - Subband based Log Frequency Power Coefficients • Pitch related features – narrowband spectrogram features (Hann windowed FFT coefficients) • And possibly more….under investigation

  8. Tasks • HHMM on a test song • Songs with I-V1-C1-V2-C2-(V3-C3)-B-O structure • Manually label structures as ground truth • Predefine the number of states at both structure and frame levels • Preprocessing • Model fitting • Evaluation • Accuracy of structure identification • Accuracy of structure timing

  9. Reference • Y. Wang, M.-Y. Kan, T. L. New, A. Shenoy, J. Yin, “LyricAlly: Automatic Synchronization of Acoustical Musical Signals and Textual Lyrics”, ACM MM 2004 • C. Raphael, “A Hybrid Graphical Model For Aligning Polyphonic Audio With Musical Scores”, ISMIR 2004 • C. Raphael, “Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models”, IEEE Trans on PAMI, 1999 • P. J. Walmsley, S. J. Godsill, P. J. W. Rayner, “Polyphonic Pitch Tracking Using Joint Bayesian Estimation of Multiple Frame Parameters”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1999 • L. Xie, S.-F. Chang, A. Divakaran, H. Sun, “Learning Hierarchical Hidden Markov Models for Video Structure Discovery”, Tech Report, Columbia Univ, 2002 • L. Xie, S.-F. Chang, A. Divakaran, H. Sun, “Unsupervised Mining of Statistical Temporal Structures in Video”, Video Mining, Ch 10, Kluwer Academic Publishers, 2003 • R. J. Turetsky, D. P. W. Ellis, “Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Synthesis”, ISMIR 2003

More Related