90 likes | 246 Views
Structure Discovery of Pop Music Using HHMM. E6820 Project Jessie Hsu 03/09/05. Problem Description. Given Wav signal of a pop song Discover the structure of the song Intro Verse Chorus Bridge Outro. HMM Framework. Model the music signal as a series of state transitions.
E N D
Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05
Problem Description • Given • Wav signal of a pop song • Discover the structure of the song • Intro • Verse • Chorus • Bridge • Outro
HMM Framework • Model the music signal as a series of state transitions Hidden States …… Observations ……
HMM Framework: Hierarchical HMM • Each observation is an audio frame of one beat length Hidden States at Structure Level Intro Verse Outro Hidden States at Frame Level …… Observations ……
Representing a HHMM • HHMM parameters • Prior of each state at structure level and frame level π • State transition probabilities at structure level and frame level α • Emission parameters for each state at both levels • Each state is modeled as a mixture of Gaussians • Mean μand covariance matrices Σof each Gaussian
Training a HHMM • EM for HHMM • Look for maximum likelihood state sequence and model parameters • M-step: Best state sequence • Backward-forward algorithm • Viterbi algorithm • E-step: Parameter estimation • Priors at both levels π • State transition probabilities α • Emission parameters - Gaussian mixture mean μand covariance matrices Σ
Preprocessing • Beat detection • Segment the music into beat-length frames • Feature extraction • Repetition related feature (chorus/nonchorus) – Chroma vector • Intensity related feature (vocal/nonvocal) - Subband based Log Frequency Power Coefficients • Pitch related features – narrowband spectrogram features (Hann windowed FFT coefficients) • And possibly more….under investigation
Tasks • HHMM on a test song • Songs with I-V1-C1-V2-C2-(V3-C3)-B-O structure • Manually label structures as ground truth • Predefine the number of states at both structure and frame levels • Preprocessing • Model fitting • Evaluation • Accuracy of structure identification • Accuracy of structure timing
Reference • Y. Wang, M.-Y. Kan, T. L. New, A. Shenoy, J. Yin, “LyricAlly: Automatic Synchronization of Acoustical Musical Signals and Textual Lyrics”, ACM MM 2004 • C. Raphael, “A Hybrid Graphical Model For Aligning Polyphonic Audio With Musical Scores”, ISMIR 2004 • C. Raphael, “Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models”, IEEE Trans on PAMI, 1999 • P. J. Walmsley, S. J. Godsill, P. J. W. Rayner, “Polyphonic Pitch Tracking Using Joint Bayesian Estimation of Multiple Frame Parameters”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1999 • L. Xie, S.-F. Chang, A. Divakaran, H. Sun, “Learning Hierarchical Hidden Markov Models for Video Structure Discovery”, Tech Report, Columbia Univ, 2002 • L. Xie, S.-F. Chang, A. Divakaran, H. Sun, “Unsupervised Mining of Statistical Temporal Structures in Video”, Video Mining, Ch 10, Kluwer Academic Publishers, 2003 • R. J. Turetsky, D. P. W. Ellis, “Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Synthesis”, ISMIR 2003