180 likes | 351 Views
Recognition and Analysis of Melodies from Polyphonic Musical Audio Data. Karin Dressler Fraunhofer IDMT, Konstanz University, TU Ilmenau Advisor: Prof. Dr. Karlheinz Brandenburg. Graduate School ISMIR 2004. Level of Abstraction. Sampled waveform (WAV – file). Time stamped events
E N D
Recognition and Analysis of Melodies from Polyphonic Musical Audio Data Karin Dressler Fraunhofer IDMT, Konstanz University, TU Ilmenau Advisor: Prof. Dr. Karlheinz Brandenburg Graduate School ISMIR 2004
Level of Abstraction Sampled waveform (WAV – file) Time stamped events (MIDI-file) Music notation (Humdrum)
What is melody? • „A rhythmical succession of single tones, ranging for the most part within a given key, and so related together as to form a musical whole, having the unity of what is technically called a musical thought, at once pleasing to the ear and characteristic in expression.“ • „The air or tune of a musical piece.“
What is melody? • „A rhythmical succession of single tones, ranging for the most part within a given key, and so related together as to form a musical whole, having the unity of what is technically called a musical thought, at once pleasing to the ear and characteristic in expression.“ • „The air or tune of a musical piece.“
Is this melody? [ Goto, 2000 ] • „The melody and the bass line have a harmonic structure.“ • „The melody line has the most predominant harmonic structure in middle and high frequency regions...“ • „The melody ... tends to have temporally continuous trajectories.“
Is this melody? [ Goto, 2000 ] • „The melody and the bass line have a harmonic structure.“ • „The melody line has the most predominant harmonic structure in middle and high frequency regions...“ • „The melody ... tends to have temporally continuous trajectories.“ • Just a good first guess
Algorithm Overview Audio file • Windowing and FFT >> logarithmic magnitude spectrum • Extract sinusodial components • Link sinusodials between frames in and find onsets of trajectories • Discriminate fundamentals and harmonics and choose candidates for melody • Choose most probable succession of notes according to music theory and psychoacoustics Spectral Analysis Extraction of Sinusodials Trajectories Onset detection Tone candidates Classification MIDI file
Spectral Analysis I • Many ways: DFT, Ear models, WT, Constant Q Transform, Warped FFT, filterbanks ... • My Choice: Discrete Fourier Transform >> FFT • Contra: • Linear frequency resolution • Pro: • „Simple“ output ( linear phase, linear frequency resolution ) • Analysis functions are sinusodials • Fast • Simple monophonic example:
Deterministic Analysis I • Segmentation ( binary decission ) • Discriminate sinusoids from „noise“ • Melody: deterministic part • Sinusodial models • Deterministic plus stochastic decomposition [Serra, 1989]
Trajectory objects • Partial trajectories • Frame linking • With respect to human auditory system ( magnitude and frequency changes) • Pixelwise representation >> object representation (labeling) • Deletion of small objects
Classification I • Assign a meaning to single objects • Classes: • First step: fundamentals, harmonics, noise • Second step: melody • Melody candidates • Classification based on music theory and psychoacoustics • MIDI note number, duration • Perceived loudness, toneness, masking effects... • Musicology: Which schema/model one should use? • Voice leading rules [Huron, 2001] • The Implication-Realization Model [Narmour, 1991] • Statistical Analysis of annotated melodies ( MIDI ) • Which classifier can modell those schemata?