350 likes | 806 Views
Using Blackboard Systems for Polyphonic Transcription. A Literature Review by Cory McKay. Outline. Intro to polyphonic transcription Intro to blackboard systems Keith Martin’s work Kunio Kashino’s work Recent contributions Conclusion . Polyphonic Transcription.
E N D
Using Blackboard Systems for Polyphonic Transcription A Literature Review by Cory McKay
Outline • Intro to polyphonic transcription • Intro to blackboard systems • Keith Martin’s work • Kunio Kashino’s work • Recent contributions • Conclusion
Polyphonic Transcription • Represent an audio signal as a score • Must segregate notes belonging to different voices • Problems: variations of timbre within a voice, voice crossing, identification of correct octave • No successful general purpose system to date
Polyphonic Transcription • Can use simplified models: • Music for a single instrument (e.g. piano) • Extract only a given instrument from mix • Use music which obeys restrictive rules • Simplified systems have had success rates of between 80% and 90% • These rates may be exaggerated, since only very limited testing suites generally used
Polyphonic Transcription • Systems to date generally identify only rhythm, pitch and voice • Would like systems that also identify other notated aspects such as dynamics and vibrato • Ideal is to have system that can identify and understand parameters of music that humans hear but do not notate
Blackboard Systems • Used in AI for decades but only applied to music transcription in early 1990’s • Term “blackboard” comes from notion of a group of experts standing around a blackboard working together to solve a problem • Each expert writes contributions on blackboard • Experts watch problem evolve on blackboard, making changes until a solution is reached
Blackboard Systems • “Blackboard” is a central dataspace • Usually arranged in hierarchy so that input is at lowest level and output is at highest • “Experts” are called “knowledge sources” • KSs generally consist of a set of heuristics and a precondition whose satisfaction results in a hypothesis that is written on blackboard • Each KS forms hypotheses based on information from front end of system and hypotheses presented by other KSs
Blackboard Systems • Problem is solved when all KSs are satisfied with all hypotheses on blackboard to within a given margin of error • Eliminates need for global control module • Each KS can be easily updated and new KSs can be added with little difficulty • Combines top-down and bottom-up processing
Blackboard Systems • Music has a naturally hierarchal structure that lends itself well to blackboard systems • Allow integration of different types of expertise: • signal processing KSs at low level • human perception KSs at middle level • musical knowledge KSs at upper level
Blackboard Systems • Limitation: giving upper level KSs too much specialized knowledge and influence limits generality of transcription systems • Ideal system would not use knowledge above the level of human perception and the most rudimentary understanding of music • Current trend is to increase significance of upper-level musical KSs in order to increase success rate
Keith Martin (1996 a) • “A Blackboard System for Automatic Transcription of Simple Polyphonic Music” • Used a blackboard system to transcribe a four-voice Bach chorale with appropriate segregation of voices • Limited input signal to synthesized piano performances • Gave system only rudimentary musical knowledge, although choice of Bach chorale allowed the use of generally unacceptable assumptions by lower level KSs
Keith Martin (1996 a) • Front-end system used short-time Fourier transform on input signal • Equivalent to a filter bank that is a gross approximation the way the human cochlea processes auditory signals • Blackboard system fed sets of associated onset times, frequencies and amplitudes
Keith Martin (1996 a) • Knowledge sources made five classes of hierarchally organized hypotheses: • “Tracks” • Partials • Notes • Intervals • Chords
Keith Martin (1996 a) • Three types of knowledge sources: • Garbage collection • Physics • Musical practice • Thirteen knowledge sources in all • Each KS only authourized to make certain classes of hypotheses
Keith Martin (1996 a) • KSs with access to upper-level hypotheses can put “pressure” on KSs with lower-level access to make certain hypotheses and vice versa • Example: if the hypotheses have been made that the notes C and G are present in a beat, a KS with information about chords might put forward the hypothesis that there is a C chord, thus putting pressure on other KSs to find an E or Eb. • Used a sequential scheduler to coordinate KSs
Keith Martin (1996 b) • “Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing” • Previous system often misidentified octaves • Attempted to improve performance by shifting octave identification task from a top-down process to a bottom-up process
Keith Martin (1996 b) • Proposes the use of log-lag correlograms in front end • Models the inner hair cells in the cochlea with a bank of filters • Determines pitch by measuring the periodic energy in each filter channel as a function of lag • Correlograms now basic unit fed to blackboard system • No definitive results as to which approach is better
Kashino, Nadaki, Kinoshita and Tanaka (1995) • “Application of Bayesian Probability Networks to Music Scene Analysis” • Work slightly preceded that of Martin • Used test patterns involving more than one instrument • Uses principles of stream segregation from auditory scene analysis • Implements more high-level musical knowledge • Uses Bayesian network instead of Martin’s simple scheduler to coordinate KSs
Kashino, Nadaki, Kinoshita and Tanaka (1995) • Knowledge sources used: • Chord transition dictionary • Chord-note relation • Chord naming rules • Tone memory • Timbre models • Human perception rules • Used very specific instrument timbres and musical rules, so has limited general applicability
Kashino, Nadaki, Kinoshita and Tanaka (1995) • Tone memory: frequency components of different instruments played with different parameters • Found that the integration of tone memory with the other KSs greatly improved success rates
Kashino, Nadaki, Kinoshita and Tanaka (1995) • Bayesian networks well known for finding good solutions despite noisy input or missing data • Often used in implementing learning methods that trade off prior belief in a hypothesis against its agreement with current data • Therefore seem to be a good choice for coordinating KSs
Kashino, Nadaki, Kinoshita and Tanaka (1995) • No experimental comparisons of this approach and Martin’s simple scheduler • Only used simple test patterns rather than real music
Kashino and Hagita (1996) • “A Music Scene Analysis System with the MRF-Based Information Integration Scheme” • Suggests replacing Bayesian networks with Markov Random Field hypothesis network • Successful in correcting two most common problems in previous system: • Misidentification of instruments • Incorrect octave labelling
Kashino and Hagita (1996) • MRF-based networks use simulated annealing to converge to a low-energy state • MRF approach enables information to be integrated on a multiply connected hypothesis network • Bayesian networks only allow singly connected networks • Could now deal with two kinds of transition information within a single hypothesis network: • chord transitions • note transitions
Kashino and Hagita (1996) • Instrument and octave identification errors corrected, but some new errors introduced • Overall, performed roughly 10% better than Bayesian-based system at transcribing 3-part arrangement of Auld Lang Syne • Still only had a recognition rate of 71.7%
Kashino and Murase (1998) • Shifts some work away from blackboard system by feeding it higher-level information • Simplifies and mathematically formalizes notion of knowledge sources • Switches back to Bayesian network • Perhaps not truly a blackboard system anymore • Has very good recognition rate • Scalability of system is seriously compromised by new approach
Kashino and Murase (1998) • Uses adaptive template matching • Implemented using a bank of filters arranged in parallel and a number of templates corresponding to particular notes played by particular instruments • The correlation between the outputs of the filters is calculated and a match is then made to one of the templates
Kashino and Murase (1998) • Achieved recognition rate of 88.5% on real recordings of piano, violin and flute • Including templates for many more instruments could make adaptive template matching intractable • Particularly a problem for instruments with • Similar frequency spectra • A great deal of spectral variation from note to note
Hainsworth and Macleod (2001) • “Automatic Bass Line Transcription from Polyphonic Music” • Wanted to be able to extract a single given instrument from an arbitrary musical signal • Contrast to previous approaches of using recordings of only one instrument or a set of pre-defined instruments
Hainsworth and Macleod (2001) • Chose to work with bass • Can filter out high frequencies • Notes usually fairly steady • Used simple mathematical relations to trim hypotheses rather than a true blackboard system • Had a 78.7% success rate on a Miles Davis recording
Bello and Sandler (2000) • “Blackboard Systems and Top-Down Processing for the Transcription of Simple Polyphonic Music” • Return to a true blackboard system • Based on Martin’s implementation, using a conventional scheduler • Refines knowledge sources and adds high-level musical knowledge • Implements one of knowledge sources as a neural network
Bello and Sandler (2000) • The chord recognizer KS is a feedworard network • Trained using the spectrograph of different chords of a piano • Trained network fed a spectrograph and outputs possible chords • Can therefore output more than one hypothesis at each iteration • Gives other KSs more information and allows parallel exploration of solution space
Bello and Sandler (2000) • Could automatically retrain network to recognize spectrograph of other instruments with no manual modifications needed • Preliminary testing showed tendency to misidentify octaves and make incorrect identification of note onsets • These problems could potentially be corrected by signal processing system that feeds blackboard system
Conclusions • Bass transcription system and more recent work of Kashino useful for specific applications, but limited potential for general transcription purposes • True blackboard approach scales well and appears to hold the most potential for general-purpose polyphonic transcription
Conclusions • Use of adaptive learning in knowledge sources seems promising • Interchangeable modules could be automatically trained to specialize in different areas • Could have semi-automatic transcription, where user chooses correct modules and system performs transcription using them