340 likes | 351 Views
This review covers recent research on polyphonic music queries, focusing on melodic fragments as queries, search requirements, monophonic and polyphonic databases, current polyphonic systems, and the use of N-grams for transposition-invariant searches. The text discusses various algorithms and their implementations to enhance search efficiency in music databases.
E N D
Polyphonic Queries A Review of Recent Research by Cory Mckay
Overview • Introduction to polyphonic music queries • Review of five recent systems • Conclusions
Melodic Fragments as Queries • Melodic fragments are a useful way of searching electronic music databases • Convenient for humans to remember musical phrases • Easy to enter musical fragments • Monophonic: query by humming • Polyphonic: notation software
Search Requirements • Want searches to return all occurrences of a given set of notes in a database • May also want records that contain similar sets of notes • Should be able to deal with incomplete or partially erroneous queries • Ideally, want to search music stored in • Symbolic formats (MIDI, scores, sketches) • Raw audio data
Monophonic Databases • Has been some success with special case of monophonic databases • Andreas Kornstadt’s Themefinder • Roger J. MacNab’s Meledex • Searching polyphonic databases is much more difficult
Polyphonic Databases: Problems • Notes may begin simultaneously. Makes it impossible to outline an unambiguous sequence of events • Multiple voices, with varying roles and relevance to particular queries • Hard to deal with both symbolic and raw audio representations: • Monophonic: can just transcribe audio • Polyphonic: no good transcription system
Current Polyphonic Systems • Wide-spread systems rely on meta-data • This no good for searches of melodic fragments • Currently no widely accepted content-based system • A number of papers on topic have recently been published
Wiggins et al. (2002) • Designed a new algorithm called SIA(M)ESE • For making transposition-invariant queries • Matches a query even if there are events in a score being searched that separate musical events in the query • Assumes that both queries and database files are in symbolic form and are accurate • This limits generality of algorithm • No implementation of the algorithm given
Doraisamy & Rüger (2001) • Polyphonic queries • Partial queries permitted • Symbolic queries • Symbolic records • Pitch and rhythm features
Doraisamy & Rüger (2001) • N-grams are produced by converting notes into interval-based representation and grouping intervals into subdivisions of length n using a gliding window • Leads to transposition-invariant data • N-grams work well with monophonic queries
Doraisamy & Rüger (2001) • To deal with potential for simultaneous note onsets in polyphonic music, constructs exhaustive “melodic strings”: • Divide piece into overlapping windows of n adjacent onset times • Find all possible combinations of onsets within each window
Doraisamy & Rüger (2001) • Incorporates rhythmic information as well as intervals into each n-gram window • Done by calculating ratio of time differences between adjacent note onsets: • Ratioi = (Onseti+2 – Onseti+1) / (Onseti+1 – Onseti) • Avoids need to quantize events based on a predetermined beat duration • By using onsets only, avoids needing to determine the duration of notes, which can be difficult to detect in raw audio recordings
Doraisamy & Rüger (2001) • N-grams converted into text-based representations to allow use of text-based search engines • Interval and rhythmic ratio histograms constructed in order to search for patterns in each piece
Doraisamy & Rüger (2001) • Tested using a database of 3096 MIDI representations of classical music • Studied effects of varying window sizes, bin ranges and query lengths • 95% success rate with window sizes of 4 onset times, variable bin ranges and queries involving 50 notes • 74% with query lengths of ten notes • 65% with queries containing errors
Doraisamy & Rüger (2001) • Evaluation: • Performs well under ideal conditions • Databases or queries containing raw audio not considered • Only transposition invariant searches possible • Undesirably long query lengths necessary • Only classical music records tested • Successful in showing the potential utility of n-grams • Most recent work uses a variant of this n-gram approach
Doraisamy & Rüger (2002) • Monophonic queries • Partial queries permitted • Symbolic queries • Symbolic records • Pitch and rhythm features
Doraisamy & Rüger (2002) • Focused on monophonic queries because of query by humming • N-grams particularly appropriate for error-prone queries resulting from query by humming • One or two mistakes only lead to a few incorrect n-grams among a larger number of correct ones
Doraisamy & Rüger (2002) • Used same overall design as previous system • Includes more sophisticated error models to test effects of query inaccuracies • Database expanded to include popular music as well as classical music • 80% of the relevant compositions were returned in first 15 hits, on average
Doraisamy & Rüger (2002) • Evaluation • N-grams are an effective and error-tolerant tool for searching polyphonic music with monophonic queries • Improvements still need to be made • Queries still symbolic, so true applicability of system to query by humming not tested
Pickens et al. (2002) • Polyphonic queries • Full-length queries only • Audio queries • Symbolic records • Harmonic features
Pickens et al. (2002) • Relies on transcription algorithms to transform audio queries into symbolic form • Relies on error tolerance of searches to compensate for transcription errors • Two types of polyphonic transcription systems used, including blackboard
Pickens et al. (2002) • Transcribed queries and database records analyzed and stored using a harmonic modelling module • Characterizes pieces by mapping chords to a probability distribution • Breaks into sequences of independent note sets • Applies smoothing procedure to sets • Markov models created from smoothed sets
Pickens et al. (2002) • Performing searches: • Scoring function used to compare query models with each model stored in database • Dissimilarity scores produced • Allows search hits to be ranked
Pickens et al. (2002) • Tested with database containing 3150 classical piano pieces • Queries consisted of full-length audio recordings • On average, searches assigned a rank of between 2 and 6 to the correct database record • Moderately successful at matching variations of a piece: on average, 3 of top 5 hits relevant
Pickens et al. (2002) • Evaluation • Can use polyphonic audio queries • Limited by effectiveness of its transcription systems and error tolerance of the query system • System only tested on piano music • Full-length queries limit usefulness • Only one feature used
Song, Bae & Yoon (2002) • Monophonic queries • Partial queries permitted • Audio queries • Audio records • Melodic features
Song, Bae & Yoon (2002) • Intended to be used with query by humming • Avoids disadvantages of automated transcription by mapping audio data directly to “mid-level” melody-based feature set description • Contrasts with approach of first transcribing audio data and then extracting features from this high-level symbolic representation
Song, Bae & Yoon (2002) • “Mid-level” representation produced by processing audio frames using a five-step process
Song, Bae & Yoon (2002) • Instead of making definite decision on which notes were present, as a blackboard system would have done, vectors of all possible notes were kept for each audio segment • A DP-matching method was used during searches so that potentially error-prone patterns of different lengths could be compared
Song, Bae & Yoon (2002) • Tested by attempting to match 176 hummed samples to 92 short extracts (15-20 seconds long) of popular Korean and Western popular songs • Resulted in exact matches roughly 43% of the time and a match in the top ten from 69% to 76% of the time (depending on window size) • Search time varied from 3 to 14 seconds
Song, Bae & Yoon (2002) • Evaluation • Performance relatively poor • Only monophonic queries possible • Only tested using a database of short recordings • This was only system using audio data for both queries and database records
Conclusions • No truly viable systems produced yet • Promising approaches have been proposed • Recurring problems: • None of systems can deal with both symbolic and audio data • None have been tested with both polyphonic and monophonic queries • Tend to require long queries to achieve good results • Searches do not allow much flexibility (e.g. must be transposition invariant)
Conclusions • Difficult to compare systems because they each use different performance evaluation metrics • Limited data sets used during testing
Conclusions • Possible improvement: use a greater number of feature classes • Aside from Doraisamy & Rüger, all systems discussed limited themselves to one feature class • Harmonic, melodic, timbral and rhythm-based features could all prove useful • Would allow much more flexible searches