1 / 34

Polyphonic Queries

This review covers recent research on polyphonic music queries, focusing on melodic fragments as queries, search requirements, monophonic and polyphonic databases, current polyphonic systems, and the use of N-grams for transposition-invariant searches. The text discusses various algorithms and their implementations to enhance search efficiency in music databases.

dswann
Download Presentation

Polyphonic Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Polyphonic Queries A Review of Recent Research by Cory Mckay

  2. Overview • Introduction to polyphonic music queries • Review of five recent systems • Conclusions

  3. Melodic Fragments as Queries • Melodic fragments are a useful way of searching electronic music databases • Convenient for humans to remember musical phrases • Easy to enter musical fragments • Monophonic: query by humming • Polyphonic: notation software

  4. Search Requirements • Want searches to return all occurrences of a given set of notes in a database • May also want records that contain similar sets of notes • Should be able to deal with incomplete or partially erroneous queries • Ideally, want to search music stored in • Symbolic formats (MIDI, scores, sketches) • Raw audio data

  5. Monophonic Databases • Has been some success with special case of monophonic databases • Andreas Kornstadt’s Themefinder • Roger J. MacNab’s Meledex • Searching polyphonic databases is much more difficult

  6. Polyphonic Databases: Problems • Notes may begin simultaneously. Makes it impossible to outline an unambiguous sequence of events • Multiple voices, with varying roles and relevance to particular queries • Hard to deal with both symbolic and raw audio representations: • Monophonic: can just transcribe audio • Polyphonic: no good transcription system

  7. Current Polyphonic Systems • Wide-spread systems rely on meta-data • This no good for searches of melodic fragments • Currently no widely accepted content-based system • A number of papers on topic have recently been published

  8. Wiggins et al. (2002) • Designed a new algorithm called SIA(M)ESE • For making transposition-invariant queries • Matches a query even if there are events in a score being searched that separate musical events in the query • Assumes that both queries and database files are in symbolic form and are accurate • This limits generality of algorithm • No implementation of the algorithm given

  9. Doraisamy & Rüger (2001) • Polyphonic queries • Partial queries permitted • Symbolic queries • Symbolic records • Pitch and rhythm features

  10. Doraisamy & Rüger (2001) • N-grams are produced by converting notes into interval-based representation and grouping intervals into subdivisions of length n using a gliding window • Leads to transposition-invariant data • N-grams work well with monophonic queries

  11. Doraisamy & Rüger (2001) • To deal with potential for simultaneous note onsets in polyphonic music, constructs exhaustive “melodic strings”: • Divide piece into overlapping windows of n adjacent onset times • Find all possible combinations of onsets within each window

  12. Doraisamy & Rüger (2001) • Incorporates rhythmic information as well as intervals into each n-gram window • Done by calculating ratio of time differences between adjacent note onsets: • Ratioi = (Onseti+2 – Onseti+1) / (Onseti+1 – Onseti) • Avoids need to quantize events based on a predetermined beat duration • By using onsets only, avoids needing to determine the duration of notes, which can be difficult to detect in raw audio recordings

  13. Doraisamy & Rüger (2001) • N-grams converted into text-based representations to allow use of text-based search engines • Interval and rhythmic ratio histograms constructed in order to search for patterns in each piece

  14. Doraisamy & Rüger (2001) • Tested using a database of 3096 MIDI representations of classical music • Studied effects of varying window sizes, bin ranges and query lengths • 95% success rate with window sizes of 4 onset times, variable bin ranges and queries involving 50 notes • 74% with query lengths of ten notes • 65% with queries containing errors

  15. Doraisamy & Rüger (2001) • Evaluation: • Performs well under ideal conditions • Databases or queries containing raw audio not considered • Only transposition invariant searches possible • Undesirably long query lengths necessary • Only classical music records tested • Successful in showing the potential utility of n-grams • Most recent work uses a variant of this n-gram approach

  16. Doraisamy & Rüger (2002) • Monophonic queries • Partial queries permitted • Symbolic queries • Symbolic records • Pitch and rhythm features

  17. Doraisamy & Rüger (2002) • Focused on monophonic queries because of query by humming • N-grams particularly appropriate for error-prone queries resulting from query by humming • One or two mistakes only lead to a few incorrect n-grams among a larger number of correct ones

  18. Doraisamy & Rüger (2002) • Used same overall design as previous system • Includes more sophisticated error models to test effects of query inaccuracies • Database expanded to include popular music as well as classical music • 80% of the relevant compositions were returned in first 15 hits, on average

  19. Doraisamy & Rüger (2002) • Evaluation • N-grams are an effective and error-tolerant tool for searching polyphonic music with monophonic queries • Improvements still need to be made • Queries still symbolic, so true applicability of system to query by humming not tested

  20. Pickens et al. (2002) • Polyphonic queries • Full-length queries only • Audio queries • Symbolic records • Harmonic features

  21. Pickens et al. (2002) • Relies on transcription algorithms to transform audio queries into symbolic form • Relies on error tolerance of searches to compensate for transcription errors • Two types of polyphonic transcription systems used, including blackboard

  22. Pickens et al. (2002) • Transcribed queries and database records analyzed and stored using a harmonic modelling module • Characterizes pieces by mapping chords to a probability distribution • Breaks into sequences of independent note sets • Applies smoothing procedure to sets • Markov models created from smoothed sets

  23. Pickens et al. (2002) • Performing searches: • Scoring function used to compare query models with each model stored in database • Dissimilarity scores produced • Allows search hits to be ranked

  24. Pickens et al. (2002) • Tested with database containing 3150 classical piano pieces • Queries consisted of full-length audio recordings • On average, searches assigned a rank of between 2 and 6 to the correct database record • Moderately successful at matching variations of a piece: on average, 3 of top 5 hits relevant

  25. Pickens et al. (2002) • Evaluation • Can use polyphonic audio queries • Limited by effectiveness of its transcription systems and error tolerance of the query system • System only tested on piano music • Full-length queries limit usefulness • Only one feature used

  26. Song, Bae & Yoon (2002) • Monophonic queries • Partial queries permitted • Audio queries • Audio records • Melodic features

  27. Song, Bae & Yoon (2002) • Intended to be used with query by humming • Avoids disadvantages of automated transcription by mapping audio data directly to “mid-level” melody-based feature set description • Contrasts with approach of first transcribing audio data and then extracting features from this high-level symbolic representation

  28. Song, Bae & Yoon (2002) • “Mid-level” representation produced by processing audio frames using a five-step process

  29. Song, Bae & Yoon (2002) • Instead of making definite decision on which notes were present, as a blackboard system would have done, vectors of all possible notes were kept for each audio segment • A DP-matching method was used during searches so that potentially error-prone patterns of different lengths could be compared

  30. Song, Bae & Yoon (2002) • Tested by attempting to match 176 hummed samples to 92 short extracts (15-20 seconds long) of popular Korean and Western popular songs • Resulted in exact matches roughly 43% of the time and a match in the top ten from 69% to 76% of the time (depending on window size) • Search time varied from 3 to 14 seconds

  31. Song, Bae & Yoon (2002) • Evaluation • Performance relatively poor • Only monophonic queries possible • Only tested using a database of short recordings • This was only system using audio data for both queries and database records

  32. Conclusions • No truly viable systems produced yet • Promising approaches have been proposed • Recurring problems: • None of systems can deal with both symbolic and audio data • None have been tested with both polyphonic and monophonic queries • Tend to require long queries to achieve good results • Searches do not allow much flexibility (e.g. must be transposition invariant)

  33. Conclusions • Difficult to compare systems because they each use different performance evaluation metrics • Limited data sets used during testing

  34. Conclusions • Possible improvement: use a greater number of feature classes • Aside from Doraisamy & Rüger, all systems discussed limited themselves to one feature class • Harmonic, melodic, timbral and rhythm-based features could all prove useful • Would allow much more flexible searches

More Related