440 likes | 723 Views
Transcription by Beat-Boxing Elliot Sinyor - MUMT 611 Feb 17, 2005. Presentation . Introduction Background Making “beat-box” sounds Some Common Methods Related Work “Query-by-beat-boxing: Music Retrieval for the DJ” Kapur, Benning, Tzanetakis
E N D
Transcription by Beat-BoxingElliot Sinyor - MUMT 611Feb 17, 2005
Presentation • Introduction • Background • Making “beat-box” sounds • Some Common Methods • Related Work • “Query-by-beat-boxing: Music Retrieval for the DJ” • Kapur, Benning, Tzanetakis • “A Drum pattern Retrieval Method by Voice Percussion” • Nakano, Ogata, Goto, Hiraga • “Towards Automatic Transcription of Expressive Oral Percussive Performances” • Hazan • Project for MUMT 605
Introduction • Ways to input percussion: • Electronic Drums (Yamaha DD-5, Roland V-Drums) • Velocity-sensitive MIDI keyboard • Velocity-insensitive computer keyboard • Vocalized percussion: • Common practice - “beat-boxing”, tabla vocal notation • Few applications that explicitly use vocalized percussion as input.
Introduction • Uses • Method of percussion input, for composition or performance • Method of transcription, along with expressive information • Method of retrieving stored percussion samples
Background Plosives (aka Stops) • /t/ sound being made • Step 1: tongue at alveolar ridge behind teeth • Air builds up behind tongue • Step 2: tongue released, along with air.
Background Fricatives • /z/ sound being made (voiced) • /s/ sound being made. (unvoiced) • In both cases, the flow of air is constricted by the tongue and the alveolar ridge (right behind teeth). • Turbulence results in white noise sound.
Background • Why does this matter? • Plosives and Fricatives yield short signals (approximately 30 msec) • Noisy, non-deterministic signals • Varies greatly from person to person
Common Methods • Segment monophonic input stream (onset detection) • Distinguish between silence and “beats” • Analyse features • Temporal/Spectral features • Classify each sound based on features, and training data • (eg, ANN, minimum-distance criteria)
High-level diagram [XXX]
Analysis Features • Some Time-domain features: • Root Mean Squared (RMS) analysis - measure of energy level over a frame • Relative Difference Function (RDF) • used to determine perceptual onset • Zero-crossing Rate (ZCR) analysis - Used to estimate frequency components
Analysis Features • Some Frequency-Domain features: • Spectral Flux • Measure of change from 1 frame to another • Spectral Centroid • “center of gravity” • Mel-frequency Cepstral Coefficients • Compact and perceptually relevant way to model the spectrum
Analysis Features • Some Frequency-Domain features: • Spectral Flux • Measure of change from 1 frame to another • Spectral Centroid • “center of gravity” • Mel-frequency Cepstral Coefficients • Compact and perceptually relevant way to model the spectrum
Onset Detection Relative Difference Function • Klapuri (99):
Relative Difference Function • “This is psychoacoustically relevant, since perceived increase in signal amplitude is in relation to its level, the same amount of increase being more prominent in a quiet signal.” • Can be used to find the perceptual onset, whereas physical onset may occur earlier.
Relative Difference Function /p/ /t/
Relative Difference Function /k/ /s/
Time Domain - RMS • Can be used as a measure of a signal’s energy for a given frame of N samples. • Usable for perceptual onset detection? • Following figures: taken for N = 100 samples.
Zero-Crossing Rate • Quite simply how many times does the signal cross the zero mark in a given frame of samples? • Somewhat analogous to “frequency” • Should be used with a noise-gate for silent portions. • Gouyon et al. (2000)
Zero-Crossing Rate(loop1) N = 500 samples
Zero-Crossing Rate(loop2) N = 500 samples
Frequency-Domain Features Spectrogram: /s/ /k/ /t/ /p/
Frequency-Domain Features • Spectral Centroid (ie center of gravity): • For each frame: average frequency weighted by averages, divided by sum of amplitudes • The midpoint of spectral energy distribution • Can be used as a rough estimate of “brightness”
“Query-by-beat-boxing: Music Retrieval for the DJ” • Kapur, Benning, Tzanetakis (ISMIR 2004) • Identify drum sound being made • Induce tempo of beat • Match the beat-boxed input to a drum loop stored in a sample bank
“Query-by-beat-boxing: Music Retrieval for the DJ” Pre-processed targets (drum loops created in Reason) Used ZCR, spectral-centroid, spectral rolloff, LPC as features in a NN Experimented with features to determine most reliable feature set
“Query-by-beat-boxing: Music Retrieval for the DJ” • Bionic BeatBoxing Voice Processor • User provides 4 examples for each class of drum • User beat-boxes according to a click-track • Input beat is segmented, each sound is classified by ANN using ZCR. • Can play back, or use as input in MuseScape
“Query-by-beat-boxing: Music Retrieval for the DJ” • MuseScape • User enters tempo/style (eg Dub, Rnb, House) • Can use analyzed BeatBoxed loop
“A drum pattern retrieval method by voice percussion” • Nakano, Ogata, Goto, Hiraga (ISMIR 2004) • Use “onomatopea” to make monophonic bass-snare patterns • IOI (inter onset interval) compared to stored drum-sequences (all 4/4, 1 measure) • Allows for use of different consonants and vowels to make drum sounds
“A drum pattern retrieval method by voice percussion” • Typical onomatopeic expressions of drum sounds stored in pronunciation dictionary (eg Don, Ton, Zu) • Onomatopeic expression mapped to drum sound • Use MFCC as analysis feature
“Towards automatic Transcription of expressive oral percussive performances” • Hazan • Goal: to create symbolic representation of voice percussion that includes expressive features • Used 28 features (10 temporal, 18 spectral) • Tree-induction and Lazy Learning (k-NN) tested for accuracy.
“Classification of Unvoiced Plosives and Fricatives for Control of Percussion” • Sought to distinguish between /p/, /t/, /k/, /s/ sounds • Used 5 features and minimum-distance criteria to classify • Implemented in Matlab
“Classification of Unvoiced Plosives and Fricatives for Control of Percussion”
“Classification of Unvoiced Plosives and Fricatives for Control of Percussion”
“Classification of Unvoiced Plosives and Fricatives for Control of Percussion”
References • A. Kapur, M. Benning, and G. Tzanetakis, “Query-by-beat-boxing: music retrieval for the DJ”, Proc. Int. Conf. Music Information Retrieval (ISMIR), Barcelona, Spain, 2004. • T. Nakano, J. Ogata, M. Goto, Y. Hiraga, “A drum pattern retrieval method by voice percussion”, Proc. Int. Conf. Music Information Retrieval (ISMIR), Barcelona, Spain, 2004. • A. Hazan, “Towards automatic transcription of expressive oral percussive performances”, Proc of the 10th international conference on Intelligent user interfaces, San Diego, 2005. • F. Gouyon, F. Pachet, and O. Delerue, “On the use of zero-crossing rate for an application of classification of percussive sounds”, Proc. Of the COST G-6 Conf. on Digital Audio Effects (DAFX-00),Verona, Italy, 2000. • A. Klapuri, “Sound onset detection by applying psychoacoustic knowledge”, Proc. IEEE Int. Conf.Acoust., Speech, and Signal Proc (ICASSP), 1999.