Transcription by Beat-Boxing Elliot Sinyor - MUMT 611 Feb 17, 2005

Transcription by Beat-BoxingElliot Sinyor - MUMT 611Feb 17, 2005

Presentation • Introduction • Background • Making “beat-box” sounds • Some Common Methods • Related Work • “Query-by-beat-boxing: Music Retrieval for the DJ” • Kapur, Benning, Tzanetakis • “A Drum pattern Retrieval Method by Voice Percussion” • Nakano, Ogata, Goto, Hiraga • “Towards Automatic Transcription of Expressive Oral Percussive Performances” • Hazan • Project for MUMT 605

Introduction • Ways to input percussion: • Electronic Drums (Yamaha DD-5, Roland V-Drums) • Velocity-sensitive MIDI keyboard • Velocity-insensitive computer keyboard • Vocalized percussion: • Common practice - “beat-boxing”, tabla vocal notation • Few applications that explicitly use vocalized percussion as input.

Introduction • Uses • Method of percussion input, for composition or performance • Method of transcription, along with expressive information • Method of retrieving stored percussion samples

Background Plosives (aka Stops) • /t/ sound being made • Step 1: tongue at alveolar ridge behind teeth • Air builds up behind tongue • Step 2: tongue released, along with air.

Background Fricatives • /z/ sound being made (voiced) • /s/ sound being made. (unvoiced) • In both cases, the flow of air is constricted by the tongue and the alveolar ridge (right behind teeth). • Turbulence results in white noise sound.

Background • Why does this matter? • Plosives and Fricatives yield short signals (approximately 30 msec) • Noisy, non-deterministic signals • Varies greatly from person to person

Common Methods • Segment monophonic input stream (onset detection) • Distinguish between silence and “beats” • Analyse features • Temporal/Spectral features • Classify each sound based on features, and training data • (eg, ANN, minimum-distance criteria)

High-level diagram [XXX]

Analysis Features • Some Time-domain features: • Root Mean Squared (RMS) analysis - measure of energy level over a frame • Relative Difference Function (RDF) • used to determine perceptual onset • Zero-crossing Rate (ZCR) analysis - Used to estimate frequency components

Analysis Features • Some Frequency-Domain features: • Spectral Flux • Measure of change from 1 frame to another • Spectral Centroid • “center of gravity” • Mel-frequency Cepstral Coefficients • Compact and perceptually relevant way to model the spectrum

Analysis Features

Onset Detection Relative Difference Function • Klapuri (99):

Relative Difference Function • “This is psychoacoustically relevant, since perceived increase in signal amplitude is in relation to its level, the same amount of increase being more prominent in a quiet signal.” • Can be used to find the perceptual onset, whereas physical onset may occur earlier.

Relative Difference Function

Relative Difference Function /p/ /t/

Relative Difference Function /k/ /s/

Relative Difference Function loop2

Time Domain - RMS • Can be used as a measure of a signal’s energy for a given frame of N samples. • Usable for perceptual onset detection? • Following figures: taken for N = 100 samples.

RMS (/p/, /t/, /k/, /s/)N = 500 samples

Relative Difference Function loop1

Zero-Crossing Rate • Quite simply how many times does the signal cross the zero mark in a given frame of samples? • Somewhat analogous to “frequency” • Should be used with a noise-gate for silent portions. • Gouyon et al. (2000)

Zero-Crossing Rate(/p/, /t/, /k/, /s/)

Zero-Crossing Rate(loop1) N = 500 samples

Zero-Crossing Rate(loop2) N = 500 samples

Frequency-Domain Features Spectrogram: /s/ /k/ /t/ /p/

Frequency-Domain Features • Spectral Centroid (ie center of gravity): • For each frame: average frequency weighted by averages, divided by sum of amplitudes • The midpoint of spectral energy distribution • Can be used as a rough estimate of “brightness”

Spectral Centroid(/p/)

Spectral Centroid(/t/)

Spectral Centroid(/k/)

Spectral Centroid(/s/)

“Query-by-beat-boxing: Music Retrieval for the DJ” • Kapur, Benning, Tzanetakis (ISMIR 2004) • Identify drum sound being made • Induce tempo of beat • Match the beat-boxed input to a drum loop stored in a sample bank

“Query-by-beat-boxing: Music Retrieval for the DJ” Pre-processed targets (drum loops created in Reason) Used ZCR, spectral-centroid, spectral rolloff, LPC as features in a NN Experimented with features to determine most reliable feature set

“Query-by-beat-boxing: Music Retrieval for the DJ” • Bionic BeatBoxing Voice Processor • User provides 4 examples for each class of drum • User beat-boxes according to a click-track • Input beat is segmented, each sound is classified by ANN using ZCR. • Can play back, or use as input in MuseScape

“Query-by-beat-boxing: Music Retrieval for the DJ” • MuseScape • User enters tempo/style (eg Dub, Rnb, House) • Can use analyzed BeatBoxed loop

“A drum pattern retrieval method by voice percussion” • Nakano, Ogata, Goto, Hiraga (ISMIR 2004) • Use “onomatopea” to make monophonic bass-snare patterns • IOI (inter onset interval) compared to stored drum-sequences (all 4/4, 1 measure) • Allows for use of different consonants and vowels to make drum sounds

“A drum pattern retrieval method by voice percussion” • Typical onomatopeic expressions of drum sounds stored in pronunciation dictionary (eg Don, Ton, Zu) • Onomatopeic expression mapped to drum sound • Use MFCC as analysis feature

“Towards automatic Transcription of expressive oral percussive performances” • Hazan • Goal: to create symbolic representation of voice percussion that includes expressive features • Used 28 features (10 temporal, 18 spectral) • Tree-induction and Lazy Learning (k-NN) tested for accuracy.

“Classification of Unvoiced Plosives and Fricatives for Control of Percussion” • Sought to distinguish between /p/, /t/, /k/, /s/ sounds • Used 5 features and minimum-distance criteria to classify • Implemented in Matlab

“Classification of Unvoiced Plosives and Fricatives for Control of Percussion”

References • A. Kapur, M. Benning, and G. Tzanetakis, “Query-by-beat-boxing: music retrieval for the DJ”, Proc. Int. Conf. Music Information Retrieval (ISMIR), Barcelona, Spain, 2004. • T. Nakano, J. Ogata, M. Goto, Y. Hiraga, “A drum pattern retrieval method by voice percussion”, Proc. Int. Conf. Music Information Retrieval (ISMIR), Barcelona, Spain, 2004. • A. Hazan, “Towards automatic transcription of expressive oral percussive performances”, Proc of the 10th international conference on Intelligent user interfaces, San Diego, 2005. • F. Gouyon, F. Pachet, and O. Delerue, “On the use of zero-crossing rate for an application of classification of percussive sounds”, Proc. Of the COST G-6 Conf. on Digital Audio Effects (DAFX-00),Verona, Italy, 2000. • A. Klapuri, “Sound onset detection by applying psychoacoustic knowledge”, Proc. IEEE Int. Conf.Acoust., Speech, and Signal Proc (ICASSP), 1999.

Transcription by Beat-Boxing Elliot Sinyor - MUMT 611 Feb 17, 2005

Transcription by Beat-Boxing Elliot Sinyor - MUMT 611 Feb 17, 2005

Presentation Transcript

Monday, Feb 17

Feb. 17, 2011

Monday, Feb. 17, 2014

Presented by Elliot Robia

Lecture 17: RNA and Transcription

28 Feb 2005

Monday, Feb 17

Fermium by Elliot Rieflin

Mon. Feb. 17

Feb 2005 Sweeps

By Richard Carr + Elliot Callanan

Erbium PowerPoint by Elliot Rieflin

By Elliot age 8

Feb 2005 Sweeps

Quiz: Feb 4, 2005

feb.17

Friday , Feb. 17

UBS Motorsport Day Post-event Report 17-FEB-2005

15-17 Feb 11

15-17 Feb 11

Feb 17, 2015

28 Feb 2005