230 likes | 424 Views
Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan. Audio Features. Four commonly used audio features Volume, pitch, zero crossing rate, timbre Our goal
E N D
Basic Features of Audio Signals(音訊的基本特徵) Jyh-Shing Roger Jang (張智星) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan
Audio Features • Four commonly used audio features • Volume, pitch, zero crossing rate, timbre • Our goal • These features can be perceived (more or less) subjectively. • Our goal is to compute them quantitatively (and objectively) for further processing and recognition.
General Steps for Audio Analysis • Frame blocking • Frame duration of 20~40 ms or so • Frame-based feature extraction • Volume, zero-crossing rate, pitch, MFCC, etc • Frame-based Analysis • Pitch vector for QBSH comparison • MFCC for HMM evaluation • …
Frame Blocking Overlap Zoom in Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec Frame
Audio Features in Time Domain • Time-domain audio features presented in a frame (analysis window) Fundamental period Intensity Timbre: Waveform within an FP
Audio Features in Frequency Domain • Frequency-domain audio features in a frame • Energy: Sum of power spectrum • Pitch: Distance between harmonics • Timbre: Smoothed spectrum Second formant F2 Pitch freq First formant F1 Energy
Frame-based Manipulation • For simplicity, we usually pack frames into a frame matrix for easy manipulation in MATLAB: • [y, fs, nbits] = wavread(‘file.wav’); • frameMat = enframe(y, frameSize, overlap); … frameMat = Frame n Frame 1 Frame 2
Volume (I) • Loudness of audio signals • Visual cue: Amplitude of vibration • Also known as energy or intensity • Two major ways of computing volume: • Volume: • Log energy (in decibel):
Volume (II) • Perceived volume is influenced by • Frequency (example shown later) • Timbre (example shown later) • Computed volume is influenced by • Microphone types • Microphone setups
Volume (III) • To avoid DC bias (or DC drifting) • DC bias: The vibration is not around zero • Computation: • Volume: • Log energy (in decibel): • Theoretical background (How to prove them?)
Volume (IV) • Functions for computing volume • Example: volume01 • Example: volume02 • Example: volume03 • Volume depends on… • Frequency • Equal loudness test • Timbre • Example: volume04
Zero Crossing Rate • Zero crossing rate (ZCR) • The number of zero crossing in a frame. • Characteristics: • Zero-justification is required. • Noise and unvoiced sound have high ZCR. • ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. • To distinguish noise/silence from unvoiced sound, usually we add a shift before computing ZCR.
ZCR Computations • Two types of ZCR definitions • If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. • The distinction diminishes when using a higher bit resolution. • Other consideration • ZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?)
ZCR • ZCR computing • Example: zcr01 • Example: zcr02 • To use ZCR to distinguish between unvoiced sounds and environmental noise • Example: Example: zcrWithShift
Pitch • Definition • Pitch is also known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). • Unit • More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz: Piano roll via HTML5
Pitch Computation (I) • Pitch of tuning forks (code)
Pitch Computation (II) • Pitch of speech(code)
Pitch Perception • Age-related hearing loss • As one grows old, the audible frequency bandwidth is getting narrower • Mosquito ringtone • Low to high, high to low • Applications • Frequencies vs. ages 21k 17.4k 15k 12k 8k
Tones in Mandarin Chinese • 5401 characters, each character is at least associated with a base syllable and a tone • 411 base syllables, and most syllables have 4 tones, so we have 1501 tonal syllables • Tone is characterized by the pitch curves: • Tone 1: high-high • Tone 2: low-high • Tone 3: high-low-high • Tone 4: high-low • Some examples of tones: • 1234:三民主義、優柔寡斷 • 3333:勇猛果敢 (tone sandhi) • ?????:美麗大教堂、滷蛋有夠鹹(Taiwanese)
Timbre • Timbre is represented by • Waveform within a fundamental period • Frame-based energy distribution over frequencies • Power spectrum (over a single frame) • Spectrogram (over many frames) • Frame-based MFCC (mel-frequency cepstral coefficients)
Timbre/Pitch Demo:Real-time Spectrogram • Simulink model for real-time display of spectrogram • dspstfft_audio (Before MATLAB R2011a) • dspstfft_audioInput (R2012a or later) Spectrum: Spectrogram: