180 likes | 304 Views
Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan. Audio Features. Four commonly used audio features Volume Pitch Zero crossing rate Timber Our goal
E N D
Basic Features of Audio Signals(音訊的基本特徵) Jyh-Shing Roger Jang (張智星) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan
Audio Features • Four commonly used audio features • Volume • Pitch • Zero crossing rate • Timber • Our goal • These features can be perceived subjectively. • But we need to compute them quantitatively for further processing and recognition.
Audio Features in Time Domain • Audio features presented in the time domain Fundamental period Intensity Timbre: Waveform within an FP
Audio Features in Frequency Domain • Volume: Magnitude of spectrum • Pitch: Distance between harmonics • Timber: Smoothed spectrum Second formant F2 Pitch freq First formant F1 Intensity
Demo: Real-time Spectrogram • Try “dspstfft_audio” under MATLAB: Spectrum: Spectrogram:
Steps for Audio Feature Extraction • Frame blocking • Frame duration of 20 ms or so • Feature extraction • Volume, zero-crossing rate, pitch, MFCC, etc • Endpoint detection • Usually based on volume & zero-crossing rate
Frame Blocking Overlap Sample rate = 11025 Hz Frame size = 256 samples Overlap = 84 samples (Hop size = 256-84) Frame rate = 11025/(256-84)=64 frames/sec Zoom in Frame
Intensity (I) • Intensity • Visual cue: Amplitude of vibration • Computation: • Volume: • Log energy (in decibel): • Characteristics • Influenced by • microphone types • Microphone setups • Perceived volume is influenced by frequency and timbre
Intensity (II) • To avoid DC drifting • DC drifting: The vibration is not around zero • Computation: • Volume: • Log energy (in decibel): • Theoretical background (How to prove?)
Intensity (III) • Examples • Please refer to the online tutorial
Pitch • Definition • Pitch is known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). • More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:
Pitch Computation (I) • Pitch of tuning forks
Pitch Computation (II) • Pitch of speech
Statistics of Mandarin Chinese • 5401 characters, each character is at least associated with a base syllable and a tone • 411 base syllables, and most syllables have 4 ones, so we have 1501 tonal syllables • Tone is characterized by the pitch curves: • Tone 1: high-high • Tone 2: low-high • Tone 3: high-low-high • Tone 4: high-low • Some examples of tones: • 1242:清華大學 • 1234:三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 • ?????:美麗大教堂、滷蛋有夠鹹(Taiwanese)
Sinusoidal Signals • How to generate a stream of sinusoidal signals fs=16000; duration=3; f=440; t=(1:fs*duration)/fs; y=0.8*sin(2*pi*f*t); plot(t,y); axis([0.6, 0.65, -1 1]); sound(y, fs);
Zero Crossing Rate • Zero crossing rate (ZCR) • The number of zero crossing in a frame. • Characteristics: • Noise and unvoiced sound have high ZCR. • ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. • To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.
ZCR Computations • Two types of ZCR definition • If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. • It affects the ZCR, especially when the sample rate is low. • Other consideration • Zero-justification is required. • ZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?)
ZCR • Examples • Please refer to the online tutorial.