1 / 18

Basic Features of Audio Signals ( 音訊的基本特徵 )

Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan. Audio Features. Four commonly used audio features Volume Pitch Zero crossing rate Timber Our goal

hollis
Download Presentation

Basic Features of Audio Signals ( 音訊的基本特徵 )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Features of Audio Signals(音訊的基本特徵) Jyh-Shing Roger Jang (張智星) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan

  2. Audio Features • Four commonly used audio features • Volume • Pitch • Zero crossing rate • Timber • Our goal • These features can be perceived subjectively. • But we need to compute them quantitatively for further processing and recognition.

  3. Audio Features in Time Domain • Audio features presented in the time domain Fundamental period Intensity Timbre: Waveform within an FP

  4. Audio Features in Frequency Domain • Volume: Magnitude of spectrum • Pitch: Distance between harmonics • Timber: Smoothed spectrum Second formant F2 Pitch freq First formant F1 Intensity

  5. Demo: Real-time Spectrogram • Try “dspstfft_audio” under MATLAB: Spectrum: Spectrogram:

  6. Steps for Audio Feature Extraction • Frame blocking • Frame duration of 20 ms or so • Feature extraction • Volume, zero-crossing rate, pitch, MFCC, etc • Endpoint detection • Usually based on volume & zero-crossing rate

  7. Frame Blocking Overlap Sample rate = 11025 Hz Frame size = 256 samples Overlap = 84 samples (Hop size = 256-84) Frame rate = 11025/(256-84)=64 frames/sec Zoom in Frame

  8. Intensity (I) • Intensity • Visual cue: Amplitude of vibration • Computation: • Volume: • Log energy (in decibel): • Characteristics • Influenced by • microphone types • Microphone setups • Perceived volume is influenced by frequency and timbre

  9. Intensity (II) • To avoid DC drifting • DC drifting: The vibration is not around zero • Computation: • Volume: • Log energy (in decibel): • Theoretical background (How to prove?)

  10. Intensity (III) • Examples • Please refer to the online tutorial

  11. Pitch • Definition • Pitch is known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). • More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:

  12. Pitch Computation (I) • Pitch of tuning forks

  13. Pitch Computation (II) • Pitch of speech

  14. Statistics of Mandarin Chinese • 5401 characters, each character is at least associated with a base syllable and a tone • 411 base syllables, and most syllables have 4 ones, so we have 1501 tonal syllables • Tone is characterized by the pitch curves: • Tone 1: high-high • Tone 2: low-high • Tone 3: high-low-high • Tone 4: high-low • Some examples of tones: • 1242:清華大學 • 1234:三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 • ?????:美麗大教堂、滷蛋有夠鹹(Taiwanese)

  15. Sinusoidal Signals • How to generate a stream of sinusoidal signals fs=16000; duration=3; f=440; t=(1:fs*duration)/fs; y=0.8*sin(2*pi*f*t); plot(t,y); axis([0.6, 0.65, -1 1]); sound(y, fs);

  16. Zero Crossing Rate • Zero crossing rate (ZCR) • The number of zero crossing in a frame. • Characteristics: • Noise and unvoiced sound have high ZCR. • ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. • To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.

  17. ZCR Computations • Two types of ZCR definition • If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. • It affects the ZCR, especially when the sample rate is low. • Other consideration • Zero-justification is required. • ZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?)

  18. ZCR • Examples • Please refer to the online tutorial.

More Related