330 likes | 584 Views
Speech Coding Basics. A Tutorial. Mahdi Amiri Supervisor Dr. H. R. Rabiee April 2009 Sharif University of Technology. Speech Coding. A road map. PCM DPCM ADPCM LPC CELP. Pulse-code Modulation (PCM). Basics. Digital Representation of an Analog Signal Sampling and Quantization
E N D
Speech Coding Basics A Tutorial Mahdi Amiri Supervisor Dr. H. R. Rabiee April 2009 Sharif University of Technology
Speech Coding A road map • PCM • DPCM • ADPCM • LPC • CELP Speech Coding Basics
Pulse-code Modulation (PCM) Basics • Digital Representation of an Analog Signal • Sampling and Quantization • Parameters: • Sampling Rate (Samples per Second) • Quantization Levels (Bits per Sample) Speech Coding Basics
Pulse-code Modulation (PCM) Why Call it PCM? 4-bit PCM Speech Coding Basics
Pulse-code Modulation (PCM) Bit per Second (bit/s) • How to choose proper… • Sampling Rate • 8 Khz ? • Quantization Level • 8 bit/sample ? • Bit per Second for 8000 Hz 8 bit PCM • 64 kbit/s Speech Coding Basics
Pulse-code Modulation (PCM) Sampling Rate • Human Hearing Frequency Range • 20 Hz to 20 kHz • Play with “HearTest” to test your hearing • Most people will find that their hearing is most sensitive around 1-4 kHz and that it is less sensitive at high and low frequencies. Speech Coding Basics
Pulse-code Modulation (PCM) Hearing Range Speech Coding Basics
Pulse-code Modulation (PCM) Sampling Rate • Human Vocal Range • Normal: 80 Hz to 1100 Hz • Charles Kellogg (14 KHz) (not verified) • Guinness Book of Records • Female: Georgia Brown • (Eight octaves, 25087Hz) • Male: Tim Storms • (Six octaves) Speech Coding Basics
Pulse-code Modulation (PCM) Common Sampling Rates • 8,000 Hz: Telephone, adequate for human speech • 11,025 Hz • 22,050 Hz – radio • 32,000 Hz - miniDV digital video camcorder, DAT (LP mode) • 44,100 Hz - audio CD, also most commonly used with MPEG-1 audio (VCD, SVCD, MP3) • 48,000 Hz - digital sound used for miniDV, digital TV, DVD, DAT, films and professional audio • 96,000 or 192,000 Hz - DVD-Audio, some LPCM DVD tracks, BD-ROM (Blu-ray Disc) audio tracks, and HD-DVD (High-Definition DVD) audio tracks • 2.8224 MHz - SACD, 1-bit sigma-delta modulation process known as Direct Stream Digital, co-developed by Sony and Philips” Speech Coding Basics
Pulse-code Modulation (PCM) Quantization Levels • Want to prevent human ear fatigue by minimizing quantization noise • Signal-to-Noise Ratio = 6.02B dB • SNR is approximately 6 dB per bit. • 16-bit => 96 dB • Above 36 dB is required Speech Coding Basics
Pulse-code Modulation (PCM) Good to Know • The average person cannot tell the difference between a bitrate above 192 kbit/s and the original CD/WAV. • Even if your headphones seal really well around your ears, they will probably only give you about 20 to 25 dB insulation from the external sound. Speech Coding Basics
Pulse-code Modulation (PCM) Images Speech Coding Basics
Pulse-code Modulation (PCM) u-law, a-law • Nonuniform quantizers: Difficult to make, Expensive. • Solution: Companding Uniform Q. Expanding Speech Coding Basics
Pulse-code Modulation (PCM) U-law, A-law Speech Coding Basics
Pulse-code Modulation (PCM) u-law, a-law Europe North America and Japan Speech Coding Basics
Differential PCM (DPCM) Idea Speech Coding Basics
Differential PCM (DPCM) Basic Scheme General Predictive Coding Problem? Speech Coding Basics
Differential PCM (DPCM) Better Structure Speech Coding Basics
Adaptive DPCM (ADPCM) Idea Problem? Speech Coding Basics
Adaptive DPCM (ADPCM) Size of Quantization Step Speech Coding Basics
Speech Compression Concepts Spectrogram, STFT 3D surface spectrogram of a part from a music piece. Speech Coding Basics
Speech Compression Concepts Spectrogram Spectrogram of a male voice saying ‘nineteenth century’. Speech Coding Basics
Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Face! Singing Voice Speech Coding Basics
Speech Compression Concepts Formant Speech Coding Basics
Linear Predictive Coding (LPC) Modeling Speech Coding Basics
Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 30 thr. 50 frames/sec. Speech = Formants + Residue Predictor for each frame: Speech Coding Basics
Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Speech Coding Basics
Code Excited Linear Prediction CELP • Problem of LPC • Where there is both Hiss and Buzz • Solution • Encode residue • Method • Vector Quantization (Codebook) Speech Coding Basics
Comparison Sample Speech A lathe is a big tool. Grab every dish of sugar. Speech Coding Basics
Comparison Demonstration Original ADPCM CELP LPC Speech Coding Basics
Speech Coding Basics A Tutorial Thank You FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.aictct.ir/dml/ Speech Coding Basics
Animated Title Title • Abc Speech Coding Basics
Title Title • Abc Definition ofVanishing Percentage (VP) Speech Coding Basics