E N D
1. SPEECH CODING &
APPLICATION
2. Introduction to Speech &
Waveform coder
By: Ahmed Mohamed Elshaer
3. What is the Speech? Speech is the primary method of human communication.
To transmit/store a speech waveform using as few bits as possible while retaining high quality
4. Why Speech Coding? speech coding systems is to transmit speech with the highest possible quality using the least possible channel capacity.
To save bandwidth in telecoms applications and to reduce memory storage requirements.
5. Speech Process: 1- Production:
6. Speech Process: 2- Propagation:
the sound waves propagate
through the air at
a speed of 300 m/s,
reaching the
listener’s ears.
7. Speech Process: 3-· Perception:
the incoming sounds are deciphered by the listener into a received message, thereby completing the chain of events that culminated in the transfer of information from the speaker to the listener.
8. The Vocal Tract:
9. The Vocal Tract:
10. Sources of Sound Energy: 1- Turbulence: air moving quickly through a small hole (e.g./s/ in “size”).
2- Explosion: pressure built up behind a blockage is suddenly released (e.g. /p/ in “pop”).
3- Vocal Fold Vibration: like the neck of a balloon (e.g./a/ in “hard”).
11. Speech Sound Categories: 1-Voiced: speech sound where the vocal tract folds vibrate.
2-Vowels: no blockage of the vocal tract and no turbulence
12. Speech Sound Categories: 3-Consonants: non-vowels.
4-Plosives: consonants involving an explosion
13. The Vocal Tract Filter
14. Speech Spectrograme: Ex: my speech
15. Speech Coding Hierarchy:
16. Characteristic of Speech Signals: 1-Probability Density Function(PDF):
the pdf of speech signal is in general characterized by a very high probability of near zero amplitudes, a significant probability of very high amplitudes
17. Characteristic of Speech Signals: 2-Autocorrelation Function (ACF):
The ACF gives a quantitative measure of the closeness or similarity between samples of a speech signal as a function of their time separation.
18. Characteristic of Speech Signals: 3-Power Spectral Function (PSD):
the nonflat characteristic of the power spectral density of speech makes it possible to obtain significant compression by coding speech in the frequency domain.
The SFM is defined as the ratio of the arithmetic to geometric mean of the samples of the PSD taken at uniform intervals in frequency .
19. Quantization Techniques
20. 1-Uniform Quantization Quantization is the process of mapping a continuous range of amplitudes of a signal into a finite set of discrete amplitudes.
Quantizers can be thought of as devices that remove the irrelevancies in the signal and uses n bit can have M=2^n levels.
The SQNR of a PCM encoder
21. 2-Non Uniform Quantization:
Nonuniform quantizers with the feature that the step-size increase as the separation from the origin of the input-output amplitude characteristic is increased
22. Non Uniform Quantization: Compression law:
In US ?-law
In Europe A-law
23. Non Uniform Quantization: Compander
Compressor Expandor
Compressor + Expandor = Compandor
24. 3-Adaptive Quantization: Adaptive quantization with forward estimation (AQF).
Adaptive quantization with backward estimation (AQB)
25. 4-Vector Quantization: The vector quantizer that use blocks of consecutive samples of the source output to form vectors
The vector is encoded by comparing it with codebook consisting of a set of stored reference vectors known as code vectors or patterns
the coded transmission rate in bits per sample
26. ADPCM PCM:speech to be encoded at a bit rate of 64 kbps
ADPCM: speech to be encoded at a bit rate of 32 kbps
G.721, CT2 and DECT
27. ADPCM
28. Frequancy Domain Coding: 1- Sub Band Coding:
divide the entire speech band into unequal sub bands that contribute equally to the articulation index
Sub band Number Frequency range
1 200-700 Hz
2 700-1310 Hz
3 1310-2020 Hz
4 2020-3200 Hz
Sub band coding can be used for coding speech at bit rates in the range 9.6 kbps to 32 kbps
29. Adaptive Transform Coding: encode speech at bit rates in range 9.6 kbps to 20 kbps.
which involves block transformations of windowed input segments of the speech waveform