250 likes | 747 Views
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec. Presented by Peter. AMR Narrow Band. Adaptive Multi-Rate Codec for narrow band speech (AMR-NB) Specified by 3GPP for GSM/3G Systems Input: 8 kHz sampling rate, 13-bit PCM 20 ms frames, no overlap 8 modes + Comfort noise
E N D
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec Presented by Peter
AMR Narrow Band • Adaptive Multi-Rate Codec for narrow band speech (AMR-NB) • Specified by 3GPP for GSM/3G Systems • Input: 8 kHz sampling rate, 13-bit PCM • 20 ms frames, no overlap • 8 modes + Comfort noise • Output bitrate from 4.75 – 12.2 kbps • Algebraic Code Excited Linear Prediction (ACELP) is used as speech codec
Speech Encoder • Pre-processing • Linear prediction analysis and quantization • Open-loop pitch analysis • Impulse response computation • Target signal computation • Adaptive codebook • Algebraic codebook • Quantization of the adaptive and fixed codebook gains • Memory update
Principles of the adaptive multi-rate speech encoder • Eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s • 10th order linear prediction (LP), or short‑term, synthesis filter is used which is given by • The long‑term, or pitch, synthesis filter is given by • The pitch synthesis filter is implemented using adaptive codebook approach
Pre-Processing • Two pre‑processing functions • high‑pass filtering • signal down‑scaling – prevent overflow • A filter with a cut off frequency of 80 Hz is used
Linear Prediction Analysis • Frame is spit into four sub-frames • 12.2 kbit/s mode • Performed twice per frame • 30ms asymmetric window • No lookahead • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s • Performed once per frame • 30ms asymmetric window • 5ms lookahead
Windowing and Auto-correlation Computation • 12.2 kbit/s mode • Two different asymmetric windows • 1st window concentrates on 2nd sub-frame • 2nd window concentrates on 4th sub-frame
Windowing and Auto-correlation Computation • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s • One asymmetric windows • Concentrates on 4th sub-frame • 5ms (40 samples) lookahead
Auto-correlation Computation • Lag 0 to 10 is computed • is the windowed speech • 60 Hz bandwidth expansion is used by lag windowing • is multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at ‑40 dB
Levinson‑Durbin algorithm • by solving the set of equations • uses the following recursion: • The final solution is given as
LP to LSP conversion • The LP filter coefficients, are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes • The LSPs are defined as the roots of the sum and difference polynomials • All roots of these polynomials are on the unit circle and they alternate each other • z=-1 and 1 are eliminated
Quantization of the LSP coefficients • 12.2 kbit/s mode • Two sets of LSP are quantified using the representation in the frequency domain • 1st order MA prediction is applied • two residual LSF vectors are jointly quantified using split matrix quantization (SMQ) • weighted LSP distortion measure is used in the quantization process • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes • 1st order MA prediction is applied • residual LSF vector is quantified using split vector quantization • weighted LSP distortion measure
Interpolation of the LSPs • 12.2 kbit/s mode • interpolated LSP vectors at the 1st and 3rd subframes are given by • 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes • interpolated LSP vectors at the 1st, 2nd, and 3rd subframes are given by
original weighted unit circle Open‑loop pitch analysis • Performed twice per frame (each 10 ms) for 12.2k, 10.2k, 7.95k, 7.40, 6.70k, 5.90k bit/s modes • Performed once per frame for 5.15k, 4.75k bit/s modes • Filtering the pre-processed signal with a perceptual weighting filter Flat: Tilted:
Impulse response computation • The impulse response, h(n) is computed each subframe • For the search of adaptive and fixed codebooks • Computed by filtering the vector of coefficients of the filter extended by zeros through the two filters and
Adaptive codebook • Adaptive codebook search is performed on a subframe basis • The parameters are the delay and gain of the pitch filter • The codebook contain entries taken from the previously synthesized excitation signal
Algebraic codebook • Encode the random portion of the excitation signal • The periodic portion of the weighted residual is first removed. Only the random portion is remained to be coded by fixed codebook • Codebook search by minimize error between perceptual weighted input speech and reconstructed speech • Based on interleaved single-pulse permutation (ISPP) design • A few sparse impulse sequence that are phase-shifted version of each other • All the pulses have the same magnitude • Amplitudes are +1 or -1
Speech decoder • Codebook parameter are decoded by table look up • LSP coefficients are interpolated and converted to LP coefficients • Excitation = sum of adaptive and fixed codebook vectors multiplied by their respective gains in each subframe • Speech = excitation through vocal tract filter. • Enhanced perceived quality by adaptive post-filtering.
Synthesis model • To reconstruct speech • A noise-like speech • A pitch filter model of the glottal vibrations • A linear prediction filter model of the vocal tract
Post‑processing • Adaptive post-filtering • Cascade of two filters: a format postfilter and a tilt compensation filter • Updated every subframe of 5 ms • High-pass filter • Against undesired low frequency components • Cut-off frequency of 60 Hz is used • Up-scaling by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal