指導教授陳福坤學生葛書銓

Specifications for theAnalog to Digital Conversion of Voice by 2,400 Bit/Second Mixed Excitation Linear Prediction 指導教授陳福坤學生葛書銓

大綱 • 簡介 • 編碼器 • 解碼器

簡介 MELP (Mixed Excitation Linear Prediction) • 取代LPC-10 (FS-1015) • 以LPC作為模型基礎，並包含五項新特性: • mixed excitation • aperiodic pulses • adaptive spectral enhancement • pulse dispersion • Fourier magnitude modeling • 每個MELP音框為22.5ms 每音框含180個samples (8000 samples/s)

編碼器 Low frequency removal • 編碼過程中第一步，使用 Chebychev 高通濾波器，截止頻率60Hz和阻帶抑制(stopband rejection)30dB。

Integer Pitch Calculation Normalized Autocorrelation Function Point Of Maximum 1KHz LPF Integer Pitch ( )-current frame

Integer Pitch Calculation • = 40~160，計算Normalized Autocorrelation 正規化自相關函數的定義為 • 求出使正規化自相關函數最大的值作為Pitch 的第一個估計值

Bandpass Voicing Analysis --Fractional pitch -- current & last frame BPF 0-500Hz Pitch Refinement & Normalized Autocorrelation Function Point of Maximum BPF 500-1000HZ Full-Wave Rectifier + Smoothing Filter BPF 1000-2000Hz Full-Wave Rectifier + Smoothing Filter BPF 2000-3000Hz Full-Wave Rectifier + Smoothing Filter BPF 3000-4000Hz Full-Wave Rectifier + Smoothing Filter

Fractional Pitch Calculation • 當0-500Hz filter output signal: • 使用兩個分別為前後音框的integer pitch values ( )作為候選值 • 為real pitch & integer pitch 的差值 • 計算是對兩個候選值前5個到後5個samples , 用正規化自相關再做一次integer pitch search , 找到optimum integer pitch lag 後做 Fractional Pitch Refinement -- Fractional pitch 0-500Hz BPF Fractional Pitch Refinement Normalized Autocorrelation Function r( ) Voicing Analysis Point of Maximum --Current frame & last frame

Fractional Pitch Refinement • 假設Integer Pitch為 T 個樣點,假定我們求出的最大值發生在 = T ,最大值可能位於區間 T and T+1 或 T-1 and T 內,The fractional offset : = Fractional Pitch value 正規化自相關: 對兩個候選值各別計算出的Fractional Pitch & Normalized Autocorrelation value , 其中較大的作為current frame 的 fractional pitch , , =

Aperiodic Flag • 由 Bandpass Voicing Analysis 決定 if < 0.5 , Aperiodic Flag set 1 otherwise , Aperiodic Flag set 0 • 由旗標設定決定解碼器是否使用非週期性脈衝的激發源

Linear Prediction Analysis • 用200個samples(25ms)的漢明窗對輸入語音加窗進行10階線性預測分析 • 採用Levinson-Durbin 求解線性預測系數 (i=1,2,…,10)，然後對做 0.994(15Hz)帶寬擴展，也就是: Linear Prediction Residual Calculation • 輸入的語音信號經過線性預測分析後將10個LPC係數過濾後,為線性預測殘值信號 Voice strength LPC Aanlysis LPC Residual signal Peakiness & Adjust Vbp Final Pitch

Peakiness Calculation • 線性預測殘值信號的峰度(peakiness)定義為: • 峰度值超過1.34 ，則會被設定為1 峰度值超過1.6 ，則 (i=1,2,3)全會設為1

Final Pitch Calculation • 將殘值信號經過截止頻率為1KHz的低通濾波器 • 以為基準,從前5個到後5個samples , 用 Normalized Autocorrelation 做做Integer Pitch search • 找出optimum integer pitch lag 進行 Fractional Pitch Refinement , 得到的值暫定為Final Pitch &

之後經過Pitch Doubling Check 後 , 才會是準確的Final Pitch

no yes yes Doubling Check no yes Doubling Check END

Gain Calculation • 輸入信號的每一音框分成兩個子音框,分別計算和 • 計算增益的窗長會隨著變化而有所改變 • 每個子音框的增益為RMS的分貝值計算, 計算公式為: • 式中0.01是防止RMS值太接近零,若計算結果為負值,則將結果設為0

QUANTIZATION • Quantization of Prediction Coefficients • Pitch Quantization • Gain Quantization • Bandpass Voicing Quantization • Fourier Magnitude Calculation and Quantization

Quantization of Prediction Coefficients • 將10個線性預測係數轉化為 line spectrum frequencies (LSF's) • 10個LSF按照升冪排列,間隔至少為50Hz • LSF向量用 multi-stage vector quantizer (MSVQ)進行量化 LSF-- stage 128 levels Stage 64 levels Stage 64 levels Stage 64 levels LSF--

The algorithm is to find the quantized vector - and as seen in the above figure he is the sum of the vectors selected in each stage. The main purpose of the MSVQ is to find the quantized vector that will best represent the original LSF vector. In order to do so the MSVQ finds the codebook vector, which minimize the square of the weighted Euclidean distance, , between the original LSF and the quantized LSF vectors:

Pitch Quantization • 將Final Pitch value 進行99階平均量化 • 量化後對應於量化表中一個7bit 的 codeword.

Gain Quantization 每個Frame有兩個增益和 , 分別用3bit , 5bit 進行平均量化的範圍10dB~77dB

Bandpass Voicing Quantization • 若 , 表示unvoiced , (i=2,3,4,5)量化為0 • 若 , 且 (i=2,3,4,5)>0.6 , 則量化為1 , Otherwise 量化為 0 • 有一特例, 若 (i=2,3,4,5) 為 0001 , 則將量化為 0

Fourier Magnitude Calculation and Quantization • 先根據量化的LSF向量計算出量化的線性預測參數 • 利用量化的線性預測參數計算殘值信號 • 用200sample的漢明視窗補零後做512點的Fast Fourier Transform (FFT) • 將複數的FFT結果轉換成幅度值 • 利用spectral peak-picking algorithm 搜尋 first 10 pitch harmonics

解碼器 Pitch Decoding • decoding the 7-bit pitch code to determine if a frame is voiced, unvoiced, or whether a frame erasure is indicated • If the pitch code is all-zero or has only one bit set, then the unvoiced mode is used. If two bits are set, a frame erasure is indicated. Otherwise, the pitch value is decoded and the voiced mode is used. • If any erasure is detected in the current frame All of the parameters for the current frame are replaced with the parameters from the previous frame. In addition, the first gain term is set equal to the second gain

Parameter Interpolation • 由於每個Frame只會傳送一組參數 , 考慮到一個Frame內可能不止有一個pitch period,所以MELP的參數再合成時,都要進行pitch-synchronously .

Aperiodic Pulses • 由於MELP語音標準中,語音分成三種狀態 voiced , unvoiced , jitter voiced • 非週期脈衝激發源主要使用在voiced 和 unvoiced 語音交界, 用來合成jitter voiced , 能使得解碼器產生不穩定的 glottal pulses.

Mixed Pulse and Noise Excitation • 利用multi-band mixing model ,使用分成五個頻帶的FIR band-pass filter bank • 處理有聲成分的濾波器統稱為 Pulse Shaping Filter Bank 處理無聲成分的濾波器統稱為 Noise Shaping Filter Bank • 兩濾波器組會依據激發源每個頻帶的有聲和無聲傾向改變,將脈衝激發源的訊號利用代通濾波器套用到有聲/無聲的頻帶 • 將這五個頻帶的訊號合成便是所謂的混合激發源,利用這種方式可以大幅改善傳統LPC參數模型嚴重的buzz雜音

Adaptive Spectral Enhancement • 由於合成的語音衰減速度會比自然情況的人聲還快 , 因此造成失真 , 失真的原因是由於LPC pole bandwidth 所造成 • 為了解決失真的問題 , 在混合激勵信號產生後,會經過自適應頻譜增強濾波器濾波 • 此濾波器為一個10階層極點零點(pole/zero)加強濾波器,加上一個一階FIR濾波器進行補償 • 其目的是減少有共振點的頻帶與真實語音間的誤差 , 以減緩共振點響應衰減的速度

指導教授 陳福坤 學生 葛書銓