290 likes | 453 Views
„Bandwidth Extension of Speech Signals“. 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June 2005 - Mainz, Germany. Bernd Iser biser@harmanbecker.com. Contents. Motivation Model for Speech Production Process Bandwidth Extension
E N D
„Bandwidth Extension ofSpeech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June 2005 - Mainz, Germany Bernd Iser biser@harmanbecker.com
Contents • Motivation • Model for Speech Production Process • Bandwidth Extension • Generation of the excitation signal • Non-linear characteristics • Results using non-linear characteristics • Generation of the spectral envelope • Codebook approach • Neural network approach • Linear mapping approach • Power adjustment • Current Results • Audio samples • Outlook 2nd Workshop on Wideband Speech Quality - June 2005
Original audio signal: Band limited audio signal: Motivation Problem: Degradation of speech quality due to suppression/cancelation of frequency bands (e.g., transmission over telephone network) But: In most cases environment provides more bandwidth (e.g., - MOST-bus: 11025 Hz sampling rate or - GSM: 8000 Hz sampling rate) Idea: Extrapolate missing frequency components out of bandlimited signal Advantage: Network as well as transmission system can remain unchanged 2nd Workshop on Wideband Speech Quality - June 2005
Model gain Input signal Output signal Power adjustment Excitation signal (source) Removing spectral envelope Excitation signal extension Phase manipulation Band stop Spectral envelope (filter) Narrowband parameters Envelope estimation Generation of the Excitation Signal Block diagram of BWE: 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Excitation Signal Generation of a „broadband“ excitation signal: • Extension of pitch structure in case of voiced sounds. • Generation of a noise like excitation signal in case of unvoiced sounds. 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Excitation Signal Approaches for the generation of a „broadband“ excitation signal: • „Harmonic Modeling“ • Placing spectral components (pitch, voicing) • Function generators: sine (pitch, voicing), noise, ... • Shifting / modulation approaches (frequency / time domain) • Fixed • Pitch adaptive (requires pitch analysis!) • Application of non-linear characteristics • Piecewise defined characteristics (distributions): halfway-, fullway-rectification, saturation ... • Quadratic-, cubic-, tanh-,... characteristics (functions) 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Excitation Signal Application of a non-linear characteristic: Applied to a har-monic signal filtered by a bandpass the resulting signal shows the missing harmonics. Notice the aliasing in the upper frequencies. 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Excitation Signal Application of a non-linear characteristic: If the input signal is upsampled (e.g., by the factor of 4) before the half-way rectification is performed, almost no aliasing can be observed after lowpassfiltering and downsampling. 2nd Workshop on Wideband Speech Quality - June 2005
Predictor error filter Generation of the Excitation Signal Application of a cubic characteristic in the time domain: • Predictor error filtering for extracting the excitation signal 2nd Workshop on Wideband Speech Quality - June 2005
Model gain Input signal Output signal Power adjustment Excitation signal (source) Removing spectral envelope Excitation signal extension Phase manipulation Band stop Spectral envelope (filter) Narrowband parameters Envelope estimation Generation of the Spectral Envelope 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope • Extension of spectral envelope. • Placing formants of estimated envelope where broadband formants are. 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information: • Codebook • „Narrowband“ and „broadband“ codebook trained jointly using envelopes of wideband data and bandlimited counterparts • Weight codebook entries with inverse distance to input envelope and sum them up (LSF) • Possibility of including other features than spectral envelope in „narrowband“ codebook using a special distance measure • Codebook approach as classification stage with post processing by e.g., neural network or linear mapping • Can be implemented taking predecessor and successor into account 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information: • Neural network • Exploit quasy-stationarity of speech by using a memory • Feeding NN with other features than just spectral envelope • Various architectures and training algorithms • Can be used as post processing after codebook classification 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope Approaches for the generation of a „broadband“ spectral envelope out of the „narrowband“ information: • Linear mapping • Can be implemented taking predecessor and successor into account • Can be used as post processing after codebook classification 2nd Workshop on Wideband Speech Quality - June 2005
„Narrowband“ codebook „Broadband“ codebook Weighting the codebook entries with the „inverse“ distance Comparison (distance measure) Output of „broadband“ counterpart Envelope input signal Generation of the Spectral Envelope Codebook: 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope Computation of the output LSFs: With N being the LSF order and M the codebook size, respectively 2nd Workshop on Wideband Speech Quality - June 2005
Likelihood ratio distance measure: Generation of the Spectral Envelope Training of codebook (LBG-algorithm): • Initialising: Compute the centroid for the whole training data. • Splitting: Each centroid is splitted into two near vectors by the application of a perturbance. • Quantization: The whole training data is assigned to the centroids by the application of a certain distance measure and afterwards the centroids are calculated again. Step 3 is executed again and again until the result doesn‘t show any significant changes. • Is the desired codebook size reached => abort. Otherwise continue with step 2. Spectral distortion: City block distance Euclidean distance Minkowski distance 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope Linear Mapping: Narrowband input features (LPC, CC, LSF): Broadband input features (LPC, CC, LSF): Aim to find mapping matrix: Optimization criterion: Leads to optimal mapping matrix: 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope 2nd Workshop on Wideband Speech Quality - June 2005
Generation of the Spectral Envelope Linear Mapping as post processing algorithm after codebook classification: Note that this principle can be applied to other approaches. E.g., one could exchange the multiplication with the linear mapping matrix with the processing by a neural network which has been trained corresponding to the classification to the respective codebook entry. 2nd Workshop on Wideband Speech Quality - June 2005
Model gain Input signal Output signal Power adjustment Excitation signal (source) Removing spectral envelope Excitation signal extension Phase manipulation Band stop Spectral envelope (filter) Narrowband parameters Envelope estimation Power Adjustment 2nd Workshop on Wideband Speech Quality - June 2005
Power Adjustment Power comparison: Computation of the gain out of the ratio of the power of the extended signal to the input signal within the telephone band 2nd Workshop on Wideband Speech Quality - June 2005
Current Results Setup used to produce results: • Database • TIMIT processed with WM NetSim tool (training, english) • Phone filter / GSM / phone filter • Algorithm • Excitation signal • Lower part extended using half way rectification • Higher part extended using half way rectification • Spectral envelope • Codebook classification using 64 entries • Post processing with linear mapping 2nd Workshop on Wideband Speech Quality - June 2005
Current Results Audio samples: 2nd Workshop on Wideband Speech Quality - June 2005
Outlook Outlook on future work: • Integration of additional features into codebook training • Pitch information • Information on „voicedness“ • Add „comfort-noise“ • Training of neural network • Using additional features • In combination with codebook 2nd Workshop on Wideband Speech Quality - June 2005
Thank you for your attention! 2nd Workshop on Wideband Speech Quality - June 2005