1 / 1

ADVANCES IN SOURCE-CONTROLLED VARIABLE BIT RATE WIDEBAND SPEECH CODING

Coding Type. Mode 0. Mode 1. Mode 2. Mode 3. Input. Generic FR. 93.4 %. 60.4 %. 34.1 %. -. Spectral Analysis. Interoperable FR. -. -. -. 100.0 %. Parameters. Voice Activity?. Generic HR. -. 7.1 %. 13.1 %. -. Noise Reduction. De-noised Input. Voiced HR. -. 13.0 %.

lucas
Download Presentation

ADVANCES IN SOURCE-CONTROLLED VARIABLE BIT RATE WIDEBAND SPEECH CODING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coding Type Mode 0 Mode 1 Mode 2 Mode 3 Input Generic FR 93.4 % 60.4 % 34.1 % - Spectral Analysis Interoperable FR - - - 100.0 % Parameters Voice Activity? Generic HR - 7.1 % 13.1 % - Noise Reduction De-noised Input Voiced HR - 13.0 % 33.2 % - • LP Analysis • Pitch Tracking Voice Activity Decision: Unvoiced HR 6.6 % 19.5 % 5.6 % - Unvoiced QR - - 14.0 % - Noise Estimation lower for noisy speech higher for clean speech Begin CNG Encoding or DTX No 1. Voice Activity? Yes Unvoiced Speech Optimized Encoding Yes 2. Unvoiced Frame? No Voiced Speech Optimized Encoding Yes 3. Voiced Frame? No Generic HR Encoding Yes 4. Low Energy? No Generic FR Encoding 0 2000 Hz ADVANCES IN SOURCE-CONTROLLED VARIABLE BIT RATE WIDEBAND SPEECH CODING Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno Bessette†, Philippe Gournay†‡, Claude Laflamme†, and Roch Lefebvre† †University of Sherbrooke, Sherbrooke, Canada - ‡VoiceAge Corp., Montréal, Canada - *Nokia inc., San Diego, USA • VMR-WB • Variable-Rate Multi-Mode Wideband Speech Codec • New 3GPP2 WB Speech Coding Standard for 3G applications • Main Features: • Near Face-to-Face Communication Speech Quality • Source and Channel Controlled Operation (4 Modes) • 3GPP/ITU AMR-WB Directly Interoperable in Mode 3 • Average Bit Rates (ABR): • Compliant with CDMA2000 Rate Set 2 • - 13.3 (FR), 6.2 (HR) , 2.7 (QR) or 1.0 (ER) kbit/s frames • WB (50-7000 HZ) and NB (200-3400 Hz) Input/Output • 20 ms Frames • Noise Reduction with Adjustable Maximum Reduction Encoder Flow Chart VMR-WB Coding Techniques • Source-Controlled Operation • Hierarchical Signal Classification • Operating on Frame-level 1.Voice Activity Detection (VAD) 2.Unvoiced Frame Decision Based on the following parameters: • Normalized Correlation T – open-loop pitch period estimate xi – perceptually weighted input signal • Spectral Tilt Eh– average energy of last 2 critical bands. El– average energy of pitch-synchronous bins in the first 10 critical bands • Frame Energy Variation • Noise Estimation Update Decision: • Based on parameters with low sensitivity to noise level: • Pitch period varying • AND normalized correlation at pitch period low • AND low estimated order of AR model • AND signal energy stationary • INDEPENDENT of VAD decision! • - Robust to noise level variations • - Conservative approach: the noise estimation is updated only if quite sure the frame is inactive E32(j) – energy maximum in a bloc of 32-samples • Relative Frame Energy - Erel Decision: 3.VoicedFrameDecision/SignalModification • Channel-Controlled Operation • 4 Operational Modes Controlled by Channel Conditions • Transparent Memory-less Mode Switching • Per-Frame Bit Rate Control Capability • Coding Types Relative Usage in Active Speech: • Mode Switching Performance: • Comparing MOS scores of modes 0, 1, 2 with random mode switching at 0.5, 1 and 5 second intervals (from characterization test) • Enhancements at Decoder • Low Frequency Post-processing: • Enhancement of the periodicity in low frequency region: Performance (MOS scoresfrom selection test) CDMA Specific Modes (Modes 0, 1, 2), WB Input 4.Low Energy Decision Performance (MOS scores from characterization test) • Voiced Decision is an Inherent Part of Original Signal Modification Algorithm • Frame is coded as voiced if all constraints of the modification are satisfied • Signal modification is done pitch-synchronously • Pitch period evolution is piecewise linear (constant at frame end) to avoid pitch period oscillations • Modified input is synchronous with original input at frame end • Modification is transparent at least up to 30% of active speech frames (in the example bellow, no coding is used and 30 % of active clean speech frames are modified) • NB Input Test • Modes 0, 1, 2, 3, • Clean speech, nominal level • Test on Interworking with AMR-WB @ 12.65 kbit/s • -WB input, clean speech conditions Purpose: To avoid encoding unclassified frames with low perceptual importance at Full Rate Condition: Ref 0 – AMR-WB @ 14.25 Ref 1 – AMR-WB @ 12.65 Ref 2 – AMR-WB @ 8.85 Test 0 – VMR-WB Mode 0 Test 0 – VMR-WB Mode 1 Test 0 – VMR-WB Mode 2 Et – sum of critical band energies for current frame, in dB Ef – long-term mean of Et for active speech Clean Speech Conditions: Example: Typical example of a low-energy frame encoded with Generic HR in mode 2 • Frame Errors Concealment: • Lost Frame Concealment: • Excitation energy and spectral envelope converge to estimated noise. • Excitation periodicity converges to 0. • Convergence rate depends on the signal class of last good frame. • Recovery after erasure: • Careful energy control of synthesized speech. • Artificial onset reconstruction in case of lost voiced onset. Channel Error Conditions: Background Noise Conditions:

More Related