320 likes | 478 Views
Multi-channel Echo Cancellers for Packet Telephony using a low cost DSP. Krishna V V, Jitendra Rayala, Joseph Yau, Brendon Slade DSP Products Division LSI Logic. Plan. Line Echo Cancellation Overview Echo Sources and Cures EC for packet Voice Echo Canceller Internals
E N D
Multi-channel Echo Cancellersfor Packet Telephony using a low cost DSP Krishna V V, Jitendra Rayala, Joseph Yau, Brendon Slade DSP Products Division LSI Logic
Plan • Line Echo Cancellation Overview • Echo Sources and Cures • EC for packet Voice • Echo Canceller Internals • Multi-channel EC on LSI403LP • Summary
Central Office, B Central Office, A 2-wire local loop 2-wire local loop Rx-A Rx-B 4-wire segment AH2 AH1 BH2 BH1 Tx-B Tx-A Echo Sources in Telephony Echo arises due to impedance mismatches at hybrids. Near Echo for A: (side tone) Leakage at AH1 + Reflection at AH2. Far Echo for A: Leakage at BH2 (major component) + Reflection at BH1.
Echo level, or “Echo Return Loss” (ERL) • Round-trip path delay When is EC critical? The need for EC is determined by both Typical ERL values range between 6dB to 12dB. Typical round-trip network delays: POTS (Local Calls) POTS (LD, terrestrial) POTS (LD, satellite) Wireless (GSM, CDMA,…) Packet Voice Less than 10ms 30-70ms 300-500ms 100-180ms 120-200ms
Low delay for PCM, ADPCM, G.728, BV16, … Speech Codec Packetization I/O Buffers Transmission Jitter Buffer Total 0.2 - 40ms 10 - 30ms 20 - 60ms 20 - 150ms 50 - 150ms 100 - 400ms Typical: 120 – 200ms Delay in Packet Networks Overall delay break-up: Revised from “Internet Telephony: Going like crazy”, by G. Thomsen, Y. Jani, IEEE Spectrum, May 2000.
1976 1980 1984 1988 1992 1996 2000 2004 Echo Suppressors G.161 G.164 Line Echo Cancellers G.165 G.168 Indicates specification release / revision CCITT ITU Tackling Echo:Telephony Standards
Early 90’s Late 90’s Early 00’s EC channels per board ~ (12 - 24) ~ 120 mW ~ $20 - 25 ~ (24 - 256) ~ 10 - 25 mW ~ $4 - 5 ~ (64 – 672+) ~ 5 - 15 mW ~ $2 Power per channel* Cost per channel* Major markets Long distance POTS networks (incl. satellite downlinks) Long distance POTS networks; Cellular networks Cellular networks; Packet voice networks Other markets Cellular networks Packet voice networks Long distance POTS networks * Excludes overall system power consumption / costs. EC: The Past Decade
Central Office EC EC EC EC EC IP Cloud PSTN Cloud CPE Gateway 2-wire link “4-wire” link EC for Packet Voice Question: Why EC in the Gateway?
EC at CPE: • Short tails sufficient (~ 16 ms) on FXS ports • Longer tails (32 - 64 ms) used on FXO ports • As few as 2 - 24 channels, as many as a few 100’s, depending on the CPE • EC at PVG: • Longer tail support (32 - 128 ms) • As many as 8K to 30K channels EC in Packet Networks
Tone Detector DTMF, V.21 VAD G.168 Line Echo Canceller PCM Interface A-Law U-Law Linear FXO FXS PRI … Voice Encoder G.711, G.726, G.729A, G.723.1, GSM AMR, iLBC, BV16, … Frame / Packet Interface To RTP packetization From Jitter Buffer CNG Voice Decoder / PLC G.711, G.726, G.729A, G.723.1, GSM AMR, iLBC, BV16, … Tone Generator DTMF, CPT Caller ID Tx Type I, II Packet Voice: CPE Detail
ROUT (to near-end) RIN (from far-end) ERL = LRIN – LECHO (LRIN) Echo G.168 EC (LECHO) SOUT SIN Near-end signal (LRES or LRET) Control Status ACOM = LRIN – LSOUT (near-end signal absent) EC – A Black-box View
Rout Rin V.25 Tone Detector; Holding-band Logic Enable Disable EC Control Logic (Adaptation, NLP) Nonlinear Processor (NLP); Comfort Noise (CNI) Sin Sout G.168 EC Internals
“Tail independent” or “Floating window” “Full tail” Single filter with robust control Double filter with simpler controls Time domain Subband structure Transform domain Some EC Design Options
Full Tail / Floating Window 128 ms Actual echo path Full tail solution 2-window solution 12 ms 12 ms
Determined by: Adaptation method Key Performance Issues • Fast initial convergence • Low steady-state residual • Fast tracking (for occasional path changes) Big Questions -- How fast? How low?
Determined by: Adaptation Control NLP Module CNI Module Key Performance Issues (Cont’d) • Robust to near-end talk • Robust to double-talk • Near-end voice quality (measured by PESQ, MOS, ...) • Near-end back-ground noise contrast
Adaptation Options • NLMS (sample rate or block adaptive) • Enhanced NLMS variants (decorrelation, variable step size, PNLMS, PNLMS++) • Fast affine projection (FAP) • Fast RLS (FTRLS, QR-RLS, …) • Other methods also exist …
Data Memory / Channel MACs / Sample* NLMS APA (order P) FAP (order P) PNLMS FTRLS O(2.N) O(2.PN) + O(7.P2) O(2.N) + O(20.P) O(4.N)* O(8.N)* ~ O(2.N) ~ O(2.PN) ~ O(2.N) ~ O(2.N) ~ O(7.N) * MACs/sample not a good cost measure for PNLMS and FTRLS. Costs of Adaptation
LSI403LP/LC DSP Price-Performance Balance: • 120 MHz - 200 MHz clock • ZSP400 core, up to 4 instructions per cycle • Dual MACs can perform two 16x16 or one 32x32 operation(s) per cycle • 48K words of on-chip SRAM (configurable as 16K:32K or 24K:24K or 32K:16K of PM and DM) • Two serial ports with TDM support • As low as $4.00 in volume.
NLMS based EC can be split into 3 functional parts: 1. FIR filtering Typically 15-25% of the processor load; varies with tail length 2. NLMS filter update Typically 25-35% of the processor load; varies with tail length. 3. Overall Control Logic This has a few loops for IIR filtering, division, as well as many if-then-else-type of decisions. Also includes V.25 tone detector, comfort noise generator, etc. Typically 40-60% of the processor load; varies with tail length. EC Complexity Break-up
EC Complexity NLMS based Example: Data Memory Ops / Sample FIR Filtering: O(N) O(N) For lattice structures, filtering and update stage break-up not possible. Filter Updates: O(N)-O(2N) O(N) Costs are almost constant (depends very weakly on filter length, N). Other Logic*: ~ c ~ M * Other logic includes several IIR filters, conditional branching, data buffer management, etc. for update control, NLP, CNI and V25 tone disabler.
MACs / cycle: 1 2 4 FIR Filtering: O(N) O(N/2) O(N/4) Filter Updates: O(N/4) -O(N/2) O(N)-O(2N) O(N/2)-O(N) Other Logic: ~ c ~ c ~ c Load for 64ms EC ~ 12 – 13 (MHz) ~ 8 – 9 (MHz) ~ 5 – 6 (MHz) Multi-MAC Processors: Percentage load for “other logic” is significant.
FIR Filtering Loop ZSP400 Code snippet: L_ECFilter_Loop: lddu r2, r14, 2 ! r2 = Y[k], r3 = Y[k+1] lddu r4, r13, 2 ! r4 = A[k], r5 = A[k+1] mac2.a r2, r4 ! r1r0 = r1r0 + r2*r4 + r3*r5 agn0 L_ECFilter_Loop Approximately N/2 cycles per sample, as it can be implemented using lddu / lddu / mac2.a instruction sequence.
Full-Tail Windowed Code Size: Data Memory: Channel Data: I/O Buffers: Load (MHz): 1.1 K 0.4K 1.3K 0.12K 8.6 3.8 K 1.2K 1.6K 0.12K 6.1 LSI403LP/LC EC Implementations Two Versions: • Full-tail, 64ms echo canceller • Windowed version (up to 3 discrete echoes) Notes: All memory in 16-bit words. I/O buffers are for 2.5 ms frame size Numbers subject to change (on-going revisions!).
Multi-channel EC Costs • Processor load (MHz or gates): • Increases almost linearly with channel count • For large channel counts savings possible • Data Memory (Channel object): • Increases linearly with channel count • On-chip memory is expensive, but reduces power consumption, offers easier scalability with multiple cores
Full-Tail Windowed Code Size: Data Memory: Load (MHz): 1.1 K 35 K 208 3.8 K 42.5 K 147 24 Channels on LSI403LP/LC Resources for 24 channels: Notes: Processor load is worst case (all channels performing adaptation). Extra data memory requirements can be met by the free program memory on LSI403LP/LC. The required swap operations (for some channels only) are estimated to add an extra load of 6.5 MHz in case of the windowed version. Multi-chip packaging using LSI403WLP provides higher channel density. For example, a dual-processor package can support 32 channels, at a lower clock, without requiring any memory swaps.
Summary • LSI403LP or LSI403LC can be used to support as many as 24 channels of LEC with 64ms tail, without any external SRAM. • Very low cost per channel (under $0.50 per channel). • Multi-chip packaging for higher channel density. • Custom ASICs can be built for further cost reduction. Higher performance options using ZSP G2 cores also possible. Thanks! Questions?
One-way Delay < 150ms 150-400ms > 400ms ITU-T Classification (with echo “adequately controlled”) Mostly acceptable. Acceptable (maybe). Unacceptable (in general). Terrestrial, national long distance PSTN: < 50ms Terrestrial, international PSTN: ~ 100ms Cellular: Mobile to PSTN: ~ 150ms Cellular: Mobile to Mobile: ~ 300 – 400ms TYPICAL DELAYS Delay: G.114 Guidelines
30 20 Reqd. ERL (dB) 10 10 30 50 70 90 Delay (ms) Echo Level and Delay ERL data from Table 1.1, “Acoustic Signal Processing for Telecommunication”, S. L. Gay and J. Beneste (Ed.s), Kluwer Academic Publishers (2000)
Dealing With Delay (Echo) • One-way delays in packet voice networks > 100ms • As recommended in ITU-T G.131, a network echo canceller (EC) is required. • EC required only for: • PSTN interfaces on packet voice gateways (PVGs) • Analog phone (SLIC) interfaces on CPEs • EC not required for digital IP phones • AEC may still be needed (for hands-free operation) • EC tail length – a much misused parameter • ITU-T G.168 EC was initially developed for PSTN. Can it be applied as-is for packet voice networks? CPE: Customer Premises Equipment, PVG: Packet Voice Gateway
Voice quality (MOS, PESQ, R-value, etc.) Network Latency Delay induced quality loss Packetization Voice Codec Tx / Rx buffers End-End Delay Codec delay JB Size JB delay Line echo Jitter Buffer EC Quality “Lost” packets PLC quality Speech clipping, comfort noise quality Echo Canceller Packet Loss Concealment VAD / CNG Quality of Service (QoS)