Balazs KÖVESI, Dominique MASSALOUX

Method of Packet Errors Cancellation Suitable for any Speech and Sound Compression Scheme STQ Workshop, Sophia-Antipolis, February 11th 2003 Balazs KÖVESI, Dominique MASSALOUX

Introduction • Context • Application like VoIP or audio streaming • Possible high packet loss rate (up to 10 %) • Proposition of a frame error concealment (FEC) method • Copes with high packet loss rate • Relies on CELP synthesis scheme • Independent from the codec type • Speech oriented but also suitable for music • Includes adaptive gain control • Avoids "robot" voice • Ensures the decoder memory update • Smoothing after an erased period

Plan • Basic principle of the new FEC method • Implementation in a MDCT codec • Generalization to other codec types • Conclusion

Valid data Decoded signal decoder Storage of decoded samples Smoothing with decoded signal Decoder update Synthesis of missing samples Indication of erased data reconstructed signal Basic principle of the new FEC

Analysis window n-1 Analysis window n t 20 ms Synthesis window n-1 Synthesis window n t 20 ms decoded frame n Implementation in a MDCT codec • The MDCT transform • Analysis with 50 % overlap • Synthesis with overlap-add • windowing • T F transform • F T transform • windowing • overlap-add memory for the next frame overlap-add

Implementation in a MDCT codec • Effect of frame erasure • the loose of x bitstream frames affects x+1 output frames • these frames have to be synthesized in the decoder Erased: frame n t 20 ms Disturbed: frames n-1 & n

Valid data Decoded signal decoder Storage of decoded samples Smoothing with decoded signal Decoder update Synthesis of missing samples synthesized signal Erased data indication Implementation in a MDCT codec • Memorizing part • After decoding a valid frame • The 40 ms output memory is updated • The energy of the frame is calculated • The energy memory buffer is updated ( 5 s )

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection Implementation in a MDCT codec

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection LPC coefficients filter A(z) classical method past decoded signal 20 ms Implementation in a MDCT codec • LPC filter modelizes the spectral envelope • Coefficients not transmitted • LPC analysis order can be higher than in a usual CELP (32 @ 16kHZ)  better performance on music

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection past decoded signal correlation calculations p1 p2 p3 p4 Implementation in a MDCT codec • Precise pitch estimation is crucial for the good performance • Only integer pitch (P) values are examined [50 Hz, 600 Hz] • Normalized correlations on the last 2P samples • Pitch criteria: maximum correlation + multiple & fractional verifications • V/UV criteria: selected correlation value + energy value  5 s energy memory energy evolution of the last two pitch periods

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection A(z) past decoded signal past excitation signal Implementation in a MDCT codec • LPC analysis filtering

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection Implementation in a MDCT codec • Excitation signal generation for the LPC synthesis filtering • voiced excitation: • 2 components • harmonic, lower frequency bands • LTP filter combined with a low pass filter • less harmonic, higher frequency bands • LTP filter combined with a high pass filter + randomly evolving pitch

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection Implementation in a MDCT codec • unvoiced excitation • non harmonic, lower frequency bands • “randomized” LTP filtering + low pass filtering + sudden energy variations are suppressed • Excitation signal generation for the LPC synthesis filtering • voiced excitation: • 2 components • harmonic, lower frequency bands • LTP filter combined with a low pass filter • less harmonic, higher frequency bands • LTP filter combined with a high pass filter + randomly evolving pitch

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection 1 excitation signal synthesized signal A(z) Implementation in a MDCT codec • LPC synthesis filtering

LPC parameters (A(z)) Valid data Decoded signal decoder Synthe- sized signal Storage of decoded samples Smoothing with decoded signal calc. past excitation signal Adaptive gain control LTP filtering 1/B(z) LPC synthesis 1/A(z) Decoder update LPC analysis Synthesis of missing samples synthesized signal Erased data indication LTP parameters (B(z)) Memory of past decoded signal LTP analysis & V/UV detection Implementation in a MDCT codec • Important in case of long erased periods (> 20 ms) • 2 adaptation laws: • stationary • non-stationary • The adaptations also depend on the pitch value • decision available from the LTP analysis t 40 ms background noise level t

Implementation in a MDCT codec • Recoverable information Erased: frame n-1 & n t 20 ms Synthesized frames n-1, n, n+1 decoded frame n+2

IMDCT transform Partly recovered frame n-1 Implementation in a MDCT codec • Recoverable information • for the first erased frame MDCT transform on the first 2 synthesized frames t

Valid data Decoded signal decoder Storage of decoded samples Smoothing with decoded signal Decoder update Synthesis of missing samples synthesized signal Erased data indication MDCT transform on the last 2 synthesized frames updated IMDCT memory IMDCT transform (FT + windowing) Implementation in a MDCT codec • Decoder memory update IMDCT memory to update t

IMDCT transform Partly recovered frame n+1 Implementation in a MDCT codec • Recoverable information • for the last erased frame t

Valid data Decoded signal decoder Storage of decoded samples Smoothing with decoded signal Decoder update Synthesis of missing samples synthesized signal Erased data indication Implementation in a MDCT codec • Smoothing part • without smoothing discontinuity synthesized domain error-free domain t Synthesized frames n-1, n, n+1 decoded frame n+2

Valid data Decoded signal decoder Storage of decoded samples Smoothing with decoded signal Decoder update Synthesis of missing samples synthesized signal Erased data indication Implementation in a MDCT codec • Smoothing part • A codec independent solution: synthesized domain error-free domain t Synthesized frames n-1, n, n+1 decoded frame n+2 Extra synthesized samples 1 crossfading  0

Valid data Decoded signal decoder Storage of decoded samples Smoothing with decoded signal Decoder update Synthesis of missing samples synthesized signal Erased data indication overlap-add like crossfading smooth transition at frame n+1 Implementation in a MDCT codec • Smoothing part • with MDCT smoothing synthesized domain error-free domain t

Valid data Decoded signal decoder Storage of decoded samples Smoothing with decoded signal Decoder update Synthesis of missing samples synthesized signal Erased data indication Generalization to other codec types • can be adapted to any coding scheme • was successfully implemented in • temporal codecs (G.711, G.721, G722) • in a CELP codec (G.723.1) • in a hierarchical codec composed of a CELP and a transform codec • Memorizing and synthesis part are codec independent • Decoder memory update • very important for recursive codecs (CELP) • general solution: coding – decoding on the synthesized frames • too complex for CELP • less complex solution: backtracking • Smoothing • a general solution: crossfading • more efficient smoothing can be found for some coding schemes (ex.: MDCT) • the decoder memory update ensure the smoothing in CELP codecs

Conclusion • A general FEC method for any coding scheme • optimal for speech voice, good performances on music • avoids too synthetic sound for voiced frames • keeps the nature of the unvoiced frames • enhanced energy management • careful update of the decoder memory • smoothing after an erased period • Informal subjective tests have shown its good behavior • Successfully implemented in group communication applications • Perspectives: • speech / music decision + enhanced music mode • …

Balazs KÖVESI, Dominique MASSALOUX