Implementation of G.729 algorithm on TI’s TMS320C62xx

Implementation of G.729 algorithm on TI’s TMS320C62xx Zoha Pajouhi May 2005

Outline • What is G.729? • How Does it work ? • ََََAcquaintance with C62xx processors • Steps to be taken for implementation • Optimizations required for implementation • Implementation of the algorithm • Summary & Conclusions

What is G.729 ? • G.729 is a speech coding technique • Most important usage is in VoIP • Compress speech signal from 64kbps to 8kbps • It uses the CS-ACELP algorithm • CS-ACELP stands for conjugate structure algebraic code excited linear prediction • Has different annexes :G.729 Annex A , G.729 Annex B , G.729 Annex D , G.729 Annex E . • Different in :difficulty of the algorithm ,utilizing voice activity detection , reduced bit rates at the cost of lower quality & vice versa • I discuss implementing G.729 Annex A

How Does G.729 Work ? (1) • Uses input frames of 10ms which is equal to 80 samples • Each frame has two subframes of 5ms • The idea of the algorithm is to predict the next coming signals by means of linear prediction . • It also uses statistical data to distinguish the resemblance of the signal to special signals in it’s code book • Exact signal variation is irrelevant to our discussion .

How Does G.729 Work ? (2) • These operations are performed once per frame • Pre processing : scales down the signal by a factor of 2 , passing from a HP filter. • LP (linear prediction) analysis : Uses linear prediction to model the signal , the LP coefficients are converted to LPC coefficients for less sensitivity to quantization noise .

How Does G.729 Work ? (3) • Quantization : LPCs are quantized and used throughout the rest of the algorithm . • Open-loop pitch analysis : pitch analysis is too difficult , this part gives us a rough estimation of the pitch

How Does G.729 Work ? (4) • These operations are performed twice a frame or once per subframe • Closed-loop pitch analysis : Determines exact pitch delay through a closed loop • Fixed codebook search : computes the resemblance of the signal to the different codes in the codebook is used for consonant sounds • Adaptive codebook search : Is the same as fixed codebook except for it is for nonconsonant sounds .

How Does G.729 Work ? (5) • The bit allocation is as follows :

Why use DSP processors ? • Complex & long calculations in speech coding algorithms • Short design period Lead to using DSP processors instead of custom hardware design • Different DSP processors available : floating and fixed point processors usually trade off in price vs. precision.

DSP processor

Acquaintance with DSP C62xx processors • 150-250 MHz clock • 8 instructions/clk • 1200-2000 million instructions/s (MIPs) • 8 , 16 & 32 bit data manipulation better memory access • Low power • VLIW code structure ,6 ALUs ,2 multipliers • 40-bit mathematic function • Saturation and normalization blocks • On-chip Ram , etc.

Steps to be taken for implementation • System simulation & analysis : Matlab or C or C++ is usually used . • Simulation of the implementation on the processor :one can make different optimization choices according to the processor being used . • Optimization : will be discussed later • Conversion to processor assembly

Optimization • Different issues should be considered : • Processor independent issues • Locating independent instructions between two dependant ones to use parallelism • Optimized usage of registers • Loop folding/unfolding . • Folding for reduced code memory • Unfolding for reducing loop overhead • Using pointers instead of arrays

Optimization • Processor dependant issues • Substituting special functions instead of assembly : e.g. : saturation has got a special instruction & doesn’t need to be rewritten . • Reading 32 bit data instead of 16 bit in 16 bit operations to reduce memory access time • Pipelining the algorithm : sometimes done by the design software sometimes not :e.g. :the inner loops are pipelined but not the outer ones. • Using hardware over flow bit , not supported in C6000 series .

Implementation of the algorithm • The multichannel system can run more than one algorithm at the same time. • Any algorithm compliant with the eXpress DSP Algorithm Standard (xDAIS) is capable of multichannel processing. • An xDAIS-compliant algorithm requires three functional modules: • initialization, freeing and kernel. • The kernel module performs the algorithm processing while • the initialization and freeing module initializes/frees the algorithm context data.

Framework initialization • Initialize context data and store to the desired memory location. • The system repeatedly calls an algorithm/algorithms until all the frames have been processed

Algorithm implementation The G.729 speech encoder is divided into five submodules: • Submodule 1 contains pre-processing, the LP analysis and LPC to LSP conversion ,including Pre_Process(), Autocorr(), Lag_window(), Levinson() and Az_lsp(). • Submodule 2 calls Qua_lsp() to conduct LSP quantization. • Submodule 3 generates the interpolated LPC parameters, computes weighted speech, and finds the open-loop pitch, including Int_qlpc(), Int_lpc(), Weight_Az(), Residu(), Syn_filt() and Pitch_ol(). • Submodules 1 through 3 are for frame processing and should be done once per frame. Submodules

Algorithm Implementation • Submodule 4 performs the closed-loop fractional pitch search and the adaptive codebook search, calling Pitch_fr3(), Enc_lag3(), Pred_lt_3(), Convolve() and G_pitch(). • Submodule 5 performs the innovative codebook search and filter memory update, calling ACELP_Codebook(), Corr_xy2(), Qua_gain() and Syn_filt(). • 4 and 5 are for subframe processing and should be repeated twice per frame

Data Memory requirements The data memory is divided into three groups: • Context data : The context data are the static variables and arrays with values that must be kept from one frame to the next. • Tables : The constant tables are sorted into the G729_TABLES ,these tables contain different constants needed for the algorithm • Local variables and arrays :The local variables and arrays are stored in the stacks to be simply used when the algorithm is implemented .

Summary & Conclusion • The G.729 standard is a popular choice for applications, such as VoIP, that require efficient use of bandwidth and good speech quality. • This standard has a good balance of bit-rate and frame size, producing acceptable speech quality • The TI DSP processors are capable of performing the algorithm • Although implementation seems simple but there are various issues to be considered .

References • ITU-T G.729 Annex A:Reduced Complexity 8 kb/s CS-ACELP Codec for Digital Simultaneous Voice and Data , Redwan Salami, Claude Laflamme, Bruno Bessette, and Jean-Pierre AdoulUniversity of Sherbrooke , IEEE Communications Magazine ,September 1997 • CODING OF SPEECH AT 8 kbit/s USING CONJUGATE-STRUCTURE ALGEBRAIC-CODE-EXCITED LINEAR-PREDICTION (CS-ACELP) , ITU-T Recommendation G.729 , (03/96) • G.729/A Speech Coder: Multichannel TMS320C62x Implementation , Chiouguey Chen ,Xiangdong Fu , TI Application Report SPRA564B - February 2000 • TMS320C6211, TMS320C6211BFIXED-POINT DIGITAL SIGNAL PROCESSORS ,AUGUST 1998 ,REVISED MARCH 2004 ,TI corp. Datasheets

Implementation of G.729 algorithm on TI’s TMS320C62xx

Implementation of G.729 algorithm on TI’s TMS320C62xx

Presentation Transcript

Marching Cubes: A High Resolution 3D Surface Construction Algorithm

IMAGE RECONSTRUCTION

KMP algorithm

COMPUTER GRAPHICS

Fiduccia-Mattheyses Algorithm

A New Approach to the Maximum-Flow Problem

Hungarian Algorithm

Heap Sort

Chapter 2

A Privacy-Preserving Index for Range Queries

Final Presentation

Design and Analysis of Algorithm Decrease and Conquer Algorithm

Algorithm Engineering „ Externspeicherplatzsuche “

Hidden Markov Models

34.NP Completeness

Chapter 11: File System Implementation

Lecture 10 Implementation

Top-k and Skyline Computation

Chapter 2

235015, 305450 Artificial Intelligence ปัญญาประดิษฐ์ 3(2-2-5)

Algorithm Analysis

IMAGE RECONSTRUCTION