220 likes | 522 Views
Implementation of G.729 algorithm on TI’s TMS320C62xx. Zoha Pajouhi. Outline. What is G.729? How Does it work ? ََََ Acquaintance with C62xx processors Steps to be taken for implementation Optimizations required for implementation Implementation of the algorithm Summary & Conclusions.
E N D
Implementation of G.729 algorithm on TI’s TMS320C62xx Zoha Pajouhi May 2005
Outline • What is G.729? • How Does it work ? • ََََAcquaintance with C62xx processors • Steps to be taken for implementation • Optimizations required for implementation • Implementation of the algorithm • Summary & Conclusions
What is G.729 ? • G.729 is a speech coding technique • Most important usage is in VoIP • Compress speech signal from 64kbps to 8kbps • It uses the CS-ACELP algorithm • CS-ACELP stands for conjugate structure algebraic code excited linear prediction • Has different annexes :G.729 Annex A , G.729 Annex B , G.729 Annex D , G.729 Annex E . • Different in :difficulty of the algorithm ,utilizing voice activity detection , reduced bit rates at the cost of lower quality & vice versa • I discuss implementing G.729 Annex A
How Does G.729 Work ? (1) • Uses input frames of 10ms which is equal to 80 samples • Each frame has two subframes of 5ms • The idea of the algorithm is to predict the next coming signals by means of linear prediction . • It also uses statistical data to distinguish the resemblance of the signal to special signals in it’s code book • Exact signal variation is irrelevant to our discussion .
How Does G.729 Work ? (2) • These operations are performed once per frame • Pre processing : scales down the signal by a factor of 2 , passing from a HP filter. • LP (linear prediction) analysis : Uses linear prediction to model the signal , the LP coefficients are converted to LPC coefficients for less sensitivity to quantization noise .
How Does G.729 Work ? (3) • Quantization : LPCs are quantized and used throughout the rest of the algorithm . • Open-loop pitch analysis : pitch analysis is too difficult , this part gives us a rough estimation of the pitch
How Does G.729 Work ? (4) • These operations are performed twice a frame or once per subframe • Closed-loop pitch analysis : Determines exact pitch delay through a closed loop • Fixed codebook search : computes the resemblance of the signal to the different codes in the codebook is used for consonant sounds • Adaptive codebook search : Is the same as fixed codebook except for it is for nonconsonant sounds .
How Does G.729 Work ? (5) • The bit allocation is as follows :
Why use DSP processors ? • Complex & long calculations in speech coding algorithms • Short design period Lead to using DSP processors instead of custom hardware design • Different DSP processors available : floating and fixed point processors usually trade off in price vs. precision.
Acquaintance with DSP C62xx processors • 150-250 MHz clock • 8 instructions/clk • 1200-2000 million instructions/s (MIPs) • 8 , 16 & 32 bit data manipulation better memory access • Low power • VLIW code structure ,6 ALUs ,2 multipliers • 40-bit mathematic function • Saturation and normalization blocks • On-chip Ram , etc.
Steps to be taken for implementation • System simulation & analysis : Matlab or C or C++ is usually used . • Simulation of the implementation on the processor :one can make different optimization choices according to the processor being used . • Optimization : will be discussed later • Conversion to processor assembly
Optimization • Different issues should be considered : • Processor independent issues • Locating independent instructions between two dependant ones to use parallelism • Optimized usage of registers • Loop folding/unfolding . • Folding for reduced code memory • Unfolding for reducing loop overhead • Using pointers instead of arrays
Optimization • Processor dependant issues • Substituting special functions instead of assembly : e.g. : saturation has got a special instruction & doesn’t need to be rewritten . • Reading 32 bit data instead of 16 bit in 16 bit operations to reduce memory access time • Pipelining the algorithm : sometimes done by the design software sometimes not :e.g. :the inner loops are pipelined but not the outer ones. • Using hardware over flow bit , not supported in C6000 series .
Implementation of the algorithm • The multichannel system can run more than one algorithm at the same time. • Any algorithm compliant with the eXpress DSP Algorithm Standard (xDAIS) is capable of multichannel processing. • An xDAIS-compliant algorithm requires three functional modules: • initialization, freeing and kernel. • The kernel module performs the algorithm processing while • the initialization and freeing module initializes/frees the algorithm context data.
Framework initialization • Initialize context data and store to the desired memory location. • The system repeatedly calls an algorithm/algorithms until all the frames have been processed
Algorithm implementation The G.729 speech encoder is divided into five submodules: • Submodule 1 contains pre-processing, the LP analysis and LPC to LSP conversion ,including Pre_Process(), Autocorr(), Lag_window(), Levinson() and Az_lsp(). • Submodule 2 calls Qua_lsp() to conduct LSP quantization. • Submodule 3 generates the interpolated LPC parameters, computes weighted speech, and finds the open-loop pitch, including Int_qlpc(), Int_lpc(), Weight_Az(), Residu(), Syn_filt() and Pitch_ol(). • Submodules 1 through 3 are for frame processing and should be done once per frame. Submodules
Algorithm Implementation • Submodule 4 performs the closed-loop fractional pitch search and the adaptive codebook search, calling Pitch_fr3(), Enc_lag3(), Pred_lt_3(), Convolve() and G_pitch(). • Submodule 5 performs the innovative codebook search and filter memory update, calling ACELP_Codebook(), Corr_xy2(), Qua_gain() and Syn_filt(). • 4 and 5 are for subframe processing and should be repeated twice per frame
Data Memory requirements The data memory is divided into three groups: • Context data : The context data are the static variables and arrays with values that must be kept from one frame to the next. • Tables : The constant tables are sorted into the G729_TABLES ,these tables contain different constants needed for the algorithm • Local variables and arrays :The local variables and arrays are stored in the stacks to be simply used when the algorithm is implemented .
Summary & Conclusion • The G.729 standard is a popular choice for applications, such as VoIP, that require efficient use of bandwidth and good speech quality. • This standard has a good balance of bit-rate and frame size, producing acceptable speech quality • The TI DSP processors are capable of performing the algorithm • Although implementation seems simple but there are various issues to be considered .
References • ITU-T G.729 Annex A:Reduced Complexity 8 kb/s CS-ACELP Codec for Digital Simultaneous Voice and Data , Redwan Salami, Claude Laflamme, Bruno Bessette, and Jean-Pierre AdoulUniversity of Sherbrooke , IEEE Communications Magazine ,September 1997 • CODING OF SPEECH AT 8 kbit/s USING CONJUGATE-STRUCTURE ALGEBRAIC-CODE-EXCITED LINEAR-PREDICTION (CS-ACELP) , ITU-T Recommendation G.729 , (03/96) • G.729/A Speech Coder: Multichannel TMS320C62x Implementation , Chiouguey Chen ,Xiangdong Fu , TI Application Report SPRA564B - February 2000 • TMS320C6211, TMS320C6211BFIXED-POINT DIGITAL SIGNAL PROCESSORS ,AUGUST 1998 ,REVISED MARCH 2004 ,TI corp. Datasheets