270 likes | 469 Views
Intro. To Audio Post-Processing and Optimization Strategy. Ivan Lee , Jan, 2005. Outline. Post-Processing Overview Dolby ProLogic II Principle & Features Float to Fixed-point Translation Code Optimization on Lx5280 Q&A. Post-Processing Overview (1). BBE ( HD Sound, Mach3Bass, ViVA (+) )
E N D
Intro. To Audio Post-Processingand Optimization Strategy Ivan Lee , Jan, 2005
Outline • Post-Processing Overview • Dolby ProLogic II Principle & Features • Float to Fixed-point Translation • Code Optimization on Lx5280 • Q&A
Post-Processing Overview (1) • BBE ( HD Sound, Mach3Bass, ViVA(+) ) Natural Musical Realism, Full Frequency Operation, Speech Intelligibility • SRS ( WOW, TruSurround(XT), Circle Surround II) Clarity Improvement, 3D-Audio(SRS), Bass Enhancement Down-mix, Matrix-decode
Post-Processing Overview (2) • Dolby ( ProLogic II, Headphone, Virtual Speaker ) Matrix-decode, Down-mix to Headphone, Down-mix to 2 Speakers system • Realtek Sound Effect EAX(Reverb), Chorus, Equalizer, Pitch Shift, Voice Canceller
ProLogic II Principle & Features (1) • ProLogic II matrix decoder receives 2 channels and delivers feeds for three, four or more loudspeakers. • To express Lt and Rt in terms of the intended direction of a source,independent of the magnitude.
ProLogic II Principle & Features (2) See ProLogic II Block Diagrams
ProLogic II Principle & Features (3) See ProLogic II Block Diagrams
ProLogic II Principle & Features (4) See ProLogic II Block Diagrams
Float to Fixed-point Translation- Translation Process • Keeping an acceptable accuracy is the only criterion when judging the end of translation process.
Float to Fixed-point Translation- float v.s. fixed • IEEE 754 single precision float point range : -3.4 x 1038 ~ 3.4 x 1038 • 16bit integer with 2’s complement range : -32,768 ~ 32,767 • 32bit integer with 2’s complement range : -2,147,483,648 ~ 2,147,483,647
Float to Fixed-point Translation- Numeral Representation • Interpretation of bits in a 4-bit integer ┌─────── sign │ ┌───── 22 = 4 0110 0*(-8) + 1*4 + 1*2 + 0*1 = 6 │ │ ┌─── 21 = 2 │ │ │ ┌─ 20 = 1 1111 1*(-8) + 1*4 + 1*2 + 1*1 = -1 ↓ ↓ ↓ ↓ ┌─┬─┬─┬─┐ │S │b2│b1│b0│ └─┴─┴─┴─┘ • Interpretation of fractional numbers ┌─────── sign │ ┌───── 22 = 4 0101 0*(-1)+1*0.5+0*0.25+1*0.125 = 0.625 │ │ ┌─── 21 = 2 │ │ │ ┌─ 20 = 1 1010 1*(-1)+0*0.5+1*0.25+0*0.125 = -0.75 ↓ ↓ ↓ ↓ ┌─┬─┬─┬─┐ │S │b2│b1│b0│ └─●─┴─┴─┘
Float to Fixed-point Translation- Arithmetic Functions (1) • Replace arithmetic operators with arithmetic functions (ETSI std. and Lx5280 extension) Ex. Y = a + b; Y = L_add(a,b); M = x * y; M = L_mult(x,y); S += x * y; S = L_mac(S,x,y);
Float to Fixed-point Translation- Arithmetic Functions (2) • Addition add( ), sub( ), L_add( ), L_sub( ) • Multiplication mult( ), mujlt_r( ), L_mult( ), EL_mult( ), EL_mult_r( ) • Division divide_s( ) • Arithmetic shifts shr( ), shl( ), L_shr( ), L_shl( ), shift_r( ), L_shift_r( ) • Absolute value abs_s( ), L_abs( )
Float to Fixed-point Translation- Arithmetic Functions (3) • Multiply accumulate msu_r( ), mac_r( ), L_mac( ), L_msu( ), EL_mac( ), EL_mac_72( ) • Negation negate( ), L_negate( ) • Accumulator manipulation L_deposit_l( ), L_deposit_h( ), extract_l( ), extract_h( ), extract_h32( ) • Round round( ), round32( ) • Normalization norm_l( ), norm_s( )
Float to Fixed-point Translation- Example Code • See fixed math library source code ex. L_add( ), L_shl( ),EL_mult( ) • See float and fixed source code ex. polezero( ) v.s. fix_polezero( )
Float to Fixed-point Translation- Example Code : L_add( ) INT32 L_add(INT32 L_var1, INT32 L_var2) { INT32 L_Sum,L_SumLow,L_SumHigh; L_Sum = L_var1 + L_var2; if ((L_var1 > 0 && L_var2 > 0) || (L_var1 < 0 && L_var2 < 0)) { /* an overflow is possible */ L_SumLow = (L_var1 & 0xffff) + (L_var2 & 0xffff); L_SumHigh = ((L_var1 >> 16) & 0xffff) + ((L_var2 >> 16) & 0xffff); if (L_SumLow & 0x10000) L_SumHigh += 1; /* carry into high word is set */ /* update sum only if there is an overflow or underflow */ if ((0x10000 & L_SumHigh) && !(0x8000 & L_SumHigh)) L_Sum = LW_MIN; /* underflow */ else if (!(0x10000 & L_SumHigh) && (0x8000 & L_SumHigh)) L_Sum = LW_MAX; /* overflow */ } return (L_Sum); }
Float to Fixed-point Translation- Example Code : polezero( ) void polezero(DSPfract *inptr, DSPshort inoff, DSPfract *outptr, DSPshort outoff, POLEZERO_CFS *filtcfs, POLEZERO_VARS *filtvars, DSPshort sampcount) { DSPfract accum; int samp; for (samp = 0; samp < sampcount; samp++) { accum = -filtvars->y1 * filtcfs->a1; accum += *inptr * filtcfs->b0; accum += filtvars->x1 * filtcfs->b1; filtvars->x1 = *inptr; *outptr = DSPrnd(PCMBITS, PCMRND, accum); filtvars->y1 = *outptr; inptr += inoff; outptr += outoff; } }
Float to Fixed-point Translation- Example Code : fix_polezero( ) void fix_polezero(INT32 *inptr, INT16 inoff, INT32 *outptr, INT16 outoff, FIX_POLEZERO_CFS *filtcfs, FIX_POLEZERO_VARS *filtvars, INT16 sampcount) { INT32 accum; INT16 samp; for (samp = 0; samp < sampcount; samp++) { accum = EL_mult(filtvars->y1 , filtcfs->a1); accum = L_sub(EL_mult(*inptr , filtcfs->b0),accum); accum = L_add(accum, EL_mult(filtvars->x1 , filtcfs->b1)); filtvars->x1 = *inptr; *outptr = accum; filtvars->y1 = *outptr; inptr += inoff; outptr += outoff; } }
Float to Fixed-point Translation- Substitute by Module Float point level program flow ─┬─→┬─→┬─→┬─→┬─→┬─→ │ │ │ │ │ │ ↓ ↑ ↑ ↑ ↑ ↑ └─→┴─→┴─→┴─→┴─→┘ Fixed point level
Code Optimization on Lx5280- Iterative Optimization Process • Different performance metrics execution speed, memory use, power consumption, quality
Code Optimization on Lx5280- Optimization Strategy • Processor-independent small interface, inline function, recycling memory buffer flatting function call hierarchy • Processor-specific algorithmic modifications and transformations, assembly language programming • Memory access issue
Code Optimization on Lx5280- Radiax DSP instructions • MAC 40/72-bit Accumulator Reg. , Saturation Detect, SIMD • Data Addressing Twinword data movement, Post-modified Pointer, Circular Buffer • ALU Saturation Detect, SIMD, Absolute, Normalization
Code Optimization on Lx5280- Example Code : fix_polezero( ) fix_polezero: /* setup ZOH loop */ la t3, fix_polezero_start # get fix_polezero_start la t4, fix_polezero_end-4 # get fix_polezero_end ori t5, zero, 8-1 # loop_count = sampcount - 1 mtru t3, lps0 # set ZOH loop start address mtru t4, lpe0 # set ZOH loop end address mtru t5, lpc0 # set ZOH count /************************************************************ * state variables & coefficients mapping to register files * (a1,b0,b1,y1,x1) = (t3,t4,t5,t6,t7) ************************************************************/ lw t3, 0x00(t0) # t3 = filtcfs->a1 lw t4, 0x04(t0) # t4 = filtcfs->b0 lw t5, 0x08(t0) # t5 = filtcfs->b1 lw t6, 0x00(t1) # t6 = filtcfs->y1 lw t7, 0x04(t1) # t7 = filtcfs->x1 sll a1, a1, 0x02 # inptr offest length in byte sll a3, a3, 0x02 # outptr offest length in byte
Code Optimization on Lx5280- Example Code : fix_polezero( ) [con’t] fix_polezero_start: lw t8,0x00(a0) # t8 = *inptr multa m0, t6, t3 # MAC0 = -y1 * a1 multa m1, t8, t4 # MAC1 = (*inptr) * b0 multa m2, t7, t5 # MAC2 = x1 * b1 addu a0, a0, a1 # inptr += inoff mfa v0, m0h mfa v1, m1h mfa t9, m2h subr.s v1, v1, v0 addr.s v1, v1, t9 # v1 = accum or t7, zero, t8 # x1 = *inptr sw v1, 0x00(a2) # *outptr = accum or t6, zero, v1 # y1 = *outptr addu a2, a2, a3 # outptr += outoff fix_polezero_end: nop sw t6, 0x00(t1) # filtcfs->y1 = t6 sw t7, 0x04(t1) # filtcfs->x1 = t7 fix_polezero_exit: jr ra # return to caller nop
References • “The Scientist and Engineer's Guide to Digital Signal Processing” ,Steven W. Smith , ISBN 0-9660176-6-8 • “Using The Low Cost, High Performance ADSP-21065L Digital Signal Processor For Digital Audio Applications” ,Dan Ledger and John Tomarakos , Analog Device Inc. • “Converting floating-point applications to fixed-point “ ,Randy Allen , Embedded Systems Programming. Sep. 24. 2004 • “Developing software for audio/visual devices” ,Bjorn Hori and Jeff Bier , Embedded Systems Programming. Nov. 24. 2004 • ETSI ANSI-C code for the GSM half rate speech codec (GSM 06.06)