1 / 21

ADPCM ON TENSILICA

ADPCM ON TENSILICA. Xiaoling Xu and Fan Mo EECS, UC Berkeley. DESIGN GOAL. Basic Algorithm Two Streams Approach Make use of Tensilica’s Special Features Results Conclusion. Step Size Calculation. Adjusted step size ss(n+1). Z -1. Step size ss(n). +. Encoder. X(n) Input sample.

yoko-colon
Download Presentation

ADPCM ON TENSILICA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ADPCM ON TENSILICA Xiaoling Xu and Fan Mo EECS, UC Berkeley

  2. DESIGN GOAL • Basic Algorithm • Two Streams Approach • Make use of Tensilica’s Special Features • Results • Conclusion

  3. Step Size Calculation Adjusted step size ss(n+1) Z-1 Step size ss(n) + Encoder X(n) Input sample d(n) Difference L(n) ADPCM output sample 4 bits _ Decoder X(n) estimate X(n-1) estimate of last input sample Z-1 ADPCM ENCODER

  4. Step Size Calculation Adjusted step size ss(n+1) Z-1 Z-1 Step size ss(n) X(n-1) Decoder L(n) ADPCM input sample 4 bits d(n) Difference + X(n) Output sample ADPCM DECODER

  5. ENCODING ALGORITHM StepsizeTable[89] = { 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066, 2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358, 5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899, 15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767 }; Encoding(*input) { loop(number of samples) { X=*input++; D=X-X-1; S=StepsizeTable(Index); Xa=|X|; Code=0; if (Xa>S) { Code[2]=1; Xa-=S; X-1+=S; } S/=2; if (Xa>S) { Code[1]=1; Xa-=S; X-1+=S; } S/=2; if (Xa>S) { Code[0]=1; Xa-=S; X-1+=S; } Code[3]|=(X>0)?0:1; X-1=(X>0)?X-1:X; if (X-1>32767) X-1 =32767; if (X-1<-32768) X-1 =-32768; Index+=IndexTable(Code); if (Index>88) Index=88; if (Index<0) Index=0; *output++=Code; } } IndexTable[16] = { -1, -1, -1, -1, 2, 4, 6, 8, -1, -1, -1, -1, 2, 4, 6, 8 };

  6. DECODING ALGORITHM StepsizeTable[89] = { 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066, 2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358, 5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899, 15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767 }; Decoding(*Code) { C=*Code++; S=StepsizeTable(Index); D=0; if (C[2]==1) D+=S; S/=2; if (C[1]==1) D+=S; S/=2; if (C[0]==1) D+=S; if (Code[3]==1) X=X-1-D; else X=X-1+D; if (X>32767) X =32767; if (X<-32768) X =-32768; Index+=IndexTable(Code); if (Index>88) Index=88; if (Index<0) Index=0; *output++=X; X-1=X; } IndexTable[16] = { -1, -1, -1, -1, 2, 4, 6, 8, -1, -1, -1, -1, 2, 4, 6, 8 };

  7. S 0XX 1XX S’ 00X 01X 10X 11X S’’ ALTERNATIVE APPROACHES USING MULTIPLICATION Multiplier is there. Why not use it? if (Xa>S) { Code[2]=1; Xa-=S; X-1+=S; } S/=2; if (Xa>S) { Code[1]=1; Xa-=S; X-1+=S; } S/=2; if (Xa>S) { Code[0]=1; Xa-=S; X-1+=S; } Code[3]|=(X>0)?0:1; Code[2:0]=Xa/S*4=Xa*(1/S)*4; X-1=Code[2:0]*S/4; (1/S) is stored in a table. USING MORE TABLES Build tables for all possible paths. if (Xa>S) { Code[2]=1; Xa-=S; X-1+=S; } S/=2; if (Xa>S) { Code[1]=1; Xa-=S; X-1+=S; } S/=2; if (Xa>S) { Code[0]=1; Xa-=S; X-1+=S; } Code[3]|=(X>0)?0:1; Xa-=S; Code[2]=~MSB(Xa); Xa-=S’[Code]; Code[1]=~MSB(Xa); Xa-=S’’[Code]; Code[0]=~MSB(Xa); Eg. S’[0XX]=S/2; S’[1XX]=-S+S/2;

  8. BUT... • Earlier experiments showed that neither approaches give big improvement. WHY? • Multiplication takes many cycles. • Too many tables cause large cache miss.

  9. UNIQUE OPERATIONS Decoding(*Code) { C=*Code++; S=StepsizeTable(Index); D=0; if (C[2]==1) D+=S; S/=2; if (C[1]==1) D+=S; S/=2; if (C[0]==1) D+=S; if (Code[3]==1) X=X-1-D; else X=X-1+D; if (X>32767) X =32767; if (X<-32768) X =-32768; Index+=IndexTable(Code); if (Index>88) Index=88; if (Index<0) Index=0; *output++=X; X-1=X; } IF (…) … ELSE ... Encoding(*input) { loop(number of samples) { X=*input++; D=X-X-1; S=StepsizeTable(Index); Xa=|X|; Code=0; if (Xa>S) { Code[2]=1; X-=S; X-1+=S; } S/=2; if (Xa>S) { Code[1]=1; X-=S; X-1+=S; } S/=2; if (Xa>S) { Code[0]=1; X-=S; X-1+=S; } Code[3]|=(X>0)?0:1; X-1=(X>0)?X-1:X; if (X-1>32767) X-1 =32767; if (X-1<-32768) X-1 =-32768; Index+=IndexTable(Code); if (Index>88) Index=88; if (Index<0) Index=0; *output++=Code; } } CLAMP

  10. StreamA Data StreamB Data 31 16 | 15 0 UNIQUE DATA STRUCTURE • Most data shorter than or equal to 16-bit. • Since register is 32-bit, why not put two data in one register • But in some place, the 17th bit is required to store the intermediate results. if (Code[3]==1) X=X-1-D; else X=X-1+D; if (X>32767) X =32767; if (X<-32768) X =-32768; X has to be 17-bit

  11. DUAL STREAM ENCODER DUAL STREAM DECODER WHY NOT TWO STREAMS? Difficult?

  12. FIRST APPROACH: • Control-Oriented Application is hard to do parallel operations. • Modify the algorithm into a more computation-oriented approach by using multiply. • Speedup • 10% for single stream • 0% for two streams due to high cache misses. • Why? • 16-bit multiplication results a 32-bit data .

  13. XA-1 XB-1 + SA SB 31 16 | 15 0 ANOTHER APPROACH • Keep Control-Oriented Approach: • 1. How to block the carry/borrow between bit16 and bit15? • 2. How to carry out two “If (..) ..” in one instruction? • 3. How to encapsulate two 17-bit data in a 32-bit register?

  14. TIE Instruction 1. How to carry out two “If (..) ..” in one instruction? if (data1>bound) data1=bound; if (data2>bound) data2=bound; if(data2|data1 > bound) data2|data1 = bound|bound data2 data1 - bound bound 31 30 15 0 data2 data2 2:1 mux 2:1 mux bound bound data2 data1

  15. TIE Instructions • 2. How to encapsulate two 17-bit data in a 32-bit register? • data1 += diff1; data2 += diff2; • if (data1 > 32767) data1 = 32767 if(data2 > 32767) data2 = 32767 • data2|data1 += diff2|diff1; data2 data1 + diff2 diff1 result1 result2 31 16 | 15 0 result1 result2 2:1 mux 2:1 mux 32767 32767 data2 data1

  16. CONSTANT TABLES • A lot of table lookup instructions in the original algorithm. • Access constant table from cache is slow. • Increase cache miss rate • increase # of memory access instructions • Using constant table! • Tensilica has tables come with the processor. • Almost no extra cost to access the tables.

  17. CONSTANT TABLES

  18. TWO STREAM RESULTS

  19. TWO STREAM RESULTS

  20. COMPARISON

  21. CONCLUSION • TIE extensions and improved code efficiency resulted in an order of magnitude improvement from our original • Constant table helps to decrease cache access and cache miss. • Tensilica is also able to handle control-oriented applications.

More Related