1 / 31

計算機組織與組合語言

計算機組織與組合語言. Teacher : cyy P resenter : B98902071 康秩群. Outline of this slide. The 0 -bits counting problem Naïve algorithm Querying table approach Counting 1’s and subtracted by 16 Eliminating algorithm Parallelly counting algorithm Other improvement skills. 數圈圈問題.

jered
Download Presentation

計算機組織與組合語言

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 計算機組織與組合語言 Teacher : cyy Presenter : B98902071 康秩群

  2. Outline of this slide • The 0-bits counting problem • Naïve algorithm • Querying table approach • Counting 1’s and subtracted by 16 • Eliminating algorithm • Parallelly counting algorithm • Other improvement skills

  3. 數圈圈問題 • Input: An array of 16-bits integers. (Size of the array is no more than 32) • Output: The amount of 0-bits in the array.

  4. Any naïve algorithm?

  5. Naïve algorithm b = 0; do { d = a[--c]; r2 = 1; do { if (d & r2 == 0) b++; r2 <<= 1; } while (r2 != 0); } while (c > 0); return b;

  6. Time complexity The above algorithm runs in [amount of 0-bits] *5 + [amount of 1-bits] *4 + !@#$ = O(5n) = O(n)

  7. Performance • 4032 clocks • Rank #39

  8. 「數圈圈問題」的複雜度

  9. Querying table approach 0110110110011110 group1group2group3group4 0000 : 4 0’s 0001 : 3 0’s 0010 : 3 0’s 0011 : 2 0’s ……………….. 1111 : 0 0’s

  10. Constructing table int C0[16]={4,3,3,2, 3,2,2,1, 3,2,2,1, 2,1,1,0};

  11. Querying table do { d = a[--c]; b += C0[d & 0xF]; d >>= 4; d &= 0xFFF; b += C0[d & 0xF]; d >>= 4; b += C0[d & 0xF]; d >>= 4; b += C0[d & 0xF]; } while (c > 0);

  12. Time complexity The above algorithm runs in [amount of integers] *24 + 42(constructig table) = O( (24/16)n ) = O(1.5n) = O(n)

  13. Performance • 1578 clocks • Rank #18

  14. Counting 1’s and subtracted by 16 • You can construct a larger table such as C0[64], and divide the integer to 6-6-4. • Run time is no less than ¾ of the above algorithm (>=1200). • How about another view point that count 1’s and then subtracted by 16. • There many interesting algorithms!

  15. Eliminating algorithm while (n){ count++; n &= n-1; } • Twocases : (1.) *************1 (2.)*****10...0000

  16. Case 1 • n = *************1 • n-1 =*************0 ----------------------- • n&n-1 = *************0 • A one was eliminated.

  17. Case 2 • n = *****10...0000 • n-1 =*****01...1111 ----------------------- • n&n-1 = *****00...0000 • A one was eliminated.

  18. Eliminate a 1 each round • When n is eliminated to zero, that’s the end!

  19. Implement b = c << 4; //c * 16 do { d = a[--c]; while (d){ b--; d &= d - 1; } } while (c > 0); return b;

  20. Time complexity The above algorithm runs in [amount of 1-bits] *5 + [size of array] *4 + 4 = O( [5 + (4/16)]n ) = O(5.25n) = O(n)

  21. Performance • 2100 clocks • Rank #27 • Slower? It depends on the amount of 1’s. • It’s faster then the above before rejudge. • Obviously, the amount of 1-bits was increased. • But the code is short, good to do other things.

  22. Parallelly counting algorithm • Similar as the others 00(0) 0 ones →00 – 0 = 00(0) 01(1) 1 ones → 01 – 0 = 01(1) 10(2) 1 ones → 10 – 1 = 01(1) 11(3) 2 ones → 11 – 1 = 10(2) • [the original two bit] – [the left bit] • then add them all iteratively

  23. Parallelly counting algorithm do { x = x - ((x >> 1) & 0x5555); x = (x&0x3333) + ((x>>2) & 0x3333); x = (x + (x >> 4)); b -= x & 0xF; b -= (x >>8) & 0xF; } while (c > 0);

  24. Time complexity The above algorithm runs in [amount of integers] *18 + 9 = O( (18/16)n ) = O(1.125n) = O(n)

  25. Performance • 1224 clocks • Rank #10

  26. Processing 3 integers • 3個數字一組一起算(同阿蹦) • 4個bits可表示0~15,但同一組1的數量最多只有4個 • 故算出每4bits中1的數量後可塞進3組數字(4 * 3 = 12 < 15) • 後續動作可一起做,節省兩組的後續計算時間 • Code有點長就不附上了,有興趣請與我聯繫 1111 → 0100 1111 → 0100 1111 → 0100 ----------------- 1100

  27. Time complexity The above algorithm runs in ceil (amount of integers/3) *45 + 10 = O( (15/16)n ) = O(0.9375n) = O(n)

  28. Performance • 1090 clocks • Rank #7

  29. Other improvement skills • 攤開迴圈 • 以該code長度可攤開四組(12個數字) • 尾端未滿三組須跳出,盡可能將不影響之判斷式移除 • 可順便測得兩組測資分別為16、32組數字 • 在main裡直接輸入直接算 (for part #3) • 亦可攤開三組(9個數字)

  30. Final performance • Part #2: 1002 clocks • Rank #1 (Can run even faster by combining the others’ skills) • Part #3: 674 clocks • Rank #1

  31. Appreciation • Thanks for your attention. • Thanks for Professor hil’s slides prototype.

More Related