310 likes | 510 Views
計算機組織與組合語言. Teacher : cyy P resenter : B98902071 康秩群. Outline of this slide. The 0 -bits counting problem Naïve algorithm Querying table approach Counting 1’s and subtracted by 16 Eliminating algorithm Parallelly counting algorithm Other improvement skills. 數圈圈問題.
E N D
計算機組織與組合語言 Teacher : cyy Presenter : B98902071 康秩群
Outline of this slide • The 0-bits counting problem • Naïve algorithm • Querying table approach • Counting 1’s and subtracted by 16 • Eliminating algorithm • Parallelly counting algorithm • Other improvement skills
數圈圈問題 • Input: An array of 16-bits integers. (Size of the array is no more than 32) • Output: The amount of 0-bits in the array.
Naïve algorithm b = 0; do { d = a[--c]; r2 = 1; do { if (d & r2 == 0) b++; r2 <<= 1; } while (r2 != 0); } while (c > 0); return b;
Time complexity The above algorithm runs in [amount of 0-bits] *5 + [amount of 1-bits] *4 + !@#$ = O(5n) = O(n)
Performance • 4032 clocks • Rank #39
Querying table approach 0110110110011110 group1group2group3group4 0000 : 4 0’s 0001 : 3 0’s 0010 : 3 0’s 0011 : 2 0’s ……………….. 1111 : 0 0’s
Constructing table int C0[16]={4,3,3,2, 3,2,2,1, 3,2,2,1, 2,1,1,0};
Querying table do { d = a[--c]; b += C0[d & 0xF]; d >>= 4; d &= 0xFFF; b += C0[d & 0xF]; d >>= 4; b += C0[d & 0xF]; d >>= 4; b += C0[d & 0xF]; } while (c > 0);
Time complexity The above algorithm runs in [amount of integers] *24 + 42(constructig table) = O( (24/16)n ) = O(1.5n) = O(n)
Performance • 1578 clocks • Rank #18
Counting 1’s and subtracted by 16 • You can construct a larger table such as C0[64], and divide the integer to 6-6-4. • Run time is no less than ¾ of the above algorithm (>=1200). • How about another view point that count 1’s and then subtracted by 16. • There many interesting algorithms!
Eliminating algorithm while (n){ count++; n &= n-1; } • Twocases : (1.) *************1 (2.)*****10...0000
Case 1 • n = *************1 • n-1 =*************0 ----------------------- • n&n-1 = *************0 • A one was eliminated.
Case 2 • n = *****10...0000 • n-1 =*****01...1111 ----------------------- • n&n-1 = *****00...0000 • A one was eliminated.
Eliminate a 1 each round • When n is eliminated to zero, that’s the end!
Implement b = c << 4; //c * 16 do { d = a[--c]; while (d){ b--; d &= d - 1; } } while (c > 0); return b;
Time complexity The above algorithm runs in [amount of 1-bits] *5 + [size of array] *4 + 4 = O( [5 + (4/16)]n ) = O(5.25n) = O(n)
Performance • 2100 clocks • Rank #27 • Slower? It depends on the amount of 1’s. • It’s faster then the above before rejudge. • Obviously, the amount of 1-bits was increased. • But the code is short, good to do other things.
Parallelly counting algorithm • Similar as the others 00(0) 0 ones →00 – 0 = 00(0) 01(1) 1 ones → 01 – 0 = 01(1) 10(2) 1 ones → 10 – 1 = 01(1) 11(3) 2 ones → 11 – 1 = 10(2) • [the original two bit] – [the left bit] • then add them all iteratively
Parallelly counting algorithm do { x = x - ((x >> 1) & 0x5555); x = (x&0x3333) + ((x>>2) & 0x3333); x = (x + (x >> 4)); b -= x & 0xF; b -= (x >>8) & 0xF; } while (c > 0);
Time complexity The above algorithm runs in [amount of integers] *18 + 9 = O( (18/16)n ) = O(1.125n) = O(n)
Performance • 1224 clocks • Rank #10
Processing 3 integers • 3個數字一組一起算(同阿蹦) • 4個bits可表示0~15,但同一組1的數量最多只有4個 • 故算出每4bits中1的數量後可塞進3組數字(4 * 3 = 12 < 15) • 後續動作可一起做,節省兩組的後續計算時間 • Code有點長就不附上了,有興趣請與我聯繫 1111 → 0100 1111 → 0100 1111 → 0100 ----------------- 1100
Time complexity The above algorithm runs in ceil (amount of integers/3) *45 + 10 = O( (15/16)n ) = O(0.9375n) = O(n)
Performance • 1090 clocks • Rank #7
Other improvement skills • 攤開迴圈 • 以該code長度可攤開四組(12個數字) • 尾端未滿三組須跳出,盡可能將不影響之判斷式移除 • 可順便測得兩組測資分別為16、32組數字 • 在main裡直接輸入直接算 (for part #3) • 亦可攤開三組(9個數字)
Final performance • Part #2: 1002 clocks • Rank #1 (Can run even faster by combining the others’ skills) • Part #3: 674 clocks • Rank #1
Appreciation • Thanks for your attention. • Thanks for Professor hil’s slides prototype.