540 likes | 638 Views
Bowling Green State University. Implementation of Morton Layout for Large Arrays. Presented by: Sharad Ratna Bajracharya Advisor: Prof. Larry Dunning. 23 rd April 2004. Outline. Introduction Objectives Implementation Samples Improvement Recommendation Conclusion. Introduction.
E N D
Bowling Green State University Implementation of Morton Layout for Large Arrays Presented by: Sharad Ratna Bajracharya Advisor: Prof. Larry Dunning 23rd April 2004
Outline • Introduction • Objectives • Implementation • Samples • Improvement • Recommendation • Conclusion
Introduction • Morton Layout is used in two dimensional array. • Performance of Morton Layout is comparatively better than row-major or column-major array representation.
Introduction continues... • Reports on analysis of the Morton Layout for the performance and efficiency : • An exhaustive evaluation of row-major, column-major and Morton Layouts for large two-dimensional arrays; Jeyarajan Thiyagalingam, Olav Beckman, Paul H. J. Kelly. • Is Morton Layout competitive for large two-dimensional arrays?; Jeyarajan Thiyagalingam and Paul H. J. Kelly. • Improving the Performance of Morton Layout by Array Alignment and Loop Unrolling; Jeyarajan Thiyagalingam, Olav Beckman, Paul H. J. Kelly.
Introduction continues... • General Row Major Array Representation • Row major ordering assigns successive elements, moving across the rows and then down the columns, to successive memory locations. 0 1 2 3 4 5 6 78 9 10 1112 13 14 15
Introduction continues... • Column Major array representation. 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15
Introduction continues... • Morton layout is a compromise storage layout between the programming language mandated layouts such as row-major and column-major. 0 1 2 3 0 1 4 5 4 5 6 7 2 3 6 7 8 9 10 11 8 9 12 1312 13 14 15 10 11 14 15 (Row Major) (Morton Storage Layout)
Introduction continues... • Morton storage layout works with almost equal overhead whether traversed row-wise or column-wise. • Morton layout works fine with square two dimensional array, which size is power of 2 such as 2x2, 4x4, 8x8 etc.
Introduction continues... • For non-square matrix, it waste lots of memory spaces.0 1 2 3 0 1 4 5 4 5 6 7 2 3 6 78 9 10 11 8 9 XX 10 11(Row Major) (Morton Storage Layout)
Introduction continues... • How Morton Layout Works? • For any subscript of 2 dimensional array such as array[ 2 , 3 ]Binary value of row 2 -> 1 0Binary value of col 3 -> 1 1Morton Layout stores at 1 1 0 1 location, i.e. 13th memory location. • Also known as Zip Fastening Array Layout.
Introduction continues... • Consider row major large array1 2 3 4 5 6 7 …………………….10001001 1002 1003 1004 1005 1006 1007 ………………...20002001 2002 ……………………………………………………………………………………...9001 9002 9003 9004 9005 9006 9007 ………………10000. . . . . . . . • Result is cache miss, page faults and poor performance.
Objectives • Improve cache miss and page fault characteristics in Large Array using Morton Array Layouts. • Reduce wasted memory in Morton layout. • Improvement in extendibility of arrays.
Implementation • Interleaved bit patterns: 4 -> 0 1 0 0 -> 0 0 1 0 0 0 09 -> 1 0 0 1 -> 1 0 0 0 0 0 115 -> 1 1 1 1 -> 1 0 1 0 1 0 1(Interleaved Bits)
Implementation continues • Bit interleaved increment and decrement: • Bit interleaved increment:101 + 1 -> 1 0 0 0 1 + 1110 -> 1 0 1 0 0(Changes are in interleaved bits) • For any value “a”, bit interleaved increment is given by:a+1 = ((a | 0xAAAAAAAA) + 1) & 0x55555555 • 0xAAAAAAAA=1010……..10101010 (32 bits) • 0x55555555 = 0101…… .01010101 (32 bits)
Implementation continues • Bit interleaved increment…a+1 = ((a | 0xAAAAAAAA) + 1) & 0x55555555 0 0 0 1 -> Bit interleaved 1 (0 1)OR 1 0 1 01 0 1 1 + 11 1 0 0AND 0 1 0 10 1 0 0 -> Bit interleaved 2 (1 0)
Implementation continues • More examples of bit interleaved increment:0 0 0 0 0 + 1 = 0 0 0 0 1 0 0 0 0 1 + 1 = 0 0 1 0 1 0 0 1 0 1 + 1 = 1 0 0 0 0 1 0 0 0 0 + 1 =1 0 0 0 1 1 0 0 0 1 + 1…
Implementation continues • Bit interleaved Decrement:For example,1 0 0 - 1 -> 1 0 0 0 0 - 11 1 -> 0 0 1 0 1(Changes are in interleaved bits) • For any value “a”, bit interleaved decrement is given by: a-1 = (a - 1) & 0x55555555Where, • 0x55555555 = 0101……01010101 (32 bits)
Implementation continues • Bit interleaved decrement…a-1 = (a -1) & 0x555555550 1 0 0 0 0 -> Bit interleaved 4 (100) - 10 0 1 1 1 1AND 0 1 0 1 0 10 0 0 1 0 1 -> Bit interleaved 3 (11)
Implementation continues • More examples of bit interleaved decrement:…………...1 0 0 0 0 - 1 = 0 0 1 0 1 0 0 1 0 1 - 1 =0 0 1 0 00 0 1 0 0 - 1 = 0 0 0 0 1 0 0 0 0 1 - 1 = 0 0 0 0 0
Implementation continues • Morton Layout Array representation can be implemented in two ways: • First method is by maintaining lookup table of bit interleaved array subscript for address calculation. For example,0 -> 0 0 0 01 -> 0 0 0 12 -> 0 1 0 03 -> 0 1 0 1
Implementation continues • For example, any array subscript viz. [ 2 , 3 ]Value of 2 (1 0 ) from lookuptable -> 0100Value of 3 ( 1 1) from lookuptable -> 0101To get the Morton layout address,ROW bitwise shift 1 + COL0100<<1 + 01011000+0101, that is, 1 0 0 0 + 0 1 0 11 1 0 1 (zipped address)
Implementation continues • Second Method to implement Morton Array Layout Representation is by only using bit interleaved increment and decrement without lookuptable.
Implementation continues • Implemented in C++ as two dimensional array matrix class with Standard Template Library (STL) compatibility so as to make it generic, that is, it is not tied to any particular data structure or object type. • Internally data are stored in STL vector sequentially.
Implementation continues • Direct accessing the element of array matrix by using array subscript is implemented using lookup table. • Random Iterators are defined which make use of bit interleaved increment and decrement without using lookup table. • Iterators are generalization of pointers. They are objects that point to other objects.
Implementation continues • Different types of random iterators are implemented to provide the flexibility in using the matrix class, such as, • Row Major iterator • Column Major iterator • Diagonal iterator • Row iterator / Super row iterator • Column iterator / Super column iterator • Reverse Row Major iterator
Samples • Using Row Major Iterator: Original Data:6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7 Sorted Data: -9 -9 -8 -8 -8 -8 -7 -6 -6 -5 -4 -4 -2 -2 -2 -1 1 1 2 3 5 5 6 7 Start End //Row Major sorting using STL Sort() mat1=matori; cout<<mat1<<endl; sort(mat1.begin(), mat1.end()); cout<<"Sorted Data:"<<endl; cout<<mat1<<endl;
Samples continues... • Using Column Major iterator: Sorted Data: -9 -7 -2 2 -9 -6 -2 3 -8 -6 -2 5 -8 -5 -1 5 -8 -4 1 6 -8 -4 1 7 Original Data: 6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7 Start End //Column Major sorting using STL Sort() mat1=matori; cout<<mat1<<endl; sort(mat1.cbegin(), mat1.cend()); cout<<"Sorted Data:"<<endl; cout<<mat1<<endl;
Samples continues... • Using super row iterator: Original Data: 6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7 Sorted Data: -9 -8 -1 6 -9 -8 -6 -2 -6 -5 -4 -2 -8 -4 2 3 -7 -2 1 5 -8 1 5 7 //Row by row sorting using STL Sort() mat1=matori; cout<<mat1<<endl; for(riter=mat1.r2rbegin();riter!=mat1.r2rend();riter++) { sort((*riter).begin(), (*riter).end()); } cout<<mat1<<endl;
Samples continues... • Using super column iterator: Original Data: 6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7 Sorted Data: -8 -9 -9 -8 -2 -8 -8 -4 -2 -6 -7 -2 2 -5 -6 -1 5 1 -4 5 6 3 1 7 //Column by column sorting using STL Sort() mat1=matori; cout<<mat1<<endl; for(citer=mat1.c2cbegin();citer!=mat1.c2cend();citer++) { sort((*citer).begin(), (*citer).end()); } cout<<mat1<<endl;
Samples continues... • Using Resize function: Sorted Data: 6 -9 -8 -1 0 0 -8 -6 -9 -2 0 0 -2 -5 -6 -4 0 0 2 3 -4 -8 0 0 -2 1 -7 5 0 0 5 -8 1 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Original Data: 6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7 //Resizing the matrix mat1=matori; cout<<mat1<<endl; mat1.resize(8, 8, 0); cout<<mat1<<endl;
Improvement • Morton array representation can be improved if we can utilize the wasted spaces for non-square matrices. • This can be achieved to some extent by using partial interleaved bit patterns. • Portion of bits are interleaved and remaining bits are left as it is. This helps in utilizing the wasted space.
Improvement continues • For example: Let us consider matrix of size 20 x 4 (actual reqd. space 80). Using Morton layout, it will require 1000001010 + 0000000101 = 1000001111=527+1 =528 spacesWith modified version, it will require1001010 + 0000101 = 1001111 = 79+1 = 80 spaces ->Improved !!!
Improvement continues • More details… 1000001010 ->19 (row)+ 0000000101 -> 3 (col) 1000001111 ->527 (Morton location) 100001010 -> 19 (row)+ 000000101 -> 3 (col) 100001111 -> 79 (Improved Morton) Extra interleaving bits removed
Improvement continues • In the improved version, only N bits are interleaved where N is total no. of bits in the smallest of total “row-1” and “column-1” in row x column matrix. • For example, in 20x4 matrix, the smallest no. is 4 and 4-1=3 which is “11” in binary, that is N=2 as 3 is represented by 2 bits “11”.
Improvement continues • Interleaving N bits and leaving remaining bits. For example, for rows=20-1=19=10011 100 1010 ->2 bits are interleavedN=2 row interleaved bits.For columns=4-1=3=11000 0101 -> 2 bits are interleavedN=2 column interleaved bits.
Improvement continues • Bit interleaved increment/decrement still works. • For bit interleaved Increment:0011010 -> Bit interleaved 7 (111)OR 000 0101 -> Bit Mask 001 1111 + 1 010 0000AND 111 1010 -> ~ Bit Mask (complement) 010 0000 -> Bit interleaved 8 (1000)
Improvement continues • For bit interleaved Decrement:0100000 -> Bit interleaved 8 (1000) - 1 001 1111AND 111 1010 -> ~ Bit Mask 001 1010 -> Bit interleaved 7 (111)
Improvement continues • Improved array location is calculated by adding partial bit interleaved row and column.100 10 10 -> 19+ 000 01 01 -> 3 100 11 11 = 79 • This method utilizes the wasted space to some extent but it does not work better than original Morton layout for square matrix which are not power of 2.
Improvement continues • Improvement for square matrices: • Lets consider matrix NxN and say we want n bits to be interleaved. There is no change in the remaining bits of column bit patterns but for row bit patterns, remaining bits will have special bit patterns which are multiple of N/2n . So, separate lookuptables are required for row and column bit patterns. • Row bit and column bit patterns are added to get the modified storage location.
Improvement continues • For example, 17x17 matrix with n=2 interleaved bits (actual 289 spaces reqd.): • Space required by normal Morton Layout will be 1000000000+ 0100000000=1100000000 =768+1=769 • With Improved version, we have, 17/22 =5Row Lookuptable Col Lookuptable0000 0000 00000 00000000 0010 10000 00010000 1000 20000 01000000 1010 30000 01010101 0000 40001 00000101 0010… 5... 0001 0001... Changed by 5 = 101
Improvement continues • For 17x17 matrix, • 16 from row lookuptable will be,10100 0000 • 16 from col lookuptable will be,00100 0000 • Total space required will be, 10100 0000+00100 0000Improved!!! 11000 0000 -> 384 + 1=385 spaces reqd.
Improvement continues • This technique used for the square matrix still leaves some extra space as shown in the example of 17x17 matrix. In some cases, it even works perfectly. However its an improvement over Morton layout for square matrices which are not power of 2.
Improvement continues • Generalized improvement for both square and non-square matrices: • Each row and column have respective partially interleaved bit patterns. • Either row or column whichever is greater, will have some non-interleaved and some special bit patterns. • Different lookup tables for rows and columns are required to implement.
Multiple of j bit pattern i regular remaining bits n interleaved bits Remaining bits <not used> Improvement continues • Let’s consider matrix of RxC with n interleaved bits then r= R/2n and c= C/2n • If r>c, row will have i regular non-interleaved bits and some special bit patterns of multiple of j, or vice versa. • If r>c:For Row:For Column: n interleaved bits
Improvement continues • For r>c, i abs(r - cx2i) is the least where i =1, 2, 3,.…..j = MAX(r/2i, c) • For c>r,i abs(c - rx2i) is the least where i =1, 2, 3,.…..j = MAX(r, c/2i)
Improvement continues • For example, consider 70x13 matrix with n=2 interleaved bits (actually 910 spaces required). Space required by normal Morton Layout will be,10000000100010 + 00000001010000= 10000001110010=8306+1=8307Here,R=70, C=13, r= 70/22 and c= 13/22 We have, r>c,When i=1, abs(r - cx21)=10When i=2, abs(r - cx22)=2When i=3, abs(r - cx23)=14 i=2 (only used by row in this case) j= MAX(r/22, c)=5
Improvement continues • Row Lookuptable Col Lookuptable00000 00 0000 000000 00 000000000 00 0010 100000 00 000100000 00 1000 200000 00 010000000 00 1010 300000 00010100000 01 0000 4 00001 00 000000000 01 0010… 5... 00001 000001………… 00000 11 1010 15 00101 00 0000... 16 Changed by 5 = 101 Only used by Rowbecause row > col
Improvement continues • For 70x13 matrix, • 69 from row lookuptable will be,10100 01 0010 • 12 from col lookuptable will be,00011 000000 • Total space required will be, 10100 01 0010 +00011 000000 Improved!!! 10111 01 0010 -> 1490 + 1=1491 spaces
Recommendations • Devise more efficient algorithms to utilize the wasted spaces by Morton Array Layout. • If an optimal compromised algorithm is devised which works with both non-square and square matrices, then it could be new research paper or graduate research project.
Conclusion • Morton Array Layout and its variant to improve the wasted spaces by Morton Layout was implemented in C++. • Improvements on Morton Layout such as improvement for non-square and square matrices was introduced. • But still optimal algorithm is to be researched.