350 likes | 454 Views
Transforming Linear Algebra Libraries: From Abstraction to Parallelism. Ernie Chan. Motivation. Statically. Outline. Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion. Inversion of a Triangular Matrix.
E N D
Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan HIPS 2010
Motivation Statically HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Inversion of a Triangular Matrix • Formal Linear Algebra Methods Environment (FLAME) • High-level abstractions for expressing linear algebra algorithms • Triangular Inversion (Trinv) R := U-1 HIPS 2010
Inversion of a Triangular Matrix HIPS 2010
Inversion of a Triangular Matrix • LAPACK-style Implementation DO J = 1, N, NB JB = MIN( NB, N-J+1 ) CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ JB, N-J-JB+1, -ONE, A( J, J ), LDA, $ A( J, J+JB ), LDA ) CALL DGEMM( ‘No transpose’, ‘No transpose’, $ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA, $ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA ) CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ J-1, JB, ONE, A( J, J ), LDA, $ A( 1, J ), LDA ) CALL DTRTI2( ‘Upper’, ‘Non-unit’, $ JB, A( J, J ), LDA, INFO ) ENDDO HIPS 2010
Inversion of a Triangular Matrix • FLASH • Matrix of matrices HIPS 2010
Inversion of a Triangular Matrix FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ) { FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ******** */ /* **************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, 1, 1, FLA_BR ); /*-------------------------------------------------------*/ FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_MINUS_ONE, A11, A12 ); FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE, FLA_ONE, A01, A12, FLA_ONE, A02 ); FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A01 ); FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 ); /*-------------------------------------------------------*/ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ********** */ /* ************* */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); } HIPS 2010
Inversion of a Triangular Matrix • Extensible Markup Language (XML) <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <Update> <Statement name="FLA_Trsm"> <Option type="side">FLA_LEFT</Option> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter>FLA_MINUS_ONE</Parameter> <Parameter partition="11">A<Parameter> <Parameter partition="12">A<Parameter> <Statement name="FLA_Gemm"> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Parameter>FLA_ONE<Parameter> HIPS 2010
Inversion of a Triangular Matrix • Extensible Markup Language (XML) Cont. <Parameter partition="01">A</Parameter> <Parameter partition="12">A</Parameter> <Parameter>FLA_ONE</Parameter> <Parameter partition="02">A</Parameter> </Statement> <Statement name="FLA_Trsm"> <Option type="side">FLA_RIGHT</Option> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter>FLA_ONE</Parameter> <Parameter partition="11">A</Parameter> <Parameter partition="01">A</Parameter> </Statement> <Statement name="FLA_Trinv"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter partition="11">A</Parameter> </Statement> </Update> </Loop> </Function> HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Requisite Semantic Information • Partitioning Scheme <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010
Requisite Semantic Information • Problem Size* <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010
Requisite Semantic Information • Updates <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010
Requisite Semantic Information • Input and Output Parameters <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trsm"> <Declaration> <Operand type=“scalar“ inout=“in">alpha</Operand> <Operand type="matrix“ inout=“in">A</Operand> <Operand type="matrix“ inout=“both“>B</Operand> </Declaration> </Function> <Function name="FLA_Gemm"> <Declaration> <Operand type=“scalar“ inout=“in">alpha</Operand> <Operand type="matrix“ inout=“in">A</Operand> <Operand type="matrix“ inout=“in">B</Operand> <Operand type=“scalar“ inout=“in">beta</Operand> <Operand type="matrix“ inout="both">C</Operand> </Declaration> </Function> <Function name="FLA_Trinv"> <Declaration> <Operand type="matrix“ inout="both">A</Operand> </Declaration> </Function> HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Static Generation of a DAG • Code Generation • Convert XML representation to FLASH code generation intermediary • Annotated with input and output information • Create directed acyclic graph (DAG) by statically unrolling the loop • Operations on submatrix blocks (tasks) are vertices • Data dependencies between tasks are edges HIPS 2010
Static Generation of a DAG • Data Dependencies • Flow (read-after-write) S1: A = B + C; S2: D = A + E; • Anti (write-after-read) S3: F = A + G; S4: A = H + I; • Output (write-after-write) S5: A = J + K; S6: A = L + M; HIPS 2010
Static Generation of a DAG HIPS 2010
Static Generation of a DAG • Problem Size • Problem size cannot be determined a priori • Fix the block size or loop unrolling factor • Balance between instruction footprint and data granularity of tasks • Example • Trinv on 3x3 matrix of blocks HIPS 2010
Static Generation of a DAG • Trinv • Iteration 1 Trsm0 Trsm1 Trinv2 HIPS 2010
Static Generation of a DAG • Trinv • Iteration 2 Trsm5 Gemm4 Trinv6 Trsm3 HIPS 2010
Static Generation of a DAG • Trinv • Iteration 3 Trsm7 Trsm8 Trinv9 HIPS 2010
Static Generation of a DAG Trsm0 Trsm1 Trinv2 Trsm3 Gemm4 Trsm5 Trinv6 Trsm7 Trsm8 Trinv9 HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Performance • LabVIEW • Graphical, data flow programming language (G) • Anti-dependencies cannot exist in G • Copies are made when wire is split HIPS 2010
Performance HIPS 2010
Performance • Target Architecture • 16-core AMD processor • 4 socket quad-core Opteron • 1.9 GHz • 4 GB of RAM per socket • LabVIEW 8.6 • Windows XP • Basic Linear Algebra Subprograms (BLAS) • MKL 7.2 HIPS 2010
Performance HIPS 2010
Performance • Results • Parallelism • Exploit parallelism inherent within DAG • Hierarchical matrix storage • Spatial locality • Overhead • Copy matrix from flat row-major storage to hierarchical matrix and back HIPS 2010
Performance HIPS 2010
Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010
Conclusion • Instantiate linear algebra algorithm using a code generation intermediary • Statically produce a directed acyclic graph by fixing block size or loop unrolling factor XML → FLASH → DAG HIPS 2010
Acknowledgments • Jim Nagle, Robert van de Geijn • We thank the other members of FLAME team for their support • Funding • National Instruments • NSF Grants • CCF—0540926 • CCF—0702714 HIPS 2010
Conclusion • More Information http://www.cs.utexas.edu/~flame • Questions? echan@cs.utexas.edu HIPS 2010