1 / 35

Transforming Linear Algebra Libraries: From Abstraction to Parallelism

Transforming Linear Algebra Libraries: From Abstraction to Parallelism. Ernie Chan. Motivation. Statically. Outline. Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion. Inversion of a Triangular Matrix.

kyros
Download Presentation

Transforming Linear Algebra Libraries: From Abstraction to Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan HIPS 2010

  2. Motivation Statically HIPS 2010

  3. Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010

  4. Inversion of a Triangular Matrix • Formal Linear Algebra Methods Environment (FLAME) • High-level abstractions for expressing linear algebra algorithms • Triangular Inversion (Trinv) R := U-1 HIPS 2010

  5. Inversion of a Triangular Matrix HIPS 2010

  6. Inversion of a Triangular Matrix • LAPACK-style Implementation DO J = 1, N, NB JB = MIN( NB, N-J+1 ) CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ JB, N-J-JB+1, -ONE, A( J, J ), LDA, $ A( J, J+JB ), LDA ) CALL DGEMM( ‘No transpose’, ‘No transpose’, $ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA, $ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA ) CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ J-1, JB, ONE, A( J, J ), LDA, $ A( 1, J ), LDA ) CALL DTRTI2( ‘Upper’, ‘Non-unit’, $ JB, A( J, J ), LDA, INFO ) ENDDO HIPS 2010

  7. Inversion of a Triangular Matrix • FLASH • Matrix of matrices HIPS 2010

  8. Inversion of a Triangular Matrix FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ) { FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ******** */ /* **************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, 1, 1, FLA_BR ); /*-------------------------------------------------------*/ FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_MINUS_ONE, A11, A12 ); FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE, FLA_ONE, A01, A12, FLA_ONE, A02 ); FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A01 ); FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 ); /*-------------------------------------------------------*/ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ********** */ /* ************* */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); } HIPS 2010

  9. Inversion of a Triangular Matrix • Extensible Markup Language (XML) <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <Update> <Statement name="FLA_Trsm"> <Option type="side">FLA_LEFT</Option> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter>FLA_MINUS_ONE</Parameter> <Parameter partition="11">A<Parameter> <Parameter partition="12">A<Parameter> <Statement name="FLA_Gemm"> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Parameter>FLA_ONE<Parameter> HIPS 2010

  10. Inversion of a Triangular Matrix • Extensible Markup Language (XML) Cont. <Parameter partition="01">A</Parameter> <Parameter partition="12">A</Parameter> <Parameter>FLA_ONE</Parameter> <Parameter partition="02">A</Parameter> </Statement> <Statement name="FLA_Trsm"> <Option type="side">FLA_RIGHT</Option> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="trans">FLA_NO_TRANSPOSE</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter>FLA_ONE</Parameter> <Parameter partition="11">A</Parameter> <Parameter partition="01">A</Parameter> </Statement> <Statement name="FLA_Trinv"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Option type="diag">FLA_NONUNIT_DIAG</Option> <Parameter partition="11">A</Parameter> </Statement> </Update> </Loop> </Function> HIPS 2010

  11. Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010

  12. Requisite Semantic Information • Partitioning Scheme <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010

  13. Requisite Semantic Information • Problem Size* <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010

  14. Requisite Semantic Information • Updates <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trinv" type="blk" variant="3"> <Option type="uplo">FLA_UPPER_TRIANGULAR</Option> <Declaration> <Operand type="matrix" direction="TL->BR" inout="both">A</Operand> </Declaration> <Loop> <Guard>A</Guard> <!-- while m( ATL ) < m( A ) --> <Update> <Statement name="FLA_Trsm“> <!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 --> </Statement> <Statement name="FLA_Gemm“> <!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 --> </Statement> <Statement name="FLA_Trsm“> <!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 --> </Statement> <Statement name="FLA_Trinv“> <!–- ‘Upper’, ‘Non-unit’, A11 --> </Statement> </Update> </Loop> </Function> HIPS 2010

  15. Requisite Semantic Information • Input and Output Parameters <?xml version="1.0" encoding="ISO-8859-1"?> <Function name="FLA_Trsm"> <Declaration> <Operand type=“scalar“ inout=“in">alpha</Operand> <Operand type="matrix“ inout=“in">A</Operand> <Operand type="matrix“ inout=“both“>B</Operand> </Declaration> </Function> <Function name="FLA_Gemm"> <Declaration> <Operand type=“scalar“ inout=“in">alpha</Operand> <Operand type="matrix“ inout=“in">A</Operand> <Operand type="matrix“ inout=“in">B</Operand> <Operand type=“scalar“ inout=“in">beta</Operand> <Operand type="matrix“ inout="both">C</Operand> </Declaration> </Function> <Function name="FLA_Trinv"> <Declaration> <Operand type="matrix“ inout="both">A</Operand> </Declaration> </Function> HIPS 2010

  16. Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010

  17. Static Generation of a DAG • Code Generation • Convert XML representation to FLASH code generation intermediary • Annotated with input and output information • Create directed acyclic graph (DAG) by statically unrolling the loop • Operations on submatrix blocks (tasks) are vertices • Data dependencies between tasks are edges HIPS 2010

  18. Static Generation of a DAG • Data Dependencies • Flow (read-after-write) S1: A = B + C; S2: D = A + E; • Anti (write-after-read) S3: F = A + G; S4: A = H + I; • Output (write-after-write) S5: A = J + K; S6: A = L + M; HIPS 2010

  19. Static Generation of a DAG HIPS 2010

  20. Static Generation of a DAG • Problem Size • Problem size cannot be determined a priori • Fix the block size or loop unrolling factor • Balance between instruction footprint and data granularity of tasks • Example • Trinv on 3x3 matrix of blocks HIPS 2010

  21. Static Generation of a DAG • Trinv • Iteration 1 Trsm0 Trsm1 Trinv2 HIPS 2010

  22. Static Generation of a DAG • Trinv • Iteration 2 Trsm5 Gemm4 Trinv6 Trsm3 HIPS 2010

  23. Static Generation of a DAG • Trinv • Iteration 3 Trsm7 Trsm8 Trinv9 HIPS 2010

  24. Static Generation of a DAG Trsm0 Trsm1 Trinv2 Trsm3 Gemm4 Trsm5 Trinv6 Trsm7 Trsm8 Trinv9 HIPS 2010

  25. Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010

  26. Performance • LabVIEW • Graphical, data flow programming language (G) • Anti-dependencies cannot exist in G • Copies are made when wire is split HIPS 2010

  27. Performance HIPS 2010

  28. Performance • Target Architecture • 16-core AMD processor • 4 socket quad-core Opteron • 1.9 GHz • 4 GB of RAM per socket • LabVIEW 8.6 • Windows XP • Basic Linear Algebra Subprograms (BLAS) • MKL 7.2 HIPS 2010

  29. Performance HIPS 2010

  30. Performance • Results • Parallelism • Exploit parallelism inherent within DAG • Hierarchical matrix storage • Spatial locality • Overhead • Copy matrix from flat row-major storage to hierarchical matrix and back HIPS 2010

  31. Performance HIPS 2010

  32. Outline • Inversion of a Triangular Matrix • Requisite Semantic Information • Static Generation of a Directed Acyclic Graph • Performance • Conclusion HIPS 2010

  33. Conclusion • Instantiate linear algebra algorithm using a code generation intermediary • Statically produce a directed acyclic graph by fixing block size or loop unrolling factor XML → FLASH → DAG HIPS 2010

  34. Acknowledgments • Jim Nagle, Robert van de Geijn • We thank the other members of FLAME team for their support • Funding • National Instruments • NSF Grants • CCF—0540926 • CCF—0702714 HIPS 2010

  35. Conclusion • More Information http://www.cs.utexas.edu/~flame • Questions? echan@cs.utexas.edu HIPS 2010

More Related