210 likes | 378 Views
Scalable Subgraph Mapping for Acyclic Computation Accelerators. Nate Clark, Amir Hormati, Scott Mahlke, Sami Yehia University of Michigan ARM, Ltd. ASIP Architecture. Tightly integrated, atomic execution Examples: MAC, dot-product, Galois field . W B. I s s u e. F e t c h.
E N D
Scalable Subgraph Mapping forAcyclic Computation Accelerators Nate Clark, Amir Hormati, Scott Mahlke, Sami Yehia University of Michigan ARM, Ltd. 1
ASIP Architecture • Tightly integrated, atomic execution • Examples: MAC, dot-product, Galois field W B I s s u e F e t c h Accel. … … ALU ALU 2
* +/- +/- What Are We Solving? 3
How Do People Solve This Problem? • Hand coding • Greedy algorithms 4
Greedy Algorithms • Locally optimal decisions • Example – partition problem • Divide numbers into two sets with near equal sum Set 1 5 5, 3 Input Sum: 8 4, 3, 3, 5, 3 3, 3, 3 3, 3 3 5, 4, 3, 3, 3 4, 3, 3, 3 Set 2 4 4, 3 4, 3, 3 Sum: 10 5
Greedy Pros / Cons • Cons: room for improvement • Pros: fast, easy to implement Greedy Optimal 5, 3 5, 4 Sum: 8 Sum: 9 4, 3, 3 3, 3, 3 Sum: 10 Sum: 9 6
Greedy Full Compilation Problems • Frequently NP-complete • Scheduling, superblock selection, allocation • Greedy algorithms prevalent 7
Compilation for Acyclic Accelerators • Define target • Describe greedy algorithm • Develop FEU algorithm • Compare runtime, quality 8
Input2 Input3 Input4 Input1 Output1 Output2 Target Accelerator • Array of FUs • Arithmetic/logic • Sparse interconnect • 82% important subgraphs 9
Live In Live In Live In SHR SHL 1 2 AND 5 SHL 8 AND SHR 3 4 MPY 7 SHR 10 SUB 6 ADD 9 ADD 11 SHL 12 SHL 13 SHR 15 SHR 14 CMP 16 Live out BEQ Live out 17 Subgraph Mapping • Select parts of applications to accelerate 10
Live In Live In Live In SHR SHL 1 2 AND 5 SHL 8 AND SHR 3 4 MPY 7 SHR 10 SUB 6 ADD 9 ADD 11 SHL 12 SHL 13 SHR 15 SHR 14 CMP 16 Live out BEQ Live out 17 Subgraph Mapping: 3 Steps • Enumerate • Find candidates • Prune • Remove invalid candidates • Selection • Pick candidates for accel. 11
Live In Live In Live In SHR SHL 1 2 AND 5 SHL 8 AND SHR 3 4 MPY 7 SHR 10 SUB 6 ADD 9 ADD 11 SHL 12 SHL 13 SHR 15 SHR 14 CMP 16 Live out BEQ Live out 17 Greedy Subgraph Mapping Speedup = 17/7 = 2.4 12
Greedy Summary • Enumeration • Restricted • Prune • Unnecessary • Selection • Implicit Live In Live In Live In SHR SHL 1 2 AND 5 SHL 8 AND SHR 3 4 MPY 7 SHR 10 SUB 6 ADD 9 ADD 11 SHL 12 SHL 13 SHR 15 SHR 14 CMP 16 Live out BEQ Live out 17 13
Full Enumeration- Unate Covering (FEU) • Enumerate • All subgraphs • Prune • Subgraph isomorphism • Selection • Unate covering • Shrink search space to control runtime 14
SHR SHL 1 2 SHR SHL 1 2 AND 3 SHL 8 AND 3 ADD 9 SHR 10 SUB SHL 6 12 ADD 11 SHR 14 SHL 13 Full Enumeration Live In Live In Live In SHR SHL 1 2 AND 5 SHL 8 AND SHR 3 4 MPY 7 SHR 10 SUB 6 ADD 9 ADD 11 SHL 12 SHL 13 SHR 15 SHR 14 CMP 16 Live out BEQ Live out 17 15
SHL 8 AND 3 << << 8 << << << << * * * * * * * Logic 3 3 3 3 3 3 SUB A A A A A A A B B B B B B B C C C C C C C 6 ADD 11 >> >> 10 10 >> >> >> >> >> >> >> >> 10 >> 6 6 6 +/- 6 +/- 6 D D D D D D D E E E E E E E F F F F F F F +/- +/- +/- +/- +/- +/- +/- 11 +/- +/- 11 11 11 +/- G G G G G G G H H H H H H H Subgraph Isomorphism Pruning • Ensure subgraphs can run on accelerator SHRA 10 16
Live In Live In Live In Live In Live In Live In SHR SHL 1 2 SHR SHL 1 2 AND 5 AND 5 SHL 8 SHL AND SHR 8 3 4 AND SHR 3 4 MPY 7 MPY 7 SHR 10 SHR SUB 10 6 SUB ADD 6 9 ADD 9 ADD 11 ADD 11 SHL 12 SHL SHL 13 12 SHL 13 SHR 15 SHR SHR 15 14 SHR 14 CMP 16 CMP 16 Live out BEQ Live out 17 Live out BEQ Live out 17 Unate Covering Selection • Place as many ops in as few subgraphs as possible Subgraphs D B Speedup = 17/5 = 3.4 Ops E 17
FEU Runtime 99.5% 98% 18
Greedy Full Conclusions • Greedy algorithms: opportunity! • FEU subgraph mapping • Better: 50% more speedup • Fast: >98% blocks less than 1 second 20
Questions ? ? ? ? ? ? ? ? ? ? ? ? 21