100 likes | 123 Views
Multi-core Architecture and System. Advisor: Bo-Cheng C. Lai. Final Project: Static Timing Analysis on GPGPU. Static Timing Analysis. Given: Netlist and Gate delay ( # Fanin + # Fanout ) Evaluate: Critical Path Delay. 1. 1. 3. 5. 11. 3. 12. 1. 2. 2. 3. 9. 12. 3. 13. 1.
E N D
Multi-core Architecture and System Advisor: Bo-Cheng C. Lai Final Project: Static Timing Analysis on GPGPU
Static Timing Analysis • Given: • Netlist and Gate delay (#Fanin + #Fanout) • Evaluate: • Critical Path Delay 1 1 3 5 11 3 12 1 2 2 3 9 12 3 13 1 1 1 6 4 8 2 2 2 5 3 1 1
Parallel STA • How to parallelize ? • Levelization (Breath First Search) • Gates on the same level can be evaluated at the same time Level_0 Level_1 Level_2 Level_3 Level_4 1 1 3 5 3 11 12 1 2 2 9 3 12 3 13 1 1 1 6 4 8 2 2 2 5 3 1 1
Target Platform • General Purpose GPU (GPGPU) • Nvidia’s “Fermi” • 448 homogeneous cores • Cache system: L1 and L2 cache
Target Platform (cont.) • Relative Latency • Register • ~1 cycle • Shared memory / L1 cache • ~1 cycle • L2 cache • ~10 cycles • DRAM (global) • ~100 cycles No coherence Shared
Grading Policy • Correctness: 70% • Performance: 20% • Runtime: 20% (Kernel only) • Don’t count in the time to “Parse” and “Memcpy” • Report: 10% (Sent it to TA by 1/10) • Introduction (your methodology) • Challenges • Solutions • Results
Check Point • 2010/11/1: Training resources studying by yourself • 2010/11/1: Netlist parser • 2010/11/15: Compacted link-list • 2010/11/22: Additional one practice (as a HW) • 2010/11/29: Levelization • 2010/12/6: Verify netlist in GPGPU • 2010/12/20: Status check ( Brief pres.) • 2010/1/10: Final check
Appendix. Input Format INPUT(G1) • INPUT(G2) • INPUT(G3) • INPUT(G4) • INPUT(G5) • OUTPUT(G16) • OUTPUT(G17) n60 = NOR(G2, G5) net14 = NAND(G3, G4) net17 = NAND(G1, G3) net25 = NOT(net14) net18 = NAND(net, G2) G17 = NOR(n60, net25) G16 = NAND(net17, net18)
Appendix. Output Format Critical Path Delay Gate_0 arrival time Gate_1 arrival time Gate_2 arrival time Gate_3 arrival time Gate_4 arrival time Gate_5 arrival time . . . Gate_N arrival time 13 G1 1 G2 2 G3 1 G4 2 G5 1 PO_G16 13 PO_G17 12 n60 5 net14 6 net17 5 net25 8 net18 9 G17 11 G16 12
Reference • Nvidia’s website • “CUDA C Programming Guide” • “CUDA Reference Manual” • “Visual Profiler User Guide” • “Fermi Tuning Guide” • “GPU Computing SDK code samples” • Book • “CUDA by Example: An Introduction to General-Purpose GPU Programming” • “Programming Massively Parallel Processors: A Hands-on Approach” (Advanced)