1 / 10

Static Timing Analysis on GPGPU Architecture

Explore parallelizing Static Timing Analysis on GPGPU, evaluating critical path delay, utilizing multi-core architecture for improved performance. Utilize Nvidia's Fermi architecture and follow specific grading policies.

rolandf
Download Presentation

Static Timing Analysis on GPGPU Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-core Architecture and System Advisor: Bo-Cheng C. Lai Final Project: Static Timing Analysis on GPGPU

  2. Static Timing Analysis • Given: • Netlist and Gate delay (#Fanin + #Fanout) • Evaluate: • Critical Path Delay 1 1 3 5 11 3 12 1 2 2 3 9 12 3 13 1 1 1 6 4 8 2 2 2 5 3 1 1

  3. Parallel STA • How to parallelize ? • Levelization (Breath First Search) • Gates on the same level can be evaluated at the same time Level_0 Level_1 Level_2 Level_3 Level_4 1 1 3 5 3 11 12 1 2 2 9 3 12 3 13 1 1 1 6 4 8 2 2 2 5 3 1 1

  4. Target Platform • General Purpose GPU (GPGPU) • Nvidia’s “Fermi” • 448 homogeneous cores • Cache system: L1 and L2 cache

  5. Target Platform (cont.) • Relative Latency • Register • ~1 cycle • Shared memory / L1 cache • ~1 cycle • L2 cache • ~10 cycles • DRAM (global) • ~100 cycles No coherence Shared

  6. Grading Policy • Correctness: 70% • Performance: 20% • Runtime: 20% (Kernel only) • Don’t count in the time to “Parse” and “Memcpy” • Report: 10% (Sent it to TA by 1/10) • Introduction (your methodology) • Challenges • Solutions • Results

  7. Check Point • 2010/11/1: Training resources studying by yourself • 2010/11/1: Netlist parser • 2010/11/15: Compacted link-list • 2010/11/22: Additional one practice (as a HW) • 2010/11/29: Levelization • 2010/12/6: Verify netlist in GPGPU • 2010/12/20: Status check ( Brief pres.) • 2010/1/10: Final check

  8. Appendix. Input Format INPUT(G1) • INPUT(G2) • INPUT(G3) • INPUT(G4) • INPUT(G5) • OUTPUT(G16) • OUTPUT(G17) n60 = NOR(G2, G5) net14 = NAND(G3, G4) net17 = NAND(G1, G3) net25 = NOT(net14) net18 = NAND(net, G2) G17 = NOR(n60, net25) G16 = NAND(net17, net18)

  9. Appendix. Output Format Critical Path Delay Gate_0 arrival time Gate_1 arrival time Gate_2 arrival time Gate_3 arrival time Gate_4 arrival time Gate_5 arrival time . . . Gate_N arrival time 13 G1 1 G2 2 G3 1 G4 2 G5 1 PO_G16 13 PO_G17 12 n60 5 net14 6 net17 5 net25 8 net18 9 G17 11 G16 12

  10. Reference • Nvidia’s website • “CUDA C Programming Guide” • “CUDA Reference Manual” • “Visual Profiler User Guide” • “Fermi Tuning Guide” • “GPU Computing SDK code samples” • Book • “CUDA by Example: An Introduction to General-Purpose GPU Programming” • “Programming Massively Parallel Processors: A Hands-on Approach” (Advanced)

More Related