1 / 34

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units. Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University, College Station, TX ASPDAC 2009. Outline. Preliminaries Previous works The proposed approach Experimental results Conclusions.

Download Presentation

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating Statistical Static Timing Analysis Using GraphicsProcessing Units Kanupriya Gulati and Sunil P. Khatri Department of ECE, Texas A&M University, College Station, TX ASPDAC 2009

  2. Outline • Preliminaries • Previous works • The proposed approach • Experimental results • Conclusions

  3. Outline • Preliminaries • Previous works • The proposed approach • Experimental results • Conclusions

  4. Preliminaries • Static Timing Analysis • Statistical Static Timing Analysis • Monte Carlo method • Some differences between GPU and CPU

  5. Static Timing Analysis (STA) • At each gate, the MAX of the SUM of the input arrival time at pin i plus the pin-to-output rising (or falling) delay from pin i to the output is computed. • Use LUT for storing delay of each type of gates or compute the delay according to specific equations. • Worst case delay as the representa-tive value.

  6. STA example • We use a 2-inputs NAND as a example.

  7. Pros and Cons of STA • Pros • Can be computed very fast. • Very easy to understand the meaning. • Cons • Not that precise. • Hard to deal with the process variation. • Moreover, variations become less systematic now.

  8. Statistical Static Timing Analysis (SSTA) • Apply probability and statistics in signals, gates, etc. • Basic ideas is the same: MAX and SUM. • Need to generate random samples or deal with probability distribution functions (PDFs) directly.

  9. Why SSTA? • To deal with variations and to move beyond the limitations of the deterministic nature of traditional STA techniques. • The main idea is to include the effect of variations in order to analyze circuit delay more accurately.

  10. Pros and Cons of SSTA • Pros • Could deal with variations. • High accuracy. • Cons • High runtime cost for accurate method. • May have big difference between different methods.

  11. Monte Carlo method • There is no single Monte Carlo method; instead, the term describes a large and widely-used class of approaches. • However, these approaches tend to follow a particular pattern: • Define a domain of possible inputs • Generate inputs randomly from the domain using a certain specified probability distribution • Perform a deterministic computation using the inputs • Aggregate the results of the individual computations into the final result

  12. A simple example for Monte Carlo method • How can we approximate π? • Draw a square and a circle within it on the ground. • Uniformly scatter some uniform size object into the square. • Counting the number of objects in the circle and dividing by the total number of objects in the square will yield an approximation for π / 4

  13. A simple example for Monte Carlo method (cont.)

  14. A simple example for Monte Carlo method (cont.) • Generally speaking • The more the objects (samples), the more the preciseness. • The smaller the objects (unit of samples), the more the preciseness. • Distribution of the objects (distribution function of samples) affects the result.

  15. About some differences between GPU and CPU

  16. Abstract comparisons of memory between GPU and CPU (cont.)

  17. Outline • Preliminaries • Previous works • The proposed approach • Experimental results • Conclusions

  18. Previous works • Block-based SSTA • Perform statistical MAX and SUM operations and traverse the circuit in a level-wise BFS • Fast but not that accurate • Path-based SSTA • Calculate delay PDF of each selected path • Maybe accurate but hard to decide the path that should be selected

  19. Previous works (cont.) • Block-based SSTA like [14][15][16] are fast but only an approximation. • Path-based SSTA like [17] using Gaussian distribution propagation is also approximation. • [19][20][21] propose faster algorithm that compute only the bound of result. • [22][23][24][25] do operations on PDFs.

  20. Outline • Preliminaries • Previous works • The proposed approach • Experimental results • Conclusions

  21. The proposed approach • Monte Carlo based SSTA on GPU with Mersenne Twisterpseudo-random number generator and Box-Muller transformations. • Compute delay of gates like path-based SSTA approach. • Traverse circuit like block-based SSTA approach.

  22. Monte Carlo based SSTA • Generate gate delay samples according to μ and σ. • Do STA for each set of samples. • Aggregate results to produce the full circuit delay distribution. • The spirit of Monte Carlo method – The more the objects (samples), the more the preciseness.

  23. Why Monte Carlo based SSTA on GPU? • Sample parallelism • the generation of samples and the corresponding static timing analysis for a single gate computation can be executed in parallel, with no data-dependency • Data parallelism • gates at the same logic level can execute Monte Carlo based SSTA in parallel

  24. Why Monte Carlo based SSTA on GPU? (cont.) • SIMD of GPU • Parallel execute Mersenne Twisterpseudo-random number generator followed by Box-Muller transformations • Large memory bandwidth of GPU • Extremely fast in lookup • Many threads of GPU • STA with lots of samples can be executed fast • Memory access time can be hided well

  25. Mersenne Twisterpseudo-random number algorithm • Developed in 1997 by Makoto Matsumoto and Takuji Nishimura that is based on a matrix linear recurrence over a finite binary field F2. • For a k-bit word length, the Mersenne Twister generates numbers with an almost uniform distribution in the range [0,2^k -1]. • Long period, efficient use of memory, good distribution properties and high performance

  26. Box-Muller transformations • Given a source of uniformly distributed random numbers. • A method of generating pairs of independent standard normally distributed (zero expectation, unit variance) random numbers • Transform into N(0,1) • Developed by George Edward Pelham Box and Mervin Edgar Muller at 1958.

  27. Monte Carlo based SSTA kernel

  28. Example • Suppose a random number sequence: • 0.1 -0.2 0.2 -0.2 0.4 0.1 -0.3 0 0.5 0.1 -0.4 0.2 0.3 -0.2 -0.5 0.3 0.1 0

  29. Outline • Preliminaries • Previous works • The proposed approach • Experimental results • Conclusions

  30. Experimental results • NVIDIA GeForce 8800 GTX graphic card • 768MB memory • Some are listed in previous slides • The environment that is compared with • 3.6GHz CPU with 3GB memory • Linux • Monte Carlo analysis was performed with 64K samples

  31. Experimental results - Some comparisons • Running 16M threads of SSTA kernel • CPU took 37.158 sec • GPU tool 0.115 sec • About 320x faster • Mersenne Twister generator • CPU generates about 2.24*10^7 number/sec • GPU generates about 2.33*10^9 number/sec • About 100x faster

  32. Experimental results – 30 cases

  33. Outline • Preliminaries • Previous works • The proposed approach • Experimental results • Conclusions

  34. Conclusions • Monte Carlo based SSTA on GPU • Mersenne Twister generator and Box-Muller transformation • Combination of path-based SSTA approach and block-based SSTA approach • No loss of accuracy and ultra fast

More Related