1 / 34

F A S T Frequency-Aware Static Timing Analysis

Understand the importance of Worst-Case Execution Time (WCET) for real-time systems and the challenges in accurate static timing analysis. Discover the motivation behind FAST and the application of the parametric frequency model for precise performance evaluation.

mmoen
Download Presentation

F A S T Frequency-Aware Static Timing Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. F A S TFrequency-Aware Static Timing Analysis By Kiran Seth, Aravindh Anantaraman, Frank Mueller and Eric Rotenberg Center for Embedded Systems Research Departments of CS & ECE North Carolina State University

  2. Real-Time Systems • Tasks have a deadline must terminate on time • Classification • Hard Real-time: missed deadline  catastrophe • Soft Real-time: missed deadline  low QoS. • Multi-tasking real-time systems require scheduling algorithms  • Scheduler ensures task arbitration online • Schedulability test ensures met deadlines (static test) • requires known Worst-Case Execution Time (WCET)

  3. Static Timing Analysis • To schedule tasks in Real-time systems, need • Worst-case Execution Time (WCET) and • Worst-case Execution Cycles (WCEC) • Experimental WCET  unsafe bounds • Due to input & hardware complexity • Use static timing analysistoolset to obtain safe WCET bounds

  4. Static Instruction Cache Analysis • Work explained in [Mueller RTS-J’00] • Interprocedural data-flow analysis • Predicts each cache reference as one of • always-hit • always-miss • first-hit • first-miss • Each instruction categorized • for each loop level • and function (loop w/ 1 iteration)

  5. Static Data Cache Simulation • For accurate static timing analysis • need data cache analysis • Currently, data cache analysis tool not accurate enough • Too many restrictions, not general enough for real code • Improvements by [Vera RTSS’03] • Solutions  • All data accesses hits… highly underestimated. • All data accesses misses… highly overestimated. • Assume big enough cache to fit all data set • Assume first-time accesses as misses (cold misses, only), o/w hits • Accurate? Yes. But what is caches smaller? • No significant impact on this study

  6. Static Timing Analyzer • Path & tree-based approach [Healy IEEE TC’99] • Find nodes in the CFG and derive WCEC for each node • A node is a function or loop • WCET is calculated bottom-up • Standard timing analysis assumptions apply  • No recursion • All loop bounds must be known • No function pointers

  7. Motivation of FAST • Dynamic Voltage Scaling (DVS) scheduling schemes • Change frequency/voltage for system • save power without missing deadlines • Several DVS scheduling schemes available • Good fit for real-time systems • Most real-time systems • have low utilization • are low-power embedded systems • Potential for considerable energy savings with DVS

  8. Problem • Current DVS schemes: • Ignore effects of frequency scaling on WCEC • DVS schemes assume: WCEC constant with frequency • Overestimate WCET at lower frequencies • To demonstrate the problem • WCET of C-Lab benchmark  static timing analysis tool • For frequencies 100MHz – 1GHz • Assess observed WCEC & WCET vs. assumption made by DVS schemes

  9. Actual vs. Assumed WCEC for FFT WCEC changes with frequency modulation • WCEC increases with higher frequency • Constant memory latency:100ns

  10. Actual vs. Assumed WCET for FFT Difference in chosen frequency for DVS w/ WCET=5ms • assumed: ~ 550 MHz • actual: ~ 150 MHz

  11. Parametric Frequency Model Problem: • DVS • Considers processor frequency scaling • Ignores effect of frequency scaling on memory accesses • With frequency scaling: • Cycles for processor operations remains constant • Except for memory operations  problem • DVS schemes overestimate the WCET at lower frequencies • Cannot fully utilize available slack • Power savings potential largely wasted

  12. Parametric Frequency Model Solution: • Calculate WCEC • accounting for effects of memory accesses • using the new parametric frequency model • Model: WCEC(f) = i + mN = i + mLf • i: Invariant # of worst-case cycles (for non-memory operations) • m: # of worst-case memory accesses • N: # of cycles per memory access • depends on memory latency L and frequency f: N = Lf

  13. Using the Parametric Frequency Model A: add R2, R1, R3 B: load R4, [M1] C: add R2, R1, R4 D: add R2, R1, R5 • Instruction sequence simulated through simple pipeline • explain parametric frequency model • Simple pipeline: • 6 stages • Data & instruction cache • N = 10

  14. Example 0: Cache Hits • Recall: B is load instruction WCEC = 9 + 0N • Each row represents pipeline stage. • Time (and cycle count) increases horizontally.

  15. Example 1: Effect of I-cache miss WCEC = 9 + 1N • Stall due to I-cache miss is shown • Model accurately captures memory latency, however long

  16. Example 2: Effect of D-cache miss • Recall: B is load instruction WCEC = 9 + 1N • Stall due to D-cache miss is shown • Again, model captures memory latency, however long • Notice: during stall cycles, no useful work is done

  17. Example 3: Effect of I- & D-cache Miss WCEC = 9 + 2N • I-cache miss first, then D-cache miss • Overlap between useful cycles & stall cycles • Also during high-latency execution operations • E.g. floating-point, multiply, …  overlap w/ D-cache miss • Leads to overestimation in practice rare, still safe WCET

  18. Experimental Validation • Combine frequency model with our static timing analyzer • FAST tool • WCEC  FAST equations • Experiment to validate results from FAST tool • Run benchmarks through FAST tool • An equation representing WCEC for benchmark obtained • Run same benchmarks through traditional timing analysis tool • Vary frequencies: 100MHz-1GHz

  19. Frequency-Aware Static Timing Analysis (FAST) • FAST tool  “as accurate” as traditional static timing analysis • Slight overestimation in case of floating-point benchmarks

  20. FAST in EDF Scheduling with DVS • DVS with EDF: Ck/Pk , where =fc/fm • FAST with EDF:  (ik+mkLfm)/Pkfm   • Schedulability test:  (ik/Pk) / fm (1 - L mk/Pk)   • Implemented frequency model for 3 EDF-DVS algorithms • Algorithms by [Pillai & Shin] • Look-ahead improved: • @ completion, consider next deadline • up to 34% additional energy savings (5-11% on avg.), low U • but 0.5-8% less savings at high utilization

  21. Improving DVS schemes • Use parametric frequency model to improve DVS schemes • provide accurate WCET • Improved energy savings • Architectural Simulator: SimpleScalar+Wattch [Brooks ISCA’00] • 6-stage simple in-order pipeline processor model • I-cache and D-cache (8KB each) • Run 4-8 tasks simultaneously (scheduler runs as its own task) • More accurate than E ~ V2f model ? • Results newer than paper

  22. Static RT-DVS vs. FAST Static RT-DVS • Base case: EDF • Tasks at 1GHz • Idle: 100MHz • no sleep mode  small task periods • tasksets • 1: integer • 2: float • 3: mix • Static scheme better than base EDF  12-60% energy savings • FAST-Static even better  40-78% savings • high + lower utilization

  23. Cycle-conserving RT-DVS vs. FAST cycle-conserving RT-DVS • dynamic scheduling  early completion, reclaimed as slack • Cycle-conserving  57-72% energy savings • FAST  71-80% savings

  24. Look-ahead RT-DVS vs. FAST Look-ahead RT-DVS • most aggressive DVS: early completion + max. deferral • Look-ahead: slightly higher savings than cycle-conserving @ 68-80% • FAST: slightly better in most cases @ 72-83%

  25. E ~ V2f model Higher savings: up to 96% ? Ratio look-ahead / FAST similar Wattch detailed power model Probably more accurate Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS

  26. Conclusion • Energy savings in real-time systems can be significantly improved by considering the effects of frequency scaling on WCET • FAST + Static RT-DVS • as good as Look-Ahead RT-DVS • less overhead • The parameterized frequency model can easily track effects of frequency scaling on WCET • FAST tool works best when  • Many cache misses • If D-cache analysis is highly inaccurate (usually true) • FAST can make up for it • High memory latency • Insufficient dynamic slack reclaiming (during DVS scheduling) • Integrated into real-time hardware support [VISA ISCA’03]

  27. BACKUP SLIDES

  28. The V2f model

  29. Old DVS Scheduling Simulator • Event based simulator of scheduler. • Have to assume miss rate for the tasks in dynamic schemes. • Uses E ~ V2f energy model. • Gives a good idea about savings, BUT accurate ??

  30. Static RT-DVS vs. FAST Static RT-DVS

  31. Cycle-conserving RT-DVS vs.FAST cycle-conserving RT-DVS

  32. Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS

  33. DVS schemes (Pillai & Shin) • Static RT-DVS – Uses static slack available in the schedule. • Cycle-conserving RT-DVS – Uses static slack + dynamic slack due to early completion. • Look-ahead RT-DVS – Uses static slack + dynamic slack due to early completion + latest possible scheduling (look-ahead).

  34. Complexity • Original EDF test  O(n) • Modified EDF test  still O(n)

More Related