270 likes | 441 Views
A Unified WCET Analysis Framework for Multi-core Platforms. Sudipta Chattopadhyay , Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel Heiko Falk TU Dortmund, Germany Ulm University, Germany. Timing Analysis .
E N D
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter MarwedelHeiko Falk TU Dortmund, Germany Ulm University, Germany RTAS 2012, Beijing
Timing Analysis • Hard real time systems require absolute timing guarantees • System level analysis • Single task analysis • Worst case execution time (WCET) analysis • An upper bound on execution time for all possible inputs • Sound over-approximation is obtained by static analysis RTAS 2012, Beijing
WCET Analysis WCET of basic blocks Infeasible path constraints Program Micro-architectural modeling IPET Loop bound Control flow graph constraints Path analysis IPET = Implicit Path Enumeration Technique RTAS 2012, Beijing
Architecture Core 1 Core n L1 cache L1 cache Shared bus Shared L2 cache Memory RTAS 2012, Beijing
Micro-architectural Modeling Li et. al RTSS’09 branch predictor shared cache Chattopadhyay et. al SCOPES’10 Kelter et. al ECRTS’11 Interactions cache pipeline shared bus Rosen et. al RTSS’07 Single Core Multi Core Unified Multi-core timing analysis RTAS 2012, Beijing
Timing Anomaly (shared Cache) hit miss miss miss hit hit miss hit miss hit miss hit miss hit miss hit May not be the worst case path RTAS 2012, Beijing
Timing Anomaly (Shared Bus) delaymax delaymin delaymax delaymin delaymin delaymax May not be the worst case path RTAS 2012, Beijing
Background • Representing each pipeline stage as a timing interval start [1,3] finish [3,7] [4,10] latency EX WB R1 := R2 + 5 IF ID CM Structural dependency CM IF ID EX WB EX WB CM IF ID R5 := R1 * R7 IF ID EX WB CM Contention IF ID EX WB CM R3 := R5 * 5 A fixed-point analysis derives the timing of each stage as an interval RTAS 2012, Beijing
Shared Cache + Pipeline Abstract interpretation – hit, miss or unclear Timing interval miss unclear L1 hit T := T + [1, 1] T := T + [ miss1 + 1, miss1 + 1] T := T + [miss1 + 1, miss1 + miss2 + 1] L2 (shared) hit unclear T := T + [1, miss1 + miss2 + 1] hit latency = 1 cycle miss1 L1 cache miss penalty miss2 L2 cache miss penalty RTAS 2012, Beijing
Shared Bus Analysis • Time Division Multiple Access (TDMA) • Offset abstraction Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 delay = 0 offset delay offset round round T’ (core 0) T (core 1) RTAS 2012, Beijing
Shared bus + pipeline IF1 ID1 IF2 ID2 O1 O2 IF3 ID3 Oin (approximate timing by static analysis) IF2 finishes after ID1 ID1 finishes after IF2 ID1 IF2 Oin = O1 IF2 ID1 Oin = O2 IF2 ID1 Oin = O1 U O2 Property: Offset content monotonically decreases over different iterations RTAS 2012, Beijing
Loop Construct Ci = bus context of the loop body at i-th iteration Bus contexts …… C3 C100 C1 C2 Unrolling loop iterations EXPENSIVE RTAS 2012, Beijing
Loop Construct Bus context flow graph C1 C2 C3 C4 C5 C5 C3 How do we define bus context? Property: If Ci Cj, then Ci+k Cj+k for any k > 0 RTAS 2012, Beijing
Loop Construct Bus context flow graph C1 C2 Bus offsets of all pipeline stages of all instructions? C3 There could be thousands of nodes C4 How do we define bus context? RTAS 2012, Beijing
Loop Construct EX WB previous iteration IF ID CM CM IF ID EX WB EX WB CM current iteration IF ID IF ID EX WB CM How do we define bus context? Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change RTAS 2012, Beijing
Loop Construct Bus context flow graph C1 C2 Compute WCET for each bus context C3 Generate ILP flow constraints: E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound E(C1) ≥ E(C2) E(C1) = number of times context C1 is executed C4 RTAS 2012, Beijing
Branch prediction + Cache m Cache conflict m Cache hit Cache miss m’ m evicted from cache branch correctly predicted branch incorrectly predicted RTAS 2012, Beijing
Branch prediction + Cache Cache content m Branch location JOIN m Maximum number of speculated instructions m’ Cache content Unclear cache access RTAS 2012, Beijing
Overall Picture WCET of basic blocks Infeasible path constrains shared cache branch predictor IPET cache pipeline shared bus Loop bound Multi Core constraints Bus context constraints Path analysis RTAS 2012, Beijing
Experimental Setup (Chronos Toolkit) GCC simplescalar C source Binary code CFG Micro architectural modeling Flow constraints Private cache pipeline Branch prediction ILP WCET Shared cache Shared bus Micro-architectural constraints RTAS 2012, Beijing
Cache Sharing vs Cache Partitioning 4 4 4 Core 1 8 8 8 Core 1 Core 2 Core 2 Shared Cache between 2 cores Horizontally partition Vertically partition RTAS 2012, Beijing
Evaluation (cache + pipeline) Imprecision of shared cache analysis jfdctint statemate RTAS 2012, Beijing
Evaluation (Cache + pipeline + Speculation) Imprecision of modeling speculation RTAS 2012, Beijing
Evaluation (Bus + pipeline) Imprecision of shared bus analysis Imprecision of path analysis RTAS 2012, Beijing
Evaluation (Bus + pipeline + Speculation) Imprecision of path analysis Imprecision of shared bus analysis RTAS 2012, Beijing
Conclusion • A unified WCET analysis framework • Handles interaction of shared cache and bus with pipeline and branch prediction • Timing anomaly is possible, state explosion is handled by timing interval abstraction • Detailed information of the tool and extensive results are available at: • http://www.comp.nus.edu.sg/~rpembed/chronos-multi-core.html RTAS 2012, Beijing
Questions Thank You RTAS 2012, Beijing