180 likes | 301 Views
CSE 522 WCET Analysis. Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann -Hang Lee yhlee@asu.edu (480) 727-7507. Some of the slides were based on the lecture by G. Fainekos (ASU). Execution Time – WCET & BCET.
E N D
CSE 522WCET Analysis Computer Science & Engineering DepartmentArizona State University Tempe, AZ 85287 Dr. Yann-Hang Leeyhlee@asu.edu(480) 727-7507 Some of the slides were based on the lecture by G. Fainekos(ASU)
Execution Time – WCET & BCET (Figure from R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007.)
The WCET Problem • Given • the code for a software task • the platform (OS + hardware) that it will run on • Determine the WCET of the task. • Why is this problem important? • The WCET is central in the design of real-time computing • Can the WCET always be found? • In general, not a decidability problem, but a complexity problem • Compute bounds for the execution times of instructions and basic blocks and determine a longest path in the basic-block graph of the program.
Components of Execution Time Analysis • Program path (Control flow) analysis • Want to find longest path through the program • Identify feasible paths through the program • Find loop bounds • Identify dependencies amongst different code fragments • Processor behavior analysis • For small code fragments (basic blocks), generate bounds on run-times on the platform • Model details of architecture, including cache behavior, pipeline stalls, branch prediction, etc. • Outputs of both analyses feed into each other
Program Path Analysis: Overall Approach (1) • Construct Control-Flow Graph (CFG) for the task • Nodes represent Basic Blocks of the task • Basic block: a sequence of consecutive program statements where there is no possibility of branching • We have a single entry and a single exit node • Edges represent flow of control (jumps, branches, calls, …) • The problem is to identify the longest path in the CFG • Note: CFG can have loops, so need to infer loop bounds and unroll them • This gives us a directed acyclic graph (DAG). How do we find the longest path in this DAG?
Program Path Analysis: Overall Approach (2) • In a CFG • Bi = basic block i • xi = number of times the block Bi is executed • dj= number of times edge is executed • ci= worst case running time of block Bi • Objective: find • How to get xi? • Structural constraints • Functionality constraints • Loop bounds -- need to be known
CFG Example d1 N = 10; q = 0; while(q < N) q++; q = r; B1: N = 10; q = 0; x1 Want to maximize i cixi subject to constraints x1 = d1 = d2 d1 = 1 x2 = d2+d4 = d3+d5 x3 = d3 = d4 = 10 x4 = d5 = d6 d2 d4 B2: while(q<N) x2 d3 d5 0 1 B3: q++; B4: q = r; x4 x3 d6 Example due to Y.T. Li and S. Malik
d1 x1 B1 s = k; d2 x2 d8 B2 while (k < 10){ d3 x3 B3 if (ok) d5 d4 B5 j = 0; ok = true; x5 x4 B4 j++; d6 d7 k++; B6 d9 x6 B7 r = j; x7 d10 CFG – Another example /* k >=0 */ s = k; while (k < 10){ if (ok) j++; else { j = 0; ok = true; } k++; } r = j;
Functionality Constraints check_data() { x1inti, morecheck, wrongone; x2morecheck = 1; i = 0; wrongone = -1; x3 while (morecheck) { x4 if (data[i] < 0) { x5wrongone = i; morecheck = 0; } else x6 if (++i >= 10) x7morecheck = 0; } x8 if (wrongone >= 0) x9 return 0; else x10 return 1; } Constraints x2 x4 x4 10x2 (x5 = 0 & x7 = 1) | (x5 = 1 & x7 = 0) x5 = x9
Micro-architectural Modeling -- Cache Modify cost function (cache hit and miss have different costs) Add linear constraints to describe relationship between cache hits and misses Basic idea • Basic blocks assumed to be smaller than entire cache • Subdivide instruction counts (xi) into counts of cache hits (xihit) and misses (ximiss) • Line-block (or l-block) is a contiguous sequence of code within the same basic block that is mapped to the same cache line in the instruction cache • Either all hit or all miss in a l-block
B1.1 B1.2 B1.3 B2.1 B2.2 B3.1 B3.2 Basic Blocks to Line Blocks (Direct-mapped cache) Color Cache Set B1 0 1 2 Cache Constraints: 3 B2 No conflicting l-blocks: (only the first execution has a miss) Two nonconflicting l-blocks are mapped to same cache line Conflicting blocks: affected by the sequence B3
start p(s,m.n) p(k.l,k.l) p(s,k.l) p(k.l,m.n) Bm.n Bk.l p(s,e) p(m.n,k.l) p(m.n,m.n) p(m.n,e) p(k.l,e) end Cache Conflict Graph For every cache set containing two or more conflicting l-blocks • start node, end node, and node Bk.l for every l-block in the cache set Edge from Bk.l to Bm.n: control can pass between them without passing through any other l-blocks of the same cache set. • p(i. j,u.v) : the number of times that the control passes through that edge.
d1 Cache x1 B1.1 s = k; d2 x2 d8 B2.1 while (k < 10){ d3 x3 B3.1 if (ok) d5 d4 B5.1 j = 0; ok = true; x5 x4 B4.1 j++; d6 d7 B6.1 k++; d9 x6 B7.1 r = j; x7 d10 Cache Constraints Example (1)
S S p(s,5.1) p(s,1.1) p(s,4.1) p(4.1,4.1) p(1.1,6.1) p(4.1,5.1) B1.1 B4.1 B6.1 B5.1 p(5.1,4.1) p(6.1,6.1) p(5.1,5.1) p(1.1,e) p(4.1,e) p(s,e) p(6.1,e) p(5.1,e) E E Cache Constraints Example (2)
Progress During the Past 10 Years The explosion of penalties has been compensated by a reduction of uncertainties! 200 cache-miss penalty 60 25 30-50% 25% 20-30% 15% over-estimation 10% 4 2002 2005 1995 Lim et al. Thesing et al. Souyris et al.
Open Problems • Architectures are getting much more complex. • Can we create processor behavior models without the pain? • Can we change the architecture to make timing analysis easier? • Small changes to code and/or architecture require completely re-doing the WCET computation • Use robust techniques that learn about processor/platform behavior • Need more reliable ways to measure execution time • References: • Li, Malik, and Wolfe, “Cache Modeling for Real-Time Software: Beyond Direct Mapped Instruction Caches” • Wilhelm, “Determining bounds on execution times,” Handbook on Embedded Systems, CRC Press, 2005