380 likes | 511 Views
Timing Analysis for Modern Architectures. Sang Lyul Min Dept. of Computer Engineering Seoul National University. Overview. Intra-task analysis (WCET analysis) Cache memory Pipelined execution Inter-task analysis Cache memory Experiments Conclusions and Future Work. Intra-task Analysis.
E N D
Timing Analysis for Modern Architectures Sang Lyul Min Dept. of Computer Engineering Seoul National University
Overview • Intra-task analysis (WCET analysis) • Cache memory • Pipelined execution • Inter-task analysis • Cache memory • Experiments • Conclusions and Future Work
Intra-task Analysis • Why WCET analysis is important? • Safe and tight WCET (worst case execution time) estimate is a prerequisite of correct and accurate schedulability analysis
Schedulability AnalysisExamples • Utilization bound-based approach • Response time-based approach
Good Old Days • No cache memory • No pipelined execution Fixed instruction execution times (Simple table look-up)
Timing Schema • S: S1; S2 • S: if (exp) then S1 else S2 • S: while (exp) S1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 x x x x x IF x x x RD x x FRD x x x ALU x x x x x x FALU MD x x FMUL x x x x x x x x FDIV x x x MEM x x FMEM x x x WB x x FWB x x FFWB Pipelined Execution div.s $f2, $f4, $f6 lw $8, 4($sp) nop mul.s $f8, $f10, $f12 addiu $9, $8, 4
x x x x x x x x x x x x x x x x 1 2 3 4 5 6 x x x x IF x x x x RD ALU x x x x MD DIV 1 2 3 4 5 6 7 8 9 MEM x x x x WB IF x x x x RD x x x ALU x x x x MD DIV x x MEM x x x WB The Problem 1 2 3 4 5 6 7 8 9 10 IF RD ALU MD DIV MEM WB
Our Approach • Define PA (Path Abstraction) structure which encodes • elements whose timings are affected • elements that affect other’s timings • Define op on PAs + op • Define pruning op on PAs max op
Instruction Cache Modeling cache contents cache block 0 b2 ? b2 b4 b2 cache block 1 ? ? b3 b3 b3 b2 b3 b2 b4 (hit/miss) (hit/miss) (hit) (miss)
b4 b2 b3 b3 PA Structure for Instruction Cache last_reference first_reference texecution = 38 cycles
pruning Example: Concatenation and Pruning first last b6 b6 b1 b1 first 48 cycles last first last b6 b8 b6 b8 first last first last b1 b7 b8 b6 b4 b1 b8 b5 b7 b7 b5 b5 102 cycles 126 cycles 78 cycles 68 cycles
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Pipelined Execution Modeling div.s $f2, $f4, $f6 lw $8, 4($sp) nop mul.s $f8, $f10, $f12 addiu $9, $8, 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB
tail head PA Structurefor Pipelined Execution 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 x x x x x IF x x x RD x x FRD x x x ALU x x x x x x FALU MD x x FMUL x x x x x x x x FDIV x x x MEM x x FMEM x x x WB x x FWB x x FFWB tmax = 21 cycles
PA Structurefor Pipelined Execution head tail x x x x x x x x x x x x x x x x x x x x texecution = 38 cycles
1 2 3 4 5 13 14 15 16 17 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB 20 1 2 3 4 5 10 11 12 13 14 18 19 1 2 3 4 5 21 22 IF IF RD RD FRD FRD ALU ALU FALU FALU MD MD FMUL FMUL FDIV FDIV MEM MEM FMEM WB WB FWB FWB FFWB FFWB Example:Concatenation and Pruning x x x x x x x x x x x x x x x x x x x tmax = 17 cycles S1 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x FMEM x x x x x tmax = 22 cycles tmax = 14 cycles S2
1 2 3 4 5 13 14 15 16 17 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB 20 18 19 1 2 3 4 5 21 22 IF RD FRD ALU FALU MD FMUL FDIV MEM WB FWB FFWB Example:Concatenation and Pruning 1 2 3 4 5 13 14 15 16 17 18 19 20 33 34 35 36 37 x x x x x x x IF x x x x x RD x x FRD x x x x x x x x ALU x x x x x x FALU x x x x x x x x x MD x x x FMUL FDIV x x x x x x x x x MEM FMEM x WB x x x FWB x FFWB x tmax = 37 cycles tmax = 17 cycles x x x x x x x x x x x x x x x FMEM x x tmax = 22 cycles
pruning 1 2 3 4 5 13 14 15 16 17 IF RD FRD ALU FALU MD FMUL FDIV MEM FMEM WB FWB FFWB 1 2 3 4 5 10 11 12 13 14 IF RD FRD ALU FALU MD head FMUL FDIV MEM tail FMEM WB FWB FFWB Example:Concatenation and Pruning x x x x x x x x x x x 22 23 24 25 26 1 2 3 4 5 13 14 15 16 17 x x x x x x x IF x x x x x x x x x x RD x x FRD x x x x x ALU x x x x x FALU x x x x x MD FMUL FDIV x x x x MEM tmax = 17 cycles x FMEM x x WB x FWB x FFWB tmax = 26 cycles x x x x x 26 cycles < 37 cycles - (5 cycles ( )+ 5 cycles ( )) x x x x x x x x x x x x tmax = 14 cycles
Combined PA Structure first_reference last_reference b4 b2 b3 b3 head tail x x x x x x x x x x x x x x x x x x x x texecution = 38 cycles
Extended Timing Schema • S: S1; S2 • S: if (exp) then S1 else S2 • S: while (exp) S1 where
Comparison with Original Timing Schema Original Timing Schema Extended Timing Schema timing element WCET bound Path Abstraction path concatenation + path elimination pruning max
0 2 4 6 8 10 12 14 16 18 20 t1 t1,1 t1,2 t1,3 t1,4 t1,5 t2 t2,1 t2,1 t2,2 t2,2 t2,2 t3 t3,1 t3,1 t3,1 t Inter-task Analysis
Two Step Approach 1. Local (per-task) analysis for estimating # of useful cache blocks at each execution point 2. Global analysis for calculating the cache-related preemption delay based on the linear programming technique
m m m m m m m m m m m m 7 1 0 5 6 3 4 5 6 0 0 2 m 0 m useful cache blocks at point P 5 m 6 m 3 Local Analysis (1) • A cache block is useful if it contains a memory block that may be re-referenced before being replaced. • # of useful cache blocks at an execution point gives an upper bound on the cache-related preemption cost at that point.
Definitions c : set of memory blocks that may reside in cache block c at point p RMB p : set of memory blocks that may be the first reference to cache block c after point p c LMB p Local Analysis (2) • A useful cache block at point p is defined as a cache block whose RMBs and LMBs have at least one common memory block.
useful useful Local Analysis (3)
Preemption Cost Table Task Largest Preemption Cost • t1 • f1 Local Analysis of Each Task • t2 • f2 • t3 • f3 ... ... • tn • fn
Augmented Response Time Equation • Iterative Solving ... Global Analysis (1)
maximize subject to Global Analysis (2)
Limitations 0 20 40 60 80 100 120 main memory cache • Not all useful cache blocks are replaced. • Some preemptions are not feasible. t1 5 t2 t3 2 R3
Enhanced Approach • Uses two new features 1. Scenario-sensitive preemption cost 2. Additional constraints from task phasing
FFT LUD LMS FIR cache mapping 1 cache mapping 2 cache mapping 3 Experiments • Task set with 4 tasks • Three different cache mappings of tasks
Conclusions • Intra-task Analysis • Extended Timing Schema • PA (Path Abstraction) • and pruning operations • Inter-task Analysis • Data Flow Analysis • Response Time Equation • Linear Programming Technique
Future Work • Data Cache Analysis • WCET Analysis for Advanced Architectures (Superscalar and VLIW) • I/O (DMA) Timing Analysis
http://archi.snu.ac.kr/symin/ Related Papers • S.-S. Lim et al. “An Accurate Worst Case Timing Analysis for RISC Processors,”IEEE Transactions on Software Engineering, 21(7):593-604, July 1995. • C.-G. Lee et al. “Analysis of Cache-related Preemption Delay in Fixed- priority Preemptive Scheduling,” IEEE Transactions on Computers, 47(6):700-713, June 1998.