430 likes | 545 Views
Timing-Predictable Systems - Reconciling Predictability with Performance -. Lothar Thiele and Reinhard Wilhelm. Quantified: Time. Embedded controllers with hard real-time characteristics must be guaranteed to finish their tasks within deadlines .
E N D
Timing-Predictable Systems- Reconciling Predictability with Performance - Lothar Thiele and Reinhard Wilhelm
Quantified: Time • Embedded controllers with hard real-time characteristics must be guaranteed to finish their tasks within deadlines. • (static) Schedulability test must be performed. • needs (upper) bounds on the execution times of all tasks • Timing Predictability provides for precise bounds
Assumptions • Aiming at guarantees, i.e. need to consider all executions • Achieve Predictability not at (considerable) loss of Performance completely (locally) deterministic systems are not the alternative • Systems too big for exhaustive approaches • Analytical approaches necessary
Variability of Execution Times • is at the heart of timing unpredictability, • is introduced at all levels of granularity • Memory reference • Instruction execution • Function • Task • Distributed system of tasks • Service
LOAD r2, _a LOAD r1, _b ADD r3,r2,r1 Access Times x = a + b; MPC 5xx PPC 755
Timing Accidents and Penalties Timing Accident – cause for an increase of the execution time of an instruction Timing Penalty – the associated increase • Types of timing accidents
Deriving Run-Time Guarantees • Static Program Analysis derives Invariants about all execution states at a program point. • Derive Safety Properties from these invariants : Certain timing accidents will never happen.Example:At program point p, instruction fetch will never cause a cache miss. • The more accidents excluded, the lower the upper bound. • (and the more accidents predicted, the higher the lower bound).
History-Sensitivity of Execution Times- Problem and Chance - Contribution of the execution of an instruction to a program‘s execution time • depends on the execution state, i.e., on the execution so far, • can be bounded if strong invariants about all execution states at this instruction are available.
lowerbound bestcase worstcase upperbound t interference influences causes non-determinism design design limitedanalysis limitedanalysis analysis techniques analysis techniques Bounds, Guarantees and Predictability design
Worst-Case Predictability Best-Case Predictability Worst-case guarantee Lower bound Upper bound t Worst case Best case Basic Notions Uncertainty x Penalties • Message: • make systems analysable • control the penalties
200 Some published Results cache-miss penalty 60 25 30-50% 20-30% 15% 15-25% over-estimation 7-8% 4 2007 2002 2005 1995 Lim et al. Thesing et al. Souyris et al. Tan
System Characteristics and Degrees of Overestimation • Airbus A380 code: • real code, synthesized from SCADE • complex processor, PPC 755 • 15 – 25% overestimation • ARTIST2 WCET Tool Challenge • small benchmark programs • simple processors, ARM7 • 7 – 8% overestimation
The Goal Time predictability + performance = • minimize upper bound – lower bound • minimize WCET
Compiler responsible: EPIC/VLIW Scratchpad memory Properties: Large focus Static information Complex algorithms: Heuristics required Heap Predictability Processor responsible: Superscalar Caches Properties: Small focus Dynamic information Complex hardware: High energy costs Adaptability Compiler vs. Processor – an old battle
Troublesome Architectural Features • Interference between architecture components • Branch prediction – instruction cache • Shared resources • Unified caches • Register overlays • Implicit actions (memory mapped registers) • Non-predictable variability • Memory access • Operation timing • Concurrency in combination with shared resources • Superscalarity • Out-of-order execution • Multi-threading (dyn. scheduled)
Penalties for Memory Accesses(in #cycles for PowerPC 755) Remember: Penalties have to be assumed for uncertainties! Tendency increasing, since clocks are getting faster faster than everything else
Cache Impact of Language Constructs • Pointer to data • Function pointer • Dynamic method invocation • Service demultiplexing CORBA
Cache Analysis How to statically precompute cache contents: Must Analysis:For each program point (and calling context), find out which blocks are in the cacheevery time program execution reaches this program point (through this context)
Must-Cache Information Must Analysis determines safe information about cache hitsEach predicted cache hit reduces the upper bound
“young” s z y x Age “old” s z x t z s x t { s } { x } { t } { y } { x } { } { s, t } { y } [ s ] Cache with LRU Replacement: Transfer for must concrete z y x t [ s ] abstract
{ a } { } { c, f } { d } { c } { e } { a } { d } “intersection + maximal age” { } { } { a, c } { d } Cache Analysis: Join (must) Join (must) Interpretation: memory block a is definitively in the (concrete) cache => always hit
{ } { x } { } {s, t } { x } { } { s, t} { y } Cache with LRU Replacement: Transfer for must under unknown access, e.g. unresolved data pointer Set of abstract cache [ ? ] If address is completely undetermined, same loss and no gain of information in every cache set! Analogously for multiple unknown accesses, e.g. unknown function pointer; assume maximal cache damage
Dynamic Method Invocation • Traversal of a data structure representing the class hierarchy • Corresponding worst-case execution time and resulting cache damage • Efficient implementation [WiMa] with table lookup needs 2 indirect memory references; if page faults cannot be excluded: 2 x pf = 4000 cycles!
System Layers • Distributed Operation • Inter-Task Level • Intra-Task Level • Hardware Platform Cross-LayerDependencies
System-Level Performance Methods e.g. delay Worst-Case Best-Case Real System Measure-ment Simulation Analysis
Difficulties ab acc b Input Stream Task Communication Task Scheduling Complex Input: - Timing (jitter, bursts, ...) - Different Event Types
Processor Task ab acc b Buffer Difficulties Input Stream Task Communication Variable Resource Availability Task Scheduling Variable Execution Demand - Input (different event types) - Internal State (Program, Cache, ...) Complex Input: - Timing (jitter, bursts, ...) - Different Event Types
Why is Performance Analysis of Distributed Systems Difficult? • non-deterministic environment- unpredictable input streams- data dependent behavior • interference between concurrent actions- multiple applications- sharing of limited resources- scheduling/arbitration mechanisms • local non-determinism- long-range dependencies- adaptive behavior (control loops)
Case Study - Opportunities S1 6 Real-Time Input Streams - with jitter - with bursts - deadline > period 3 ECU’s with own CC’s 13 Tasks & 7 Messages - with different WCED 2 Scheduling Policies - Earliest Deadline First (ECU’s) - Fixed Priority (ECU’s & CC’s) Hierarchical Scheduling - Static & Dynamic Polling Servers Bus with TDMA - 4 time slots with different lengths (#1,#3 for CC1, #2 for CC3, #4 for CC3) S2 ECU1 CC1 S3 S6 CC3 ECU3 BUS S4 ECU2 CC2 S5 Total Utilization: - ECU1 59 % - ECU2 87 % - ECU3 67 % - BUS 56 %
The Distributed System... ECU1 CC1 BUS (TDMA) S1 CC3 ECU3 FP FP S1 FP FP T1.1 PS C1.1 T1.2 PS T1.3 S2 T2.1 C1.2 EDF S3 T3.1 T2.2 C3.2 S3 T3.3 FP PS S6 T6.1 C2.1 T3.2 S6 C3.1 T4.2 ECU2 CC2 FP T5.2 C4.1 S4 T4.1 C5.1 S5 T5.1
Input of Stream 3 ECU1 BUS CPU ECU3 CPU TDMA PS PS CC1 S1 T1.1 C1.1 T1.2 C1.2 T1.3 EDF S2 T2.1 C2.1 T2.2 CC3 PS C3.1 S3 T3.1 T3.3 C3.2 T3.2 S6 T6.1 ECU2 CPU CC2 T4.1 S4 C4.1 T4.2 T5.1 T5.2 S5 C5.1
Output of Stream 3 ECU1 BUS CPU ECU3 CPU TDMA PS PS CC1 S1 T1.1 C1.1 T1.2 C1.2 T1.3 EDF S2 T2.1 C2.1 T2.2 CC3 PS C3.1 S3 T3.1 T3.3 C3.2 T3.2 S6 T6.1 ECU2 CPU CC2 T4.1 S4 C4.1 T4.2 T5.1 T5.2 S5 C5.1
Output with Greedy Shapers ECU1 BUS CPU ECU3 CPU TDMA PS PS CC1 S1 T1.1 C1.1 T1.2 C1.2 T1.3 EDF S2 T2.1 C2.1 T2.2 CC3 PS C3.1 S3 T3.1 T3.3 C3.2 T3.2 S6 T6.1 ECU2 CPU CC2 T4.1 S4 C4.1 T4.2 T5.1 T5.2 S5 C5.1
Open Cross-Layer Issues • Does it make sense to use preemptive-scheduling (intra task-level non-determinism increases, scheduling efficiency increases) ? • Uncoordinated scheduling (static and dynamic scheduling) • Distributed! control on several layers (control loops, adaptive behavior)
New Threats • Trend towards adaptive systems • adapt to varying processing/communication loads • adapt speed /switch off units for energy saving • multiple levels of control and estimation! • Increases long-range timing dependencies with non-deterministic behavior
System Layers • Hardware • Compiler • Task level (cf. talk offered by Sebastian Altmeyer) • Distributed operation Layering Principle: Separation of Concerns
Separation of Concerns • is the Design Principle • Virtualization & Abstraction are the means: • One processor is virtualized as often as there are tasks • Limited physical memory is abstracted to almost unlimited virtual memory • Time is abstracted to #transitions of some very abstract model or even orders of magnitude • Services are abstracted from their actual location by middleware Very successful, but a disaster for predictability!
Increasing Predictability • Architecture: reducing penalties, identifying architectures offering a good combination of predictability with performance • System layers: Resource-aware abstraction with resource interfaces • Development process: reducing uncertaintyMatching design with tools
Resource-aware Abstraction with Resource Interfaces • Importing resource constraints into a layer • Slot assignment or available bandwidth for communication • Bounding resource consumption by design • RT CORBA limits service demultiplexing • Exporting information about resource consumption • Real-Time Scheduling needs upper bounds on tasks’ execution times and context-switch costs
Architecture • Scratchpad memory • LRU caches • Statically Scheduled multi-threading • Parallelism instead of speculation • Static decisions instead dynamic decisions • Dealing with resources • Based on history
Predictability of Memory Systems no cache scratchpad fully predictable SW-contr. cache partially frozen PRLU cache cache with LRU PRLU cache cache with FIFO, random unpredictable fully dynamic fully static cf. talk offered by Jan Reineke
A New Research Agenda • Architecture design: Beyond EPIC • Programming languages/constructs • Schedulability analysis for distributed systems • Predictable real-time middleware