170 likes | 307 Views
Response Time Analysis of Tasks in Multiprocessor Systems. ArtistDesign Joint TA/MpSoC Cluster Meeting Date 2009. Simon Schliecker Mircea Negrean Rolf Ernst. Outline. Performance Abstractions for the Analysis of Real-Time Systems Multicore Architecture
E N D
Response Time Analysis of Tasks in Multiprocessor Systems ArtistDesign Joint TA/MpSoC Cluster Meeting Date 2009 Simon Schliecker Mircea Negrean Rolf Ernst Institute of Computer and Network Engineering
Outline • Performance Abstractions for the Analysis of Real-Time Systems • Multicore Architecture • Timing Implications and Countermeasures • Analytical Solutions • Conclusion
Software Timing Hierarchy … System Shared Memory HW Arbiter CPU1 CPU2 CoP Local Mem Resource T1 T2 activations Scheduler Task …
Local Scheduling Analysis • Large body of methods available to derive worst case response time for different scheduling policies • SPP, TDMA, RR, EDF,… • considering realistic scheduling effects (context switch times, offsets, cache-related preemption delay,…) Execution Single core execution (classical model) Preemption CPUa Stalling CPUa Single core execution (with explicit memory) Memory WCRT Valid assumption for single processor systems: Memory access times are part of the Core Execution Time Ci
Model task activation as event streams events D [ms] 2.5 Response Time Analysis requires traffic models: • Determined by • application model (Simulink, LabView, …) • environment model (reactive systems) • service contracts (max no of requests per time, …) Event Stream events T1 2.5 t [ms] Arrival Curves Extract key parameters (optional) T1 Derive event bounds P Period J Jitter dmin Minimum event distance
Component performance analysis • Output event model of one processor becomes input event model of successor compose multiple local analyses into system level analysis Comp 1 scheduling comp 1 P3 output event streams input event streams P4
Merging ECUs using Multicore-Architectures ECU1 ECU2 Current distributed system • all accesses to local resources • bus communication clearly specified and systematic local resources local resources CPU1 CPU2 Multi-core system • keep task sets and functions separate • accesses to local and shared resources • complicated, interleaved and less systematic communication timing local resources local resources core1 core2 shared resources MC-ECU
CPUa Single core execution (with stalling) Memory Multicore execution CPUa Memory CPUb WCRT Task Exection in Multicore Single core execution (classical model) Execution Preemption CPUa Stalling
Software Timing Hierarchy … System Shared Memory HW Arbiter Bottom-up broken CPU1 CPU2 CoP Local Mem Ressource T1 T2 activations Scheduler Task …
Countermeasures • Orthogonalization of resources • introduce schedulers that give upper bounds on interference independently of competing streams • at least perform traffic shaping • imposes strict hardware guidelines • protection from partially false system specification • prone to overprovisioning (not so much in hard real-time setups) • Use formal analysis that covers dynamism • Find realistc upper bounds on application behavior • Provide formulas and analysis methods matching actual system • requires comprehensive knowledge of hardware behavior to set up analysis • requires safe assumptions about behavior of the software • allows considering dynamic schedulers and load 3. Mix of the above
Multicore with dynamically shared resources To formally analyse multicore systems with dynamic shared ressource arbiters, we need: • Traffic Models to the Shared Ressource Reuse methods from the WCET community • Analysis of Latency of Shared Ressource Accesses Basically covered with system level approach, but new focus on multiple events • Analysis of Impact of Shared Ressource Access Latency on Task Response Time Extensions of single processor scheduling analysis
Extended Analysis Loop environment model 1. input traffic description Extended response time analysis Shared resource access analysis 3. 2. output traffic description until convergence or non-schedulability
1. Derivation of Request Latencies Control Flow Graph • “Transactions” need to be explicit part of task description • Problem: Cache Misses occur only during runtime • Use cache modeling from WCET tools • Improvement: minimum distance between requests is bounded by “best case” execution • accumulate for each task to derive load on shared resource Transactions Tra(1) Tra(1) Tra(2) Maximum number of requests Longest Execution Path
CPU1 a) Memory CPU2 CPU1 b) Memory CPU2 2. Derivation of Shared Resource Latencies • Request times are very dynamic • System state at request time is dynamic Individual request times difficult to track • calculate total interference • for all requests during execution • using load bonds (previous slide) „aggregate busy time“ • Given a certain amount of dynamism „sum-of-all-requests“ is close to actual worst-case
3. Considering Resource Access in Response Time • „Processor stalls during accesses“: • Note: Memory acesses included at the WCRT analysis level CPUa Memory CPUb • Processor allows Multithreading: • Self-suspension to increase utilization • Parallelism of „local execution“ and „transaction processing“ difficult to prove! • Exact solutions have exponential complexity • Careful when combining MT with SPP in hard-real time [Codes06]
Multi-core Consequences / Conclusion • merging ECU functions on multi-core impacts function timing • Cross-processor interference due to shared resources occurs • Shared memories • Shared coprocessors / hardware • Logical locks (DATE 2009) • new analysis algorithms are available • compatible to system level analysis • can work with incomplete and estimated task sets • Use analysis to optimize performance and cost: - in early design stages to guide towards optimal design choices • refine input data during design process to provide verification strength guarantees for final product