Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communicati

Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student, University of Twente, NL Co-author and supervisor: Marco Bekooij, NXP Semiconductors Research Gerard Smit, University of Twente

Outline • Context • Streaming applications • Programming multiprocessor architectures • Problem • Problem statement • Related work • Variable Rate Dataflow • Chain topology • Arbitrary graph topology • Experiment • Conclusion [Wiggers – DATE 2008, Wiggers – RTAS 2008] Maarten Wiggers -- University of Twente

Outline • Context • Streaming applications • Programming multiprocessor architectures • Problem • Problem statement • Related work • Variable Rate Dataflow • Chain topology • Arbitrary graph topology • Experiment • Conclusion Maarten Wiggers -- University of Twente

Multi-stream car-entertainment system Maarten Wiggers -- University of Twente

Application model • Jobs process streams of data • Jobs are composed of tasks • Simultaneously running jobstogether form use-cases • Jobs often have real-time requirements • Firm (FRT) if deadline misses arehighly undesirable (steep quality degradation) use-case use-case FRT video job task input data stream task task output stream to display input data stream task task output stream to speakers FRT audio job Maarten Wiggers -- University of Twente

Task graphs • Jobs are implemented as task graphs • Tasks communicate fixed-sized containers over fixed-sized FIFO buffers • Container is a place-holder for data • Task has random access in container • Task only starts an execution on sufficient • Full containers in input buffers • Empty containers in output buffers (back-pressure)‏ • Backpressure robustly prevents buffer overflow • Required quanta of containers can be • Known at design-time • Dependent on the actual processed stream Maarten Wiggers -- University of Twente

Example job – MP3 playback • MP3 decoding task consumes a variable number of bytes per frame • Every execution a different number of bytes consumed • BR task executes a-periodically • No static-order schedule for BR and MP3  run-time arbitration • Throughput constraint : sink needs to execute strictly periodically • All tasks are pushing data towards the sink • For sufficiently large buffers, sink can execute strictly periodically n=[0,960] Maarten Wiggers -- University of Twente

Example job – H.263 video decoder • Variable length decoder (VLD) consumes a variable number of bytes per frame • VLD produces a variable number of blocks per frame • DQ and IDCT process blocks • Motion compensator assembles a frame from blocks • Throughput constraint : sink needs to execute strictly periodically m=[0,6536] n=[0,2376] Maarten Wiggers -- University of Twente

Application trend • Behaviour of applications is increasingly input-data dependent, e.g. • Entropy encoding • Adaptation to channel conditions by digital radio’s • Reflected in • Input-data dependent execution times • Conditional execution of code • Mode changes • Input-data dependent execution rates • Input-data dependent execution rates requires run-time arbitration Maarten Wiggers -- University of Twente

Trend  challenge • Required properties • Functionally deterministic behaviour: output values completely determined by input values • Deadlock free • Throughput constraint satisfied • Research challenge is to define models • For which required properties are decidable • Can model applications with input-data dependent behaviour • Include effects of run-time arbitration • E.g. Variable-Rate Dataflow Maarten Wiggers -- University of Twente

Multi-processor architecture template • Multi-processor system required for performance and power reasons External SDRAM P DSP I/O $ ctrl mem Arb CA NI NI NI NI Network-on-Chip [Hansson – TODAES 2008] Maarten Wiggers -- University of Twente

Compute settings multiprocessor instance (cyclic) task graph WCET throughput and latency constraint Dataflow synthesis scheduler settings and buffer capacities Maarten Wiggers -- University of Twente

Compute settings • Guarantees on end-to-end throughput requires guarantees on deadlock-freedom • Models that provide end-to-end throughput guarantees are not Turing complete • Poses restrictions on • Applications : e.g. inter-task synchronisation behaviour • Architectures : e.g. applicable run-time arbitration schemes • Goal: define a model that can guarantee throughput for H.263 Maarten Wiggers -- University of Twente

Example • Every execution, task B can choose to consume either 2 or 3 • Required buffer capacity for deadlock freedom? Maarten Wiggers -- University of Twente

Example (cont.)‏ • Attempt : assume maximum consumption quantum in every execution • Requires buffer capacity of 3 for deadlock freedom Maarten Wiggers -- University of Twente

Example (cont.)‏ • However, when consuming the minimum quantum • Buffer capacity of 3 is insufficient! Maarten Wiggers -- University of Twente

Example (cont.)‏ Maarten Wiggers -- University of Twente

Example (cont.)‏ Deadlock! Maarten Wiggers -- University of Twente

Problem • Compute buffer capacities • Guarantee satisfaction of throughput constraint • Tasks can require data-dependent quantum of data and space per execution Maarten Wiggers -- University of Twente

Problem • Compute buffer capacities • Guarantee satisfaction of throughput constraint • Tasks can require data-dependent quantum of data and space per execution • Assumptions • Run-time arbitration on shared resources • Upper and lower bounds on transferred quanta • Upper bound on execution time • Throughput constraint: sink or source that executes strictly periodically Maarten Wiggers -- University of Twente

Related work • Quasi static-order scheduling • Transfer quanta change only after (sub) graph iterations • For every iteration a static-order schedule computed • Bounded memory is decidable • Models are amenable for code-synthesis • Examples • Heterochronous Dataflow [Girault – TCAD 1999] • Parameterised Dataflow [Bhattacharya – TSP 2001] • Requirement on changes only after graph iterations is a global requirement • Iteration is a graph property • VLD parses stream and decides next quantum locally • Static order scheduling excludes overlapped schedules of graphs with different transfer quanta Maarten Wiggers -- University of Twente

Requirements on quanta change Maarten Wiggers -- University of Twente

Requirements on quanta change Quasi static-order scheduling: 2*A and 3*B before change Maarten Wiggers -- University of Twente

Requirements on quanta change Variable-Rate Dataflow: can change every firing Maarten Wiggers -- University of Twente

Related work • Variable token sizes instead of variable number of transferred tokens • [Sen – ASSP 2005] • Experiment will show that this results in larger buffers • Variable consumption quantum by VLD depends on processed stream • BR task is unaware of the semantics of the stream  cannot know quantum Maarten Wiggers -- University of Twente

Related work • Run-time arbitration • Not required to compute schedules at design-time • Only need to show that for all transfer quanta a schedule exists • State-of-the-art • Real-time calculus (group of Thiele at ETH Zurich) • Symta/S (group of Ernst at TU Braunschweig) • These approaches have • Difficulties with cyclic dependencies that influence the temporal behaviour • No means to reason about bounded memory or deadlock properties • E.g. no concept similar to consistency Maarten Wiggers -- University of Twente

Phase 1 • Next slides discuss buffer capacity computation in case of chain topology Maarten Wiggers -- University of Twente

Phase 1 and 2 • Next slides discuss buffer capacity computation in case of chain topology • Subsequent slides discuss extension to graphs Maarten Wiggers -- University of Twente

Variable Rate Dataflow (by example) Implementation = Task graph Model = Dataflow graph Maarten Wiggers -- University of Twente

Task graph Tasks Buffers Tasks Have a bounded response time Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph Actors Queues Actors Have a fixed response time Consume tokens atomically at the start Produce tokens atomically at the finish Queues have infinite depth Variable Rate Dataflow Maarten Wiggers -- University of Twente

Execution time  response time time-slice period Maarten Wiggers -- University of Twente

Execution time  response time time-slice period Explained in detail in [Wiggers – RTAS 2007] Generalisation that includes all starvation-free schedulers in [Wiggers – SCOPES 2007] Maarten Wiggers -- University of Twente

Task graph Tasks Buffers Tasks Have a bounded response time Consume and produce data between start and finish Buffers have a finite and fixed capacity Dataflow graph Actors Queues Actors Have a fixed response time Consume tokens atomically at the start Produce tokens atomically at the finish Queues have infinite depth Variable Rate Dataflow Input specification Analysis vehicle Maarten Wiggers -- University of Twente

Approach • Model task graph on architecture by Variable-Rate Dataflow graph • Let actor vτ model the throughput constraining task • Compute sufficient number of tokens to enable actor vτ to execute strictly periodically • Computed number of tokens equals required buffer capacity • One-to-one correspondence • Containers in task graph – tokens in dataflow graph • Enabling condition task – firing rule actor • Containers consumed and produced – tokens consumed and produced • Execution times of actors are upper bound on execution times of tasks • Self-timed execution of Variable-Rate Dataflow is temporallymonotonic Maarten Wiggers -- University of Twente

Monotonic temporal behaviour • VRDF actors have sequential firing rules [Lee – 1995] • The number of tokens that is required to be present on inputs is completely determined by already consumed tokens • VRDF actors are functional • The produced tokens are a function of the consumed tokens • Given self-timed execution. If a token arrives earlier on an input, then • This can only lead to an earlier satisfaction of the firing rule, and • This can only lead to an earlier production of the same tokens • E.g. a smaller response time of a VRDF actor cannot lead to any later token arrival time • Because of scheduling anomalies this is not true for the task graph! • A smaller response time can lead to later container arrival times • Token arrival times conservatively bound container arrival times Maarten Wiggers -- University of Twente

Approach – computation of suff. tokens • Find valuation of token transfer parameters that lead to maximum required token transfer rates • On each edge, take maximum required rate as the slope of • A linear upper bound on token production times, and • A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative • Offset is relative to start of first firing of actor • Use linear bounds to compute sufficient number of initial tokens • This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente

Approach – step 1 • Determine on each edge the maximum required transfer andfiring rates • Sink has to fire strictly periodically • Maximum required transfer rate on edge for • Maximum consumption quantum • Maximum required firing rates of A for • Minimum production quantum Maarten Wiggers -- University of Twente

Approach – step 2 • Given linear bounds on production and consumption times • Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Maarten Wiggers -- University of Twente

Approach – step 2 • Given linear bounds on production and consumption times • Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum  larger difference between bounds Maarten Wiggers -- University of Twente

Approach – step 2 • Given linear bounds on production and consumption times • Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum  larger delay next start time If largest quantum betweenbounds, then every sequencebetween bounds Maarten Wiggers -- University of Twente

Approach – step 3 • Difference between linear bounds is buffer capacity Buffer capacity is maximum difference between tokens consumed and produced Maarten Wiggers -- University of Twente

Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communicati