240 likes | 405 Views
Managing dynamic concurrent tasks in real-time multi-media systems. Francky Catthoor, IMEC, Belgium. A methodology to map dynamic and concurrent real-time applications on an embedded multi-processor platform. Goal of our research. JPEG. MPEG4. T4. T1. T3. T1’. T2.
E N D
Managing dynamic concurrent tasksin real-time multi-media systems Francky Catthoor, IMEC, Belgium
A methodology to map dynamic and concurrentreal-time applications on an embedded multi-processor platform Goal of our research
JPEG MPEG4 T4 T1 T3 T1’ T2 Why are Applications becoming more dynamic and concurrent? T1 The workload decreases but the tasks are dynamically created and their size is data dependent
Divide-and-conquer approach of today leads to local solutions With this code my boss has to give me a raise!!! TASK1 Cost(P) h Exec.Time MPEG-4 : multimedia spec = too huge to handle as 1 task => break up in many interacting tasks
This didn’t look so good after all??? t1 h t2 t1+t2+t3<T h t3 Global picture: ad hoc TASK1 Processor 1 h TASK2 TASK3
Luckily we have the Pareto approach! x h t1n h x t2n t1n+t2n+t3n<T h x t3n Global trade-offs with cost-performance curves TASK1 Processor 1 TASK2 TASK3
Outline • Motivation: new challenges in system-level design • Overview of TCM methodology • Cost-efficient design-time scheduling
3D object CPU ? Bus access Video object Mem. size Tasks Terminal Terminal QoS requires new approach Solution: Pareto curves • Maximize Quality • Minimize Power Limited Resources See session Multi-media 1
Very dynamic behaviour so worst-case realisation too costly 1000 900 proj. surface (a) 800 700 600 500 Frame period (ms) & Projected Surface (# pixels/200) measured period (b) 400 300 predicted period (c) 200 100 0 0 50 100 150 200 250 300 350 400 450 500 550 600 Frame number
OUT 622 Mb/s • Processes • Dynamic and concurrent processes • Global/local control • Non-deterministic events ISR time datain out Packet Record IN 622 Mb/s FIFO Routing Record • Complex data sets • Large and irregular dynamically allocated data • Huge memory accesses routingreply data out OUT 200 accesses 622 Mb/s Stringent real-time constraints 53 cycles System design issues in IT-Application domain Embedded system => cost Network layer protocols (ATM, IP) Dynamic multi-media algorithms (MPEG4/7/21) Wireless/wired terminals (Internet, WLAN)
Real-time constraints, tasks and data in the IM1-MPEG4 protocol Application Executive Service ALManager Presenter Flex Data Decoders Critical path Demux Channel BIFS Data Composition Memory Decoding Buffer Channel OD Data Channel ... Data Channel 30 msec Pre/post Decoding Pre/post writing Rendering
Live from FZ-TV Why aren’t multi-processor platforms used now in embedded domains? • efficient mapping requires a very high design effort when done manually • => need for a cost-sensitive real-time system compiler
Outline • Motivation: new challenges in system-level design • Overview of TCM methodology • Cost-efficient design-time scheduling
Algorithms + Data Structures Architecture ARM RAM ROM IP1 IP2 Processor architecture ROM custom logic micro processor ROM MMU Platform integration RAM DSP RAM Requirements of system level design approach Data mngnt Concurrency mngnt Platform constraint TCM approach: ICPP’00, Kluwer book’99
C/C++ specification of dynamic concurrent system Memory architecture Task-level DTSE Real-time constraints Concurrency improving transformations platform Static task scheduling Real-time constraints cost cost cost task1 task2 task3 time time time platform Run-time scheduler Real-time constraints processor1 time processor2 processor3 Extraction of the gray-box model
Outline • Motivation: new challenges in system-level design • Overview of TCM methodology • Cost-efficient design-time scheduling
ARM Processor 2 ARM Processor 1 Reduce global system energy by task scheduling + assignment (e.g. 2-processor approach ) Task1 Task2 Taskn Vdd=1V Vdd=3.3V Codes’01, J.of Sys.Arch, summer 2001
Non-optimal points x x x x Processor alloc/assign and scheduling alternatives for code version 1 Trade-off between time budget (period/latency) and cost (e.g.energy) leads to Pareto curves Cost CB6 CB5 CB4 CB3 CB2 CB1 Time
Transformations shift the Pareto Curve • Decrease cost for same Time-Budget • Increase performance for same Cost-Budget TCM trafo New curve for case 2 TCM transformations on applicationmodel shift Pareto curve to origin Energy Bottleneck of case 1 Scheduling alternatives for case 1 Time Budget
Comparison for original and transformed IM1 graphs on 2 processors with different Vdd original Transformed
Comparison between different processor platforms 6 Combination 2 5 4 3 2 1 Global Pareto curve Combination 3 (Global Pareto curve) 6 5 4 3 2 1
Comparison of genetic algorithm (GA) and heuristic task scheduling results
Design-time Scheduling Design-time Scheduling Run-time Scheduling 1 3 2 A B 1 A B 3 2 • Design-time scheduling: at compile time, exploring all the optimization possibility • Run-time scheduling: at run time, providing flexibility and dynamic control at low cost as part of synthesized RTOS Overall solution: combination of design- and run-time schedulers Cases’00, Design&Test- Sep.’01
Main messages • Embedded multi-media applications are becoming very dynamic and concurrent in nature => RTOS essential • Task Concurrency Management approach provides the flexibility and optimization possibility while limiting the run time computation complexity • A multiprocessor platform with different working voltages potentially provides an energy saving solution • Application-specific run-time scheduling technique combined with design-time scheduling to provide cost-performance Pareto-curve essential for effective solution