440 likes | 530 Views
Embedded Computer Architecture 5KK73 MPSoC. Controlling the Parallel Resources. flexibility. efficiency. DSP. Programmable CPU. Programmable DSP. Application specific instruction set processor (ASIP). Application- specific processor. Contents. GPUs revisited
E N D
Embedded Computer Architecture5KK73MPSoC Controlling the Parallel Resources
flexibility • efficiency • DSP • Programmable • CPU • Programmable • DSP • Application specific • instruction set • processor (ASIP) • Application- • specific processor
Contents GPUs revisited PicoChip Real-Time Scheduling basics Resource Management 3
GPU basics Synthetic objects are represented with a bunch of triangles (3d) in a language/library like OpenGL or DirectX plus texture Triangles are represented with 3 vertices A vertex is represented with 4 coordinates with floating-point precision Objects are transformed between coordinate representations Transformations are matrix-vector multiplications 4
GeForce 8800 GPU 7 330 Gflops, 128 processors with 4-way SIMD
GPU: Why more general-purpose programmable? All transformations are shading Shading is all matrix-vector multiplications Computational load varies heavily between different sorts of shading Programmable shaders allow dynamic resource allocation between shaders Result: Modern GPUs are serious competitor for general-purpose processors! 8
Real-time systems (Reinder Bril) • Correct result at the right time: timeliness • Many products contain embedded computers, e.g. cars, planes, medical and consumer electronics equipment, industrial control. • In such systems, it’s important to deliver correct functionality on time. • Example: inflation of an air bag
Cable modem DVB Tuner IEEE 1394 interface RF Tuner CVBS interface YC interface VGA DVD CDx front end Example: Multimedia Consumer Terminals (by courtesy of Maria Gabrani)
up-scaled Example: High quality video & real time TV companies invest heavily in video enhancement,e.g. temporal up-scaling Input stream: 24 Hz (movie) original Rendered stream: 60 Hz (TV screen)
up-scaled displayed Example: High quality video & real time TV companies invest heavily in video enhancement,e.g. temporal up-scaling Input stream: 24 Hz (movie) original • Deadline miss leads to “wrong” picture. • Deadline misses tend to come in bursts (heavy load). • Valuable work may be lost.
Real-time systems • Guaranteeing timeliness requirements: • real-time tasks with timing constraints • scheduling of tasks • Fixed-priority scheduling (FPS) is the de-facto standard for scheduling in real-time systems. • FPS: supported by • commercially available RTOS; • analytic and synthetic methods.
Recap of FPS • Fixed Priority Pre-emptive Scheduling (FPPS) • A basic scheduling model • Analysis • Example • Optimality of RMS and DMS
FPPS: A basic scheduling model • Single processor • Set of n independent, periodic tasks 1, …, n • Tasks are assigned fixed priorities, and can be pre-empted instantaneously. • Scheduling: At any moment in time, the processor is used to execute the highest priority task that has work pending.
FPPS: A basic scheduling model • Task characteristics: • period T, • (worst-case) computation time C, • (relative) deadline D, • Assumptions: • non-idling; • context switching and scheduling overhead is ignored; • execution of releases in order of arrival; • deadlines are hard, and D T; • 1 has highest and n has lowest priority. • No data-dependencies between tasks
1 2 3 4 5 6 1 2 3 time 0 10 20 30 40 50 60 WR1 = 3 WR2 = 17 WR3 = 56 FPPS: Example • Worst-case response time WR for task 3: First point in time that 1, 2, and 3 are finished Task 1 Task 2 Task 3
FPPS: Analysis • Schedulable iff:WRi Di for 1 i n • Necessary condition: • Sufficient condition for RMS:ULL(n) = n (21/n – 1), i.e. ri >rj iff Ti < Tj;Di = Ti.
FPPS: Analysis • Otherwise, • i.e. U 1 and not RMS, or • n(21/n – 1) < U < 1 and RMS • exact condition: • Critical instant: simultaneous release of i with all higher priority tasks • WRi is the smallest positive solution of
FPPS: Example • Task set Γ consisting of 3 tasks: • Notes: • RM priority assignment and Di = Ti(RMS); • U1 + U2 + U3 = 0.97 1, hence Γcould be schedulable; • Utilization bound: U(n) LL(n) = n (21/n – 1): • U1+U2 = 0.88 > LL(2) 0.83, • therefore U(3) > LL(3), hence another test required.
1 2 3 4 5 6 1 2 3 time 0 10 20 30 40 50 60 WR1 = 3 WR2 = 17 WR3 = 56 FPPS: Example • Time line Task 1 Task 2 Task 3
FPPS: Optimality of RMS and DMS • Priority assignment policies: • Rate Monotonic (RM): ri >rj iff Ti < Tj • Deadline Monotonic (DM): ri >rj iff Di < Dj • Under arbitrary phasing: • RMS is optimal among FPS when Di = Ti; • DMS is optimal among FPS when DiTi, • where optimal means: if an FPS algorithm can schedule the task set, so can RMS/DMS.
FPPS not suitable for multimedia multiprocessor!! Assumptions: • context switching and scheduling overhead is ignored; No longer true • deadlines are hard, and D T; No longer true • 1 has highest and n has lowest priority: No prorities • No data-dependencies between tasks: not true • Single processor: not true
Task Non-Preemptive Systems (Akash Kumar) • State-space needed is smaller • Lower implementation cost • Less overhead at run-time • Cache pollution, memory size
Why FPS doesn’t work for “future” high-performance platforms • Heavy-duty DSPs: Preemption not supported • If it was: Context switching is significant • Data-dependencies not taken into account • Multi-processor
Related Research – Feasibility Analysis Preemptive [Liu, Layland, 1973] B A D [Jeffay, 1991] Non-Preemptive C Homogeneous MPSoC [Baruah, 2006] P1 P2 P3 P4 P5 P6 [ , 2020??] Heterogeneous MPSoC
50 49 50 49 49 50 50 49 A A B B 50 49 49 50 Unpredictability – Variation in Execution Time P1 P2 P3
Problem No good techniques exist to analyze and schedule applications on non-preemptive heterogeneous systems Resource Manager proposed to schedule applications such that they meet their performance requirements on non-preemptive heterogeneous systems
B2 A2 D2 C2 Task Job Our Assumptions • Heterogeneous MPSoC • Applications modeled as SDF • Non-preemptive system – tasks can not be stopped • Jobs can be suspended • Lot of dynamism in the system • Jobs arriving and leaving at run-time • Variation in execution time • Very simple arbiter at cores
Application QoS Manager Application level few sec Reconfigure to meet above quality milliseconds Resource Manager B A Local Processor Arbiter Task level micro sec Core Resource Manager
Resource Manager Local Arbiter P1 P2 P3 Architecture Description • Computation resources available are described • Each processor can have different arbiter • In this model First Come First Serve mechanism is used • Resource manager can configure/control the local arbiters: to regulate the progress of application if needed
Resource Manager • Responsible for two main things • Admission control • Incoming application specifies throughput requirement • Execution-time and mapping of each actor • Repetition vector used to compute expected utilization • RM checks if enough resources present • Allocates resources to applications if admitted
Video Conf Play MP3 Typing Sms P1 Admission Control Resource Reqmt Exceeded! P2 P3
Resource Manager • Admission control • Budget enforcement • When running, each application signals RM when it completes an iteration • RM keeps track of each application’s progress • Operation modes • ‘Polling’ mode • ‘Interrupt’ mode • Suspends application if needed
Performance goes down! Better than required! Budget Enforcement (Polling) New job enters! Resource Manager job suspended! job resumed!
Experiments • A high-level simulation model developed • POOSL – a parallel simulation language used • A protocol for communication defined • System verified with a number of application SDF models • Case study done with H263 and JPEG application models • Impact of varying ‘polling’ interval studied