280 likes | 499 Views
Operating System Requirements for Embedded Systems. Rabi Mahapatra. Complexity trends in OS. Small controllers. OS functionality drives the complexity. sensors. Home appliances. Mobile phones. PDAs. Game Machines. Router. Requirements of EOS.
E N D
Operating System Requirements for Embedded Systems Rabi Mahapatra
Complexity trends in OS Small controllers • OS functionality drives the complexity sensors Home appliances Mobile phones PDAs Game Machines Router
Requirements of EOS • Memory Resident: size is important consideration • Data structure optimized • Kernel optimized and usually in assembly language • Support of signaling & interrupts • Real-time scheduling • tight-coupled scheduler and interrupts • Power Management capabilities • Power aware schedule • Control of non-processor resources
Embedded OS Design approach • Traditional OS: Monolithic or Distribute • Embedded: Layered is the key (Constantine D. P, UIUC 2000) Power Management Basic Loader Interrupt & signalling Real-Time Scheduler Memory Management Custom device support Networking support
Power Management by OS • Static Approach: Rely on pre-set parameters • Example: switch off the power to devices that are not in use for a while (pre-calculated number of cycles). Used in laptops now. • Dynamic Approach: Based on dynamic condition of workloads and per specification of power optimization guidelines • Example: Restrict multitasking progressively, reduce context switching, even avoid cache/memory access, etc..
Popular E-OS • WinCE (proprietary, optimized assembly..) • VxWorks • Micro Linux • MuCOS • Java Virtual Machine (Picojava) OS • Most likely first open EOS!
Interrupts • Each device has 1-bit “arm’ register to be set by software if interrupt from the device to be accepted. • CCR is used to program the interrupts • A good design should provide for extensibility in the number of devices that can issue interrupts and also number of ISRs. • Either polled or vectored interrupts depending on nature of processors and I/O devices. • Polling: Dedicated controllers, data acquisition with periodicity and the I/O devices are slow • Interrupts: Real-time environments, when events are unpredictable and asynchronous
Direct Memory Access • DMA is used when low latency and/or high bandwidth is required. (disk IO, video output or low latency data acquisition) • Software DMA: starts with normal interrupts, the ISR sets the device resisters and initiate I/O, processor returns to normal operation, on completion of I/O device inform the processor. • Hardware DMA: the above can be implemented in hardware • Burst DMA: when buffers are put in I/O devices (disk) • Low latency asynchronous I/O can not use burst DMA.
Real-Time Scheduling • Interrupts are heavily used in scheduling when real-time events are to be completed by some deadline. • Events or threads or tasks or processes need to use priority, deadline, blocking, restoring and nesting • NP-hard problem with out an optimal solution. • Greedy heuristics are proposed as working solutions with some assumptions. • Dynamic RT Scheduling: Use greedy heuristics together with priority-based interrupts.
OS directed power reduction • Dynamic power management: determine the power state of a device based on the current workload, move through the power transitions based on shot down policy • Usually, in stead of power off/on, there are dynamic voltage setting and variable clock speeds => multiple power states • Previous works: • Shot down device if idle long enough • Hardware centric => observe past requests at device to predict future idleness, no OS info, no study on characteristics of requsters • Use stochastic model and assume randomly one request without distinguishing the source of the requester
OS directed power reduction • Disk request sources: compiler, text editor, ftp program • Network card: internet browser or telnet session? • Important that we have accurate model of requesters in concurrent environment.( Task Based Power Management) A software-centric approach • Two methods to reduce power: adjust CPU clock speed, sleeping states
Process states new terminated admitted interrupt exit ready running Scheduler dispatch IO or event wait IO or event completion waiting
TBPM’s supplement on device drivers Four problems: • Requesters are generated by multiple tasks. TBPM uses the knowledge from OS kernel to separate tasks • Tasks are created, executed and terminated. (DD has know knowledge on multiple tasks and their termination) • Tasks have different characteristics in device utilization. • Task can generate requests while running. TBPM considers CPU time of tasks while deciding the power states Data structures: • device-requester utilization matrix U (d, r) : utilization of device d by requester r; • processor utilization vector P ( r ) : percentage of processor time used by requester.
Updating U, P Gcc emacs netscape • U matrix example: 12 HDD 0.4 0.7 NIC 0 0 2.3 Matrix element refers to the reciprocal of the average Time between requests (TBR) TBRn = . TBR + (1- ). TBRn-1 U(d,r) – 1/ TBRn 0 < <1 If = 0, TBRn is constant using the first TBR and for = 1, TBRn is last TBR.
Updating U, P • P(r ) is the percentage of CPU time executing task r or = CPU time (r )/ CPU time by all requester • Updated based on sliding window scheme but not a discounted scheme as used for U. • Incase of IO bound bursty requests, TBR will show on high utilization but can not capture the running time requirements • Sliding window is used to compute CPU time distributed among processes. But the window time should be such that it samples all processes (long) and also reflect the workload variation (short).
Shutdown condition • Break-even-time: minimum length of idle time • Depends on device characteristics • Independent of workloads • Performance Consideration • Interactive system: If many shutdowns issued in short time, will increase response time => degrade “perceived interactivity” • User might react to obtain response and hence steep increase in system load. • Restrict two consecutive shutdowns within “time to wake up” (say)
TBPM Procedure • Integration of power management with process management new terminated Allocate column Delete column ready running Update P(r ) Update P(r ) Update U(d,r) waiting
TBPM procedure • A requester column is allocated when a new task is created and the column is deleted when task terminates • Utilization set to zero but updated on issue of a request. The PM evaluates the utilization in the process scheduler. • In lightly loaded system: • Sparse requests will not cause the PM to keep a device in working state long since P( r ) is small for this requester • With heavy workload: • Does not use device frequently since the PM shuts the device after its use ( U(d) is small)
Experiments • Platform: • Personal computer, TBPM in Linux kernel, Redhat 6.0 • To control power states of HD and network transmitter ( wireless) • Modify kernel and device drivers of PC with xWindow and NW, configured as a client, server daemons (http server, internet news server) are turned off, cron tasks are scheduled at low frequencies. Power state changes in an HD and NIC are emulated with two states: working & sleeping • To compare with other PM policies, power state changes were emulated without actually setting the hardware power state. By maintaining a set of variables, record was maintained on device statistics – number of shutdown and wake up by various policies. • See tables 1 for hardware parameters
Experimental Results • Other PM policies: • Exponential regression relationship between two adjacent idle periods • Event driven semi markov model • Policy that set the time out value to Tbe • Time out with one and two minutes • At least 10 hours of work load running • Table 2 shows the compared results • Ts: time in the sleeping state, Nd: number of shutdowns, St: longest sequence that cause delay ever 30 sec, Pa: average power in W, R: power consumption relative to TBPM.
Dynamic Voltage Scaling in processors • Processor usage model • Compute intensive: use full throughput • Low-speed: fraction of full throughput, not required fast processing • Idle Compute intensive and Short-latency process Maximum processor speed Desired Throughput System idle Background & Long-latency processes
Why DVS • Design objective of a processor: provide the highest possible peak throughput for compute-intensive tasks while maximizing battery life for remaining low-speed and idle periods • Common Power saving technique: Reduce clock frequency during non-compute-intensive activity. • This reduces power but not the total energy consumption per task, since energy consumption is frequency independent to a first order approximation. • Conversely, reducing voltage improves the energy efficiency, but compromises peak throughput
DVS • If, both clock frequency and voltage are dynamically varied in response to computational load demands, then energy consumed per task can be reduced for low computational period and while retain the peak throughput when required • The strategy, which achieves the highest possible energy efficiency for time-varying computational load, is called DVS.
DVS overview • Key components: • an OS: that can intelligently vary the processor speed, • regulation loop: to generate minimum voltage required for desired frequency, • processor: that can operate over wide voltage range • Circuit characteristics? (F – V) • HW or SW control of Processor speed? • SW, since hw can not know if instruction being executed is part of compute intensive task! • Control from Application program? • NO, it can not set the processor speed being unaware of other tasks. But can give useful information about their requirements.
DVS • As frequency varies, Vdd must vary in order to optimize energy consumption. But the SW is not aware of minimum required supply voltage for a given speed. It is a function of hw implementation, process variation and temperature • A ring oscillator provides this translation CPU 64KB SRAM ARM8 16KB cache B U S 3.3V Co-proc V C O Write buffer I/O Chip Fdesired Vdd Regulator V battery
DVS • Know the transition time and transition energy in order to know cost of interrupt and wakeup latency. • Voltage scheduler as new OS component: • Controls the processor speed by writing desired f to system’s control register, that is used by regulation loop in adjusting the voltage & frequency • operates processor at minimum throughput level required by current tasks and minimizes energy consumption • Note: job of determining optimal frequency and job scheduling are independent of each other. • Hence, voltage scheduler can be retrofitted to the OS.
Voltage Scheduling Algorithm • Determines the optimal clock frequency by combining computation requirements of all the active tasks in the system and ensure that latency requirements are met given the task ordering of temporal scheduler. • Multiple tasks case is complex. Considers predicting the workload and updated by the VS at the end of each task. • Research issue!