1 / 62

EMERALDS: a small-memory real-time microkernel

Explore the features and benefits of EMERALDS, a small-memory real-time microkernel for embedded applications. This paper covers task scheduling, efficient semaphore implementation, intertask communication, memory protection, and system calls.

cjustice
Download Presentation

EMERALDS: a small-memory real-time microkernel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EMERALDS: a small-memory real-time microkernel Khawar M. Zuberi, Padmanabhan Pillai, and Kang G. Shin University of Michigan September 22, 2005 Seo, Dongmahn

  2. Detail and recent Paper EMERALDS: A Small-Memory Real-Time Microkernel Khawar M. Zuberi, Microshoft Corp. Kang G. Shin, University of Michigan IEEE Trans. on Software Engineering, vol 27, no. 10, pp. 909-928, October 2001

  3. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  4. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  5. Introduction • Real-time computing systems • predictability • real-time operating system (RTOS) • real-time tasks, deadlines • not block higher-priority tasks with lower-one or communication activities • wide variety of real-time environments • real-time application • from multimedia to industrial automation control • hardware • from single-board computer to distributed systems to multiprocessors

  6. Introduction (cont) • RTOSs • Commercial RTOSs • pSOS • QNX • VxWorks • Research RTOSs • for multiprocessors • HARTOS • Spring Kernel • for distributed platforms • Harmony • RT-Mach

  7. Introduction (cont) • real-time computing today • no longer limited to • high-powered, expensive applications • slow processors : tens of kilobytes • slow fieldbus networks : 1~2 Mbit/s bandwidth • 2 main reason for using such restricted hardware • To keep production costs down in mass-produced items such as home & portable electronics and automotive control • automotive engine & ABS controllers, cellular phones, camcorders • To keep weight and power consumption low in avionics and space applications

  8. Introduction (cont) • RTOS kernel • about 20kbytes • services • task scheduling, system calls, interrupt handling • minimal overheads • EMERALDS • RTOS for a small-memory embedded systems • achieving efficiency • to rely not on carefully-crafted code but on new OS scheme and algorithms • focus on key OS services • Task scheduling, Semaphores, Intra-node message-passing, Memory protection and system call overhead

  9. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  10. Embedded Application Requirements • target embedded application • use single-chip microcontrollers • with slow processing cores running at 15~25MHz • Motorola 68332, Intel i960 • all ROM and RAM on-chip, limit to 32~128kbytes • uniprocessor or distributed, 5~10 nodes • RTOS must provide • Task scheduling • Task synchronization (semaphore) • Task communication (message-passing) • Memory protection • Interaction with external environment (interrupt handling) • Clock and timer services

  11. Task Scheduling • periodic task • 50~100 us OS scheduling operation on slow processor • 10~20 tasks, 3~5 tasks with period less than 10 ms • 10~15% of CPU time • problems • Schedules must be calculated by hand • difficult, costly to modify • Heuristics can be used, but non-optimal solution • Cyclic schedulers give poor response times for high-priority aperiodic tasks • consuming significant amouts of memory with short and long period tasks

  12. Task Scheduling (cont) • priority-driven schedulers • rate-monotonic (RM) • earliest-deadline-first (EDF) • no off-line analysis, easy handle changes and handle aperiodic tasks • 10~15% of the CPU time overhead

  13. Task Synchronization • OOP is ideal for designing real0time software • object modeling entities • internal data : physical state of the entity • temperature, pressure, position, RPM, etc. • methods : read or modified state • modeled by objects • sensors, actuators, controllers • real-time software • collection of threads of execution • invoking the methods of various objects • mutual exclusion • Semaphore • acquire & release • new and efficient schemes for implementing semaphore locking in EMERALDS

  14. Task Communication • traditional mechanism, mailbox • 2 major disadvantages • 50~100us overhead for each message • several thousand messages per second is needed • no multiple message send • global variables used by application designers • subtle, hard-to trace, bugs in the software • new mechanisms for intertask communication • state message paradigm • protected global variables • optimized basic state message scheme • reduce execution overhead and memory consumption

  15. Memory Protection • providing memory protection requires • maintaining page tables • programming memory management unit • problem • size of the kernel • additional overhead to several kernel • in embedded system • all process are cooperative and will never try to harm another process • BUT, bug in application code • TRAP to the kernel and recovery action • software fault-tolerance • in EMERALDS • kernel is mapped into each user-level address space

  16. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  17. Overview of EMERALDS • microkernal RTOS written in the C++ language • EMERALDS’ salient features • Multi-threaded processes • Full memory protection between processes • Threads scheduled by the kernel. • IPC based on message-passing and mailboxes, Shared-memory support • Optimized local message passing • Semaphores and condition variables for synchronization; priority inheritance for semaphores • Support for communication protocol stacks • Highly optimized context switching and interrupt handling • Support for user-level device drivers

  18. Overview of EMERALDS (cont) • small-sized kernel • less than 20 kbytes • no file system • only in-memory • no naming services • exchange short, simple messages over fieldbuses • talking directly to network device drivers • no built-in protocol stack • just 13 kbytes of code

  19. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  20. What Makes EMERALDS Different? • general-purpose microkernel • Mack, L3, SPIN • focus on optimizing kernel services • thread management, IPC, virtual memory management • EMERALDS • no virtual memory • different sources of overhead from GPOS • thread management is same as GPOS • system call • enter protected kernel mode • call kernel procedure • low-overhead transition between user and kernel modes • provide efficient RT scheduling of thread • IPC : inter-node networking at user level • Task synchronization : interested in uniprocessor locking

  21. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  22. Combine Static / DynamicScheduler • task scheduler overhead • run-time overhead • the time consumed by execution of scheduler code • schedulability overhead • 1- U* (ideal schedulable utilization) • EDF : U* = 1, high run-time overhead • RM : U* = 0.80 • static and dynamic priority schedulers’ B/W • dynamic one is better for aperiodic tasks • static one is better for guarantee for completion of critical tasks under processor overload situations

  23. Run-time Overhead • run-time overhead • parsing queues of tasks • adding/deleting tasks from queues • blocking overhead ∆tb • selection overhead ∆ts • unblocking overhead ∆tu • run-time overhead per task τi = ∆tb + ∆tu + 2∆ts every period • a run-time overhead of • utilization

  24. Run-time Overhead (cont) • EDF, ∆ts = O(n), twice • RM, ∆tb = O(n), once • ∆ts is less for RM than it is for EDF • especially when n is large (20 or more)

  25. Schedulability Overhead

  26. CSD: a Balance between EDF and RM • the Combined Static/Dynamic (CSD) scheduler • EDF and RM • run-time overhead of CSD is less than that of EDF, little more than that of RM. • 2 queues of tasks • dynamic-priority (DP) queue by EDF • fixed-priority (FP) queue by RM

  27. Run-Time Overhead of CSD • Zero schedulability overhead of CSD • 4 cases for run-time overhead • DP task blocks ∆ts = O(r) ∆tb = O(1) • DP task unblocks ∆ts = O(r) ∆tu = O(1) • FP task blocks ∆ts = O(1) ∆tb = O(n-r) • FP task unblocks ∆ts = O(r) ∆tu = O(1) • total scheduler overhead for CSD • ∆tb + ∆ts_block + ∆tu + ∆ts_unblock per task block/unblock operation • for DP tasks, O(1) + O(r) + O(1) + O(r) = 2O(r) • for FP tasks, O(n-r) + O(1) + O(1) + O(r) = O(n) • significantly less than that of EDF • slightly greater than that of RM

  28. Schedulability Test • EDF • RM • CSD • start by assuming r = 0 and perform the schedulability test • if successful, then stop, otherwise keep increasing r

  29. Reducing Run-Time Overhead of CSD • main advantage of CSD • EDF, good schedulable utilization • by keeping the DP queue short • if workload increases • length of DP queue also increases • degrades performance of CSD • modified CSD • to keep run-time overhead under control • as the number of tasks n increases

  30. Reducing Run-Time Overhead of CSD (cont) • Controlling DP Queue Run-Time Overhead • split DP queue into 2 queues DP1 and DP2 • CSD-3, since using 3 queues • Run-Time Overhead of CSD-3

  31. Reducing Run-Time Overhead of CSD (cont) • Allocating Tasks to DP1 and DP2 • 2 factors • balancing of 2 queues • balancing the run-time overhead and scheduling overhead between queues • exhaustive search to find best possible allocation of tasks to DP1, DP2, and FP • schedulability test O(n2) times for three queues • 2~3 minutes on a 167MHz Ultra-1 Sun workstation for a workload with 100 tasks

  32. Schedulability Test for CSD-3 • EDF, DP1 • EDF, DP2 • FP

  33. Beyond CSD-3 • can be extended to have 4, 5, …, n queues • the best number of queues • the best number of tasks per queue • computationally-intensive task • the usefulness of the general CSD scheduling framework • beneficial in real systems

  34. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  35. Efficient Semaphore Implementation • providing full semaphore semantics with priority inheritance • optimize implementation of these semaphores • by exploiting certain features of embedded applications

  36. Standard Semaphore Implementation • standard procedure to lock a semaphore if (sem locked) { do priority inheritance; add caller thread to wait queue; block; /* wait for sem to be released */ } lock sem; • EDF • context switch overhead • focus on eliminating one or more context switches • FP • priority inheritance (PI) overhead • focus on optimization efforts on the PI operations

  37. Implementation in EMERALDS • eliminate context switch • coder parser • add an extra parameter • optimize first PI • observation, parsing FP queue • optimize second PI • switch position when inherit in first PI operation

  38. Applicability of the New Scheme • problems • may miss deadline • context switch is not saved • no benefit comes out of our semaphore scheme • problems can be resolves • Modification to the Semaphore Scheme • check if Semaphore is available or not • special queue associated with Semaphore • block before acquire_sem() • unblock after release_sem()

  39. Applicability of the New Scheme (cont) • Applicability under Various Blocking Situations • 2 types of blocking • Blocking for Internal Events • Block for External Events • can be periodic or acperiodic

  40. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  41. State Messages for Intertask Communication • global variables • ideal for sharing information between tasks • subtle bugs in the application code • State message • use global variables to pass messages • managed by code generated automatically by a software tool • mailbox-based message-passing interface • not replace traditional message-passing • efficient alternative to traditional message-passing

  42. State-Message Semantics • State message • solve single-writer, multiple-reader communication problem • called SMmailboxes • differences of Smmailboxes • associated with writers • only one writer, multiple readers • new message overwrites previous message • reads do not consume messages • non-blocking reads and writes • reduce context switches

  43. Usefulness • message • later one has more recent and up-to-date • one message is be associated with one task writes • reader task always get the most recent message • each time without blocking • valid, up-to-date, useful • single-writer, multiple-reader situation • blocking read operations are still necessary • task must wait for an event to occur • traditional message-passing and/or semaphores

  44. Previous Work • State messages • were used in MARS OS, ERCOS • half-written message problem • solved by using an N-deep circular buffer for each state message • writer : post message pointer • reader : latest message pointer • memory consumption of large N • reduce N to no more than 5~10 for all possible cases

  45. Implementation of State Messages in EMERALDS • Message • maximum number of bytes of CPU operation • B = 4 bytes • message length L • case of L ≤ B is simple • case of L > B • N-deep circular buffer to each state message • each slot in the buffer is L bytes long • index I • Calculating Buffer Depth : N = max(2, xmax+1) • slow readers : use system call

  46. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  47. Memory Protection andSystem Calls

  48. Memory Protection andSystem Calls (cont)

  49. Contents • Introduction • Embedded Application Requirements • Overview of EMERALDS • What Makes EMERALDS Different? • Combined Static/Dynamic Scheduler • Efficient Semaphore Implementation • State Messages for Intertask Communication • Memory Protection and System Calls • Performance Evaluation • Conclusions

  50. Performance Evaluation • implement on the Motorola 68040 processor • 13 kbytes of code size • 25 MHz • 5MHz on-chip timer • port to • PowerPC 505 • Super Hitachi 2 (SH2) • Motorola 68332 microcontroller • evaluated by • Scientific Research Laboratory • Ford Motor Company • focus on basic OS overhead with 9 commercial RTOSs

More Related