Commercial Real-time Operating Systems – An Introduction

Commercial Real-time Operating Systems – An Introduction Swaminathan Sivasubramanian Dependable Computing & Networking Laboratory swamis@iastate.edu

outline Outline • Introduction • RTOS Issues and functionalities • LynxOS • QNX/Neutrino • VRTX • VxWorks • Spring Kernel • Distributed RTOS • ARTS • MARS

Commercial and Research RTOS • Commercial RTOSes different from traditional OS – gives more predictability • Used in the following areas such as: • Embedded Systems or Industrial Control Systems • Parallel and Distributed Systems • E.g. LynxOS, VxWorks, pSoS, Spring,ARTS, Maruti, MARS • Traditionally these systems can be classified into a Uniprocessor, Multiprocessor or Distributed Real-Time OS

RTOS – Issues • Real-Time POSIX API standard compliance • Whether pre-emptive fixed-priority scheduling is supported • Support for standard synchronization primitives • Support for light weight real-time threads • APIs used for task-handling • Scalability • Footprint of the kernel – how huge is the kernel? • Can the kernel be scaled down to fit in the ROM of the system?

RTOS – Issues (contd..) • Modularity • How does the functionalities like I/O, file system, networking services behave? • Can they be added at run-time or can they be changed at run-time? • Can a new service be added at run-time? • Type of RTOS kernel • Monolithic kernel – less run-time overhead but not extensible • Microkernel – high run-time overhead but highly extensible

RTOS – Issues (contd..) • Speed and Efficiency • Run-time overhead – most of the modern RTOSes are microkernels, but unlike traditional RTOSes they’ve less overhead • Run-time overhead is decreased by reducing the unnecessary context switch • Important timings such as context switch time, interrupt latency, semaphore latency must be minimum • System Calls • Non preemptable portions of kernel functions necessary for mutual exclusion are highly optimized and made short and deterministic

RTOS – Issues (contd..) • Interrupt Handling • Non preemptable portions of the interrupt handler routines are kept small and deterministic • Interrupt handlers are scheduled and executed at appropriate priority • Scheduling • Type of scheduling supported – RMS or EDF • Number of priority levels supported – 32 to be RT-POSIX compliant; many offer between 128-256 • Type of scheduling for equal priority threads – FIFO or Round-Robin • Can thread priorities be changed at run-time?

RTOS – Issues (contd..) • Priority Inversion Control • Does it support Priority Inheritance or Ceiling protocols for scheduling? • Memory Management • Can provide virtual-to-physical address mapping • Traditionally does not do paging • Networking • Type of networking supported – deterministic network stack or not

Lynx OS • Microkernel design • Means the kernel footprint is small • Only 28 kilobytes in size • The small kernel provides essential services in scheduling, interrupt dispatching and synchronization • The other services are provided by kernel lightweight service modules, called Kernel Plug-Ins (KPIs) • New KPIs can be added to the microkernel and can be configured to support I/O, file systems, TCP/IP, streams and sockets • Can function as a multipurpose UNIX OS

Lynx OS (contd..) • Here KPIs are multi-threaded, which means each KPI can create as many thread as it want • There is no context switch when sending a message to a KPI • For example, when a RFS (Request for Service) message is sent to a File System KPI, this does not request a context switch • Hence run-time overhead is minimum • Further, inter KPI communication incurs minimal overhead with it consuming only very few instructions • Lynx OS is a self hosted system – wherein development can be done in the same sytem

Lynx OS (contd..) • In such a system, there is a need for protecting the OS from such huge memory consuming applications (compilers, debuggers) • LynxOS offers memory protection through hardware MMUs • Applications make I/O requests to I/O system through system calls • Kernel directs I/O request to the device driver • Each device driver has an interrupt handler and kernel thread

Lynx OS (contd..) • The interrupt handler carries the first step of interrupt handling • If it does not complete the processing, it sets an asynchronous trap to the kernel • Later, when kernel can respond to the software interrupt, it schedules an instance of the kernel thread to complete the interrupt processing

QNX/ Neutrino • SMP RTOS – requires high end, networked SMP machines with GBs of physical memory • Microkernel design – kernel provides essential threads and real-time services • Other services are considered as resource managers and can be added or removed at run-time • The footprint of microkernel is 12kb.

QNX/ Neutrino (contd..) • QNX is a message passing operating system • Messages are basic means of interprocess communication among all threads • Follows a message based priority tracking feature • Messages are delivered at the priority order and the service provider executes at the priority of the highest priority clients waiting for service • So, if the highest priority task wants to do read some data from file, the file system resource manager will execute at this task’s priority

QNX/ Neutrino (contd..) • When a service provider thread wants to provide service, then it creates a channel (for exchanging messages) with its service identifier for identification • To get a service from a provider, the client thread attaches it to the provider’s channel • Within the client, this connection is directly mapped to the file descriptor (so RFS can be sent directly to the file descriptor) • QNX messages are blocking unlike POSIX message standards

VRTX • VRTX has two multitasking kernels • VRTXsa • designed for performance • Provides priority inheritance, POSIX compliant libraries • Supports multiprocessing • System calls fully preemptable and deterministic • VRTXmc • designed for low power consumption • Used for cellular phones and hand-held devices • Rather than providing optional components provides hooks for extensibility – application can add its own system calls

VxWorks • Monolithic Kernel • Leads to an improved performance with less run-time overhead • However the scalability is poor I.e. the footprint of the kernel is affected a little. • Provides interfaces specified by RT-POSIX standards in addition to its own APIs • Though not a multiprocessor OS, provides shared-memory objects: shared binary and counting semaphores • It has the standard MMU as a modern OS • Provides basic virtual-to-physical memory mapping • Allows to add new mappings and make portions of memory non cacheable

VxWorks (contd..) • When memory boards are added dynamically, to increase the address space for interprocess communication • The data is made non cacheable, to ensure cache consistency • Reduced Context Switch time • Saves only those register windows that are actually in use (on a Sparc) • When a task’s context is restored, only the relevant register window is restored • To increase response time, it saves the register windows in a register cache – useful for recurring tasks

Spring Kernel • Goal – development of dynamic, distributed real-time system • System is a network of multiprocessors, each multiprocessor containing one or more processors, I/O subsystems • I/O subsystem is a separate entity from Spring kernel, handling non-critical I/O, slow I/O devices and fast sensors • Design Principle – Segmentation & Reflection

Spring Kernel (contd..) • Segmentation • dividing resources of the systems into units • Size of unit depends on application requirements • Helps in determining the resource constraints of online scheduling algorithms • Reflection • Concept of reasoning its own state and its environments • Required for handling situations in highly dynamic environments (where handcrafting is infeasible)

Spring Kernel (contd..) • Scheduling – consists of 4 modules • Process-resident dispatcher – simply removes the task from Global System Task Table (GSTT) • Local Scheduler (per processor) – responsible for locally guaranteeing that a new task can make its deadline and for ordering processor specific tasks in STT • Global Scheduler – finds a site for execution for any task that cannot be locally guaranteed • Meta Level Controller – can adapt various parameters by noticing significant changes

Spring Kernel (contd..) • Memory Management • OS is core-resident • No dynamic memory allocation to eliminate large and unpredictable delays (due to page faults and page replacements) • Kernel pre-allocates a fixed number of instances of the some of kernel data structures • Tasks are accepted dynamically if the necessary data structures are available • Inter-Process Communication • Mailboxes and communication primitives are used for communication • No need for semaphores since mutual exclusion is taken care in scheduling

ARTS - Distributed OS • Distributed real-time OS – provides a predictable distributed real-time computing environment • Distributed computing environment • Heterogeneous computing environment • Need for global view of the system and resources • No over-utilization and under-utilization of a particular system in a distributed system • Guaranteeing predictability in such a system is difficult than in multiprocessor system case

ARTS (Contd..) • How to synchronize the clocks in a distributed system? • Scheduling • Integrated time-driven scheduler • ITDS scheduler provides an interface between the scheduling policies and the rest of the operating system • Allows different scheduling policies to exist (though only one can be used at a time) • Communication scheduling • Extended RMS for communication scheduling – integrating message and processor scheduling

MARS – Distributed RTOS • Maintainable Real-Time System (MARS) –focuses on fault tolerance in distributed RTOS • Objective • To provide guaranteed timely response under peak load conditions • To support real-time testability by breaking up the system into subsystem • Time Driven System – system initiates activities at pre-determined times • Better performance than event driven systems • Control signals are based on the physical time, hence in presence of a global physical time – no need for control signals across subsystem interfaces

MARS (Contd..) • System Architecture • MARS application consists of a set of clusters (autonomous subsystems), several components of a cluster connected by a real-time bus • Each component runs an identical copy of the operating system • Different clusters are connected through an inter-cluster interface, forming a network • Cluster consists of Fault-Tolerant Units (FTUs) consisting of replicated components providing redundancy • Shadow components update their own internal state and monitor the operation of active components • Shadow becomes active, when active fails • Each message is also sent twice on real-time bus

MARS (Contd..) • Fault Tolerance • Addresses both transient and permanent faults • Messages have checksums and h/w comp. are self-checking • Uses robust storage structures • Application software detects errors by executing each task twice (catching transient faults) • MARS is fail silent – component is turned on detecting first error to avoid fault propagation • Upon detection – shadow component takes over the final one

MARS (Contd..) • Tasks and messages • Tasks (periodic and aperiodic) are scheduled by static scheduling schemes • Hard real-time tasks are run at specific intervals that are known during system initialization • Soft real-time tasks are run at intervals not used by hard real-time tasks • Communication through message passing – also uses state messages (produced periodically at predetermined times), conveying state of the system • To avoid unpredictable delays in CSMA/CD protocols, MARS uses a TDMA protocol to provide collision-free access to Ethernet (atmost one hard RT message for each slot – remaining for soft RT messages)

MARS (Contd..) • MARS uses only one kind of interrupts – periodic clock interrupt. • Interaction with peripherals is through polling • Scheduling • Scheduling done offline • Assumes that the running task will yield the CPU at the end of its quantum • Task switching is done by major handler every 8 milliseconds • Change can be triggered by invoking a system call or receiving an appropriate message.

Commercial Real-time Operating Systems – An Introduction