470 likes | 964 Views
Windows CE Real-Time Performance Architecture. John Hatch Program Manager for CE Kernel Microsoft Corporation. Agenda. Real-Time Overivew Interrupt Model Features Taking Control Measurement Tools. Agenda. Real-Time Overview Interrupt Model Features Taking Control Measurement Tools.
E N D
Windows CE Real-Time Performance Architecture John HatchProgram Manager for CE KernelMicrosoft Corporation
Agenda • Real-Time Overivew • Interrupt Model • Features • Taking Control • Measurement Tools
Agenda • Real-Time Overview • Interrupt Model • Features • Taking Control • Measurement Tools
Real-Time Overview • Real time • Applications where specific timings are requested • Hard real time • Applications where system fails if timings are not met • Soft real time • Applications where system tolerates large latencies • Actual timing requirements are system-specific
100 ms 20 ms 10 ms Cycle Time 5 ms 1 ms 500 us Real Time Defined By OMAC OMAC represents Industrial Automation Community Hard Real-Time Windows CE 2.X Windows NT Soft Real-Time 90% Apps Hard Real Time Windows CE .net 0 100 µs 1,000 µs 5,000 µs 10,000 µs Cycle Variation or Jitter (µs)
Real World Example • Consumers wanted to know if CE is HARD real-time • Want to know if CE was capable of running radio and UI • Concerned that CE was not HARD real-time enough to meet the requirements • Requirements • Run cellular radio DSP • Meet “tight” timing requirements • ARM9 250Mhz • Full Windows CE UI • And play video
0.5 ms Jitter Actual Application Requirements Interrupt every 4.6 ms Real World Timing Requirements • So what where the actually requirements? • Interrupt every 4.6 ms • Allowable jitter < 0.5ms
Windows CE Test Results • Respond time test using the following configuration • Samsung SMDK2410 development board • 200 mHz ARM with 16x16 cache • Windows CE 5.0 with full UI • Running a WMV video Windows CE Real-Time Test Results Time in microseconds (µs)
What We Learned • In terms of the 0.5 ms jitter alone • CE’s longest ISR response time was 13.3 µs • 2.6% of max allowed • CE’s longest IST response time was 103 µs • 20.6% of max allowed • Conclusion • CE’s response time was well within the requirements • Project went ahead and is progressing well
Agenda • Real-Time Overview • Interrupt Model • Features • Taking Control • Measurement Tools
Definitions • Interrupt • Hardware signal indicating an event has happened and needs to be serviced • Latency • The time from when the interrupt occurred to when the event is serviced • Jitter • Range of allowable variation in service time
Threads, Process, And Drivers • Thread • A unit of execution • A piece of code that can be scheduled to run by the kernel • May be launch by a process or a driver • Process • A collection of threads with a common execution environment • A process has at least on thread • Launch from an executable file • Can create threads to handle interrupts • Driver • A DLL, (dynamically loaded library) loaded into the device manager process • Supports the Device I/O Control Interface • Can create threads to handle interrupts
ISRs And ISTs • Interrupt Service Routine (ISR) • A piece of code loaded into the kernel • Assigned to a particular IRQ • Called immediately to handle the hardware interrupt • Should be written to run quickly with few outside dependencies • Can be chained together if multiple device might use the same IRQ • Notifies the kernel which IST should run • Interrupt Service Thread (IST) • A thread registered to handle an interrupt • Can be created by either a process or a driver • Scheduled like any other thread on the system • Should be written to do the bulk of the interrupt handling work
ISRs And ISTs Work Together • ISRs and ISTs usually work as pairs • ISR handles the critical work • IST handles the bulk of the work • They synchronize by using an Event Object • The IST creates an Event Object • Uses the API WaitForSingleObject to sit and wait on that object to be signaled • The ISR tells the kernel which object to signal • Which unblocks the IST and makes it runable • If the IST is the highest priority runable thread, it will get scheduled to run immediately
Priority Levels • Windows CE 5.0 has 256 levels of priority • Level 0 is the highest and 255 is the lowest • The old CE model of 8 levels now map to the lowest 8 of the new model • The default level for a thread is 252 • Levels 0 through 248 can be reserved by OEM
Scheduler • Is responsible for determining which thread will run • Has a queue for threads for each priority level • Will always schedule the first thread at the highest priority level • A thread gets to run for set length of time, called a quantum • Typically 100 milliseconds • A quantum of 0 means the quantum never runs out • The thread can run until blocked or interrupted • A Thread runs until— • Its quantum runs out • It is interrupted by a higher priority thread • Its blocked by a resource contention • Such as access to a critical section or a mutex
Fitting It All Together Interrupt Occurs Interrupt Handler calls registered ISR ISR runs, tells kernel which event to signal Scheduler runsthe IST IST runs and resets the interrupt Kernel signals event, IST becomes runnable
Interrupt Architecture IST Latency Thread IST OAL ISR Latency ISR ID Kernel ISH KCall + Scheduler (SetEvent) HW All Except ID All All Higher-Priority Int. Enabled
Latency Behavior Maximum ISR Latency Path Maximum IST Latency Path
Where Latency Occurs • For an ISR • Time required for the kernel to vector to the ISR handler (normal) • Saving register, etc. • The amount of time that interrupts are turned off (variation) • For an IST • Time to schedule a thread (normal) • Time spent in a KCall (variation) • KCall = Kernel code executing with pre-emption disabled
Worst Case IST Latency • General case • In the thread scheduler KCall and take an IRQ that will trigger a different IST • Software assisted TLB/cache miss on the IST thread
Improvements To Latency • Non-preemptable code reduced • Large Kcalls split apart and state saved to resume correctly • Reduces the latency for an IST • Kernel data structures moved to statically mapped virtual address • This avoids any TLB misses associated with accessing its data • Special-cased ISTs • An event registering for an IST can only be used in a WaitForSingleObject • New priority inversion model reduces the upper bounds • Was a large KCall
Agenda • Real-Time Overview • Interrupt Model • Features • Taking Control • Measurement Tools
Nested Interrupts • Higher priority ISRs can preempt lower ISRs • Based on support by the CPU, additional hardware, and/or OEM code • ARM • Uses a vectored interrupt table • Single CPU interrupt level with an Interrupt register • No built in concept of priority IRQ • Except FIQ • Interrupts are not turned on before entering ISR • OEM can re-enable CPU interrupt • OEMs can prioritize the interrupts with bit masks to turn on and off the different interrupts
Shared Interrupts • The hardware design might attach several devices to the same interrupt line • Multiple ISRs can be chained together to handle shared interrupts • Each ISR in turn determines if it can handle the interrupt • If it can, it does its work and either completes the interrupt or the SYSINTR indicating which IST is to run • If not, it returns SYSINTR_CHAIN indicating the kernel should try the next ISR in the chain
Priority Inheritance • Higher priority threads can get stuck waiting for a lower priority thread to release a resource • Such as a critical section, semaphore, or mutex • Cause priority inversion • Kernel detects priority inversion and handles it with priority inheritance, or boosting • The lower priority thread inherits the higher priority thread’s priority • Its quantum is set to 0, which lets it run to completion • Supports only one level of inheritance • Kernel will only boost one thread • If the boosted thread is also in turn block by a third thread, the thread third is not boosted
Thread Quantum • Per thread quantum • Default set by the OEM in the OAL • dwDefaultThreadQuantum • APIs to set Quantum • Ce(Set/Get)ThreadQuantum • Quantum of 0 sets thread to run-to-completion • At any priority • Preempted only by higher priority threads
System Tick • 1 ms timer tick in normal mode • Tick interrupt causes a reschedule • Will run next highest priority runnable thread • Sleep(N) will generally wake up in N to N + 1 ms • In Idle mode system tick is reset to next scheduled event • On system tick check for reschedule or nop
Full Kernel Mode • All threads are running in kernel mode • Security checks are disabled • No need to call SetKMode • Entire system is open to all processes • All statically mapped virtual addresses • Virtual protection is still in place • Optimizations for high traffic function • For example a router network box
Agenda • Real-Time Overview • Interrupt Model • Features • Taking Control • Measurement Tools
Taking Control • Real-time developers want to retain control at all times • Control of the schedule • Control is managed by understanding— • The hardware • The OS • Writing code to make optimal use of both features is key to real-time performance
Understanding The Hardware • Accessing hardware can delay ISRs and ISTs • Same CPUs on different boards can produce a wide range of results • Devices and associated drivers can produce a wide range of delays
Understand The Hardware • Understand device access • I/O-based access may incur a penalty • Certain devices can lock out a bus for many microseconds • For example on x86 avoid access to the CMOS RTC • Use a software RTC
Understanding The OS • Priority based preemptive thread scheduler • Virtual memory system • Provides protection • There is some overhead • Synchronization Objects • Critical Sections, Mutexs, Semaphores, MSQueues • Can cause your thread to block • System call interactions • Demand paging of non-XIP code • Stack memory reclaiming • Can delay thread execution • Going Idle can delay threads
Gaining Control • Separate User Interface operations from Real-time threads • Keeping UI calls out of the real-time threads prevents them from being blocked by the UI • User Interface involves many interactions across the OS • It can block threads • Performance of UI threads is affected by all UI applications • Use shared buffers or MSQueues to communicate between UI and RT threads
Gaining Control • Memory and objects • Preallocate all memory • Preallocate all threads, sync objects • Thread scheduling • Set the appropriate priority • Set the appropriate quantum • Use a Quantum of 0 to ‘run-to-completion’ • Use DisableThreadLibraryCalls • Prevent thread notifications to DLLs
Gaining Control • Avoid making system calls on your real-time thread • Don’t use SetTimer as a real-time timer • Avoid priority inversion conditions • Use Event tracking/Kernel tracker • Use dwNKMaxPrioNoScav to prevent stack space recovery from real-time threads • Trusted Security model and real-time performance do not mix • Security checks slow down untrusted applications • Launch RT threads from a Trusted process or driver
Gaining Control • Disable Idle processing • When OS calls OEMIdle return immediately instead of sleeping the device • Disable demand paging • LoadDriver • Locks in a single DLL • Set configuration in CONFIG.BIB ROMFLAGS • Set to 0x0001 • Locks in all modules • File system block driver can disallow • Don’t set the flag DISK_INFO_FLAG_PAGEABLE
Agenda • Real-Time Overview • Interrupt Model • Features • Taking Control • Measurement Tools
ILTiming • ILTiming • Software-based real-time measurement tool • Measures both ISR and IST latencies • ISR latency • From IRQ to ISR • IST latency • From the end of the ISR to the start of the IST • Enabled for all sample platforms • Varying system loads
OSBench • Scheduler performance-timing tests • Enables you to determine how long it takes to perform a basic kernel tasks such as— • Acquire or release a critical section • Wait or signal an event • Create a semaphore or mutex • Yield a thread • Call system APIs
Kernel Tracker Shows interaction between processes, threads, and interrupts Track interrupts TLB misses Priority inversion Thread state such as running, blocked, sleeping, and migrating
Summary • Windows CE is real-time • Windows CE provides all the functionality needed to qualify as a real-time operating system • Windows CE provides tools to optimize your real-time platform
Real World ZMP Nuvo Robot
Real World • KUKA Roboter • Launching CeWin to help customers build blended real-time solutions based on Windows XP using Windows CE as the real-time scheduler
John.Hatch @ Microsoft.com © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.