410 likes | 932 Views
The Windows Operating System. Goals. Hardware-portable Used to support MIPS, PowerPC and Alpha Currently supports x86, ia64, and amd64 Multiple vendors build hardware Software-portable POSIX, OS2, and Win32 subsystems OS2 is dead POSIX is still supported—separate product
E N D
Goals • Hardware-portable • Used to support MIPS, PowerPC and Alpha • Currently supports x86, ia64, and amd64 • Multiple vendors build hardware • Software-portable • POSIX, OS2, and Win32 subsystems • OS2 is dead • POSIX is still supported—separate product • Lots of Win32 software out there in the world
Goals • High performance • Anticipated PC speeds approaching minicomputers and mainframes • Async IO model is standard • Support for large physical memories • SMP was an early design goal • Designed to support multi-threaded processes • Kernel has to be reentrant
Process Model • Threads and processes are distinct • Process: • Address space • Handle table (Handles => file descriptors) • Process default security token • Thread: • Execution Context • Optional thread-specific security token
Tokens • “Who you are”—list of identities • Each identity is a SID • Also contains Privileges • Shutdown, Load drivers, Backup, Debug… • Can be passed through LPC ports and named pipe requests • Server side can use this to selectively impersonate the client.
Object Manager • Uniform interface to kernel mode objects. • Handles are 32bit opaque integers • Per-process handle table maps handles to objects and permissions on the objects • Implements refcount GC • Pointer count—total number of references • Handle count—number of open handles
Object Manager • Implements an object namespace • Win32 objects are under \BaseNamedObjects • Devices under \Device • This includes filesystems • Drive letters are symbolic links • \??\C: => the appropriate filesystem device • Some things have other names • Processes and threads are opened by specifying a CID: (Process.Thread)
Standard operations on handles • CloseHandle() • DuplicateHandle() • Takes source and destination process • Very useful for servers • WaitForSingleObject(), WaitForMultipleObjects() • Wait for something to happen • Can wait on up to 64 handles at once
Security Descriptors • Each object has a Security Descriptor • Owner—special SID, CREATOR_OWNER • Group—special SID, CREATOR_GROUP • DACL • Discretionary Access Control List • List of SIDs and granted or denied access rights • SACL • System Access Control List • List of SIDs and access rights to be audited
Access Rights typedef struct _ACCESS_MASK { USHORT SpecificRights; UCHAR StandardRights; UCHAR AccessSystemAcl : 1; UCHAR Reserved : 3; UCHAR GenericAll : 1; UCHAR GenericExecute : 1; UCHAR GenericWrite : 1; UCHAR GenericRead : 1; } ACCESS_MASK;
Security Use • Objects are referred to via handles • Security checks occur when an object is opened • Open requests contain a mask of requested access rights • If granted to the token by the DACL, the handle contains those access rights • Access rights are checked on use • Just a bit test—very fast
Object Open evt = OpenEvent(EVENT_MODIFY_STATE, FALSE, "SomeName"); • Finds the event object by name • Walks the DACL, looking for token SIDs • Keeps looking until all permissions are granted • If access is granted, inserts a handle to the object into the process’s handle table, with EVENT_MODIFY_STATE access
Object Use SetEvent(evt); • SetEvent() requires EVENT_MODIFY_STATE access, and an event object. • The kernel looks up the handle in the process’s handle table. • Checks to make sure that it maps to an event object, and that the granted access bits contain the EVENT_MODIFY_STATE bit. • If all is good, the event is set.
Object Use WaitForSingleObject(evt) • WaitForSingleObject() requires a synchronization object (like an event) and SYNCHRONIZE access. • evt maps to an event object • SYNCHRONIZE access was not requested when the handle was inserted. • Even if the DACL permits it, the wait fails.
Types of Objects • Events • State is set or clear. • Can clear when a wait completes (auto-reset) • Mutexes • Can be acquired by a single thread at a time. • Automatically release when owner exits. • Semaphores • Maintain a count • Waits decrement the count
More objects • Threads, Processes, Timers—like events • Registry Keys • Manipulate data in the registry—centralized store of system configuration info. • LPC Ports • Fast local RPC • Security tokens can transfer over LPC calls • Files
Files & IO • File objects maintain a current offset, and a pointer to the underlying stream. • Default internal model is asynchronous • Synchronous IO just waits for the IO to complete • Async IO can set an event, or run a callback in the thread which queued the IO, or post a message to an IO completion port. • Each request is an IRP
IRPs • Maintain state of IO requests, independent of the thread working on the IO • IRPs are handed off through the device stack to their destinations • Threads process IRPs • Initiating thread processes the IRP until a device returns STATUS_PENDING • Subsequent processing can be done in kernel worker threads
Interrupts IRQL—Interrupt Request Level: 0 => PASSIVE_LEVEL Processor is running threads All usermode code is at IRQL 0 1 => APC_LEVEL; threads, APCs disabled 2 => DISPATCH_LEVEL • Running as the processor: can’t stop! • Can’t take a page fault • Only locks available are KSPIN_LOCKs
Interupts 3-26 => Device Interrupt Service Routines • Device interrupts are mapped to an IRQL and an interrupt service routine; ISR is called at that IRQL 27 => PROFILE_LEVEL—profiling 28 => CLOCK2_LEVEL—clock interrupt 29 => IPI_LEVEL—interprocessor interrupt • Requests another processor to do something 30 => POWER_LEVEL—power failure 31 => HIGH_LEVEL—interrupts disabled
Interrupts • Hardware signals an interrupt • Interrupt’s ISR runs at device IRQL • Has to be fast; get off the processor and allow other ISRs to run • Typically queues a DPC, acknowledges the interrupt, and returns • DPC—Delayed Procedure Call • Further processing at DISPATCH_LEVEL • Queues work to kernel worker threads
IO Completion • Driver calls IO Manager to complete the IRP • IO Manager queues a kernel mode APC to the initiating thread • APC: Asynchronous Procedure Call • Kernel mode APC preempts thread execution • Writes data back to user mode in the context of the thread which initiated the IO • Signals completion of the IO
IO Cache • Classic: block cache • Page mappings translate directly to blocks on the underlying partition. • Windows: stream cache • Page mappings are offsets within a stream. • IO Cache Manager uses the same mappings. • All cache management (trimming) is centralized in the memory manager • All modifications show up in mapped views.
Virtual Memory • Sections—another object type • Can be created to map a file • Can also be created off the pagefile • Optionally named, for shared memory • Reservation • Range of VA which will not be handed out for some other purpose • Committed • VA which actually maps to something
Aside: CreateProcess • Just a user mode Win32 API { NtCreateFile(&file, szImage); NtCreateSection(&sec, file); NtCreateProcess(&proc, sec); NtCreateThread(&thrd, proc);} WaitForSingleObject(proc);
Virtual Memory • Memory Manager maintains processor-specific page table entry mappings. • Some parts of the address space are shared between processes—for instance, the kernel’s address space and the per-session space. • On a pagefault, mm reads in the data • Pages can be mapped without the appropriate access… what to do?
Signals • With threads, signals don’t work very well. • Some software designs expect to touch inaccessible memory. • Large structured files • Concurrent garbage collection • SLists • Single global handler has to somehow know about all possible situations.
Structured Exception Handling • Exceptions unwind the stack • Almost like C++! • C++ matches against a type hierarchy • SEH calls exception filter code—filters are Turing-complete. • Two ways to deal with exceptions: • try/finally • try/except
try/finally res = AllocateSomeResource(); try { SomeOperation(res); } finally { if (AbnormalTermination()) { FreeSomeResource(res); } } return res;
try/except try { SomeOperationWhichMayAV(); } except (Filter( GetExceptionCode(), GetExceptionInformation())) { DoSomethingElse(); }
try/except • GetExceptionCode() • A code indicating the cause of the exception • GetExceptionInformation() • Additional code-specific info • The full processor context • Filter decides what to do • EXCEPTION_EXECUTE_HANDLER • EXCEPTION_CONTINUE_SEARCH • EXCEPTION_CONTINUE_EXECUTION
Structured Exception Handling • On x86, TEB points to stack of EXCEPTION_REGISTRATION_RECORD • auto structs, pointing to handler code • pushed by function prolog • popped by function epilog • On exception, RtlDispatchException() walks the list. • Runs the filters to figure out what to do • Calls handler functions
Structured Exception Handling • On x86, there’s some overhead with pushing and popping the registration record • On ia64, there is no overhead • Stack traces are reliable • It’s always possible to look up the handler • Exception handling is very slow • Especially on ia64 • Used only for truly exceptional conditions
Structured Exception Handling • Used in kernel mode too! • Most user mode access will just work • Still need to validate address ranges & data • Works great for SMP when another thread might be in the middle of modifying the address space • Expected read exceptions are returned as status codes from system calls • Expected writes are returned as SUCCESS • Unexpected => buggy kernel => blue screen
Top-level Exception Filter • Top frame on each thread defines a catchall exception filter • Top-level exception filter: • Notifies the debugger (if being debugged) • Launches a just-in-time debugger (if set up) • Loads faultrep.dll to report the failure
Faultrep.dll • faultrep.dll offers to report the failure back to Microsoft • We analyze the failures • A significant number are recognized instantly; we can tell the user what happened and how to fix it. • The others go through the standard triage process; developers analyze the dumps and figure out what happened.
OCA • 67 million machines running XP • Tens of thousands of drivers • Over 100 drivers on any given machine • One bug in one driver => Crash • A significant number of crashes come from third-party drivers (some of which ship on the CD) • Lots of different problems, though
Driver Verifier • Controlled by verifier.exe • Special-pool’s allocations • Detects allocation overruns & use after free • Validates some behaviors • IRQL—touching paged memory? • DMA buffers • Can inject failures—useful for testing behavior under sub-optimal conditions
Stress • Every night, a couple hundred machines run stress on the latest build • Stress exercises filesystems, memory, GUI, scheduler, &c, trying to uncover low-memory handling problems and race conditions • Every morning, the stress test team triages failed machines • Developers debug the failures