640 likes | 978 Views
IRP Voodoo. Adrian J. Oney Kernel Developer Windows Base OS Microsoft Corporation. Overview. Theory Request “Basics” Asynchronous I/O Stacking Building IRPs Miscellaneous. What’s Not Covered As we only have an hour. MDLs Start I/O PnP, Power, WMI Security This is a big one!
E N D
IRP Voodoo Adrian J. Oney Kernel Developer Windows Base OS Microsoft Corporation
Overview • Theory • Request “Basics” • Asynchronous I/O • Stacking • Building IRPs • Miscellaneous
What’s Not CoveredAs we only have an hour • MDLs • Start I/O • PnP, Power, WMI • Security • This is a big one! • See WinHEC 2002’s “Understanding Driver Security” talk
IRP Theory • The OS communicates with drivers by sending I/O Request Packets (IRPs) • IRPs are both: • A container for I/O requests, and • A thread-independent call stack Both definitions will be discussed in detail!
IRPs as a Request ContainerDefinition #1 • Most driver requests are presented using IRPs • IRPs can be processed asynchronously • IRPs can be recalled (i.e. canceled) • IRPs are designed for I/O involving multiple drivers • The IRP structure packages together the data needed to respond to a request • Requests From Kernel Mode • Requests From User Mode
IRP IRPs as a Request Container • IRPs are divided into two pieces • A global header for the main request • An array of parameters for sub-requests (“dispatches”) • A driver uses both pieces to handle the request DEVICE_OBJECT
IRP Header Contents • The IRP header contains • Where to get the input • Where to write the final result • Driver safe locations or copies for each of the above • Scratch space for the current driver owning the request • A driver-supplied callback the OS uses to recall the IRP
IRP Parameters • The IRP header is followed by an array of sub-requests • Each of these sub-requests is represented by an IO_STACK_LOCATION structure • A field in the IRP Header identifies the currently used stack location • Fields in the IO_STACK_LOCATION include • Major and Minor Codes • Major/Minor specific arguments • Device Object • File Object • But why multiple sub-requests?
Thread Independent Call StacksIRP Definition #2 • Doing an operation often requires multiple drivers • Usually arranged in chains called Device Stacks • IRPs are WDM’s solutionto asynchronous I/O • Specifically designed for I/O involving multiple drivers IRP DEVICE_OBJECT DEVICE_OBJECT DEVICE_OBJECT DEVICE_OBJECT
Asynchronous I/O • Goal: Let applications queue one or more requests without blocking • Example: Decompressing video frames from a CD • Two dedicated application threads is not an optimal solution • Thread switches are expensive • Policy needs to be in the driver stack • Only it knows whether a given request should be handled synchronously or asynchronously
Thread Stack Param for A Param for A Return address to App … Param for B Param for B Return address to A … Param for C Param for C Return address to B Synchronous OS design App User Mode Kernel Mode Driver A Driver B Driver C Instruction Pointer Stack Pointer
Thread Stack Param for A Param for A Return address to App … Param for B Param for B Return address to A … Param for C Param for C Return address to B I/O Request Packetsas Thread Independent Call Stacks
Calling Down The IRP StackStep by step • Set up the next location’s parameters • Use IoGetNextIrpStackLocation to get a pointer to the location • Fill in the appropriate parameters • Set a completion routine callback for post-processing if needed • Use IoSetCompletionRoutine. • Call the next driver • IoCallDriver automatically advances the IRP’s stack pointer
Completing a Request • IoCompleteRequest moves the IRP stack pointer upwards • Completion routines get called when the stack pointer arrives back at their location • The current stack location points to the original parameters • Completion Routines are the IRP equivalent of return addresses
Completion Routines • Completion routines can return either • STATUS_MORE_PROCESSING_REQUIRED • This halts the upward movement of the IRP’s stack pointer • STATUS_CONTINUE_COMPLETION • This continues upward completion of the IRP • Actually, returning any value other than STATUS_MORE_PROCESSING_REQUIRED does this • Pseudo-code for some of IoCompleteRequest: while ( Can advance stack location ) { status = CompletionRoutine(…); if (status == STATUS_MORE_PROCESSING_REQUIRED) { return; // Stop upward completion }} // No more stack locations to complete…
Completing I/O Requests • When every sub-request is complete… • The main request is complete too • The status is retrieved from IrpIoStatus.Status • The number of bytes transferred is retrieved from IrpIoStatus.Information • More details later…
Sub-Request Depth • Each IRP is allocated with a fixed number of sub-requests • This typically comes from the StackSize field of the top device object in a stack • Usually the number of device objects in the stack • Implication: Driver’s must allocate a new IRP if they need to forward requests to another stack Device A 3 Device B 2 Device C 1
Synchronous Requests • Most application requests are synchronous • But any I/O can respond asynchronously • Logic for sending a synchronous I/O request could be designed like this: // Register something that will set an event // (Not shown) // Send the IRP downIoCallDriver(nextDevice, Irp); // Wait on some event that will be signaled KeWaitForSingleObject( &event, ... ); // Get the final statusstatus = IrpIoStatus.Status;
Design Problem • KeWaitForSingleObject grabs the Dispatcher Lock • Hottest lock in the operating system • This lock protects the signal state of events, semaphores, mutexes, etc • Also protects the thread scheduler • Except on Win2003 • And the wait usually isn’t needed • Most I/O responses are synchronous anyway!
NT Design Solution • Make IoCallDriver return a status, either • The IRP’s status when the driver completed it, or • STATUS_PENDING • Visualization trick – draw completion routines one slot higher in the IRP: Note: Completion routines shown one slot higher than in real IRP
NTSTATUS and IoCallDriver • To respond synchronously: • Completes IRP with status • Then returns that status • “Here’s what I completed with” • Caller above now has two ways to get result • From completion routine • From unwind of IoCallDriver IrpIoStatus.Status Status = IoCallDriver(…) Note: Completion routines shown one slot higher than in real IRP
STATUS_ERROR STATUS_ERROR STATUS_RETRY STATUS_RETRY STATUS_SUCCESS STATUS_SUCCESS NTSTATUS and IoCallDriver status = IoCallDriver(…); status = IrpIoStatus.Status Driver C IrpIoStatus.Status = STATUS_SUCCESS;IoCompleteRequest(Irp, IO_NO_INCREMENT);return STATUS_SUCCESS; Synchronous completion: right side, then left side Note: Completion routines shown one slot higher than in real IRP
Synchronous Requests • Synchronous case can now avoid the KeWaitForSingleObject call • Only needs to be done if STATUS_PENDING is returned // Register something that will set an event (Not shown) // Send the IRP downstatus = IoCallDriver(nextDevice, Irp); if (status == STATUS_PENDING) { // Wait on some event that will be signaled KeWaitForSingleObject( &event, ... ); // Get the final status status = IrpIoStatus.Status; }
Asynchronous Responses • A driver returns STATUS_PENDING when it responds asynchronously to an I/O Request • Specifically, STATUS_PENDING is returned if • The dispatch routine might unwind before the IRP completes past the driver’s location • The IRP is completed past the driver’s location on another thread • The dispatch routine doesn’t know what the status code will be
IoMarkIrpPending • A driver must call IoMarkIrpPending before it returns STATUS_PENDING • This sets a bit (SL_PENDING_RETURNED) in the current stack location SL_PENDING_RETURNED IoMarkIrpPending(Irp); Note: Completion routines shown one slot higher than in real IRP
IoMarkIrpPending (cont) • Each time an IRP location is completed… • The I/O Manager copies its SL_PENDING_RETURNED bit to the IRP header’s PendingReturned field SL_PENDING_RETURNED Note: Completion routines shown one slot higher than in real IRP
Return Result Rules • The IrpPendingReturned field lets the completion routine know whether the lower driver responded asynchronously • Case 1: Synchronous response • IrpPendingReturned is clear • IoCallDriver does not return STATUS_PENDING • Returns same value completion routine sees in IrpIoStatus.Status IrpPendingReturned = FALSE IoCallDriver returns IrpIoStatus.Status Note: Completion routines shown one slot higher than in real IRP
Return Result Rules (cont) • Case 2: Asynchronous Response • IrpPendingReturned is set • IoCallDriver must return STATUS_PENDING • Completion routine gets the final result from IrpIoStatus.Status IrpPendingReturned = TRUE IoCallDriver returns STATUS_PENDING Note: Completion routines shown one slot higher than in real IRP
Pending Bit Tricks • Our synchronous request code looked like this: KeInitializeEvent(&event, NotificationEvent, FALSE); // Set a completion routine that will catch the IRP IoSetCompletionRoutine(Irp, CatchIrpRoutine, &event, ...); // Send the IRP downstatus = IoCallDriver(nextDevice, Irp); if (status == STATUS_PENDING) { // Wait on some event that will be signaled KeWaitForSingleObject( &event, ... ); // Get the final status status = IrpIoStatus.Status; }
Pending Bit Tricks (cont) • Here’s the completion routine: NTSTATUS CatchIrpRoutine( IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp, IN PKEVENT Event ) { if (IrpPendingReturned) { // Release waiting thread KeSetEvent( Event, IO_NO_INCREMENT, FALSE ); } return STATUS_MORE_PROCESSING_REQUIRED; } • Event is set only if caller will see STATUS_PENDING • This avoids yet another hit on the Dispatcher Lock!
Golden Pending Rule • If a driver returns STATUS_PENDING, it must mark the IRP stack location pending • Similarly, if a driver marks the IRP stack location pending, it must return STATUS_PENDING • This is illegal: IoMarkIrpPending(Irp); If ( Some error condition ) { IrpIoStatus.Status = STATUS_INSUFFICIENT_RESOURCES; IoCompleteRequest(Irp, IO_NO_INCREMENT); return STATUS_INSUFFICIENT_RESOURCES;} Must return STATUS_PENDING
Forwarding Requests • Use IoCopyCurrentIrpStackLocationToNext to forward a request to another driver • The function looks like this: • RtlCopyMemory( IoGetNextIrpStackLocation(Irp), IoGetCurrentIrpStackLocation(Irp), sizeof(IO_STACK_LOCATION));IoSetCompletionRoutine( Irp, ..., NULL, NULL ); • Code doing this by hand often misses the last step • And completion routines get called twice! • Return result of IoCallDriver to next device
Migrating the Pending Bit • Consider what happens if a driver has this code: • If we return STATUS_PENDING, we must mark our stack location pending // Forward request to next driver IoCopyCurrentIrpStackLocationToNext( Irp ); // Send the IRP downstatus = IoCallDriver( nextDevice, Irp ); // Return the lower driver’s status return status;
Migrating the Pending Bit • This trick doesn’t work though • IoMarkIrpPending operates on the current stack location, and there isn’t one! // Forward request to next driver IoCopyCurrentIrpStackLocationToNext( Irp ); // Send the IRP downstatus = IoCallDriver( nextDevice, Irp ); If (status == STATUS_PENDING) { IoMarkIrpPending( Irp ); } // Return the lower driver’s status return status;
Migrating the Pending Bit • This problem can be solved in the completion routine • This “boilerplate” code should only be used when returning the lower driver’s answer • If no completion routine is specified, the OS does this automatically NTSTATUS CompletionRoutine( ... ) { if (IrpPendingReturned) { // We’re returning the lower driver’s result. So if the // lower driver marked its stack location pending, so do we. IoMarkIrpPending( Irp ); } return STATUS_CONTINUE_COMPLETION; }
Forwarding Requests (cont) • If a driver returns the answer of a driver beneath it, and sets a completion routine… • The completion routine mustmigrate the pending bit • The completion routine must not change the status • The dispatch has committed to the lower driver’s answer • The completion routine mustreturn STATUS_CONTINUE_COMPLETION • The IRP must not be completed asynchronously later, as • The lower driver’s response may have been synchronous
Building IRPs • There are two types of IRPs that can be built • “Threaded” IRPs • “Non-Threaded” IRPs • These are sometimes termed synchronous and asynchronous requests • Yes, this is confusing
Building Threaded IRPs • Threaded IRPs are bound to the current thread at build time • They are automatically canceled when their thread is terminated • Caller provides event, buffers for I/O Manager to set • The I/O Manager frees threaded IRP upon final completion • Functions • IoBuildSynchronousFsdRequest • IoBuildDeviceIoControlRequest
Threaded IRP Completion • When every sub-request is complete… • The main request is complete too • The I/O Manager performs post-processing • Copies kernel data to “user mode” pointers • Pointers pass to ZwReadFile, IoBuildDeviceIoControl, etc • Signals supplied event when done IrpUserIosb.Status IrpIoStatus.Status IrpUserIosb.Information IrpIoStatus.Information User Mode Buffers Kernel Mode Snapshots User Mode Event Signal when done
Building Non-Threaded IRPs • Non-threaded IRPs are not associated with any thread • Initiator of request must catch IRP with a completion routine and free it • I/O Manager does not handle final completion for these IRPs • These are really meant for driver to driver communication • Functions • IoBuildAsynchronousFsdRequest • IoAllocateIrp
More Pending Bit Tricks • Clever code can use the pending bit to decide where to do post-processing work • Synchronous completion case: Initiator thread does post-processing • If IoCallDriver doesn’t return STATUS_PENDING • Asynchronous completion case: Completion Routine does post-processing • Only if IrpPendingReturn is TRUE • This design is often more efficient • Less stack space, better cache locality • Windows does this for reads, writes, some IOCTLs • Pending bugs can cause post-processing to occur twice (crash), or never (hang)
Sub-Requestsor Dispatches • Major codes • IRP_MJ_CREATE • IRP_MJ_CLEANUP • IRP_MJ_CLOSE • IRP_MJ_READ • IRP_MJ_WRITE • IRP_MJ_DEVICE_CONTROL • IRP_MJ_INTERNAL_DEVICE_CONTROL • Others not discussed • IRP_MJ_PNP, IRP_MJ_POWER, IRP_MJ_SYSTEM_CONTROL, …
Handle Handle File Objects • File Objects are created when a device is opened • File Objects represent a virtual read/write head for an individual file • File Handles identify File Objects File …010101… …010101 File Object File Object
IRP_MJ_CREATE • This request is sent when a device is opened • A User Mode application calls CreateFile • A kernel mode driver calls ZwCreateFile • The File Object is in the current stack location Flags, Security, Options File Object
IRP_MJ_CREATE • The file object specifies the name of the file to open • FileObjectFileName • The file object contains two pointers for driver use • FileObjectFsContext and FileObjectFsContext2 • In a PnP stack, only the FDO can use these • Filesystem stacks use special functions to share the fields among multiple drivers
IRP_MJ_CLEANUP • Cleanup requests are sent when the last handle to a file object has been closed • A driver should complete any pending IRPs using the specified file object • Releasing the IRPs allows the File Object to be destroyed File Object Handle
IRP_MJ_CLOSE • Close requests are sent when • All handles to a file object have been closed • All references have been dropped
Standard Lifetime • Create - new file object notification • Arrives in the context of the caller • Per-caller information can be recorded • Cleanup - closed file handle notification • For last handle only • Arrives in the context of the caller • Per-process information can be released here • Close - deleted file object notification • Does *not* arrive in the context of the caller!
Handle Handle Create, Close, and Cleanup • Trap: cleanup can appear in a different context than create! • A file handle can be duplicated into another process • say via fork()! • The new handle can be closed last • Drivers don’t get notified about this • Be very careful when using process information! File …010101… File Object
Transferring Data • There are three flavors of I/O in NT • Buffered • Driver works on a safe kernel snapshot of the data • Direct • Driver directly accesses user data via MDLs and kernel pointers • “Neither” • Driver directly accesses user data via user pointer