260 likes | 370 Views
Why can’t we do ‘raw’ I/O?. How the x86 stops user-programs from directly controlling devices, and how we devise a ‘workaround’. x86 Privilege Levels.
E N D
Why can’t we do ‘raw’ I/O? How the x86 stops user-programs from directly controlling devices, and how we devise a ‘workaround’
x86 Privilege Levels • For multiple users doing multiple tasks in a manner that affords each some ‘protection’ against inteference by others, any modern CPU will implement two or more separate levels of ‘privilege’ for its operations -- an ‘unrestricted privileges’ arena for the code in its Master Control Program (its ‘kernel’), and a ‘restricted privileges’ realm for code in users’ application programs
Four Privilege Rings Ring 3 Least-trusted level Ring 2 Ring 1 Ring 0 Most-trusted level
Suggested purposes Ring0: operating system kernel Ring1: operating system services Ring2: custom extensions Ring3: ordinary user applications
Unix/Linux and Windows Ring0: operating system Ring1: unused Ring2: unused Ring3: application programs
IOPL • The Intel x86 processor includes a way to either allow or prohibit accesses to system peripheral devices by code that executes in the various ‘privilege rings’, by utilizing a 2-bit field within the x86 FLAGS register which controls whether or not ‘in’ and ‘out’ are allowed to execute – the field is known as the I/O Privilege Level field, and Linux normally sets its value to be zero
The x86 API registers RAX RSP R8 R12 RBX RBP R9 R13 RCX RSI R10 R14 RDX RDI R11 R15 CS DS ES FS GS SS RIP RFLAGS Intel Core-2 Quad processor
The FLAGS register Status-flags 13 12 0 N T IOPL O F D F I F T F S F Z F 0 A F 0 P F 1 C F Control-flags Legend: ZF = Zero Flag SF = Sign Flag IOPL = I/O Privilege Level CF = Carry Flag NT = Nested Task PF = Parity Flag TF = Trap Flag OF = Overflow Flag IF = Interrupt Flag AF = Auxiliary Flag DF = Direction Flag
‘seeflags.cpp’ • This demo-program allows us to view the settings of bits in the RFLAGS register – and the IOPL-field in particular (bits 13,12) • When IOPL == 0, only ring0 code will be able to execute ‘in’ and ‘out’ instructions • When IOPL == 3, then code executing in any of the rings will be able to execute I/O • So – let’s change IOPL to 3 – but how?
‘pushfq’/’popfq’ • An idea suggested by the ‘inline’ assembly language in our ‘seeflags.cpp’ demo would be to just ‘pop’ a suitably designed value from the stack into the RFLAGS register • But the CPU is not about to allow that if it’s currently executing ring3 code while IOPL is set to 0 – that would compromise the system’s intended ‘protection’
Must do it from ring0! • Our classroom’s Linux systems will allow us to install our own code-module, as an ‘add-on’ to the running kernel, and such code could therefore be executed without any restrictions (i.e., at ring0) • This idea motivates us to explore briefly the programming ideas needed for writing our own LKM (Linux Kernel Module)
A module’s organization my_info The module’s ‘payload’ function module_init The module’s two required administrative functions module_exit
Our ‘newproc.cpp’ utility • The type of LKM that creates a pseudo-file in the ‘/proc’ directory, there is a ‘skeleton’ of C-language code we can start from, and then add our own specific functionality to that skeleton-code • You can quickly create this ‘skeleton’ file by using our ‘newproc.cpp’ utility-program
Software interrupts • One way a user-program, which normally executes in ring3, to switch to ring0 (if it’s allowed) is by using a ‘software interrupt’ • This is how the 32-bit version of Linux did its various system-calls, with ‘int $0x80’ • We can craft an LKM whose ‘payload’ is an interrupt service routine that would be able to change the IOPL from 0 to 3
Systems programming • To accomplish this design-idea, we’ll need an understanding of our CPU’s interrupt mechanism, including some special data-structures located in kernel memory and some special CPU registers which allow the CPU to locate those data-structures
Descriptor Tables Special processor registers used by CPU for locating its Descriptor Tables within the system’s memory Interrupt Descriptor Table (256 Gate Descriptors) IDT Global Descriptor Table (Segment Descriptors) GDT GDTR IDTR
IDT Descriptor-format 32-bits reserved (=0) 3 2 1 0 offset 63..32 offset 31..16 P D P L 0 gate type 00000 I S T segment selector offset 15..0 LEGEND: segment-selector (for the handler’s code-segment) offset within code-segment to handler’s entry-point gate-type (0xE = Interrupt Gate, 0xF = Trap Gate) IST = Interrupt Stack Table (0..7) P = Present (1 = yes, 0 = no)
IDTR register-format 80-bits Base-Address of the IDT segment (64-bits) segment limit IDTR: Special processor instructions are used to ‘load’ this 10-byte register from a memory-image (‘LIDT’), or to ‘store’ this register’s value (‘SIDT’) The ‘LIDT’ instruction can only be executed by code running in Ring0, but the ‘SIDT’ can be executed by code running at any privilege level.
Stack layout after an interrupt 64-bits 32(%rsp) 24(%rsp) 16(%rsp) 8(%rsp) 0(%rsp) SS RSP RFLAGS CS RIP RSP0 Ring0 stack
Our interrupt-9 handler Our ‘iokludge.c’ kernel module uses this ‘inline’ assembly language to generate the machine-code for handling an interrupt-9, which merely sets the IOPL-field (in the saved image of the RFLAGS register) to 3, and then resumes execution of the interrupted application program. //-------------------- INTERRUPT SERVICE ROUTINE ----------------- void isr_entry( void ); asm(“ .text “); asm(“ .type isr_entry, @function “); asm(“isr_entry: “); asm( orq $0x3000, 16(%rsp) “); asm( iretq “); //--------------------------------------------------------------------------------------
Core-2 Quad system system memory Intel Core-2 Quad processor CPU 0 CPU 1 CPU 2 CPU 3 system bus I/O I/O I/O I/O I/O
‘smp_call_function()’ • This Linux kernel ‘helper’ routine allows a CPU to request all other CPUs to execute a specified subroutine of type: void function( void *info ); • In our current Linux kernel (vers. 2.6.26.6) this helper-routine takes four arguments: • The address of the subroutine’s entry-point • The address of data the subroutine needs • A flag that indicates whether or not to ‘retry’ • A flag that indicates whether or not to ‘wait’ • (Note: Newer kernels omit the ‘retry’ argument)
Working with LKM’s • Create an LKM skeleton using ‘newproc’ • Compile an new LKM using ‘mmake’ • Install an LKM’s compiled ‘kernel object’ using the Linux ‘/sbin/insmod’ command • Remove an LKM from the running kernel using the Linux ‘/sbin/rmmod’ command
‘iokludge.c’ module_init: 1) Allocate a kernel memory page, to be used as a new Interrupt Descriptor Table 2) Save original contents of system register IDTR, so it can be restored later 3) Prepare a memory-image for the new value of register IDTR, referring to kpage 4) Setup pointers ‘oldidt’ and ‘newidt’ and copy the original IDT to our new page 5) Setup a Gate-Descriptor, to be installed as Gate 9 in our new IDT array 6) Activate the new Interrupt Descriptor Table on all the processors in our system 7) Return 0, to indicate a successful module-installation module_exit: 1) Restore the original value to register IDTR in each of our system’s processors 2) Free the page of kernel memory that was previously allocated for use as an IDT
‘tryiopl3.cpp’ • This demo-program is a modification of our earlier ‘seeflags.cpp’ example – but here we included the software interrupt instruction ‘int $9’ which, if ‘iokludge.ko’ has been installed, will allow us to check that indeed the RFLAGS register’s IOPL has been changed from 0 to 3 – thereby permitting ‘in’ and ‘out’ to be executed!
Homework exercise • Modify the ‘82573pci.cpp’ program that we weren’t able to execute, even with ‘sudo’, at our previous class meeting, replacing its call to Linux’s ‘iopl()’ library-function by the ‘inline’ assembly language statement for software interrupt 9, i.e. asm(“ int $9 “); • Then try again to compile and execute our ‘82573.cpp’ demo-program, only this time with our ‘iokludge.ko’ LKM installed