200 likes | 351 Views
Page-Faults in Linux. How can we study the handling of page-fault exceptions?. Why page-faults happen. Trying to access a virtual memory-address Instruction-operand / instruction-address Read-data/write-data, or fetch-instruction Maybe page is ‘not present’ Maybe page is ‘not readable’
E N D
Page-Faults in Linux How can we study the handling of page-fault exceptions?
Why page-faults happen • Trying to access a virtual memory-address • Instruction-operand / instruction-address • Read-data/write-data, or fetch-instruction • Maybe page is ‘not present’ • Maybe page is ‘not readable’ • Maybe page is ‘not writable’ • Maybe page is ‘not visible’
Page-fault examples movl %eax, (%ebx) ; writable? movl (%ebx), %eax ; readable? jmp ahead ; present? Everything depends on the entries in the current page-directory and page-tables, and on the cpu’s Current Privilege Level
Current Privilege Level (CPL) Layout of segment-register contents (16 bits) 3 2 1 0 15 segment-selector T I RPL TI = Table-Indicator RPL=Requested Privilege Level CPL is determined by the value of RPL field in CS and SS
What does the CPU do? • Whenever the cpu detects a page-fault, its action depends on Current Privilege Level • If CPL == 0 (executing in kernel mode): 1) push EFLAGS register 2) push CS register 3) push EIP register 4) push error-code 5) jump to page-fault service-routine
Alternative action in user-mode • If CPL == 3 (executing in user mode) the CPU will switch to its kernel-mode stack: 0) push SS and ESP 1) push EFLAGS 2) push CS 3) push EIP 4) push error-code 5) jump to the page-fault service-routine
How CPU finds new stack • Special CPU segment-register: TR • TR is the ‘Task Register’ • TR holds ‘selector’ for a GDT descriptor • Descriptor is for a ‘Task State Segment’ • So TR points indirectly to current TSS • TSS stores address of kernel-mode stack
Stack Switching mechanism user code CS EIP user stack INTERRUPT DESCRIPTOR TABLE SS ESP user-space kernel-space kernel code Gate descriptor IDTR GLOBAL DESCRIPTOR TABLE kernel stack SS0 ESP0 TR TSS descriptor TASK STATE SEGMENT GDTR
Let’s ‘intercept’ page-faults • Use our systems programming knowledge • We build a ‘new’ Interrupt Descriptor Table • With our own ‘customized’ interrupt-gates • Use a ‘new’ gate for page-fault exceptions • Other existing gates we can simply copy • Why not just modify the existing IDT? • It’s ‘write-protected’ in some Linux kernels • But we can still ‘read’ it (i.e., for copying)
Very delicate to implement • Will need to use some assembly language • Using C language doesn’t give full control • C Compiler designers didn’t plan for this! • (except they did allow for using assembly) • Assembly requires us to be very precise • So try keeping assembly to a minimum • We can use a mixture of assembly and C
Allocate a mapped page • Device interrupts are ‘asynchronous’ • CPU requires instant access to the IDT • We must insure CPU can find new IDT • Cannot risk putting it in ‘high memory’ • We can use ‘get_free_page()’ function • With flags: GFP_KERNEL and GFP_DMA • (This insures page will be always mapped) • No memory available? Cannot continue.
Must find address of current IDT • We’ll need it for copying the existing gates • We’ll need it for restoring old IDT upon exit • We can use the ‘sidt’ instruction to find it • But ‘sidt’ needs a 48-bit memory-operand • No such type is directly supported in C • We could use a 64-bit type (i.e., long long) • Better to use array of three 16-bit values
Getting hold of current IDT • We need to declare a global variable • Because ‘init_module()’ needs it • And also ‘cleanup_module()’ needs it • Use ‘static’ to make it private • Use ‘short’ to get 16-bit array-entries • Use ‘unsigned’ to avoid sign-extensions static unsigned short oldidtr[ 3 ];
Activating a ‘new’ IDT • When we’re ready, we can use ‘sidt’ • Instruction will change the IDTR register • Instruction needs 48-bit memory operand • So again we will declare a suitable array static unsigned short newidtr[ 3 ];
Initializations • We need to initialize our ‘idtr’ array • We need to initialize new Descriptor Table • Use ‘memcpy()’ for copying within kernel • Page-Fault’s gate-descriptor must be built • Must conform to CPU’s expected layout • Need to use a local 64-bit variable unsigned long long gate_desc;
Format for a Gate Descriptor Quadword (64-bits) 63 0 gate type segment-selector offset[ 15…0 ] offset[ 31…16 ] The address of the fault-handler is ‘split’ into a hiword and a loword
Declaring our fault-handler • Tell the C compiler our handler’s name: asmlinkage void isr0x0E( void ); • Its type and value are set by assembler: asm(“ .text “); asm(“ .type isr0x0E, @function “); asm(“isr0x0E: “);
Save/Restore cpu registers • Upon entering: asm(“ pushal “); asm(“ pushl %ds “); asm(“ pushl %es “); • Upon leaving: asm(“ popl %es “); asm(“ popl %ds “); asm(“ popal “); asm(“ jmp *old_isr “);
Handler must access kernel data • Registers CS and SS get set up by the CPU • But its our job to set up DS and ES registers • Linux uses same segments for data and stack asm(“ mov %ss, %eax “); asm(“ mov %eax, %ds “); asm(“ mov %eax, %es “); • (Current kernel version doesn’t use FS or GS)
Transfer to a C function • Handler will need some info from the stack • The ‘error-code’ will be needed for sure • So C function will need an ‘argument’ • So here’s our C function prototype: static void handler( unsigned long *tos );