Facilities for x86 debugging

Facilities for x86 debugging Introduction to Pentium features that can assist programmers in their debugging of software

Any project ‘bugs’? • As you work on designing your solution for the programming assignment in Project #1 it is possible (likely?) that you may run into some program failures • What can you do if your program doesn’t behave as you had expected it world? • How can you diagnose the causes? • Where does your problem first appear?

Single-stepping • An ability to trace through your program’s code, one instruction at a time, often can be extremely helpful in identifying where a program flaw is occurring – and also why • The Pentium processor provides hardware assistance in implementing a ‘debugging’ capability such as ‘single-steping’.

The EFLAGS register 16 8 R F T F TF = TRAP flag (bit 8) By setting this flag-bit in the EFLAGS register-image that gets saved on the stack when ‘pushfl’ was executed, and then executing ‘popfl’, the CPU will begin executing a ‘single-step’ exception after each instruction-executes RF = RESUME flag (bit 16) By setting this flag-bit in the EFLAGS register-image that got saved on the stack, the ‘iret’ instruction will be inhibited from generating yet another CPU exception

TF-bit in EFLAGS • Our ‘trydebug.s’ demo shows how to use the TF-bit to perform ‘single-stepping’ of a Linux application program (e.g., ‘hello’) • The ‘popfl’ instruction is used to set TF • The exception-handler for INT-1 displays information about the state of the task • But single-stepping starts only AFTER the immediately following instruction executes

How to do it • Here’s a code-fragment that we could use to initiate single-stepping from the start of our ‘ring3’ application-progam: pushw $userSS # selector for ring3 stack-segment pushw $userTOS # offset for ring3 ‘top-of-stack’ pushw $userCS # selector for ring3 code-segment pushw $0 # offset for the ring3 entry-point pushfl # push current EFLAGS btsl $8, (%esp) # set image of the TF-bit popfl # modify EFLAGS to set TF lret # transfer to ring3 application

Using ‘objdump’ output • You can generate an assembler ‘listing’ of the instructions in our ‘hello’ application • You can then use the listing to follow along with the ‘single-stepping’ through that code • Here’s how to do it: $ objdump –d hello > hello.u • (The ‘-d’ option stands for ‘disassembly’)

A slight ‘flaw’ • We cannot single-step the execution of an ‘int-0x80’ instruction (Linux’s system-calls) • Our exception-handler’s ‘iret’ instruction will restore the TF-bit to EFLAGS, but the single-step ‘trap’ doesn’t take effect until after the immediately following instruction • This means we ‘skip’ seeing a display of the registers immediately after ‘int-0x80’

Fixing that ‘flaw’ • The Pentium offers a way to overcome the problem of a delayed effect when TF is set • We can use the Debug Registers to set an instruction ‘breakpoint’ which will interrupt the CPU at a specific instruction-address • There are six Debug Registers: DR0, DR1, DR2, DR3 (breakpoints) DR6 (the Debug Status register) DR7 (the Debug Control register)

Breakpoint Address Registers DR0 DR1 DR2 DR3

Special ‘MOV’ instructions • Use ‘mov %reg, %DRn’ to write into DRn • Use ‘mov %DRn, %reg’ to read from DRn • Here ‘reg’ stands for any one of the CPU’s general-purpose registers (e.g., EAX, etc.) • These instructions are ‘privileged’ (i.e., can only be executed by code running in ring0)

Debug Control Register (DR7) 15 0 0 0 G D 0 0 1 G E L E G 3 L 3 G 2 L 2 G 1 L 1 G 0 L 0 Least significant word 31 16 LEN 3 R/W 3 LEN 2 R/W 2 LEN 1 R/W 1 LEN 0 R/W 0 Most significant word

What kinds of breakpoints? LEN R/W LEN 00 = one byte 01 = two bytes 10 = undefined 11 = four bytes R/W 00 = break on instruction fetch only 01 = break on data writes only 10 = undefined (unless DE set in CR4) 11 = break on data reads or writes (but not on instruction fetches)

Control Register 4 • The Pentium uses Control Register 4 to activate certain extended features of the processor, while still allowing for backward compatibility of software written for earlier Intel x86 processors • An example: Debug Extensions (DE-bit) 31 3 0 other feature bits D E CR4

Debug Status Register (DR6) 15 0 B T B S B D 0 1 1 1 1 1 1 1 1 B 3 B 2 B 1 B 0 Least significant word 31 16 unused ( all bits here are set to 1 ) Most significant word

Where to set a breakpoint • Suppose you want to trigger a ‘debug’ trap at the instruction immediately following the Linux software ‘int $0x80’ system-call • Your debug exception-handler can use the saved CS:EIP values on its stack to check that ‘int $0x80’ has caused an exception • Machine-code is: 0xCD, 0x80 (2 bytes) • So set a ‘breakpoint’ at address EIP+2

How to set this breakpoint isrDBG: push %ebp mov %esp, %ebp pushal # put breakpoint-address in DR0 mov 4(%ebp), %eax add $2, %eax mov %eax, %dr0

Setting a breakpoint (continued) # enable local breakpoint for DR0 mov %dr7, %eax bts $0, %eax # set LE0 mov %eax, %dr7 … popal pop %ebp iret

Detecting a ‘breakpoint’ • Your debug exception-handler can read DR6 to check for any occurrences of breakpoints mov %dr6, %eax ; get debug status bt $0, %eax ; breakpoint #0? jnc notBP0 ; no, another cause bts $16, 12(%ebp) ; set the RF-bit # or disable breakpoint0 in register DR7 notBP0:

In-class exercise #1 • Our ‘trydebug.s’ demo illustrates the idea of single-stepping through a program, but after several steps it encounter a General Protection Exception (i.e., interrupt $0x0D) • You will recognize a display of information from registers that gets saved on the stack • Can you determine why this fault occurs, and then modify our code to eliminate it?

The unlabeled stack layout • Our ‘isrGPF’ handler doesn’t label its info: ----- ES ----- DS EDI ESI EBP ESP EBX EDX ECX EAX error-code EIP ----- CS EFLAGS

Intel x86 instruction-format • Intel’s instructions vary in length from 1 to 15 bytes, and are comprised of five fields: instruction prefixes 0,1,2 or 3 bytes opcode field 1 or 2 bytes addressing mode field 0, 1 or 2 bytes address displacement 0, 1, 2 or 4 bytes immediate data 0, 1, 2 or 4 bytes Maximum number of bytes = 15

A few examples • 1-byte instruction: in %dx, %al • 2-byte instruction: int $0x16 • A prefixed instruction: rep movsb • And here’s a 12-byte instruction: cmpl $0, %fs:0x400(%ebx, %edi, 2) • 1 prefix byte • 1 opcode byte • 2 address-mode bytes • 4 address-displacement bytes • 4 immediate-data bytes

In-class exercise #2 • Modify the debug exception-handler in our ‘trydebug.s’ demo-program so it will use a different Debug Register (i.e.,, DR1, DR2, or DR3) to set an instruction-breakpoint at the entry-point to your ‘int $0x80’ system-service interrupt-routine (i.e., at ‘isrDBG’) • This can allow you to do single-stepping of your system-call handlers (e.g., ‘do_write’)

Facilities for x86 debugging