150 likes | 322 Views
LKCD Linux Kernel Crash Dumps. Matt D. Robinson matt@aparity.com. LKCD Overview. Description Kernel Implementation Configuration Invocation/Kernel State User-Level Analysis (lcrash) lcrash Example Output Future Development/Evolution. Description.
E N D
LKCDLinux Kernel Crash Dumps Matt D. Robinson matt@aparity.com
LKCD Overview • Description • Kernel Implementation • Configuration • Invocation/Kernel State • User-Level Analysis (lcrash) • lcrash Example Output • Future Development/Evolution Version 1.0
Description LKCD is a set of kernel and application code to configure, implement, and analyze system crash dumps. These slides will cover a high-level view of the kernel side of LKCD, with a brief introduction to the user-level analysis tools. Version 1.0
Kernel Implementation • dump.o is the primary kernel driver, and can be either a module or built by default into the kernel • Dump driver is dormant until either invoked for configuration or for dumping • Configuration of dump device determines what occurs on invocation • Disruptive and non-disruptive dumping available Version 1.0
Kernel Implementation • Dump compression available through modules (or standalone) – GZIP or RLE • Access to dump driver through /dev/dump (device pair 227,0) • panic() or die_if_kernel() will invoke the dumping process – dumping only occurs if dumps are configured Version 1.0
Current dump path uses existing I/O subsystem for dumping Disks (primarily swap) are used for now – future direction will be MUCH different Kernel Implementation panic() die_if_kernel() dump() dump_execute() dump_add_page() dump_write_pages() dump_compress_page() I/O Subsystem (Disk, Network, Etc.) Version 1.0
Configuration • Dump configuration takes place via ioctl() to the kernel driver: • DIOSDUMPLEVEL • DUMP_LEVEL_NONE – Don’t dump any pages • DUMP_LEVEL_ALL – Dump all memory pages • DUMP_LEVEL_KERN – Dump just kernel level pages • DIOSDUMPFLAGS • DUMP_FLAGS_NONE – No flags set • DUMP_FLAGS_NONDISRUPT – Try and continue standard system operation after a dump takes place Version 1.0
Configuration • DIOSDUMPCOMPRESS • DUMP_COMPRESS_NONE – Raw dump format • DUMP_COMPRESS_RLE – Use RLE compression • DUMP_COMPRESS_GZIP – Use GZIP compression • DIOSDUMPDEV • This is the device to dump to (for example, /dev/sda4) Each configuration parameter is dependent on the system state, whether dump compression is loaded into the kernel, etc. Version 1.0
User-Level Analysis (lcrash) Linux Crash (lcrash) is used for analyzing system crash dumps. It is extremely powerful for support and engineering personnel for finding solutions to kernel crashes: • Evaluates CPU state • Mode, register settings, etc. • Displays all tasks • Includes which task is running on a given CPU • Stack trace for each running task • This is accomplished WITHOUT frame pointers built into the kernel (-fomit-frame-pointer) • Allows for memory dumping, struct analysis, finding symbols, etc. • lcrash is amazingly versatile for problem analysis • Crash dump reports can be created automatically on boot-up after a system crash Version 1.0
lcrash Example Output >> stat | head sysname : Linux nodename : crashme.atmyhouse.com release : 2.4.8 version : #9 SMP Mon Dec 10 00:05:19 PST 2001 machine : i686 domainname : (none) LOG_BUF: >> dump log_buf 10 0xc0332c60: 4c3e343c 78756e69 72657620 6e6f6973 : <4>Linux version 0xc0332c70: 342e3220 2820382e 746f6f72 74617740 : 2.4.8 (root@cra 0xc0332c80: 79657265 70612e65 : shme.atm Version 1.0
lcrash Example Output >> task ADDR UID PID PPID STATE FLAGS CPU NAME ====================================================================== 0xc02e4000 0 0 0 0 0 - swapper 0xdfffc000 0 1 0 0 0x100 - init 0xdfff2000 0 2 1 1 0x40 - keventd 0xdffee000 0 3 0 0 0x40 - ksoftirqd_CPU0 [ . . . ] 0xde47a000 0 867 1 1 0x100 - mingetty 0xda0fe000 0 1017 660 0 0x140 - sshd 0xd9c06000 0 1018 1017 1 0x100 - bash 0xde4b4000 0 1101 1018 0 0x100 0 insmod ====================================================================== 31 active task structs found Version 1.0
lcrash Example Output >> t 0xda0fe000 ========================================================= STACK TRACE FOR TASK: 0xda0fe000(sshd) 0 schedule+1040 [0xc0111250] 1 schedule_timeout+121 [0xc0110d89] 2 do_select+506 [0xc014251a] 3 sys_select+820 [0xc01428c4] 4 system_call+44 [0xc0106ed4] ========================================================= >> fsym panic_timeout ADDR OFFSET TYPE NAME ============================================================ 0xc0332804 0 GLOBAL_DATA panic_timeout ============================================================ 1 symbol found >> od panic_timeout 0xc0332804: 00000005 : .... Version 1.0
lcrash Example Output >> px ((struct task_struct *)0xd8abf000).thread.esp0 0x15a159 >> px ((struct task_struct *)0xd8abf000).thread.debugreg[0] 0x0 >> whatis user_struct struct user_struct { atomic_t __count; atomic_t processes; atomic_t files; struct user_struct *next; struct user_struct **pprev; uid_t uid; }; >> px (struct user_struct *)(((struct task_struct *)0xd8abf000).user).uid 0xfffff000 Version 1.0
Future Development/Evolution • The 2.5 implementation of LKCD will use dump methods to allow multiple dumping paths through the kernel (multiple devices!) • Low-level device drivers will register their own set of dump functions so that each driver does what it thinks is correct • Additions to lcrash and other LKCD utilities will be extended to allow for this functionality • LKCD will be extended to work on multiple OS architectures (such as FreeBSD) Version 1.0
Questions/Comments? Version 1.0