1 / 32

Windows Crash Dump Analysis

Windows Crash Dump Analysis. Daniel Pearson David Solomon Expert Seminars. Daniel Pearson. Started working with Windows NT 3.51 Three years at Digital Equipment Corporation Supporting Intel and Alpha systems running Windows NT Seven years at Microsoft

bryanw
Download Presentation

Windows Crash Dump Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Windows Crash Dump Analysis Daniel Pearson David Solomon Expert Seminars

  2. Daniel Pearson • Started working with Windows NT 3.51 • Three years at Digital Equipment Corporation • Supporting Intel and Alpha systems running Windows NT • Seven years at Microsoft • Senior Escalation Lead in Windows base team • Worked in the Mobile Internet sustained engineering team • Instructor for David Solomon, co-author of the Windows Internalsbook series

  3. Agenda • Causes of Windows crashes • What happens during a crash • Configuring Windows crash options • Writing a crash dump • Automated and manual crash analysis • Using Driver Verifier to detect errors • Attaching a kernel debugger • Portions of this session are based on material developed byMark Russinovich and David Solomon

  4. Why Analyze a Crash? • When Windows Error Reporting has no solution or when it blames “a device driver”

  5. Why Does Windows Crash • A device driver or part of the operating system incurs anunhandled exception • A device driver or part of the operating system explicitly crashes the system due to an unrecoverable condition • A page fault occurs at an interrupt request level of dispatch or higher • A hardware condition such as a nonmaskable interrupt or faulty memory, disk, etc.

  6. Causes of Windows Crashes Microsoft Corporation. 2008. Online Crash Analysis research performed inSeptember of 2008.

  7. What Happens During a Crash • When a condition is detected that requires a crash, the kernel API KeBugCheckEx is called • KeBugCheckEx accepts a bugcheck code that indicates the reason for the crash and four parameters that supply additional information KeBugCheckEx(    IN ULONG  BugCheckCode,    IN ULONG_PTR  BugCheckParameter1,    IN ULONG_PTR  BugCheckParameter2,    IN ULONG_PTR  BugCheckParameter3,    IN ULONG_PTR  BugCheckParameter4    );

  8. Inside of KeBugCheckEx • KeBugCheckEx performs several functions • Disables interrupts • Notifies other CPUs to halt execution • Notifies registered drivers • Writes crash dump information to disk* • Restarts the system* • Only if the system is configured to do so

  9. The Windows Stop Screen 1 2 3 4 5

  10. Bugcheck Codes • Shared by many components and drivers • The Windows Driver Kit currently documents over 250 unique bugcheckcodes

  11. Memory Dump Types • Small memory dump • Records the smallest set of useful information • Kernel memory dump* • Records only kernel memory, which speeds up the process of writing a crash dump • Complete memory dump* • Records the entire contents of system memory • If either a Kernel or Complete memory dump is selected, the system will also create a minidump and store it in the %SystemRoot%\minidump directory

  12. Configuring DebuggingInformation Options

  13. Writing a Crash Dump • Crash dump information is written to the paging file on the boot volume or to a dedicated dump file if specified • Too risky to create a new file on the system • How does the system know its safe? • The boot volume paging file’s on-disk mapping is obtained when the system starts • Critical crash components are checksummed • When a crash occurs, if the checksum doesn’t match, a memory dump is not written

  14. Why Would You Not Get a Dump? • Problems with page file configuration • The paging file on the boot volume is too small or one does not exist • The system crashed before the paging file was initialized • Critical crash components are corrupted • Windows didn’t crash! • The system spontaneously restarted • The system is hung

  15. Analyzing a Crash Dump • The Microsoft kernel debuggers can be used to open and analyze a crash dump • kd, a command line tool and WinDbg, a GUI tool • Available as part of the Debugging Tools for Windows http://www.microsoft.com/whdc/devtools/debugging/default.mspx • Configure the debugger to point to symbols srv*C:\SYMBOLS*http://msdl.microsoft.com/download/symbols

  16. Automated Analysis • When you open a crash dump with WinDbg or kd, the debugger performs basic crash analysis* • Displays stop code and parameter information • Takes a guess at the offending driver • The analysis is the result of the automated execution of the !analyze debugger command • !analyze uses the bugcheck parameters and a set of heuristics to determine what component is the likely cause of the crash • Set the environment variable DBGENG_NO_BUGCHECK_ANALYSIS=1to disable

  17. Automated AnalysisUsing !analyze

  18. Memory Corruption • Occurs when a driver goes past the end, called an overrun, or the beginning, an underrun, of it’s memory allocation • Usually detected when overwritten data is referenced by the kernel or another driver • It’s possible there’s a long delay between corruption and detection

  19. Viewing the Effects of Memory Corruption

  20. Crash Transformation • For crashes that are difficult to analyze • The “victim” crashed the system, not the culprit • The debugger points to ntoskrnl.exe, win32k.sys or otherWindows components • You get many different crash dumps all pointing at different causes • Your goal isn’t to analyze difficult crashes … It’s to try to make an “unanalyzable” crash into one that can be easily analyzed

  21. Driver Verifier • Useful for identifying code defects in drivers • Performs more thorough checks on the system and device drivers as well as simulating failures • Support is built into the operating system • The requirements for the Windows logo program state that a driver must not fail while running under Driver Verifier

  22. Using Driver Verifier to Catch aBuffer Overrun

  23. Manual Analysis • Sometimes !analyze isn’t enough • It might not tell you anything useful • You want to know in more detail what was happening at the time of the crash • Several useful commands and techniques • Verify the time of the crash, .time • A short uptime value can mean frequent problems • Check the stack on each CPU, stacks are read from the bottom to the top • !cpuinfo will display a list of all the CPUs • Use ~s to switch to a different CPU for investigation • k to display the stack

  24. Manual Analysis • Several useful commands and techniques • Look at memory usage, !vm • Make sure memory pools are not depleted or contain errors • Use !poolused to identify large users • Check the currently running thread, !thread • May or may not be related to the crash • Check pending I/O requests using !irp • List all processes on the system, !process 0 0 • Make sure you understand what was running at the time • List loaded drivers, lm t n • Make sure all the drivers are recognizable and up to date

  25. Manual Analysis of aCrash Dump

  26. Attaching a Kernel Debugger • Required for debugging initialization failures and crashes where no dump file is created • Requires that the system be started with the debugger enabled to work • Support for using a null-modem, IEEE 1394 and USB 2.0 cable as well as virtual machines and over the network in Windows 7 • Limited support for local kernel debugging

  27. Attaching a Kernel Debugger to a Live System

  28. Hung Systems • Sometimes systems becomes unresponsive • Keyboard and mouse frozen • Two types of hangs • Instant lockup • Kernel synchronization deadlock • Infinite loop at a high IRQL or a very high priority thread • Slowly grinding to a halt • Resource depletion

  29. Initiating a Manual Crash • Using the keyboard • Requires a PS/2 keyboard + registry key • HKLM\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters\CrashOnCtrlScroll • Using an NMI button • Requires specialized hardware + registry key • HKLM\SYSTEM\CurrentControlSet\Control\CrashControl\NMICrashDump • Using the debugger • Break in and execute the .crash command

  30. Debugging a Hung System

  31. Additional Information • Windows Internals 5th edition • Debugging Tools for Windows documentation • Mark Russinovich’s Blog • http://blogs.technet.com/markrussinovich • Advanced Windows Debugging Blog • http://blogs.msdn.com/ntdebugging • Crash Dump Analysis and Debugging Portal • http://www.dumpanalysis.org

  32. Additional Information • David Solomon Expert Seminars offers training on Windows Internals both as public and private workshops and public webinars viathe Internet • Currently scheduled up and coming classes • Public workshop in London, April 12th – April 16th • Public webinar, April 26th & April 28th • Public workshop in New York, May 3rd – May 7th • Public workshop in San Francisco, November 8th – November 12th • Visit http://www.solsem.com for further course descriptions and up to date information

More Related