1 / 51

ADM390 Microsoft ® Windows ® Crash Dump Analysis

ADM390 Microsoft ® Windows ® Crash Dump Analysis. Mark Russinovich Winternals Software David Solomon David Solomon Expert Seminars. About The Speakers. Authors of: Inside Windows 2000 , 3rd Edition (Microsoft Press) Inside Windows 2000/XP/2003 Interactive Internals Video Tutorial

egil
Download Presentation

ADM390 Microsoft ® Windows ® Crash Dump Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ADM390Microsoft® Windows® Crash Dump Analysis Mark RussinovichWinternals Software David SolomonDavid Solomon Expert Seminars

  2. About The Speakers • Authors of: • Inside Windows 2000, 3rd Edition(Microsoft Press) • Inside Windows 2000/XP/2003 Interactive Internals Video Tutorial • Used by Microsoft for worldwide internal training • David Solomon: • Teaches Windows internals classes (www.solsem.com) • Writes books and articles on Windows internals • Mark Russinovich: • Author of tools on www.sysinternals.com • Co-founder and Chief Software Architect for Winternals Software (www.winternals.com) • Teaches Windows internals classes • Writes books and articles on Windows internals

  3. Outline • What causes crashes? • Crash dump options • Analysis with WinDbg/Kd • Debugging hung systems • Microsoft On-line Crash Analysis • Using Driver Verifier • Live kernel debugging • Getting past a crash

  4. Introduction • Many systems administrators ignore Windows NT/Windows 2000’s crash dump options • “I don’t know what to do with one” • “Its too hard” • “It won’t tell me anything anyway” • Basic crash dump analysis is actually pretty straightforward • Even if only 1 out of 5 or 10 dumps tells you what’s wrong, isn’t it worth spending a few minutes?

  5. Why Analyze Dumps? • The debuggers and Microsoft Online Crash Analysis (OCA) often solve crashes • Sometimes, however, they do not, so your analysis might tell you: • What driver to disable, update, or replace with different hardware • What OEM to send the dump to

  6. What Causes Crashes? • System crashes when a fatal error prevents further execution • Any kernel-mode component can crash the system • Drivers and the OS share the same memory space • Therefore, any driver or OS component can, due to a bug, corrupt system memory • Note: This is for performance reasons and is the same on Linux, most Unix’s, VMS, etc…

  7. What Are The Root Causes? • Anecdotal evidence suggests: • Buggy drivers • Bugs in the OS • Hardware failure/error • Cosmic rays

  8. At The Crash • A component calls KeBugCheckEx, which takes five arguments: • Stop code • 4 stop-code defined parameters • KeBugCheckEx: • Turns off interrupts • Tells other CPUs to stop • Paints the blue screen • Notifies registered drivers of the crash • If a dump is configured: • Verifies checksums • Calls dump I/O functions

  9. Common Stop Codes • There are about 150 defined stop codes • Shared by many components and drivers • Common ones include: • IRQL_NOT_LESS_OR_EQUAL (0x0A) • Usually an invalid memory access • INVALID_KERNEL_MODE_TRAP (0x7F) andKMODE_EXCEPTION_NOT_HANDLED (0x1E) • Generated by executing garbage instructions • Usually caused when a stack is trashed • Documented in Debugger Tools help file • Often, multiple articles in Knowledge Base

  10. Dump Options • Complete memory dump (Windows NT 4, Windows 2000, Windows XP) • Full contents of memory written to <systemroot>\memory.dmp • Kernel memory dump (Windows 2000, Windows XP, Server 2003) • System memory written to <systemroot>\memory.dmp • Small memory dump (Windows 2000, Windows XP, Server 2003) • Also called a minidump or triage dump • 64KB of summary written to <systemroot>\minidump\MiniMMDDYY-NN.dmp

  11. Enabling Dumps • In Windows 2000/XP/2003:

  12. What Happens When Crash Dumps Are Enabled • When the system boots it checks HKEY_LOCAL_MACHINE\System\ CurrentControlSet\Control\CrashControl • The boot disk paging file’s on-disk mapping is obtained • Relevant components are checksummed: • Boot disk miniport driver • Crash I/O functions • Page file map

  13. At The Reboot WinLogon Memory.dmp Session Manager 2 3 SaveDump 1 4 User mode Kernel mode Paging File NtCreatePagingFile

  14. At The Reboot • Session Manager process (\windows\system32\smss.exe) initializes paging file • NtCreatePagingFile • NtCreatePagingFile determines if the dump has a crash header • Protects the dump from use • WinLogon calls NtQuerySystemInformation to tell if there’s a dump 1 2

  15. At The Reboot • If there’s a dump, Winlogon executes SaveDump (\windows\system32\savedump.exe) • Writes an event to the System event log • SaveDump writes contents to appropriate file • Crash dump portion of paging file is in use during copy, so virtual memory can run low 3 4

  16. Why Crash Dumps Fail • Most common reasons: • Paging file on boot volume is too small • Not enough free space for extracted dump • Less common: • The crash corrupted components involved in the dump process • Miniport driver doesn’t implement dump I/O functions • Windows storage drivers must implement dump I/O to get a Microsoft® digital signature

  17. Microsoft On-line Crash Analysis (OCA) • By Default, after a reboot XP/Server 2003 prompts you to send information to http://oca.microsoft.com • Can be configured with Computer Properties->Advanced->Error Reporting • Can be customized with Group Policies

  18. What Does OCA Do? • Server farm uses !analyze, but uses Microsoft’s Triage.ini file and database that includes information about known problems • Several ways to get OCA results: • Via e-mail • At the OCA site • Sometimes OCA will point you at KB articles that describe the problem • KB articles may tell you to use Windows Update to get newer drivers, a hotfix, or install a Service Pack

  19. Analyzing a Crash Dump • If OCA doesn’t help you, or you have an NT4 or Windows 2000 dump, then you need to open it with one of the kernel debuggers: • WinDbg –Windows program • Kd – command-line program • Both provide same kernel debugger analysis commands • Part of the Debugging Tools for Windows • Free download from http://www.microsoft.com/whdc/ddk/debugging/default.mspx • Supports Windows NT 4, Windows 2000, Windows XP, Server 2003 • Check for updates frequently • Don’t use older version on install media

  20. Symbol Files • Before you can use any crash analysis tool you need symbol files • Symbol files contain global function and variable names • Symbols are service pack-specific and have an installer (default directory is \windows\symbols) • Windows NT 4: *.dbg • Windows 2000: *.dbg, *.pdb • Windows XP/2003: *.pdb • Note: Service Pack symbols only include updates

  21. Microsoft Symbol Server • WinDbg and Kd can download symbols automatically from Microsoft • Pick a directory to install symbols and add the following to the debugger’s symbol path:SRV*directory*http://msdl.microsoft. com/download/symbols • The debugger automatically detects the OS version of a dump and downloads the symbols on-demand

  22. Automated Analysis • When you open a crash dump with Windbg or Kd you get a basic crash analysis: • Stop code and parameters • A guess at offending driver • The analysis is the result of the automated execution of the !analyze debugger command

  23. Automated Analysis • Always execute !analyze with the –v option to get more information • Text description of stop code • Meaning (if any) of parameters • Stack dump • !Analyze uses heuristics to walk up the stack and determine what driver is the likely cause of the crash • “Followup” is taken from optional triage.ini file

  24. Manual Analysis • Sometimes automated analysis isn’t enough • !analyze doesn’t tell you anything useful • You want to know what else was happening at the time of the crash • Useful commands: • Examine current thread: !thread tid • May or may not be related to the crash • List all processes: !process 0 0 • Make sure you understand what was running on the system • Examine a specific process: !process <pid> 7 • List loaded drivers: lm kv • Make sure drivers are all recognized and up to date • Look at memory usage: !vm • Create a smaller dump file: .dump • Additional commands: !help

  25. Driver Verifier • If you find a driver in a crash dump that looks like it might be the cause of the crash, turn on verification for it • If the Verifier detects a violation it crashes the system and identifies the driver • Use “Last Known Good” if the verifier detects a bug during the boot • If a bug is detected in a third-party product check for updates and/or contact the vendor’s support

  26. NotMyFault.exe • In order to demonstrate common crash scenarios, use NotMyFault.Exe • Download from http://www.sysinternals.com /files/notmyfault.zip • It loads MyFault.sys • MyFault.Sys has an IOCTL interface that implements different bugs User Mode Kernel Mode MyFault.sys IOCTL Interface

  27. IRQL_NOT_LESS_OR_EQUAL • Run NotMyFault and select “High IRQL fault (kernel mode)” • Allocates paged pool buffer • Frees the buffer • Raises IRQL ≥ DISPATCH_LEVEL • Touches the buffer • Paged buffers that are marked “not present” but are touched when IRQL ≥ DISPATCH_LEVEL result in the IRQL_NOT_LESS_OR_EQUAL bug check • Memory Manager calls KeBugCheckEx from page fault handler • The IRQL is not less than or equal to the maximum IRQL at which the operation is legal (which is < DISPATCH_LEVEL)

  28. Using the Stack in Analysis • !analyze easily identifies MyFault.sys by looking at the KeBugCheckEx parameters • The Memory Manager looked at the stack and determined the address that caused the page fault • !analyze often looks at the stack to determine the cause of a crash

  29. Stacks • Each thread has a user-mode and kernel-mode stack • The user-mode stack is usually 1 MB on x86 • The kernel-mode stack is typically 12 KB on x86 systems • Stacks allow for nested function invocation • Parameters can be passed on the stack • Stores return address • Serves as storage for local variables

  30. Stack Frames Parameter 1 Return Address Frame Pointer Local Variable 1 Function 1 Local Variable 2 Parameter 3 Higher Addresses Parameter 2 Parameter 1 Function 2 Return Address Frame Pointer Local Variable 1 Local Variable 2 Function 3 Parameter 2 Parameter 1 Return Address Frame Pointer Local Variable 1

  31. Stacks • Other calling conventions make the stack hard to figure out • No frame pointer • Register arguments (fast calls) • Debugger requires symbol information to parse • The stack is the #1 analysis resource • It requires that a driver get “caught in the act” • Sometimes that’s not possible without the Driver Verifier’s help

  32. Stack Trashing • Stack trashes have several possible causes: • A driver pushing things on the stack causes the stack to overflow • A driver overruns a stack-allocated buffer • Usually results in garbage code being executed (KMODE_EXCEPTION_NOT_HANDLED) • Driver Verifier can’t determine cause • Since the stack is corrupted, analysis is especially hard

  33. Debugging Stack Trashes • Run NotMyFault and select “Stack Trash” • Allocates a buffer on the stack • Overruns the buffer • Returns to the caller • Crash doesn’t show much off hand • !analyze actually blames Win32K.sys, the Win32 kernel-mode subsystem • Stack doesn’t show anything except an exception handler • Look deeper • !thread shows an outstanding IRP • !irp <irp> shows that myfault.sys was the target of the IRP

  34. Buffer Overruns • Result when a driver goes past the end (overrun) or the beginning (underrun) of a buffer • Usually detected whenoverwritten data is referenced • Another driver or the kernel makes the reference • There can be a long delaybetween corruption and detection Another Driver’s Buffer Higher Addresses Pool Structures Driver Buffer

  35. Causing a Buffer Overrun • Run NotMyFault and select “Buffer Overrun” • Allocates a nonpaged pool buffer • Writes a string past the end • Note that you might have to run several times since a crash will occur only if: • The kernel references the corrupted pool structures • A driver references the corrupted buffer • The crash tells you what happened, but not why

  36. A Buffer Overrun Bluescreen • In this example, where the crash was the result of the kernel tripping on corrupt pool tracking structures, the Bluescreen tells you what to do:

  37. What is Special Pool? • Special pool is a kernel buffer area where buffers are sandwiched with invalid pages • Conditions for a driver allocating from special pool: • Driver Verifier is verifying driver • Special pool is enabled • Allocation is slightly less than one page (4 KB on x86) Invalid Page n+2 Higher Addresses Buffer Page n+1 Signature Invalid Page n

  38. Turning on Special Pool • Enable Special Pool verification on the suspect driver

  39. The Verifier Catching Buffer Overrun • The Driver Verifier catches the overrun when it occurs • The Bluescreen tells you who’s fault it is • !analyze explains the crash and also tells you the buggy driver name • The stack shows where the driver bug is

  40. Code Overwrites • Caused when a bug results in a wild pointer • A wild pointer that points at invalid memory is easily detected • A wild pointer that points at data is similar to buffer overrun • Might not cause a problem for a long time • Crash makes it look like its something else’s fault • Driver Verifier doesn’t catch code overwrite • System code write protection catches code overwrite, but it’s not on if: • It’s a Windows 2000 system with > 127 MB memory • It’s a Windows XP or .NET Server system with > 255 MB • Something has disabled it

  41. Causing a Code Overwrite • Run NotMyFault and select “Code Overwrite” • Overwrites first bytes of nt!ntreadfile • Function is most common entry to I/O system so a random thread will cause the crash • The crash hints that the fault occurred in NtReadFile • The last user-mode address is ZwReadFile • The ebx register in the exception frame points at NtReadFile • NtReadFile’s start location looks scrambled (u ntreadfile)

  42. System Code Write Protection • Make sure system code write protection is on • Set HKLM\System\CurrentControlSet\Control \Session Manager\Memory Management LargePageMinimum REG_DWORD 0xFFFFFFFF EnforceWriteProtection REG_DWORD 1 • Reboot to take effect • Rerun NotMyFault • Crash occurs immediately and even the blue screen points at MyFault.sys: • !analyze shows the address of the write and the target (NtReadFile)

  43. Hung Systems • You can tackle a hung system, but only if you’ve prepared: • Boot in debug mode, or • Set the keystroke-crash Registry value • For debug mode you need a second system (the debugger host) connected to the target via serial cable • Run Windbg/Kd on the host • Edit the target’s boot.ini file: • /debugport=comX /baudrate=XXX • When the system hangs, connect with the debugger and hit Ctrl-C

  44. Hung Systems • To configure keystroke-crash: • Set HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\i8042prt\Parameters\CrashOnCtrlScrl to 1 • Enter right-ctrl+[scroll-lock, scroll-lock] to crash the system • Use !thread to see what’s running • Examine loaded drivers, IRQL, …

  45. Getting Past a Crash • Last-Known Good • Boots with driver/kernel configuration last used during a successful boot • Safe Mode • Boots the system with core set of drivers and services • Network and non-network • Recovery Console • Manually disable offending service, replace corrupt images, update files • ERD Commander 2003 • Registry Editor, Explorer, Driver/Service Manager, password changer, Event Log viewer, Notepad

  46. The Bluescreen Screen Saver • Scare your enemies and fool your friends with the Sysinternals Bluescreen Screen Saver • Be careful, your job may be on the line!

  47. More Information • Inside Windows 2000, 3rd edition • Section on System Crashes in chapter 4 • Debugging Tools help file • Knowledge Base Articles • http://www.microsoft.com/whdc/ddk/debugging/DBG-KB.mspx • Usenet newsgroup microsoft.public.windbg for discussion of debugger issues • The debugger team wants your feedback and bug reports - mail suggestions or bug reports to windbgfb@microsoft.com

  48. Community Resources • Community Resources http://www.microsoft.com/communities/default.mspx • Most Valuable Professional (MVP) http://www.mvp.support.microsoft.com/ • Newsgroups Converse online with Microsoft Newsgroups, including Worldwide http://www.microsoft.com/communities/newsgroups/default.mspx • User Groups Meet and learn with your peers http://www.microsoft.com/communities/usergroups/default.mspx

  49. evaluations

More Related