1 / 48

The Case of the Unexplained…

WCL301. The Case of the Unexplained…. Mark Russinovich Technical Fellow Windows Azure. Outline. Introduction Sluggish Performance Error Messages Blue Screens. Case of the Unexplained…. This is the 2012 version of the “case of the unexplained” talk series

carlyn
Download Presentation

The Case of the Unexplained…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WCL301 The Case of the Unexplained… Mark Russinovich Technical Fellow Windows Azure

  2. Outline • Introduction • Sluggish Performance • Error Messages • Blue Screens

  3. Case of the Unexplained… • This is the 2012 version of the “case of the unexplained” talk series • Previous versions covered different cases • Can view webcast on Sysinternals->Mark’s webcasts • Based on real case studies • Some of these have been written up on my blog

  4. Troubleshooting • Most applications do a poor job of reporting unexpected errors • Locked, missing or corrupt files • Missing or corrupt registry data • Permissions problems • Errors manifest in several different ways • Misleading error messages • Crashes or hangs

  5. Purpose of Talk • Show you how to solve these classes of problems by peering beneath the surface • Interpreting process, file and registry activity • Interpreting call stacks • You’ll learn tools and techniques to help you solve seemingly unsolvable problems

  6. Tools We’ll Use • Sysinternals: www.microsoft.com/technet/sysinternals (\\redmond\files\SYSINTERNALS\LBI\Latest) • Process Explorer – process/thread viewer • Process Monitor – file/registry/process/thread tracing • Procdump – process memory dumper • Autoruns – displays all autostart locations • SigCheck – shows file version information • PsExec – execute processes remotely or in the system account • TcpView – shows TCP/IP endpoints • Strings – dumps printable strings in any file • Zoomit – presentation tool I’m using • Microsoft downloads: • Debugging Tools for Windows: Windbg application and kernel debugger: www.microsoft.com/whdc/devtools/debugging (//dbg)

  7. The Sysinternals Administrator’s Reference • The official guide to the Sysinternals tools • Covers every tool, every feature, with tips • Written by markruss and aaronmar • Available in June • Full chapters on the major tools: • Process Explorer • Process Monitor • Autoruns • Other chapters by tool group • Security, process, AD, desktop, …

  8. Outline • Application Hangs • Sluggish Performance • Error Messages • Blue Screens

  9. Process Monitor • Process Monitor is a real-time file, registry, process and thread monitor • Works on Windows XP and higher, including 64-bit Windows • It replaces Filemon and Regmon, but you can use Filemon and Regmon on older operating systems • Enhancements over Filemon/Regmon include: • More advanced filtering • Operation call stacks • Boot-time logging • Data mining views • Process tree to see short-lived processes • When in doubt, run Process Monitor! • It will often show you the cause for error messages • It many times tells you what is causing sluggish performance

  10. Process Monitor Enhancements: Bookmarks • Bookmarking enables you to save markers in the trace: • Use F6 to find the next one, Shift+F6 to search up

  11. Process Monitor Enhancements: Environment Variables and Current Directory • Process start event now captures new process environment variables and current directory:

  12. The Case of the Slow IE Download Bar • User experienced 40 second delay for IE’s download bar to appear after clicking on a download link • Ran IE with no addons: no change in behavior • Captured a Process Monitor trace of hang

  13. The Case of the Slow IE Download Bar(Cont) • Used Count Occurrences dialog to look for errors • Saw BAD NETWORK PATH: • Error were references to offline media center system:

  14. The Case of the Slow IE Download Bar: Solved • Saw references to media center in download manager because of previous downloads: • Deleted references in download manager: problem solved

  15. The Case of the Hanging Paypal Emails • User started getting Outlook hangs of up to a minute when clicking on Paypal payment notification emails • Captured a Process Monitor trace • Added Duration column: • One event stood out with 3 second duration • Query of file share via IP address

  16. The Case of the Hanging Paypal Emails (Cont) • Web search revealed IP address belongs to Web statistics company Omniture • But no IP address visible in email and image download disabled:

  17. The Case of the Hanging PaypalEmails: Solved • Looked at email source code and found domain name: • Outlook interprets reference as file server • Contacted Microsoft: not a security issue • Contacted Paypal: fixed email formats • In the meantime, added hosts file entry: problem solved

  18. Outline • Application Hangs • Sluggish Performance • Error Messages • Blue Screens

  19. Process Explorer • Process Explorer is a Task Manager replacement • You can literally replace Task Manager with Options->Replace Task Manager • Hide-when-minimized to always have it handy • Hover the mouse to see a tooltip showing the process consuming the most CPU • Open System Information graph to see CPU usage history • Graphs are time stamped with hover showing biggest consumer at point in time • Also includes other activity such as I/O, kernel memory limits

  20. Process Explorer v15: GPU Monitoring and Windows 8 • Captures GPU utilization and memory usage • System-wide • Per-Process

  21. Process Explorer v15.2 • Process timelines • Autostart locations

  22. The Case of the Runaway Website • For years, Jrun.exe process on web server would sporadically max out a core: • Administrator saw Case of the Unexplained and decided to investigate

  23. Processes and Threads • A process represents an instance of a running program • Address space • Resources (e.g., open handles) • Security profile (token) • A thread is an execution context within a process • Unit of scheduling (threads run, processes don’t run) • All threads in a process share the same per-process address space • The System process is the default home for kernel mode system threads • Functions in OS and some drivers that need to run as real threads • E.g., need to run concurrently with other system activity, wait on timers, perform background “housekeeping” work • Other host processes: svchost, Iexplore, mmc, dllhost

  24. Viewing Threads • Task Manager doesn’t show thread details within a process • Process Explorer does on “Threads” tab • Displays thread details such as ID, CPU usage, start time, state, priority • Start address is where the thread began running (not where it is now) • Click Module to get details on module containing thread start address

  25. Thread Start Functions and Symbol Information • Process Explorer can map the addresses within a module to the names of functions • This can help identify which component within a process is responsible for CPU usage • Configure Process Explorer’s symbol engine: • Download the latest Debugging Tools for Windows from Microsoft (free) • Use dbghelp.dll from the Debugging Tools • Point at the Microsoft public symbol server (or internal symbol server if you have access)

  26. The Case of the Runaway Website (Cont) • Thread start address didn’t reveal anything:

  27. Viewing Call Stacks • Click Stack on the Threads tab to view a thread’s call stack • Note that start address on Threads tab is different than first function shown in stack • This is because all threads created by Windows programs start in a library function in Kernel32.dll which calls the programmed start address

  28. The Case of the Runaway Website (Cont) • Looked at stack and saw Cold Fusion DLL, Cfxneo.dll • No obvious reason for CPU usage • Web search didn’t turn up anything

  29. The Case of the Runaway Website (Cont) • Ran Process Monitor and saw lots of enumeration of a particular key: • Opened key in Regedit: Regedit hung

  30. The Case of the Runaway Website: Solved • Expansion finished after 10 minutes: tens of thousands of subkeys • Searched in ColdFusion documentation and found that key stores browser client state • Option to use Cookies instead • Made configuration change: Problem solved

  31. Outline • Application Hangs • Sluggish Performance • Error Messages • Blue Screens

  32. The Case of the Locked Folder • Company’s users complained of locked folders on their common network share: • Retrying would usually work

  33. The Case of the Locked Folder: Solved • Admin used Process Explorer search and saw that thumbs.db file was in use by Explorer: • Did research and learned that if thumbs.db is present, Explorer would open it • Not clear why it was not closing it in a timely manner • Found group policy that disabled this behavior: • Applied policy: problem solved

  34. The Case of the Missing .PPSX Details • Office 2010 User complained that Details tab of Explorer properties missing for .PPSX documents • Present for .PPS, though .PPS .PPSX

  35. The Case of the Missing .PPSX Details (Cont) • Captured Process Monitor trace opening Explorer properties of both file types • Compared side by side • .PPSX trace had references to SystemFileAssociations\.ppsx • Corresponding key in .PPS trace missing

  36. The Case of the Missing .PPSX Details: Solved • Created key and imported settings for .PPS key • Now could see details, but notedit them • Compared further and found reference to HKCR\.pps missing • Repeated export/import • Problem solved No Edit Fixed

  37. Outline • Application Hangs • Sluggish Performance • Error Messages • Blue Screens

  38. Blue Screen Crashes • Windows has various components that run in Kernel Mode, the highest privilege mode of the OS • OS components: Ntoskrnl.exe, Hal.dll • Drivers: Ntfs.sys, Tcpip.sys, device drivers • Kernel-mode components are privileged extensions to the OS have to adhere to various rules • Not accessing invalid memory • Accessing memory at the right “Interrupt Request Level” • Not causing resource deadlocks • When a kernel-mode component performs an illegal operation, Windows crashes (blue screens) • Crashing helps preserve the integrity of user data • A resource deadlock can hang the system

  39. Online Crash Analysis • When you reboot after a crash, Windows offers to upload it to Microsoft Online Crash Analysis (OCA) • Automated server generates a thumbprint of the crash and uses it as a key in a database • If the database has an entry, the user is told the cause and directed at a fix

  40. Basic Crash Dump Analysis • Many times OCA doesn’t know the cause: • Basic crash dump analysis is easy and it might tell you the cause • Requires Windbg and symbol configuration • Dump files are in either: • \Windows\Memory.dmp: Vista+ and servers • \Windows\Minidump: Windows 2000 Pro, Windows XP, Vista+

  41. The Case of the Crashing Hyper-V Systems • Hyper-v cluster started having random crashes • Saw Case of the Unexplained so opened minidump • Executed !analyze –v:

  42. The Case of the Crashing Hyper-V Systems: Solved • Searched for stop code: • Found KB article with fix that matched symptoms: problem solved

  43. The Case of the GFI Backup Crash • Admin updated GFI backup from 2009 to 2011 version • Reproducible system crash (BSOD) when backing up to a network location after ~1 minute • Analyzed crash dump with Windbg:

  44. The Case of the GFI Backup Crash (Cont) • Checked online information on ncfsd.sys • First hits with Googlelink to bogus websites, trying to convince you to run their code in order to “fix“ your “infection“:

  45. The Case of the GFI Backup Crash (Cont) • Second hit reported crash could be result of low disk space: • Freed up 20 Gb of disk space • Time to crash took longer (~3 minutes)

  46. The Case of the GFI Backup Crash: Solved • Another hit pointed at Novell Client driver as the problem: • Disabled Novell service • Problem persisted • Unistalled Novell client because it was not really needed: problem solved

  47. Summary and More Information • A few basic tools and techniques can solve seemingly impossible problems • I learn by always trying to determine the root cause • Resources: • Sysinternals Administrator’s Reference • Webcasts of two previous “Case of the Unexplained “ talked • Sysinternals->Mark’s Webcasts • My blog • Windows Internals: understand the way the OS works • If you’ve solved one, send me a description, screenshots and log files!

  48. Windows 8 Bluescreens

More Related