1 / 18

Performance and Production Debugging

What we'll cover. Quick explanation of WinDBG, and a demo of features.Some real world problems I've used WinDBG to solve.Make fun of Microsoft code.Capturing information for WinDBG.. What is WinDBG. It's a CLI debugger similar to gdb, or OllyDBG.It's been used for over 15 years to debug the NT k

russell
Download Presentation

Performance and Production Debugging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. .NET Performance and Production Debugging i.e. WinDBG and SOS Kellen Sunderland About me: Actually from Edmonton, which is pretty rare in this Industry. Went to school at UofA Work mostly on .NET Enterprise apps as an independent consultant. Doing some corporate iPhone development work for a local company as well (for a change from .NET).About me: Actually from Edmonton, which is pretty rare in this Industry. Went to school at UofA Work mostly on .NET Enterprise apps as an independent consultant. Doing some corporate iPhone development work for a local company as well (for a change from .NET).

    2. What we’ll cover Quick explanation of WinDBG, and a demo of features. Some real world problems I’ve used WinDBG to solve. Make fun of Microsoft code. Capturing information for WinDBG. We’ll jump right into WinDBG soon here so you can see for yourself what it does. We’ll jump right into WinDBG soon here so you can see for yourself what it does.

    3. What is WinDBG It’s a CLI debugger similar to gdb, or OllyDBG. It’s been used for over 15 years to debug the NT kernel, device drivers, and .NET code. It works with .NET using an extension called SOS. Works with Memory Dumps (Captures). Demo. WinDBG: Ask if anyone has used it before. SOS: SOS is a dll that needs to be loaded into WinDBG. This dll has to match your framework version, and bitness of your target machine. DEMO: Make sure Debug Diag is turned off. Talk about how we’re keeping it simple and using task manager. Show off how to add PIDs and look at User Name. Talk about how this is what a ‘normal’ program will look like. How it shows internals Attach to the pid and break. Show how you can hit g , ctrl-break, !clrstack, ~#s Talk about how commands don’t really matter .. That’s why you have a cheat sheet. In an hour it’s impossible to pick up the commands. Focus on the problems you can solve. Show some ‘stepping’. Click a link, show what the stack looks like now. Run !clrstack !dumpheap –stat !dumpheap –mt !synkblk !runaway Show dumpheap –stat and what exceptions are already there (talk about stack overflow exeception). Dumpmt –md address (talk about how you can learn a lot about the internals of .net). Quickly show that you can get a mem dump via task man, load it in windbg. Ask if anyone wants to know more about any of the other commands. WinDBG: Ask if anyone has used it before. SOS: SOS is a dll that needs to be loaded into WinDBG. This dll has to match your framework version, and bitness of your target machine. DEMO: Make sure Debug Diag is turned off. Talk about how we’re keeping it simple and using task manager. Show off how to add PIDs and look at User Name. Talk about how this is what a ‘normal’ program will look like. How it shows internals Attach to the pid and break. Show how you can hit g , ctrl-break, !clrstack, ~#s Talk about how commands don’t really matter .. That’s why you have a cheat sheet. In an hour it’s impossible to pick up the commands. Focus on the problems you can solve. Show some ‘stepping’. Click a link, show what the stack looks like now. Run !clrstack !dumpheap –stat !dumpheap –mt !synkblk !runaway Show dumpheap –stat and what exceptions are already there (talk about stack overflow exeception). Dumpmt –md address (talk about how you can learn a lot about the internals of .net). Quickly show that you can get a mem dump via task man, load it in windbg. Ask if anyone wants to know more about any of the other commands.

    4. When should I use WinDBG? Only use WinDBG as a last resort You’d usually only use it with a soul crushing production problem. It typically comes down to four scenarios: memory leaks, high-cpu hangs, low-cpu hangs, and impossible to reproduce exceptions. (after logging, event logs, visual studio, procmon etc.) Generally high stress scenarios.(after logging, event logs, visual studio, procmon etc.) Generally high stress scenarios.

    5. Real World Problem #1 2 year project 20 team members, 7 devs. First time I came across WinDBG and Tess’s Blog. Asp.NET 2.0 Web Forms App using MVP pattern. Symptoms: Mem and CPU usage grew over time and load. My problems are going to be almost ‘Case Studies’. There’s a fantastic set of tutorials by Tess Ferrendez on your sheets. Most of her examples are theoretical. They don’t feel ‘real’ to me so I thought I’d give lots of background information on our problems for a little more context. Eventually we figured out how we could get it to reliably crash. This gave us a native exception code to Google for. CPU, Memory grew with prolonged use. We had a problem where performance would degrade over time, CPU and Mem usage would both increase with load. We managed this by recycling the app pool. Logs, and Event Logs provided no real help, we added more and more logging, commented code etc. Worst case scenario here, angry users, project delayed. My problems are going to be almost ‘Case Studies’. There’s a fantastic set of tutorials by Tess Ferrendez on your sheets. Most of her examples are theoretical. They don’t feel ‘real’ to me so I thought I’d give lots of background information on our problems for a little more context. Eventually we figured out how we could get it to reliably crash. This gave us a native exception code to Google for. CPU, Memory grew with prolonged use. We had a problem where performance would degrade over time, CPU and Mem usage would both increase with load. We managed this by recycling the app pool. Logs, and Event Logs provided no real help, we added more and more logging, commented code etc. Worst case scenario here, angry users, project delayed.

    6. Demo Say mem usage can bounce around a little Explain garbage collection (use zoomit). Show the site a little bit Run tinyget Talk about the ‘magic number’ Show the mem usage in task man Take a memory snapshot Debug with gcroot Say mem usage can bounce around a little Explain garbage collection (use zoomit). Show the site a little bit Run tinyget Talk about the ‘magic number’ Show the mem usage in task man Take a memory snapshot Debug with gcroot

    7. Really it was MSDN’s fault. Problem was a memory leak as a result of a EventHandler subscription keeping every object ever created in scope. Garbage Collection then went crazy causing CPU spike. With event handlers you can also get code called multiple times causing high CPU. I’ve seen this many times it often happens with IOCs, or Messaging structures. If you’re subscribing to any kind of event, sending messages and you’re working with a larger context (static) be very careful. With event handlers you can also get code called multiple times causing high CPU. I’ve seen this many times it often happens with IOCs, or Messaging structures. If you’re subscribing to any kind of event, sending messages and you’re working with a larger context (static) be very careful.

    8. http://msdn.microsoft.com/en-us/library/ms178425.aspx We had multiple headers, in our case it was completely buried in a BasePage. This is why commenting cs code out didn’t work. I think it was being used to update breadcrumbs to use french if the user was in french mode.We had multiple headers, in our case it was completely buried in a BasePage. This is why commenting cs code out didn’t work. I think it was being used to update breadcrumbs to use french if the user was in french mode.

    9. After the fix Worked great. We had profiled the code so much that now it could easily run with 15,000 unique visitors a day. We got to keep our jobs.

    10. Real World Problem #2 Larger, more important system. System to collect student information at the start of the year. It had some very hard deadlines. This system had some dependencies, it had some business process depending on it. Would have disrupted funding for schools. This system had some dependencies, it had some business process depending on it. Would have disrupted funding for schools.

    11. Real World Problem #2 cont. We had 24 cores on 3 prod servers. These servers would run our app fine for a day or so, and then crash all at the same time. The app would then take about 45 minutes to come back online. All the servers crashing at the same time was a new one for me. Kept servers in reserve as a work around (had 5-6 running at once). Ended up calling Microsoft (but thankfully we had good dump files).All the servers crashing at the same time was a new one for me. Kept servers in reserve as a work around (had 5-6 running at once). Ended up calling Microsoft (but thankfully we had good dump files).

    12. Demo For the set up for this demo I’d just like to show a little bit of how asp.net handles exceptions. Show what happens when there’s a normal exception. Show 200,000 requests Not great but it will only show up on one users' screen. You can handle this behavior with custom error screens, better logging, message to the user. Show event logs (nothing logged). Show 100,000 Show how the process went down. Show event logs. Start the process back up. Recycle it in IIS manager Set up Debug Diag. Show the ie / firefox connection error screen This is what happens when a tcp connection is closed. With default load balancer behavior what happens when it sees a tcp connection close? Could be a power outage, hardware failure It’s going to send you right to the next server. This means you can get a ‘killer request’ that takes down your entire load balanced environment. Show the threads in windbg. Pretty obvious what the problem is. Analyze with debugdiag. Quick and easy, doesn’t require a call to microsoft. You can also load VS2010 .net 4.0 memory dumps which would tell you the problem (although it often doesn’t work for me). For the set up for this demo I’d just like to show a little bit of how asp.net handles exceptions. Show what happens when there’s a normal exception. Show 200,000 requests Not great but it will only show up on one users' screen. You can handle this behavior with custom error screens, better logging, message to the user. Show event logs (nothing logged). Show 100,000 Show how the process went down. Show event logs. Start the process back up. Recycle it in IIS manager Set up Debug Diag. Show the ie / firefox connection error screen This is what happens when a tcp connection is closed. With default load balancer behavior what happens when it sees a tcp connection close? Could be a power outage, hardware failure It’s going to send you right to the next server. This means you can get a ‘killer request’ that takes down your entire load balanced environment. Show the threads in windbg. Pretty obvious what the problem is. Analyze with debugdiag. Quick and easy, doesn’t require a call to microsoft. You can also load VS2010 .net 4.0 memory dumps which would tell you the problem (although it often doesn’t work for me).

    13. This time it was our fault A certain user would perform an action which got logged. We were creating a ‘User’ from our Thread.CurrentPrincipal in order to include some information with the log. Our Users got assigned to an organization. We had organizations stored in SQL, and one org had a trailing space. When our User class saw there was a invalid organization it Logs that info. Now you’d think there would be thousands of iterations of this that could happen. That’s actually not the case because exceptions force the stack to be unrolled. This exponentially increases stack frame size.A certain user would perform an action which got logged. We were creating a ‘User’ from our Thread.CurrentPrincipal in order to include some information with the log. Our Users got assigned to an organization. We had organizations stored in SQL, and one org had a trailing space. When our User class saw there was a invalid organization it Logs that info. Now you’d think there would be thousands of iterations of this that could happen. That’s actually not the case because exceptions force the stack to be unrolled. This exponentially increases stack frame size.

    14. Resolution We had 3-4 guys working nonstop on the problem. Eventually called ‘Kyle’ at Microsoft. He found the problem in 3 hours. We saved Kyle a bunch of time by having a whole series of memory dumps that he could look at. Being able to reproduce the problem can be 90% of the challenge when you work with Microsoft. If you know enough to be able to get a mem dump at the right time you can save days of microsoft time. It can be as simple as knowing you can right click save mem dump.We saved Kyle a bunch of time by having a whole series of memory dumps that he could look at. Being able to reproduce the problem can be 90% of the challenge when you work with Microsoft. If you know enough to be able to get a mem dump at the right time you can save days of microsoft time. It can be as simple as knowing you can right click save mem dump.

    15. Other problems I’ve seen Loggers often have problems. Same goes for SQL Connections in general. Be careful around any type of subscription including messaging systems, IOCs. Loggers: Make sure if you’re logging to an unreliable place like a file share or SQL table you don’t cause a stack overflow. Good article and how not to log. Loggers: Make sure if you’re logging to an unreliable place like a file share or SQL table you don’t cause a stack overflow. Good article and how not to log.

    16. Other problems continued Be careful around resource contention. Don’t block quick running database queries with long running ones.

    17. More WinDBG We’ve only scratched the surface. A little WinDBG knowledge can go a long way. Tess’s Blog Advanced .NET Debugging (Greg) is a good week long course. Just knowing the 4 types of problems you can solve, high hang, low hang, mem leak and crashes can lead you down the right path. After taking the course and doing the tutorials you’ll still only know about 5% of what the PFEs know so call them!Just knowing the 4 types of problems you can solve, high hang, low hang, mem leak and crashes can lead you down the right path. After taking the course and doing the tutorials you’ll still only know about 5% of what the PFEs know so call them!

    18. That’s it. Thanks for coming. Any more questions?

More Related