1 / 34

Writing Rock-Solid Reliable Applications For Windows Vista And The CLR

Writing Rock-Solid Reliable Applications For Windows Vista And The CLR. Björn Levidow, Group Program Manager Brian Grunkemeyer, Software Design Engineer FUN308 Microsoft Corporation BjornL@microsoft.com BrianGru@microsoft.com. What You Will See. Customer-Focused Reliability Attributes

zack
Download Presentation

Writing Rock-Solid Reliable Applications For Windows Vista And The CLR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Writing Rock-Solid Reliable Applications For Windows Vista And The CLR Björn Levidow, Group Program Manager Brian Grunkemeyer, Software Design Engineer FUN308 Microsoft Corporation BjornL@microsoft.com BrianGru@microsoft.com

  2. What You Will See • Customer-Focused Reliability Attributes • Windows Vista and CLR reliability goals • Windows Vista and CLR reliability features • Detailed resiliency discussion • Features and Tools • Summary • Call to Action The Microsoft Platform affords developing reliable applications, both native and managed

  3. Customer-Focused Reliability Attributes Attribute Definition Examples The system continues to provide service in the face of internal or external disruptions Resilient crashes, hangs … After disruption the system is easily restored to a previously known state with no data loss data corruption Recoverable Provides timely and expected service whenever needed Controlled degraded response Required changes and upgrades do not impact the service update disruptions Undisruptable At release the system contains a minimum number of bugs, requiring a limited number of predictable patches/fixes ProductionReady patch size, frequency It works as advertised, what worked before works now compatibility failures Predictable

  4. Requires Application Design Consideration • Process/App Domain Recycling • SafeHandle Resilient crashes, hangs … • Transactional file system/Registry • Common log file system data corruption Recoverable • Resource Exhaustion Diagnostics • I/O cancellation Controlled degraded response • Restart Manager update disruptions Undisruptable • /Analyze, Safe C++ libraries, FxCop • App Verifier, Managed Debugging Assistant ProductionReady patch size, frequency OS or CLR features to plug into your app compatibility failures Predictable Good versioning and installation practices Addressing Customer-Focused Reliability Attributes

  5. Windows Vista Reliability Objectives • No loss of work, time, data or control • No Hangs, No Crashes, No Reboots • Reducing user disruptions and increasing availability • How we raised the bar on Windows Vista reliability • New processes to minimize bugs and design issues • Enhanced feedback using Windows Error Reporting for identifying product problems during development • New reliability features

  6. CLR Reliability Objectives • Write resilient applications • Improve application availability • Reduce user disruptions and increasing availability • Resiliency against failures, crashes and hangs • Availability is great today. Let’s make it even better • How we raised the bar on CLR reliability • Tested product with fault injection • New reliability features • Hardened managed libraries

  7. How Much Reliability Do I Need?Different bars for different environments • Reliability of most software meets customer needs • A few bad apples spoil the overall experience • Reliability needs differ based on your application • Console applications and simple apps like calc.exe • Sophisticated application (Word, Photoshop) • Library code • Highly available server code • Library code’s reliability bar is dictated by the applications that use the library • Car

  8. Writing Reliable CodeReliability Has A Cost • Writing reliable unmanaged code takes work • Requires discipline to handle out of memory problems • Failures in multi-threaded apps are hard to handle • Requires extensive testing (fault injection, stress runs) • Writing reliable managed code takes work • Under the covers, the CLR manages your code • Eliminates entire classes of bugs, like dangling pointers, memory leaks, most buffer overruns, etc. • However, CLR-induced failure points aren’t obvious • Asynchronous exceptions: OutOfMemoryException and ThreadAbortException

  9. Customer-Focused Reliability Attributes Attribute Definition Examples The system continues to provide service in the face of internal or external disruptions Resilient crashes, hangs … Recoverable Controlled Undisruptable ProductionReady Predictable

  10. How Do We Get Resiliency?Resiliency Approaches • Isolated extensibility models • Keep extensions in their own process space • Enables recycling • Process Recycling • Operating System resources are guaranteed to be freed • Relatively cheap and relatively easy • Requires a stateless, almost transactional model

  11. Process RecyclingHosted programming model example • ASP.NET hosts applications • Uses process recycling for resiliency • Worker processes may encounter a resource leak or deadlock, and the host will kill them • Bugs could be anywhere in the process • Server is resilient to these failures • Session state must live in a database or out-of-proc • In-process session state is lost. Controllable via web.config • Cheap and good enough for a web server

  12. SQL Server Process AppDomain 3 Default AppDomain AppDomain 2 AppDomain RecyclingAnother hosted programming model • Application Domains are a unit of isolation • Static variables are per-appdomain • Avoid* mutating any cross-AD or cross-process state • SQL unloads and recycles AppDomains • Mitigates state corruption • Higher availability • SQL is transacted => no database corruption • Operating System (OS) resources must be freed, but the OS is AD-ignorant Appdomain unloading must be clean!

  13. Problems For Hosted CodeHow does a host hurt your reliability? • Hosted libraries make tradeoffs to guarantee availability • Thread aborts between two machine instructions • OutOfMemoryExceptions more common when hosted • Typical cleanup techniques aren’t guaranteed! • Finalizers and finally’s may be aborted • Hosted managed libraries should be hardened • Prevent leaking resources in aggressive hosts • Using hardened code is very forgiving call native int CreateFile(…) stloc.2 IntPtr handle = CreateFile(…);

  14. SafeHandleReliably releasing a handle • A reliable, convenient wrapper for OS handles • CLR guarantees your release code will run • Critical finalization • Benefits • Avoids races with your own finalizer • Reduced object graph promotion during GC • Type-safe manipulation of handles • Small perf costs • Another 20 bytes on x86, 32 bytes on 64 bit • Ref count when a thread is actively using a SafeHandle

  15. SafeHandle Demo Brian Grunkemeyer Software Development Engineer Common Language Runtime

  16. Constrained Execution RegionsLimited guaranteed execution • For building hosts and changing cross-AD state • Hoist CLR-induced failures and delay thread aborts • Constraints on your code • Only call methods with reliability contracts • No allocations, virtual calls, acquiring locks, etc. • Perf and complexity cost RuntimeHelpers.PrepareConstrainedRegions(); try { // Arbitrary code: may fail } finally { // Constrained code: No virtual calls or allocs }

  17. When To Use SafeHandle And CER’s • Use SafeHandles when • Libraries hosted in environments using appdomain recycling • Anyone using P/Invoke to acquire OS resources • Use CER’s when • Hosted code that manipulates cross-appdomain or cross-machine state • Still need to design for a power failure • Corner cases that SafeHandle doesn’t support • Marshaling out handles stored in a struct

  18. Customer-Focused Reliability Attributes Attribute Definition Examples Resilient After disruption the system is easily restored to a previously known state with no data loss data corruption Recoverable Controlled Undisruptable ProductionReady Predictable

  19. Writing Recoverable Applications • Writing bug free apps is Nirvana, but… • Nobody’s perfect  • Not all software controls nuclear power plants • Even if you get there, external factors affect you • Software installs, resource exhaustion, power failures • User uses your app in an unexpected way • So, writing recoverable apps is necessary • Expect the unexpected! • Apps should be journaled and designed to recover • Use transactions and journaling to persist data • Save data and state most important to your applications • Word is a good example • Saves user docs ever 3 minutes to minimize loss • Document recovery as well

  20. Transactions And JournalingTools to help build recoverable apps • Win32 • File and Registry Transactions (TxF) • Common Log File System (CLFS) • Managed • System.Transactions SetCurrentTransaction(HANDLE hTransaction) using (TransactionScope scope = new TransactionScope( TransactionScopeOption.Required,          new TransactionOptions(),EnterpriseServicesInteropOption.Full)) {    if (!EnterTransactionScope()) throw new TransactionException(“Bad");    // Write to one or many files, etc.    if (!ExitTransactionScope()) throw new TransactionException(“Bad");    scope.Complete(); }

  21. Customer-Focused Reliability Attributes Attribute Definition Examples Resilient Recoverable Provides timely and expected service whenever needed Controlled degraded response Undisruptable ProductionReady Predictable

  22. Resource Exhaustion Diagnosis • Give users control of their system by allowing them to take action before a low resource condition impacts them • Automatic detection and diagnosis of near-exhaustion of commit limit and memory leaks on client SKUs • Provide options for manual and automatic resolution to avoid exhaustion • Impact on Windows Vista applications • If GUI app uses lots of VM, will show up on list of applications to be closed by user • If service or CMD app, will be shut down by Windows when exhaustion has been hit • What you need to do • Be mindful of memory utilization: e.g. trim working set when unused

  23. I/O Cancellation Support • Apps shouldn’t hang • Apps should provide a cancel button • Ever see Outlook hang while downloading mail? • New Win32 Cancellation APIs for Windows Vista • Cancel specific async I/O requests for file handle • Cancel synchronous requests from another thread • No managed support until “Orcas” • Look for the CancellationRegion class • Caveats • Operation is only marked for cancellation • Some “meta APIs” aren’t cancelable: (e.g. CopyFile. Use CopyFileEx) • Slightly tricky to use CancelIoEx(HANDLE hFile, LPOVERLAPPED lpOverlap) CancelSynchronousIO(HANDLE hThread)

  24. Customer-Focused Reliability Attributes Attribute Definition Examples Resilient Recoverable Controlled Required changes and upgrades do not impact the service update disruptions Undisruptable ProductionReady Predictable

  25. Minimize Reboots When Installing Software • Use the Restart Manager APIs • Shuts down only required apps and services • Automatically detect and shutdown services in shared processes with a file in use • Prevents the need for a machine restart after apps or services have been shutdown • Groups application, service and machine restarts • Design app “freeze-dry” functionality to return user to the state they were in before the restart • Use P/Invoke for managed applications RegisterApplicationRestart( GetCommandLine(), 0 ); // Native Users experience minimum disruptionfor application and patch installs for your application

  26. Customer-Focused Reliability Attributes Attribute Definition Examples Resilient Recoverable Controlled Undisruptable At release the system contains a minimum number of bugs, requiring a limited number of predictable patches/fixes ProductionReady patch size, frequency Predictable

  27. Windows Error Reporting During Development • Errors are reported to Microsoft in real-time by customer choice (crashes, hangs) • Automatic analysis and signature matching to known issues • Problems available to registered developers through the Developer Portal • Known fixes provided to customers in real-time • API’s for failing quickly and reporting an error • Or, simply let an exception go unhandled, in both managed and native Environment.FailFast(String reason); // Managed “panic button”

  28. Reliability Best Practices • If crash occurs, report the issue via Windows Error Reporting • Don’t use the IsBadWritePtr family of APIs • Turns debuggable crash into silent process exit • Replace the API with a simple `if (p == NULL)` check • Write multi-threaded code correctly • Use synchronization primitives for stopping and pausing threads • Don’t call TerminateThread • Avoid calling Thread.Abort • Don’t call Thread.Suspend

  29. Recommended Tools For Making Code Production Ready • Unmanaged • Safe C++ Libraries (CRT, MFC, ATL) • C++ Compiler static analysis (/analyze) • C++ Compiler’s buffer overrun cookie (/GS) • Application Verifier • Managed • FxCop • Managed Debugging Assistants

  30. Summary The Microsoft Platform affords developing reliable applications, both native and managed • What is Reliability? • Customer taxonomy • Windows Vista and CLR reliability goals • Windows Vista and CLR reliability features • Detailed resiliency discussion • Features and Tools

  31. Call To Action • Design for resiliency as discussed • Use SafeHandle to free OS handles • Use Windows Vista’s transactions for recoverability • Use Windows Vista’s new Restart Manager API’s to minimize disruptions • Support cancellation to give users control • Use all the tools at your disposal to make your code production ready • E.g. FxCop, /Analyze, Windows Error Reporting

  32. More InformationManaged Resiliency Features • At PDC • Add-Ins and Versioning - FUN 309: “Designing managed addins for reliability, security, and versioning” w/ Jim Miller • Versioning – FUN 314: “Architecting your apps for the future” • After PDC • High-level overview: http://msdn.microsoft.com/msdnmag/issues/05/10/Reliability/ • SafeHandle: http://blogs.msdn.com/bclteam/archive/2005/03/16/396900.aspx • Constrained Execution Regions: http://blogs.msdn.com/bclteam/archive/2005/06/14/429181.aspx • Chris Brumme’s Hosting & Reliability blog posts: http://blogs.msdn.com/cbrumme/archive/2004/02/21/77595.aspx • http://blogs.msdn.com/cbrumme/archive/2003/06/23/51482.aspx

  33. More InformationWindows Vista reliability features • At PDC • Journaling – FUN034: Improving reliability with the new System.Transactions classes, file system, and registry transactions • Restart Manager and Versioning – FUN222: Windows Vista and "Longhorn" Server: What's New in Windows Installer (MSI) and ClickOnce • Feedback – FUN313: Windows Vista: Improving Quality through Windows Feedback Data • I/O cancellation – FUN302: Programming with Concurrency (Part 1): Concepts, Patterns, and Best Practices • After PDC • http://msdn.microsoft.com/windowsvista/reliability/ • http://www.microsoft.com/technet/windowsvista/webcasts.mspx • Resource Exhaustion: http://www.microsoft.com/technet/windowsvista/evaluate/admin/mntreli.mspx • I/O Cancellation • http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/cancelsynchronousio_func.asp

  34. © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

More Related