1 / 24

CLR Reliability under Memory Exhaustion

CLR Reliability under Memory Exhaustion. Solomon Boulos. Temporary Memory Exhaustion causes failures. Out of Memory (OOM) is temporary Shouldn’t cause failure Just wait for memory to become available System take action to free up memory All managed code depends on CLR Testing is difficult

Download Presentation

CLR Reliability under Memory Exhaustion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLR Reliability under Memory Exhaustion Solomon Boulos Windows Reliability Team

  2. Temporary Memory Exhaustion causes failures • Out of Memory (OOM) is temporary • Shouldn’t cause failure • Just wait for memory to become available • System take action to free up memory • All managed code depends on CLR • Testing is difficult • Exceptions are objects • Boxing (casting value type to object) • JIT compilation Windows Reliability Team

  3. Overview • Previous Work • Reliability Working Group • Improvements for Whidbey • OOM behavior • Everett (CLR v1.1) • Whidbey (CLR v2.0) • WinFX • Solutions • Transactions • Recovery Windows Reliability Team

  4. Reliability Working Group • Discussion of CLR reliability issues • Interaction with Yukon and Avalon teams • FailFast Behavior • Controversial Decisions • Fault Injection Windows Reliability Team

  5. Improvements for Whidbey • CLR hardened to Out of Memory (OOM) • Constrained Execution Regions (CERs) • Eagerly Prepared (No JIT Compiling) • Blocks ThreadAbort • Reliability Contracts • Describes reliability attributes of code • Allows for function calls within CER • Unhandled Exception Policy Windows Reliability Team

  6. My Approach • Exhaust Memory (Not fault injection) • Find failure points • Consistently reproduce results • Examine underlying causes • Develop solutions Windows Reliability Team

  7. Everett OOM Behavior • Different classes of failures • Catchable Out of Memory (OOM) Exception • Type Initialization Exception • Invalid Program exception from JIT compiler • Fatal OOM Error • Fatal Execution Engine error Windows Reliability Team

  8. Supporting Data void ManagedFunction() { Regex* myReg = new Regex("*"); } Windows Reliability Team

  9. Fault Injection Example staticvoid Main(string[] args) { try { // operations in here } catch ( OutOfMemoryException ) { Console.WriteLine(“Nothing should get past me."); } } Windows Reliability Team

  10. Whidbey OOM Behavior • See OOM Exception instead of • TypeInit • InvalidProgram • Exception to Native host is COMPlusException • Not very helpful • Fatal OOM only during initialization • Initialization can be large though (e.g. 10MB) • CERs provide defense, but dangerous • CER { for (;;) } cannot be stopped • Reliability Contracts = Honor System Windows Reliability Team

  11. WinFX Case Studies • Swallows exceptions • Shell • Crashes and restarts • WinFS • Silent Process Failure • Indigo • False Completion WinFX Whidbey Base OS Base OS Windows Reliability Team

  12. Shell Failure • Exhaust System Memory • CLR throws OOM Exception • Shell doesn’t catch • Escalates to unhandled Win32 exception • Shell crashes and restarts • Major disruption to user Windows Reliability Team

  13. WinFS Test • Simple Contact Store Functions • AddContact • RenameContact • RemoveContact • ListContacts • ReachMemory Windows Reliability Team

  14. WinFS Test Normal Execution • ListContacts() : “No Contacts Found” • AddContact(“Shane”) : Shane is added • ListContacts(): “Shane” • RenameContact(“Shane”, “Bob”): Shane is now Bob • ListContacts(): “Bob” • RemoveContact(“Bob”): Bob is now deleted • ListContacts(): “No Contacts Found” Windows Reliability Team

  15. WinFS Test Stressed Execution • ListContacts() : “No Contacts Found” • ReachMemory(8MB): 8MB Available • AddContact(“Shane”) : Shane should be added • ListContacts(): “No Contacts Found” • Process Exits Windows Reliability Team

  16. Indigo Test Specifications • Client::SendMessage(): • Sends message to server and prints confirmation of sending. • Client::ReceiveMessage(): • Prints received message. • Server::SendMessage(): • Sends message to client and prints confirmation of sending. • Server::ReceiveMessage(): • Prints message and responds with SendMessage() Windows Reliability Team

  17. Indigo Test Behavior • Normal Execution • Client::SendMessage() • Server::ReceiveMessage() • Server::SendMessage() • Client::ReceiveMessage() • Execution with Memory Pressure • Client::SendMessage() • Server::ReceiveMessage() • Server::ExhaustMemory() • Server::SendMessage() • Client never receives message Windows Reliability Team

  18. Solutions • Transactions • In Memory • Durable (backed by disk) • Recovery • Creates Recovery Log • Allows state restore Windows Reliability Team

  19. Transaction Participant public TransactionParticipant(String _originalValue) { originalValue = _originalValue; result = originalValue; } publicvoid Prepare(IPreparingEnlistment pe) { // do work for transaction result = "New Value"; // all is well, vote prepared pe.Prepared(); } Windows Reliability Team

  20. Transaction Participant Continued publicvoid Commit(IEnlistment e) { // no work to do, vote done e.EnlistmentDone(); } publicvoid Rollback(IEnlistment e) { // restore originalValue result = originalValue; if ( null != e ) e.EnlistmentDone(); } Windows Reliability Team

  21. Simple Transaction Example TransactionParticipant tp = new TransactionParticipant(txtInput.Text); try { using (TransactionScope s = new TransactionScope()){ Transaction.Current.VolatileEnlist(tp,false); s.Consistent = true; } } catch (TransactionAbortedException){} txtInput.Text = tp.Result; Windows Reliability Team

  22. rNotepad Techniques • Log user work • KeyPressed Records • Resize Records • Write work to log file every second • Write checkpoint every 30 seconds • Upon startup, recover • Checkpoint speeds up recovery Windows Reliability Team

  23. Conclusion • Testing is difficult but possible • Temporary memory pressure shouldn’t cause failures • Transactions and Recovery can provide resilient and recoverable solutions Windows Reliability Team

  24. Questions? • More info at http://windows/sites/reliavuls/CLR/default.aspx Windows Reliability Team

More Related