240 likes | 378 Views
CLR Reliability under Memory Exhaustion. Solomon Boulos. Temporary Memory Exhaustion causes failures. Out of Memory (OOM) is temporary Shouldn’t cause failure Just wait for memory to become available System take action to free up memory All managed code depends on CLR Testing is difficult
E N D
CLR Reliability under Memory Exhaustion Solomon Boulos Windows Reliability Team
Temporary Memory Exhaustion causes failures • Out of Memory (OOM) is temporary • Shouldn’t cause failure • Just wait for memory to become available • System take action to free up memory • All managed code depends on CLR • Testing is difficult • Exceptions are objects • Boxing (casting value type to object) • JIT compilation Windows Reliability Team
Overview • Previous Work • Reliability Working Group • Improvements for Whidbey • OOM behavior • Everett (CLR v1.1) • Whidbey (CLR v2.0) • WinFX • Solutions • Transactions • Recovery Windows Reliability Team
Reliability Working Group • Discussion of CLR reliability issues • Interaction with Yukon and Avalon teams • FailFast Behavior • Controversial Decisions • Fault Injection Windows Reliability Team
Improvements for Whidbey • CLR hardened to Out of Memory (OOM) • Constrained Execution Regions (CERs) • Eagerly Prepared (No JIT Compiling) • Blocks ThreadAbort • Reliability Contracts • Describes reliability attributes of code • Allows for function calls within CER • Unhandled Exception Policy Windows Reliability Team
My Approach • Exhaust Memory (Not fault injection) • Find failure points • Consistently reproduce results • Examine underlying causes • Develop solutions Windows Reliability Team
Everett OOM Behavior • Different classes of failures • Catchable Out of Memory (OOM) Exception • Type Initialization Exception • Invalid Program exception from JIT compiler • Fatal OOM Error • Fatal Execution Engine error Windows Reliability Team
Supporting Data void ManagedFunction() { Regex* myReg = new Regex("*"); } Windows Reliability Team
Fault Injection Example staticvoid Main(string[] args) { try { // operations in here } catch ( OutOfMemoryException ) { Console.WriteLine(“Nothing should get past me."); } } Windows Reliability Team
Whidbey OOM Behavior • See OOM Exception instead of • TypeInit • InvalidProgram • Exception to Native host is COMPlusException • Not very helpful • Fatal OOM only during initialization • Initialization can be large though (e.g. 10MB) • CERs provide defense, but dangerous • CER { for (;;) } cannot be stopped • Reliability Contracts = Honor System Windows Reliability Team
WinFX Case Studies • Swallows exceptions • Shell • Crashes and restarts • WinFS • Silent Process Failure • Indigo • False Completion WinFX Whidbey Base OS Base OS Windows Reliability Team
Shell Failure • Exhaust System Memory • CLR throws OOM Exception • Shell doesn’t catch • Escalates to unhandled Win32 exception • Shell crashes and restarts • Major disruption to user Windows Reliability Team
WinFS Test • Simple Contact Store Functions • AddContact • RenameContact • RemoveContact • ListContacts • ReachMemory Windows Reliability Team
WinFS Test Normal Execution • ListContacts() : “No Contacts Found” • AddContact(“Shane”) : Shane is added • ListContacts(): “Shane” • RenameContact(“Shane”, “Bob”): Shane is now Bob • ListContacts(): “Bob” • RemoveContact(“Bob”): Bob is now deleted • ListContacts(): “No Contacts Found” Windows Reliability Team
WinFS Test Stressed Execution • ListContacts() : “No Contacts Found” • ReachMemory(8MB): 8MB Available • AddContact(“Shane”) : Shane should be added • ListContacts(): “No Contacts Found” • Process Exits Windows Reliability Team
Indigo Test Specifications • Client::SendMessage(): • Sends message to server and prints confirmation of sending. • Client::ReceiveMessage(): • Prints received message. • Server::SendMessage(): • Sends message to client and prints confirmation of sending. • Server::ReceiveMessage(): • Prints message and responds with SendMessage() Windows Reliability Team
Indigo Test Behavior • Normal Execution • Client::SendMessage() • Server::ReceiveMessage() • Server::SendMessage() • Client::ReceiveMessage() • Execution with Memory Pressure • Client::SendMessage() • Server::ReceiveMessage() • Server::ExhaustMemory() • Server::SendMessage() • Client never receives message Windows Reliability Team
Solutions • Transactions • In Memory • Durable (backed by disk) • Recovery • Creates Recovery Log • Allows state restore Windows Reliability Team
Transaction Participant public TransactionParticipant(String _originalValue) { originalValue = _originalValue; result = originalValue; } publicvoid Prepare(IPreparingEnlistment pe) { // do work for transaction result = "New Value"; // all is well, vote prepared pe.Prepared(); } Windows Reliability Team
Transaction Participant Continued publicvoid Commit(IEnlistment e) { // no work to do, vote done e.EnlistmentDone(); } publicvoid Rollback(IEnlistment e) { // restore originalValue result = originalValue; if ( null != e ) e.EnlistmentDone(); } Windows Reliability Team
Simple Transaction Example TransactionParticipant tp = new TransactionParticipant(txtInput.Text); try { using (TransactionScope s = new TransactionScope()){ Transaction.Current.VolatileEnlist(tp,false); s.Consistent = true; } } catch (TransactionAbortedException){} txtInput.Text = tp.Result; Windows Reliability Team
rNotepad Techniques • Log user work • KeyPressed Records • Resize Records • Write work to log file every second • Write checkpoint every 30 seconds • Upon startup, recover • Checkpoint speeds up recovery Windows Reliability Team
Conclusion • Testing is difficult but possible • Temporary memory pressure shouldn’t cause failures • Transactions and Recovery can provide resilient and recoverable solutions Windows Reliability Team
Questions? • More info at http://windows/sites/reliavuls/CLR/default.aspx Windows Reliability Team