940 likes | 1.17k Views
Acceptability-Oriented Computing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology. Traditional View of Correctness. Execution Space. Traditional View of Correctness. Correct Execution. Execution Space. Acceptability View. Acceptability Envelope.
E N D
Acceptability-Oriented Computing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology
Traditional View of Correctness Execution Space
Traditional View of Correctness Correct Execution Execution Space
Acceptability View Acceptability Envelope Correct Execution Execution Space
Acceptability View Acceptability Envelope Correct Execution Acceptable Executions Execution Space
Acceptability View Acceptability Envelope Correct Execution Acceptable Executions Unacceptable Execution Execution Space
Acceptable Execution Acceptability Envelope Correct Execution Execution Space
Fail Stop Execution Acceptability Envelope Correct Execution STOP Execution Space
Safe Exit Execution Acceptability Envelope Correct Execution Safe Exit Point STOP Execution Space
Resilient Computing Execution Acceptability Envelope Correct Execution Repaired Execution Execution Space
Questions • How to identify acceptability envelope? • Set of acceptability properties • Basic properties that any execution must satisfy to be acceptable • How to ensure program stays within envelope? • Acceptability monitoring • Acceptability enforcement
Resilient Computing Execution Acceptability Envelope Correct Execution Repaired Execution Acceptability Monitoring Acceptability Enforcement Execution Space
Proposed Structure Outputs Inputs Core System
Proposed Structure Outputs Inputs Core System Output Filter
Proposed Structure Outputs Inputs Core System Input Filter Output Filter
Proposed Structure Outputs Inputs Core System Input Filter Output Filter Data Structure Repair
Proposed Structure Outputs Inputs Core System Input Filter Output Filter Repair Probe Data Structure Repair
Proposed Structure Output Rectification Control Transfer Outputs Inputs Core System Input Filter Output Filter Repair Probe Data Structure Repair
Proposed Structure Output Rectification Control Transfer Outputs Inputs Core System Input Filter Output Filter Repair Probe Exception Recovery Data Structure Repair
Proposed Structure Response Enforcement Output Rectification Control Transfer Outputs Inputs Core System Input Filter Output Filter Repair Probe Exception Recovery Data Structure Repair
Monitoring and Enforcement Mechanisms • Black Box • Do not affect core • Input/output filters and correlators • White Box – New code and data into core • Gray Box • No change to core program • Can change data structures and control flow • Mechanisms • Procedure call and system call interception • Ptrace interface, mmap to access address space
Reason for Acceptability-Oriented Computing:Difficulty of Delivering Perfect Software • Difficulty in all areas of development effort • Understanding domain, obtaining requirements • Producing specification, developing software • Change Aspiration of Development Process • Accept inevitability of imperfection • Goal is to deliver acceptable program • Augment Development Activities • Identify crucial acceptability properties • Ensure that program does not violate them
Aspiring to Perfection Recognized as Harmful Defocuses development effort • All parts seen as equally important • No formal way to direct development effort to most important parts of code • Produces brittle structure • Each piece of functionality implemented • Once (no redundancy) • Completely (hard and easy parts together) • No recovery or protection mechanisms • Program completely vulnerable to any error
Advantages of Acceptability-Oriented Computing • Focused, prioritized development effort • Appropriately direct engineering activities • Ensure satisfaction of acceptability properties • Resilient software structure • Redundant acceptability property enforcement • Mechanisms enforce partial properties • Simpler (easier to obtain acceptability) than complete modules in core software • Resulting software structure tolerates errors
Ideal Result • Can build systems with less development effort • Can reduce testing effort for core • Can leave (infrequent) errors in system • Can build systems with more functionality • Can invest saved development effort on increasing functionality of system • Can make larger system stable • Can use more aggressive, riskier algorithms
10 12 11 12 10 Map Example Inputs Outputs put x 10 Map Core put y 12 put z 11 get y rem z Acceptability Property Output must be within min and max inputs
10 12 11 12 10 Map Example Inputs Outputs put x 10 Map Core put y 12 put z 11 get y rem z Acceptability Property Output must be within min and max inputs
Unacceptable Output Inputs Outputs put x 10 10 Map Core put y 11 Unacceptable Output 11 rem y 11 put x 12 12 rem x 12 get x 2
Input/Output Correlation Inputs Outputs put x 10 10 Map Core put y 11 11 rem y 11 put x 12 12 rem x 12 2 get x Input Monitor Output Filter Input/Output Correlator Min: Max:
Input/Output Correlation Inputs Outputs put x 10 put x 10 10 Map Core put y 11 put y 11 11 rem y rem y 11 put x 12 put x 12 12 rem x rem x 12 2 get x get x Input Monitor Output Filter Input/Output Correlator Min: 10 Max: 12
Input/Output Correlation Inputs Outputs put x 10 put x 10 10 10 Map Core put y 11 put y 11 11 11 rem y rem y 11 11 put x 12 put x 12 12 12 rem x rem x 12 12 2 get x get x Input Monitor Output Filter Input/Output Correlator Min: 10 Max: 12
First Option: Shut Down System Inputs Outputs put x 10 put x 10 10 10 Map Core put y 11 put y 11 11 11 rem y rem y 11 11 put x 12 put x 12 12 12 rem x rem x 12 12 2 get x get x Input Monitor Output Filter Input/Output Correlator Min: 10 Max: 12
Second Option: Return Error Code Inputs Outputs put x 10 put x 10 10 10 Map Core put y 11 put y 11 11 11 rem y rem y 11 11 put x 12 put x 12 12 12 rem x rem x 12 12 2 0 get x get x Input Monitor Output Filter Error Code Input/Output Correlator Min: 10 Max: 12
Third Option: Return Min or Max Value Inputs Outputs put x 10 put x 10 10 10 Map Core put y 11 put y 11 11 11 rem y rem y 11 11 put x 12 put x 12 12 12 rem x rem x 12 12 2 10 get x get x Input Monitor Output Filter Min Value Input/Output Correlator Min: 10 Max: 12
When to Use Each Option • Shut down system when • It is safe and acceptable • External intervention is available • Return error code when • Client is able to deal with error code • Return min or max when • Not safe to shut down system • No external intervention available • Client not prepared to deal with error code Safe Exit Delegation Resilient Computing All options use block box mechanism
Implementation Approach Hash Table a e i AcceptabilityProperty 1 7 11 b Each entry has exactly one incoming reference • From table, table entry, or free list • Implies no cycles in table or free list • Implies disjointness of table and free list 3 d h 4 10 Free List
Checking for Acceptability Violations • Auxiliary reference count for each entry • Traverse data structures to compute counts • Check that no count greater than one • Complications • Invalid pointers (addressing violations) • Out of bounds array indices (more addressing violations) • Cycles (infinite traversal loops)
Mechanisms for Accessing Data Structures • White Box • Link monitor and checking code into core • Possibility of core corrupting checker (and vice-versa!) • Gray Box • Checker uses ptrace interface (or mmap) • More cumbersome to access data structures • But checker isolated from core
Inconsistency Responses • Fail stop – halt program, await intervention • Feasible when halting acceptable • And intervention practical • May actually decrease reliability • Delegation – return error code to client • Feasible when client can deal with error • Resilient computing – fix inconsistency, continue • Enables continued (acceptable) execution • Hides effect of inconsistency from clients
Code for Put Procedure in Map Example Hash table and free list int table[M]; int freelist; put(n, v) e = alloc(); value(e) = v; strcpy(name(e), n); p = find(n); if (p != NOENTRY) free(p); b = bin(n); next(e) = table[b]; table[b] = e; return(v); free(e) value(e) = freelist; freelist = e; Allocate and initialize new hash table entry Free old entry with same name Insert new entry into hash table Insert entry into free list
Code for Put Procedure in Map Example Hash table and free list int table[M]; int freelist; put(n, v) e = alloc(); value(e) = v; strcpy(name(e), n); p = find(n); if (p != NOENTRY) free(p); b = bin(n); next(e) = table[b]; table[b] = e; return(v); free(e) value(e) = freelist; freelist = e; Does not check for empty free list Allocate and initialize new hash table entry Free old entry with same name Leaves entry in table Insert new entry into hash table Creates cycle if entry already in table Insert entry into free list
Problem Program crashes if free list empty when call put New Acceptability Property Free list is not empty Acceptability Enforcement Repair algorithm ensures free list not empty
Data Structure Repair Goal All References Valid Invalid References Map Core Map Core Cycle No Cycles Empty Free List Entries in Free List
Enforcing Consistency • Hand-coded consistency algorithm • Coding is difficult because must assume data structures can be arbitrarily corrupted • Invalid references, out of bounds indices • Cycles (can cause infinite loops in repair code) • Two data structure traversals • First eliminates invalid references and indices • Second removes all but first reference to each entry (requires auxiliary marking data structure) • Reconstruct free list • Any unreferenced entry put into list • If free list still empty, steal entry from table
Issues • Replace failure with potentially suboptimal (but still acceptable) execution • Checking overhead • Depends on properties and application • Subject to optimization • Obscured errors • Record violations and updates in logs • Use logs to reconstruct actions • Potential errors in checking and repair code • Acceptability enforcement code deals with simpler properties than core • Should be simpler and easier to get correct
Generalizations • Process structure consistency • System structured as collection of processes • Monitor and regenerate processes to preserve consistency properties • System configuration consistency • Difficult to get configuration settings correct • Monitor and update to satisfy properties • Properties may depend on running applications, attached devices, etc. • Both involve structural properties
Next Problem int table[M]; int freelist; put(n, v) e = alloc(); value(e) = v; strcpy(name(e), n); p = find(n); if (p != NOENTRY) free(p); b = bin(n); next(e) = table[b]; table[b] = e; return(v); free(e) value(e) = freelist; freelist = e; Buffer Overrun
Long Inputs Crash Core Inputs Outputs put x 10 10 put y 11 11 rem y Map Core 11 put xxxxxxxxxxx 12 rem x get xxxxxxxxxxx
Long Inputs Crash Core Inputs Outputs put x 10 10 put y 11 11 rem y Map Core 11 put xxxxxxxxxxx 12 rem x get xxxxxxxxxxx
Long Inputs Crash Core Inputs Outputs put x 10 put x 10 10 put y 11 put y 11 11 rem y rem y Map Core 11 put xxxxxxxxxxx 12 put xxx 12 12 rem x rem x 10 get xxxxxxxxxxx get xxx 12 Truncating Input Filter