300 likes | 393 Views
Memory Faults: Injection & Solutions . Jeffrey Freschl, Di Xue. The Problem. “Memory meets corruption, it happens everyday, it could happen to you…” --famous quote modified from the People Store Commercial Can Linux handle cheap memory? Can we protect ourselves from memory faults?.
E N D
Memory Faults: Injection & Solutions Jeffrey Freschl, Di Xue
The Problem “Memory meets corruption, it happens everyday, it could happen to you…” • --famous quote modified from the People Store Commercial • Can Linux handle cheap memory? • Can we protect ourselves from memory faults?
Talk Outline • Some Preparation (The How) • Actual Corruption and Results • A Solution (Methods and Implementation)
Software Fault Injection • SWIFI – Software implemented fault injection is a common way to validate system design. • SWIFI gives the freedom we need.
What We Inject? Task_struct • Process – An instance of a program in execution. • Kernel must know process’s state to properly manage. • Task_struct contains information about a process.
Data Members • prio: process’s priority • run_list: address of entry in runqueue which contains list of TASK_RUNNING processes. • time_slice: amount of time to run • lock_depth: locking for simultaneous access. • policy: fifo, round robin, etc. • mmap_base: below thestack's low limit (the base) • vm_start: start address of the VM area
Fault Propagation • EIP locates fault point • Call Trace illustrates path to fault
Part III – A Solution Protecting Linux from Di’s Corruption
Methods (Update & Access) • Error Correcting Codes (ECC) • Majority Vote What are the tradeoffs? Time? Space? Recoverability?
Intro to Hamming Code (Magic) • Hamming Rule d + p + 1 ≤ 2p (d is # of input bits, p is # of parity bits) • Generator Matrix G G = [I:A] A is a (d X p) dim matrix A must have unique rows and columns
Hamming cont. (More Magic) • To encode input string codeword = input x G • To check if input string is corrupt H = [AT : I ] syndrome = H * codeword if( syndrome == 0 ) then no corruption otherwise, match syndrome to column in H
Hamming (Back to Reality) • Redundancy • Can only recover from 1 bit corruption • Space • Almost constant (optimal # of parity bits) • Time • Lots of bitwise XORs and ANDs
Majority Vote • Time to update very fast! • Space Overhead! • Simple Implementation!! If( copy1 != copy2 ) use copy3 else everything is ok
Design Goals • Want a “redundancy repository” for entire kernel • Minimize Programmer’s Pain! • On demand backup • Scalability
“Just give me a location and I’ll take care of you!” - Redundancy Repository
Redundancy Repository Redundancy HashTable Member Entry int size long id char parity
How to Protect? Redundancy API • checkParity( addressOfMember, size ) • Add before a read access • updateParity( addressOfMember, addressOfNewValue, size ) • Add before an update
Some Challenges • Dealing with different sized data members. • Originally focused on protecting address • Solution: Need to know size of data • What about recursive redundancy? • User Registration • Manual Integration
Updated Results Di + Kernel + Solution Harmony
Summary • 20% of the critical data members we tested caused a crash. • Finding every location that updates memory is difficult. • The system no longer crashed with our redundancy solution.
Thank You • Jeffrey Freschl jfreschl@cs.wisc.edu • Di Xue goldenspaceship@gmail.com