180 likes | 253 Views
Collaborative Learning for Security and Repair in Application Communities. Performers: MIT and Determina Michael Ernst MIT Computer Science & Artificial Intelligence Lab 7 July 2006. Personnel. MIT Michael Ernst Martin Rinard Jeff Perkins Stephen McCamant Shay Artzi … and others
E N D
Collaborative Learning for Security and Repair in Application Communities Performers: MIT and Determina Michael Ernst MIT Computer Science & Artificial Intelligence Lab 7 July 2006
Personnel • MIT • Michael Ernst • Martin Rinard • Jeff Perkins • Stephen McCamant • Shay Artzi • … and others • Determina • Sandy Wilbourn • Derek Bruening • Saman Amarasinghe • … and others
Vulnerable monocultures Problem: Large installed bases of similar software Susceptible to a single catastrophic attack Opportunity: Large community of cooperating applications Share information about attacks, errors Experiment with different response and recovery strategies Disseminate successful approaches
Components of our solution • Technical ideas: • Targeted bounds enforcement • Data structure consistency learning and enforcement • Implementation platform • Determina Managed Program Execution Engine
Cooperating communities • Each computer is a sentry on watch for problems • Each computer is a testbed for evaluating solutions • Share information about problems and solutions • The system learns: it performs better over time • Example: • One machine notices an error or attack • Generate many distinct patches • Each machine loads a randomly chosen patch • Discard patches that do not yield acceptable behavior
Targeted bounds enforcement • Program errors or injected code indicates bounds violations • Generate patches to eliminate bounds errors • Evaluate patches on many machines • Filter out those that do not eliminate problems (or that cause new problems)
Data structure consistency learning and enforcement Monitor data structures in successful runs Machine learning generalizes to consistency properties • Use of a community minimizes over-fitting Monitor executions for violations Repair corrupt data structures Learn which repairs are most successful • Helps eliminate incorrect constraints
COTS applications • Pros: • Inexpensive, featureful, familiar, widely deployed • Cons: • Contain many (exploitable) bugs • No source code or debug symbols
Determina managed execution • Determina MPEE: Managed Program Execution Environment • Efficient emulation engine for x86 binaries • Typically <5% overhead: permits routine use • API: • Arbitrarily patch and modify the executable • Examine instructions before execution • Set breakpoints at which to suspend execution • Robust and scalable (e.g., Microsoft Office apps)
Productization • Determina’s customers use its security products on commercial Windows applications • Determina partnership permits test and evaluation in COTS environments • If successful, integrate intoVulnerability Protection Suite™ product
Why this can succeed (now) • Technologies (bounds enforcement, constraint learning, and constraint enforcement) have been demonstrated in the lab • Experiments limited in some ways, but more thorough than typical initial research efforts • Determina toolset has unique capabilities • Application community permits faster and more accurate learning, and permits experimentation by reducing the cost of any single failure
Metrics • Tools for Windows binaries built on top of Determina products (MPEE, LiveShield™, etc.) • Bounds enforcement detects 95% of injected code attacks and (asymptotically) recovers from 60% of them • Data structure constraint learning and repair detects 50% of attacks and errors that corrupt data, and recovers from 30% of such errors and attacks
Outline of the presentation • Introduction/overview • Previous work on learning and repair of data structure consistency constraints • DARPA Self-Regenerative Systems program • Details on learning and repair components • Determina security products • Determina monitoring framework • Plans
Challenges • Performing whole-program analysis • Determina tools are basic-block oriented • Inferring types from the heap • Past work has relied on source code or debug symbols • Scaling research tools to very large systems • Focus on small parts of interest • Distribute work among many machines • Scale back parts of the algorithms • New repair algorithms: operate directly on data, tolerate potential conflicts among constraints • Better tolerate mislabeled inputs to the learning algorithm • Learning temporal sequences as well as data structure constraints
Activities • Injected code detection • Patch generation • Patch evaluation and filtering • Constraint learning • Constraint monitoring • Constraint repair • Repair evaluation and filtering • Infrastructure development • Evaluation
Phases of the project • Tool development • Tool integration • Experimentation • Deployment
Deliverables (1) • Enhanced Client Interface for MPEE (Determina) • Injected Code Detection (MIT) • Application State Probing (Determina) • Learning for Binaries (Determina and MIT) • LiveShield Constraint Creation Framework (Determina) • Data Structure Consistency Checking (MIT) • Patch Generation (MIT) • LiveShield Coordination Center (Determina) • Patch Distribution (MIT) • Hybrid System for Binary Analysis (Determina)
Deliverables (2) • Proactive Situation Awareness (Determina) • Vulnerability Analysis (Determina) • Custom Constraints (MIT) • Integration, Testing, and Deployment (Determina) • Alternative Repair Generation (MIT) • Merging of learning (MIT) • Type Inference for Heap Structures (MIT) • Dynamic Constraint Update (MIT) • Repair Evaluation and Filtering (MIT) • Patch Testing (MIT) • Patch Evaluation and Filtering (MIT)