450 likes | 559 Views
CS598-YYZ : Reliable and Robust Software Topics Overview. Yuanyuan (YY) Zhou. Good News!. Many (around 10) students dropped the class My initial limit was set to be 20! My scaring strategy actually works! Implication: The class can be more focused I can spend more time with each group.
E N D
CS598-YYZ : Reliable and Robust SoftwareTopics Overview Yuanyuan (YY) Zhou
Good News! • Many (around 10) students dropped the class • My initial limit was set to be 20! • My scaring strategy actually works! • Implication: • The class can be more focused • I can spend more time with each group CS598YYZ-Fall 2005
Admin. Things • Some schedule change • Project proposal • Guest lectures • Drop/move some paper presentations • Reminders: • 9/1, select 2 papers for presentations • 9/6, cell topic selection • Each cell topic has assigned a TA to help • Survey • Project • Poll • Auditing • Ambition for class projects • Topic Matrix: • Security row is blocked (no survey, no project), but will read/present/discuss papers • Recovery is renamed to “Recovery or Tolerance or Avoidance” • No column survey! • Each cell has signed a TA for weekly meeting • Survey • Project • I will participate bi-weekly CS598YYZ-Fall 2005
Outline • Goal of this class • Help you understanding more about this class • Help you in paper selection and topic selection • Introduction of Software Reliability • Software development process • Software Defects, detection and recovery • Software mis-configuration, detection and diagnoses • Security attacks, detection and recovery CS598YYZ-Fall 2005
Importance of Robustness and Reliability • Correctness-critical applications • Air-craft control, hospital monitor systems • On-line transaction processing • Internet services • e.g., Google, Yahoo!, Amazon, Ebay, etc. • Expectation of 24 x 7 availability, but service outages still happen! • Sorry....We apologize for the inconvenience, but the system is currently unavailable. Please try your request in an hour. If you require assistance please call Customer Service at 1-866-325-3457. CS598YYZ-Fall 2005
General Software Reliability • This Thursday: • Fundamental Concepts of Dependability • Why Do Computers Stop and What Can Be Done About It? • Quantitative Analysis of Faults and Failures in a Complex Software System • Which one is more interesting? CS598YYZ-Fall 2005
Causes of Computer Failures Other Hardware Administrator Network Software CS598YYZ-Fall 2005
Why Does Software Fail? • “Programs are really not much more than the programmer’s best guess about what a system should do” [Abbott90] • Four inherent properties make software hard • Complexity • Non-linear interaction among millions of states • Conformity • Being considered as the most conformable component in the system • Changeability • Being considered as extremely malleable • Invisibility • Hard to visualize • Comparison with hardware? CS598YYZ-Fall 2005
Causes of Software Failures • Software bugs • Configurations (operator errors) • Security attacks CS598YYZ-Fall 2005
Software Development • Software life cycles: Planning, Development, Supporting • Software development models • Linear sequential process model CS598YYZ-Fall 2005
Prototyping process model • Question: When is this model useful? CS598YYZ-Fall 2005
Processes in Software Development • Requirement capture • Functional, performance, interface and safety • Design • Software architecture, data structure, interface, procedures • Coding • Integration CS598YYZ-Fall 2005
Software Supporting • Verification • Search and report errors • Consists of: reviews, analysis, testing • Source code configuration management • Organize, control modifications to software • Note: different from our configuration topic • Quality assurance • Ensure that the software development organization does “the right thing at the right time in the right way” CS598YYZ-Fall 2005
But yet still… • ‘The vast majority of the software today is handcrafted by artisans using craft-based techniques that cannot produce consistent results…” • “As a result, software failure is a common occurrence, often with substantial societal and economic consequences.” CS598YYZ-Fall 2005
Dealing With Software Faults • Fault prevention • Good software design framework, languages, etc • Fault removal • Bug detection, testing, debugging, etc • Fault tolerance • Survive faults, transparent recovery, etc • Manual recovery • Done by operators • Fault forecasting • Real life analogy? CS598YYZ-Fall 2005
Software Quality Measures • Reliability: the probability of failures • MTBF = MTTF + MTTR • MTBF: mean time between failure • MTTF: mean time to failure • MTTR: mean time to repair • Availability: ??? • MTBF / (MTBF + MTTR) • Safety & security: the consequences of failures • Availability: Service unavailable • Integrity: Data loss, data tampering • Confidentiality: Information leaking CS598YYZ-Fall 2005
Fault Injection: Measuring Fault Tolerance • Fault Injection (Next Week): • Fault Injection Tools and Techniques • Fault Injection for Dependability Validation: A Methodology and some Applications. • FERRARI: A Flexible Software-Based Fault and Error Injection System. • Predicting How Badly ``Good'' Software Can Behave CS598YYZ-Fall 2005
Types of Software Faults (Defects) • Design defects • E.g. protocol errors • Implementation defects • Memory-related bugs • Memory leaks, memory corruption (buffer overruns, etc) • Concurrency bugs • Deadlock, data races • Semantic bugs • … CS598YYZ-Fall 2005
Defect Characteristics Papers to be discussed (9/13, 15): • A Comparison of Software Defects in Database Management Systems and Operating Systems. • An Empirical Study of Operating System Errors. • Whither generic recovery from application faults? • Defect Survey Presentation CS598YYZ-Fall 2005
Defect Detection and Diagnoses • Detect and diagnose design defects • Software testing and interactive debugging • Manual review • Model checking • Detect and diagnose implementation defects • Software testing and interactive debugging • Code review • Model checking • Static checking • Dynamic checking CS598YYZ-Fall 2005
Solution 1: Interactive Debugging • Examples: gdb, Microsoft Visual Studio • Pros: • Program-specific • Still the dominant method for debugging • Cons: • Time and people-consuming • Some bugs are hard to reproduce • Require experience • Assistance • Flight data recorder • Deterministic replay CS598YYZ-Fall 2005
Solution 2: Static Checking • Examples: Model checking or compile checking • Pros: • No run time overhead • Cons: • Need specification, annotation … • Limited by aliasing problems and other compile-time limitation, especially for C/C++ CS598YYZ-Fall 2005
Solution 3: Dynamic Checking • Examples: assertions, Purify, KAI, Eraser, etc • Pros: • Accurate information at run-time • Cons: • Large run-time overhead (up to 40 times) • Some tools still limited by aliasing problems. • Find bugs only in exercised paths CS598YYZ-Fall 2005
Bug Detection Methods to be Studied • Oct 4-13 • CCured in the real world. • RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking. • PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code. • CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code • Bug isolation via remote program sampling • Purify: Fast detection of memory leaks and access errors. • AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-based Invariants. • iWatcher: Efficient Architecture Support for Software Debugging. • Defect Detection Survey CS598YYZ-Fall 2005
Software Failure Recovery • Group discussion • 5-6 per groups • The winning group • Find the most number of recovery strategies • Note: TAs cannot participate CS598YYZ-Fall 2005
Software Failure Recovery • Rebooting Techniques • Whole-system rebooting, micro-rebooting, software rejuvenation • General checkpointing and recovery • Fail-over system, progressive retry, recovery block, n-version programming • Application-specific recovery • Multi-process model, exception handling • Recently proposed non-traditional mechanisms • Failure-obvious computing, reactive immune systems, Rx CS598YYZ-Fall 2005
Recovery Techniques to be Studies • Nov 1-10 • Microreboot�A Technique for Cheap Recovery. • Rx: Treating bugs as allergies---a safe method to survive software failure • Selective Recovery • Software rejuvenation: Analysis, module and applications. • Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines • BASE: Using abstraction to improve fault tolerance. • Enhancing server availability and security through failure-oblivious computing. • Recovering device drivers. • Building a reactive immune system for software services. • Failure Recovery Survey CS598YYZ-Fall 2005
Another Cause of Failures: Operator Errors • A significant number of outages in Internet services are a result of operator actions [Oppenheimer03] • #1: Architecture is complex • #2: Systems are constantly evolving • #3: Lack of tools for operators to reason about the impact of their actions • Offline testing, emulation, simulation • Very little detail on operator mistakes • Details strongly guarded by companies and administrators CS598YYZ-Fall 2005
Operator Mistakes: Category Vs Impact • 64% of all mistakes had immediate impact on service performance • 36% resulted in latent faults • Obs. #1: Significant no. of mistakes can be checked by testing with a realistic environment • Obs. #2: Undetectable latent errors will still require online-recovery techniques CS598YYZ-Fall 2005
Operator Mistakes • Misconfigurations account for 57% of all errors • Configuration mistakes spanning multiple components are more likely • Obs. #1: Tools to manipulate and check configurations are crucial • Obs. #2: Be extremely careful when maintaining multiple versions of s/w CS598YYZ-Fall 2005
Why Do We Study Configurations? CS598YYZ-Fall 2005
Example: Windows Registry CS598YYZ-Fall 2005
Understanding Mis-Configurations • Sept 20-22 • Why PCs Are Fragile and What We Can Do About It: A Study of Windows Registry Problems • Understanding and Dealing with Operator Mistakes in Internet Services • STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support • Mis-configuration Survey Presentation CS598YYZ-Fall 2005
Possible Mis-Configuration Detection: Comparing with Others CS598YYZ-Fall 2005
Mis-configuration Detection • Oct 18-20 • Automatic Misconfiguration Troubleshooting with PeerPressure • Persistent-state Checkpoint Comparison for Troubleshooting Configuration Failures, • Discovering Correctness Constraints for Self-Management of System Configuration. • Gatekeeper: Monitoring Auto-Start Extensibility Points (ASEPs) for Spyware Management" • Mis-configuration Detection Survey CS598YYZ-Fall 2005
Compare Application State Database Compare Compare Tolerating Mis-Configurations Client Requests Online slice Validation slice Web Server Web Server Tier 1 Web ServerProxy Application Server Application Server Application Server Tier 2 DatabaseProxy Tier 3 Shunt CS598YYZ-Fall 2005
Mis-Configuration Diagnosis • 11/15-17 • Performance Debugging for Distributed Systems of Black Boxes. • Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control • Configuration Debugging as Search: Finding the Needle in the Haystack • Pinpoint: Problem Determination in Large, Dynamic, Internet Services. CS598YYZ-Fall 2005
Security Attacks • More than 55% of security attacks are caused by software defects • Types of attacks • Hijack system • Leaking information • Denial of service • Data tampering • Stealing identity • Internet worm • … CS598YYZ-Fall 2005
Security Attacks • 9/27-29 • Smashing the Stack for Fun and Profit • Once upon a free(). • An Experimental Study of Security Vulnerabilities Caused by Errors. • Using Memory Errors to Attack a Virtual Machine. • Security attack survey CS598YYZ-Fall 2005
Group Discussion • How can you detect security attacks? • How can you avoid attacks? CS598YYZ-Fall 2005
Attack Detection • Methods • Detecting bugs (buffer overrun, …) • Protecting critical data and locations • Logging and analysis • Pattern matching • Anomaly detection • … CS598YYZ-Fall 2005
Detecting Security Attacks • 10/25-27 • Detecting Past and Present Intrusions Through Vulnerability-Specific Predicates, • Using Programmer-Written Compiler Extensions to Catch Security Holes. • Revirt: Enabling intrusion analysis through virtual-machine logging and replay. • High Coverage Detection of Input-Related Security Faults. • Attack Detection Survey CS598YYZ-Fall 2005
Avoiding Security Attacks • 11/29 • Randomized instruction set emulation to disrupt binary code injection attacks. • Countering Code-Injection Attacks with Instruction-Set Randomization. • Automated Web Patrol with Strider Honey Monkeys: Finding Web Sites That Exploit Browser Vulnerabilities CS598YYZ-Fall 2005