320 likes | 482 Views
Developing Dependable Systems by Maximizing Component Diversity and Fault Tolerance. Jeff Tian, Suku Nair, LiGuo Huang, Nasser Alaeddine and Michael Siok Southern Methodist University
E N D
Developing Dependable Systems by Maximizing Component Diversity and Fault Tolerance Jeff Tian, Suku Nair, LiGuo Huang, Nasser Alaeddine and Michael Siok Southern Methodist University US/UK Workshop on Network-Centric Operation and Network Enabled Capability, Washington, D.C., July 24-25, 2008
Outline • Overall Framework • External Environment Profiling • Component Dependability: • Direct Measurement and Assessment • Indirect Assessment via Internal Contributor Mapping • Value Perspective • Experimental Evaluation • Fault Injection for Reliability and Fault Tolerance • Security Threat Simulation • Summary and Future Work US/UK NCO/NEC Workshop
Overall Framework • Systems made up of different components • Many factors contribute to system dependability • Our focus: Diversity of individual components • Component strength/weakness/diversity: • Target: Different dependability attributes and sub-attributes • External reference: Operational profile (OP) • Internal assessment: Contributors to dependability • Value perspective: Relative importance and trade-off • Maximize diversity => Maximize dependability • Combine strength • Avoid/complement/tolerate flaws/weaknesses US/UK NCO/NEC Workshop
Overall Framework (2) • Diversity: Four Perspectives • Environmental perspective: Operational profile (OP) • Target perspective: Goal, requirement • Internal contributor perspective: Internal characteristics • Value perspective: Customer • Achieving diversity and fault tolerance: • Component evaluation matrix per target per OP • Multidimensional evaluation/composition via DEA (Data Envelopment Analysis) • Internal contributor to dependability mapping • Value-based evaluation using single objective function US/UK NCO/NEC Workshop
Terminology • Quality and dependability are typically defined in terms of conformance to customer’s expectations and requirements • Key concepts: defect, failure, fault, and error • Dependability: the focus in this presentation • Key attributes: reliability, security, etc. • Defect = some problem with the software • either with its external behavior • or with its internal characteristics US/UK NCO/NEC Workshop
Failure, Fault, Error • IEEE STD 610.12 terms related to defect: • Failure: The inability of a system or component to perform its required functions within specified requirements • Fault: An incorrect step, process, or data definition in a computer program • Error: A human action that produces an incorrect result • Errors may cause faults to be injected into the software • Faults may cause failures when the software is executed US/UK NCO/NEC Workshop
Reliability and Other Dependability Attributes • Software reliability = the probability for failure-free operation of a program for a specified time under a specified set of operating conditions (Lyu, 1995; Musa et al., 1987) • Estimated according to various model based on defect and time/input measurements • Standard definitions for other dependability attributes, such as security, fault tolerance, availability, etc. US/UK NCO/NEC Workshop
Outline • Overall Framework • External Environment Profiling • Component Dependability: • Direct Measurement and Assessment • Indirect Assessment via Internal Contributor Mapping • Value Perspective • Experimental Evaluation • Fault Injection for Reliability and Fault Tolerance • Security Threat Simulation • Summary and Future Work US/UK NCO/NEC Workshop
Diversity: Environmental Perspective • Dependability defined for a specific environment • Stationary vs dynamic usage environments • Static, uniform, or stationary (reached an equilibrium) • Dynamic, changing, evolving, with possible unanticipated changes or disturbances • Single/overall OP for former category • Musa or Markov variation • Single evaluation result possible per component per dependability attribute: e.g., component reliability R(i) • Environment Profiling for Individual Components • Environmental snapshots captured in Musa or Markov Ops • Evaluation matrix (later) US/UK NCO/NEC Workshop
Operational Profile (OP) • Operational profile (OP) is a list of disjoint set of operations and their associated probabilities of occurrence (Musa 1998) • OP describes how users use an application: • Help guide the allocation of test cases in accordance with use • Ensure that the most frequent operations will receive more testing • As the context for realistic reliability evaluation • Other usages, including diversity and internal-external mapping in this presentation US/UK NCO/NEC Workshop
Markov Chain Usage Model • Markov chain usage model is a set of states, transitions, and the transition probabilities • As an alternative to Musa (flat) OP • Each link has an associated probability of occurrence • Models complex and/or interactive systems better • Unified Markov Models(Kallepalli and Tian, 2001; Tian et al., 2003): • Collection of Markov Ops in a hierarchy • Flexible application in testing and reliability improvement US/UK NCO/NEC Workshop
Operational Profile Development:Standard Procedure • Musa’s steps (1998) for OP construction: • Identify the initiators of operations • Choose a representation (tabular or graphical) • Create an operations “list” • Establish the occurrence rates of the individual operations • Establish the occurrence probabilities • Other variations • Original Musa (1993): 5 top-down refinement steps • Markov OP (Tian et al): FSM then probabilities based on log files US/UK NCO/NEC Workshop
OPs for Composite Systems • Using standard procedure whenever possible • For overall stationary environment • For individual component usage => component OP • For dynamic environment: • Snapshot identification • Sets of OPs for each snapshot • System OP from individual component OPs • Special considerations: • Existing test data or operational logs can be used to develop component OPs • Union of component OPs => system OP US/UK NCO/NEC Workshop
OP and Dependability Evaluation • Some dependability attributes defined with respect to a specific OP: e.g., reliability • For overall stationary environment: direct measurement and assessment possible • For dynamic environment: OP-reliability pairs • Consequence of improper reuse due to different OPs (Weyuker 1998) • From component to system dependability: • Customization/selection of best-fit OP for estimation • Compositional approach (Hamlet et al, 2001) US/UK NCO/NEC Workshop
Outline • Overall Framework • External Environment Profiling • Component Dependability: • Direct Measurement and Assessment • Indirect Assessment via Internal Contributor Mapping • Value Perspective • Experimental Evaluation • Fault Injection for Reliability and Fault Tolerance • Security Threat Simulation • Summary and Future Work US/UK NCO/NEC Workshop
Diversity: Target Perspective • Component Dependability: • Component reliability, security, etc. to be scored/evaluated • Direct Measurement and Assessment • Indirect Assessment (later) • Under stationary environment: • Dependability vector for each component • Diversity maximization via DEA (data envelopment analysis) • Under dynamic environment: • Dependability matrix for each component • Diversity maximization via extended DEA by flattening out the matrix US/UK NCO/NEC Workshop
Diversity Maximization via DEA • DEA (data envelopment analysis): • Non-parametric analysis • Establishes a multivariate frontier in a dataset • Basis: linear programming • Applying DEA • Dependability attribute frontier • Illustrative example (right) • N-dimensional: hyperplane US/UK NCO/NEC Workshop
Inputs Outputs Efficiency Output/Input • Software Reliability At Release • Defect Density after test • Software Productivity • Labor hours • Software Change Size DEA Example • Lockheed-Martin software project performance with regard to selected metrics and production efficiency model • Measures efficiencies of decision making units (DMU) using weighted sums of inputs and weighted sums of outputs • Compares DMUs to each other • Sensitivity analysis affords study of non-efficient DMUs in comparison • BCC VRS Model used in initial study US/UK NCO/NEC Workshop
DEA Example (2) • Using production efficiency model for Compute-Intensive dataset group • Ranked set of projects • Data showing distance and direction from efficiency frontier US/UK NCO/NEC Workshop
Diversity: Internal Perspective • Component Dependability: • Direct Measurement and Assessment: might not be available, feasible, or cost-effective • Indirect Assessment via Internal Contributor Mapping • Internal Contributors: • System design, architecture • Component internal characteristics: size, complexity, etc. • Process/people/other characteristics • Usually more readily available data/measurements • Internal=>External mapping • Procedure with OP as input too (e.g., fault=>reliability) US/UK NCO/NEC Workshop
Example: Fault-Failure Mapping for Dynamic Web Applications US/UK NCO/NEC Workshop
Web Example: Fault-Failure Mapping • Input to analysis (and fault-failure conversion): • Anomalies recorded in web server logs (failure view) • Faults recorded during development and maintenance • Defect impact scheme (weights) • Operational profile • Product “A” is an ordering web application for telecom services • Consists of hundreds of thousands of lines of code • Running on IIS 6.0 (Microsoft Internet Information Server), • Process couple of millions requests per day US/UK NCO/NEC Workshop
Web Example: Fault-Failure Mapping (Step 1) • Pareto chart for the defect classification of product “A” • The top three categories represent 66.26% of the total defect data US/UK NCO/NEC Workshop
Web Example: Fault-Failure Mapping (Steps 4 & 5) • OP for product “A” and the corresponding numbers of transactions. US/UK NCO/NEC Workshop
Web Example: Fault-Failure Mapping (Step 6) • Using the number of transactions calculated from OP and the defined fault impact schema, we calculated the fault exposure or corresponding potential failure frequencies US/UK NCO/NEC Workshop
Web Example: Fault-Failure Mapping (Step 7) US/UK NCO/NEC Workshop
Web Example: Fault-Failure Mapping (Result Analysis) • A large number of failures were caused by a small number of errors with high usage frequencies • Fixing faults with a high usage frequency and a high impact could achieve better efficiency in reliability improvement • By fixing the top 6.8% faults, the total failures were reduced by about 57% • Similarly, 10% -> 66%, 15%->71%, 20%->75%, for top-faults induced failure reduction • Defect data repository and web server log recorded failures have insignificant overlap => both are needed for effective reliability improvement US/UK NCO/NEC Workshop
Diversity: Value Perspective • Component Dependability Attribute: • Direct Measurement and Assessment: might not capture what customers truly care about • Different value attached to different dependability attributes • Value-based software quality analysis: • Quantitative model for software dependability ROI analysis • Avoid one-size-fits-all • Value-based process: experience at NASA/USC (Huang and Boehm) extend to dependability • Mapping to value-based perspective more meaningful to target customers US/UK NCO/NEC Workshop
Value Maximization • Single objective function: • Relative importance • Trade-off possible • Quantification scheme • Gradient scale to selecte component(s) • Compare to DEA • General cases • Combination with DEA • Diversity as a separate dimension possible US/UK NCO/NEC Workshop
Outline • Overall Framework • External Environment Profiling • Component Dependability: • Direct Measurement and Assessment • Indirect Assessment via Internal Contributor Mapping • Value Perspective • Experimental Evaluation • Fault Injection for Reliability and Fault Tolerance • Security Threat Simulation • Summary and Future Work US/UK NCO/NEC Workshop
Experimental Evaluation • Testbed • Basis: OPs • Focus on problems and system behavior under injected or simulated problems • Fault Injection for Reliability and Fault Tolerance • Reliability mapping for injected faults • Use of fault seeding models • Direct fault tolerance evaluation • Security Threat Simulation • Focus 1: likely scenarios • Focus 2: coverage via diversity US/UK NCO/NEC Workshop
Summary and Future Work • Overall Framework • External Environment Profiling • Component Dependability: • Direct Measurement and Assessment • Indirect Assessment via Internal Contributor Mapping • Value Perspective • Experimental Evaluation • Fault Injection for Reliability and Fault Tolerance • Security Threat Simulation • Summary and Future Work US/UK NCO/NEC Workshop