1 / 67

Some thoughts for the industry session

Some thoughts for the industry session. Prof. Kishor S. Trivedi Department of Electrical and Computer Engineering Duke University Durham, NC 27708-0291 Phone: (919)660-5269 e-mail: kst@ee.duke.edu At present: visiting Professor IIT Kanpur, CSE Dept. Cochin Conference Dec 18, 2002.

vondra
Download Presentation

Some thoughts for the industry session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some thoughts for the industry session Prof. Kishor S. Trivedi Department of Electrical and Computer Engineering Duke University Durham, NC 27708-0291 Phone: (919)660-5269 e-mail: kst@ee.duke.edu At present: visiting Professor IIT Kanpur, CSE Dept. Cochin Conference Dec 18, 2002

  2. What does industry want? • Well trained students • Short term research problems solved • Short courses on timely topics

  3. What do faculty want? • Funding for `their’ research • Place their students in good company labs • Hope to get their research results transferred to industry • To get to know important and difficult problems that can drive their research

  4. Some lessons learned • Student placement should be guided by the advisor • Start early with summer internship • Patience is needed in listening to problems from industry • Patience is needed in getting the IP problems resolved • Expect to do at least 50% more work than the funding provided • Tech transfer is a double edged sword • Practical problems can give rise to respectable research papers • Short courses are ideal entry points

  5. Characteristics of the Systemsbeing Studied Dependability (Reliability, Availability, Safety): • Redundancy: Hardware (Static,Dynamic), Information, Time • Fault Types: Permanent, Intermittent, Transient, Design • Fault Detection, Automated Reconfiguration • Imperfect Coverage • Maintenance: scheduled, unscheduled

  6. Characteristics of the Systemsbeing Studied • Performance: • Resource Contention, Concurrency and Synchronization • Timeliness (Have to Meet Deadlines) • Composite Performance and Dependability: • Degradable Levels of Performance • Need Techniques and Tools that can Evaluate: • Systems with All the Characteristics Above • Explicitly Address Complexity

  7. MEASURES TO BE EVALUATED • Dependability • Reliability: R(t), System MTTF • Availability: Steady-state, Transient, Interval • Safety “Does it work, and for how long?'' • Performance • Throughput, Loss Probability, Response Time “Given that it works, how well does it work?''

  8. MEASURES TO BE EVALUATED • Composite Performance and Dependability “How much work will be done(lost) in a given interval including the effects of failure/repair/contention?'' • Need Techniques and Tools That Can Evaluate • Performance, Dependability and Their Combinations

  9. PURPOSE OF EVALUATION • Understanding a System • Observation Operational Environment Controlled Environment • Reasoning A Model is a Convenient Abstraction

  10. PURPOSE OF EVALUATION • Predicting Behavior of a System Need a Model Accuracy Based on Degree of Extrapolation • All Models are Wrong; Some Models are Useful • Prediction is fine as long as it is not about the future

  11. Methods of Quantitative EVALUATION • Measurement-Based Most believable, most expensive Not always possible or cost effective during system design

  12. Methods of Quantitative Evaluation(Continued) • Model-Based Less believable, Less expensive 1. Discrete-Event Simulation vs. Analytic 2. State-Space Methods vs. Non-State-Space Methods 3. Hybrid: Simulation + Analytic (SPNP) 4. State Space + Non-State Space (SHARPE)

  13. Why MODEL? • Provides a framework for gathering, organizing, understanding and evaluating information about a system e.g. Zitel, US&S,HP • A cost-effective means to evaluate a system e.g. Boeing, US&S, HP,IBM, Motorola, Cisco,SUN

  14. Why MODEL? (continued) • Provides a means of evaluating a set of alternatives in a structured and quantitative manner e.g. Zitel, DEC,HP • Sometimes needed due to legal and contractual obligations e.g. FAA • Sometimes needed for business reasons: Motorola, SUN, Cisco

  15. Compare two CLIENT-SERVER Architectures Architecture 2 Architecture 1

  16. Compare Connection Reliabilities • Connection reliability R(t) is the probability that throughout the interval [0,t) at least one path exists from the client to server on which all components are operational. • From R(t), system mean time to failure can be computed:

  17. Compare Connection Reliabilities

  18. Compare Connection Availabilities • Connection (instantaneous, transient or point) availability A(t) is the probability that at time t at least one path exists from the client to server on which all components are operational. • A(t)R(t) and limiting or steady-state Availability

  19. Compare Connection Availabilities

  20. MODELING THROUGHOUT SYSTEM LIFECYCLE • System Specification/Design Phase Answer “What-if Questions'' • Compare design alternatives (Zitel,HP,Motorola) • Performance-Dependability Trade-offs (DEC) • Design Optimization (wireless handoff)

  21. MODELING THROUGHOUT SYSTEM LIFECYCLE • Design Verification Phase Use Measurements + Models E.g. Fault/Injection + Reliability Model Union Switch and Signals, Boeing, Draper • Configuration Selection Phase: DEC • System Operational Phase: Lucent • It is fun!

  22. CASE STUDY: ZITEL • Comparison of two different fault-tolerant RAMdisks. • Stochastic Petri Net Package (SPNP) was used to model the two systems for their reliability.

  23. CASE STUDY: ZITEL • Trivedi worked with the designers directly: • Model Validation was done using face validation and sanity checks. • Parameterization was easy due to the experience of the designers. • One difficult research problem originated from the study; Subsequently solved and published in Microelectronics and Reliability journal.

  24. CASE STUDY: VAXCLUSTER • Developed three models of Processor Subsystem: • Two-Level Decomposition (IEEE-TR, Apr 89) Inner Level: 9-state Markov Outer level: n parallel diodes • A Detailed SPN Model (PNPM 89) • A Detailed SPN model for Heterogeneous Cluster (Averesky book)

  25. CASE STUDY: VAXCLUSTER • Storage Subsystem Model: A fixed-point iteration over a set of Markov submodels. (IEEE-TR, to appear) • Observed that availability is maximized with 2 processors (HCSS 90) • Many interesting reliability, availability, performability measures computed

  26. Case Study: HP • Cluster Availability Modeling • Server Availability • Mass Storage Arrays Availability Modeling • Started with Markov chains via SHARPE • Progressed toward Stochastic Petri Nets and Stochastic Reward nets via SPNP

  27. CASE STUDY: LUCENT • A Validated Model of Hardware-Software Availability. • Worked with V. Mendiratta of Naperville. • Model is semi-Markov; solved using SHARPE. • Parameters collected form field data. • Model results validated against actual measurements.

  28. CASE STUDY: LUCENT, IBM, Motorola, SUN • Software Rejuvenation: • A technique to counter software “aging” and increase its availability to clients. • Evaluated optimum rejuvenation interval which maximizes steady state availability (minimizes expected cost) for IBM cluster, Motorola CMTS cluster • Collected data from real systems to show aging and to determine proactive fault management strategies. Worked in our lab, with SUN Microsystems

  29. CASE STUDY: MOTOROLA • Availability & Performability Modeling: • Modeled several configurations of Communication Enterprise Common Platform. • Practical approaches for approximating steady state measures in large, repairable, and highly dependable system: model decomposition, state space truncation, etc. • Both SHARPE and SPNP used.

  30. CASE STUDY: MOTOROLA • Recovery strategies in wireless handoff: • proposed and modeled several strategies • a patent being filed by Motorola • SPNP was used • Hierarchy of two-level models used • Fixed-point iteration was used

  31. CASE STUDY: BELLCORE • Architecture-based software reliability: • proposed a methodology • applied the methodology to SHARPE • used Bellcore’s test coverage tool, ATAC, to parameterize the model • Bellcore is currently enhancing ATAC to incorporate our methodology

  32. CASE STUDY: DRAPER LAB • Overall aim was Verification of system with very high reliability/availability specifications. Prototype under consideration was FTPP cluster 3. • Hybrid approach proposed • Fault injection based measurements. • Statistical analysis of measured data to enable parameterization of analytical models.

  33. CASE STUDY: DRAPER LAB • Reliability modeling of the prototype done: Parameterization done with the aid of existing reliability databases. • Analytical solution provided exact closed form expressions • Markov model solved using SHARPE • Petri net model solved using SPNP • Reliability bottlenecks found

  34. CASE STUDY: AT & T • GSHARPE: • A Preprocessor to SHARPE developed at Bell Labs by a Duke Student. • User can specify Weibull Failure times and lognormal and other repair time distributions. • GSHARPE fits these to phase type distributions and produces a Markov model that is generated for processing by SHARPE

  35. CASE STUDY: BOEING • An Integrated Reliability Environment • A working prototype • Developed a high-level modeling language (SDM) • Designed and implemented an intelligent interpreter

  36. CASE STUDY: BOEING(Continued) • Interpreter determines which solution method is applicable • Five different modeling engines are integrated: • CAFTA, SETS, EHARP, SHARPE and SPNP.

  37. QUANTITATIVE EVALUATION TAXONOMY Closed-form solution Numerical solution using a tool

  38. MODELING TAXONOMY

  39. STATE SPACE MODELING TAXONOMY

  40. ANALYTIC MODELING TAXONOMY NON-STATE SPACE MODELING TECHNIQUES Product form queuing models SP reliability block diagrams Non-SP reliability block diagrams

  41. State Space Modeling Taxonomy discrete-time Markov chains Markovian modeling continuous-time Markov chains Markov reward models State space methods Semi-Markov models non-Markovian modeling Markov regenerative models Non-Homogeneous Markov

  42. State-Space Based Models • Transition label: • Probability: (homogeneous) discrete-time Markov chain (DTMC) • Time-independent Rate: homogeneous continuous-time Markov chain • Time-dependent Rate: non-homogeneous continuous-time Markov chain • Distribution function: semi Markov process • Two Dist. Functions: Markov Regenerative Process

  43. IN ORDER TO FULFILL OUR GOALS OF • Modeling Performance, Dependability and Performability • Modeling Complex Systems We Need • Automatic Generation and Solution of Large Markov Reward Models

  44. IN ORDER TO FULFILL OUR GOALS OF • Facility for State Truncation, Hierarchical composition of Non-State-Space and State-Space Models, Fixed-Point Iteration • There are Two Tools that Potentially meet these Goals • Stochastic Petri Net Package (SPNP) • Symbolic Hierarchical Automated Rel. and Perf. Evaluator (SHARPE)

  45. MODELING SOFTWARE PACKAGES • HARP - Hybrid Automated Reliability Predictor (Duke Univ, funded by NASA Langley) • SAVE - System Availability Estimator (Duke Univ. funded by IBM) • SHARPE - Symbolic Hierarchical Automated Reliability and Performance Evaluator; installed at nearly 280 locations (GUI available) • SPNP - Stochastic Petri Net Package installed at nearly 120 locations (iSPN - GUI available) • D_RAMP for Union Switch and Signals by Duke, UVA and CMU • SDM - Boeing Integrated Reliability Modeling Environment (Jointly developed by Duke Univ., Univ. of Wash. and Boeing) • SDDS - Developed by Sohar with the help from K. Trivedi • SREPT - Software Reliability Estimation and Prediction Tool

  46. Challenges in Modeling

  47. COMPLEXITIES OF MODELS • Large State Space • Model construction problem • Model solution problem • Model Stiffness. Fast and slow rates acting together • Failure And Recovery/Repair • Performance and failure

  48. COMPLEXITIES OF MODELS • Modeling Non-Exponential Distributions • Combining performance and reliability • Believability/Understandability/Usability • Incorporation in the design process • Connection between measurements & models: • Parameterization • Validation

  49. LARGENESS TOLERANCE • Automated Model Construction • Stochastic Petri nets (GreatSPN, SPNP, SHARPE, DSPNexpress, ULTRASAN) • High level languages (SAVE, QNAP, ASSIST, SDM) • Fault-Tree + Recovery Info (HARP) • Object-Oriented Approaches (TANGRAM) • Loops in the specification of CTMC (SHARPE)

  50. LARGENESS TOLERANCE • Efficient numerical solution techniques • Sparse Storage • Accurate and Efficient Solution Methods We have Generated and Solved Models with 1,000,000 states (has gone up considerably recently) Steady-State : NEAR-Optimal SOR Transient: Modified Jensen's method

More Related