Systems Prognostic Health Management EMIS 7305 March 28, 2006

Systems Engineering Program Systems Prognostic Health ManagementEMIS 7305March 28, 2006 Christopher Thompson Senior Research Engineer Lockheed Martin Missiles and Fire Control Disclaimer: This briefing is unclassified and contains no proprietary information. Any views expressed by the author are his, and in no way represent those of Lockheed Martin Corporation.

Topic Outline • Introduction • Definitions • The Goal of Prognostic Health Management • PHM Stakeholders • PHM Modeling • Sensors • Prognostics Analysis Tools • Availability • Examples

Introduction Education B.S. in Electrical Engineering, SMU (1997) M.S. in Mechanical Engineering, SMU (2001) - Focus: Fatigue and Fracture Mechanics M.S. in Systems Engineering (one class remaining) - Focus: Reliability, Statistical Analysis Ph.D. in Applied Science (anticipated ~ 2008) - Proposed Dissertation Title: Sensor Optimization for Systems Prognostic-Diagnostic Health Management in a Unmanned Ground Combat Vehicle

Introduction Experience Lockheed Martin Missiles and Fire Control, Dallas TX Systems Engineer - Multifunction Utility/Logistics Equipment (MULE) Reliability Engineer - Army Tactical Missile System (TACMS) Lockheed Martin Aeronautics, Fort Worth TX Vehicle Systems - Prognostic Health Management - F-35 Joint Strike Fighter SMU School of Engineering - TA for Dr. Jerrell Stracener

Introduction Future Combat Systems MULE Program

Introduction Some keys to the successful fielding of the U.S. Army’s Future Combat Systems are: • Reducing the Logistics footprint • Increasing Availability • Reducing total cost of ownership • Implementing Performance Based Logistics • Improvements in the ‘ilities’ (RAM-T) • Reliability • Availability • Maintainability • Testability • Supportability

Some Definitions Prognostics - Of or relating to prediction; a sign of a future happening; a portent. Prognostics is the process of calculating and reporting an estimate of remaining useful life for a component, within sufficient time to repair or replace it before failure occurs.

Some Definitions Prognostic Health Management (PHM) – The implementation of an integrated software and hardware system which monitors the health, status and performance of a vehicle or system, tracks consumables (oil, batteries, ammunition, filters, fuel, coolant…) and configuration (software versions, part history…), and determines remaining life of all safety and performance critical components, predicting failures before they occur, thereby enhancing logistics and maintenance activities. PHM consists of ‘on-board’ as well as ‘off-board’ components.

Some Definitions Diagnostics - The identification of a fault or failure condition of an element, component, sub-system or system, combined with the deduction of the lowest measurable cause of that condition through confirmation, localization, and isolation. • Confirmation is the process of validation that a failure/fault has occurred, the filtering of false alarms, and assessment of intermittent behavior. • Localization is the process of restricting a failure to a subset of possible causes. • Isolation is the process of identifying a specific cause of failure, down to the smallest possible ambiguity group.

Some Definitions Fault – A condition that renders an element unable to perform its required function at desired levels of performance, or in a degraded mode. Failure – The inability of a component, system or sub-system to perform its intended function as designed. Failure may be the result of one or more faults. Fault Tolerance – The design of a system so that it will continue to operate in a degraded or reduced level rather than failing completely, when some part of the system fails.

Some Definitions Failure Cascade – The result when a failure occurs in a system of interconnected components, and the successful operation of a component depends on the successful operation of a preceding component. Conversely, a failure can trigger the failure of successive parts, and potentially amplify the result or impact. Redundancy and fault tolerant design can reduce the criticality or impact of the cascade, but not necessarily prevent a failure.

Some Definitions Design Failures – These take place due to inherent errors or flaws in the system design. Infant Mortality Failures - These cause newly manufactured systems to fail, and can generally be attributed to errors in the manufacturing process, or poor material quality control. Random Failures - These can occur at any time during the entire life of a system. Electrical systems are more likely to fail in this manner. Wear Out Failures - As a system ages, degradation will cause systems to fail. Mechanical systems are more likely to fail in this manner.

Some Definitions One-To-One Redundancy - Each active component in a system has a redundant backup on standby. The active component is monitored at all times, and the standby component will activate if the primary component fails. Since the probability of both components failing at the same time is low, One-To-One Redundancy provides the highest level of availability, but at a considerable disadvantage of requiring double the size, weight, power and cost, while reducing reliability (more components which can fail).

Some Definitions N + X Redundancy – N components are required to perform a function, but the system is configured with N + X components. When any of the N components fail, one of the X modules activates. The advantage lies in reduced size, weight, power and cost of the system, in the case where X is smaller than N. In case of multiple component failures, this scheme provides lesser system availability.

Some Definitions Load Sharing – Multiple components share a combined load. A higher level component manages load distribution, and monitors the health and status of the components. If one of the load sharing components fails, the load is re-distributed among the others, allowing for graceful performance degradation. In this scheme, there is almost no extra cost. The main disadvantage is that multiple failures, system performance may degrade below an acceptable level.

The Ultimate Goal of Prognostics The purpose of Prognostic Health Management is to repair systems before they fail, while maximizing useful life consumption, and to have the necessary parts, tools and maintainers waiting nearby to resolve the correct problem as quickly and efficiently as possible.

PHM Stakeholders

Systems Engineering’s Role in PHM • Requirements Development • System Integration • System Architecture • Interface Management • Risk Assessment • Performance Measures: TPM’s & KPP’s • System Modeling & Knowledge Integration • Functional Decomposition

PHM Requirements • The PHM system shall isolate X percent of all detected failures to a single component, within Y percent confidence interval. • The PHM system shall predict X percent of expected failures for the next Y hours of operation. • The PHM system shall predict all failures that can result in a Safety Critical Failure. • The PHM system shall incorporate sensors to assess platform health, status and performance. • The PHM system shall incorporate sensors to monitor platform consumables. • The PHM system shall record and store all sensor data in onboard memory.

The ‘Ilities’ & Product Support • Reliability • FMECA: Failure Modes & Effects Criticality • FRACAS: Failure Reporting & Corrective Actions • Measures: MTBF, MTBSA, MTBEFF, MTBUMA • Maintainability • - Maintenance Ratio • - Preventive Maintenance Checks • - Condition Based Maintenance • - Design for Maintainability • Availability • - AO, AI, AA

The ‘Ilities’ & Product Support • Testability • - Verification and Validation • - Fault Insertion • - Simulation • Supportability • Consumables Monitoring • Supply Planning and Prediction • System Safety • - Single & Multiple Fault Tolerant Design • Safety Critical Failures • Human/Machine Interaction

PHM Modeling • eXpress Modeling Tool • Model Based Reasoning • Case Based Reasoning • Knowledge Bases • Prognostics Analysis Tools

eXpress Modeling Tool DATA MINING DIAGNOSTIC, PROGNOSTIC & PHM DESIGN SENSOR FUSION REQUIREMENTS ANALYSIS Run-Time Prognostic Health Management Mission Assurance, Availability & Success Performance Based Logistics CONOPS, SPECS & LOGISTICS RISK ASSESSMENT LIFE CYCLE TRADE SPACE FRACAS & FMECA DEVELOPMENT BUSINESS CASES

Impact Technologies • Prognostics developed at Impact Technologies: • • Gas Turbine Engines and Auxiliary Systems • • Avionics PHM and Reasoning • • Aircraft Actuators (EMA, EHA) • • Switching Mode Power Supplies, GPS Receivers and Power Electronics • • Generators and Electric Drive Systems • • Bearings, Gears, Shafts, Drive Trains, and Clutches • • Hydraulic, Lube Oil and Fuel Systems • • Structures and Components • • Diesel Engines

Impact Technologies • Prognostics modules have been developed and successfully tested on the following systems: • Pratt & Whitney F-100 engine on F-15 and F-22 • Engine, generator, lubrication system and gearbox on Honeywell F124 • Oil wetted components on GE F110-129, GE F404, Rolls Royce F405 • CH-47 T-55 engine and drive-train and • CH-60 intermediate gearbox • Blackhawk Carrier Plate Prognosis System • JSF Clutch Wear and Lift-Fan Prognosis System • Fuel system and Power generation system on DDG-class Navy Ships

Impact Technologies • A number of different techniques have been used in the development of these prognostics: • Analytical and stochastic physics of failure models • Advanced signal processing • Feature extraction methods • Health state estimation and prediction algorithms • Statistical reliability • Bayesian updating methods • Component damage accumulation models • Probabilistic remaining useful life estimation • Data driven modeling techniques

Model Based Reasoning Model Based Reasoning (MBR) is a qualitative scheme where a model of the system is combined with an inference engine that is able to accomplish fault detection and fault isolation. The qualitative model is used to describe system elements and components, interconnections, and input/output behavior of the system being diagnosed, or ‘Knowledge Base’ and to establish an envelope of ‘correct behavior’. To accomplish diagnosis, the model determines what differences exist between the actual behavior of the system and the model of the system. The inference engine, using this comparison information, accomplishes the fault isolation task.

Case Based Reasoning Case Based Reasoning (CBR) is the process of solving problems based on past understanding of similar problems. The vast majority of this type of information is contained within the maintainers and operators – the experience and knowledge of the person using the system in question. CBR compares a case, forms an implicit generalization of the case, and then identifies commonalities between a retrieved case and the target problem.

Knowledge Bases ‘inorganic’ sensor data off-board prognostic trend analysis ‘organic’ sensor data KNOWLEDGE BASE FMECA data fault/failure propagation system level interactions functional interdependencies physical interdependencies design knowledge prognostic trend analysis CAD models circuit layouts Database Management: Data Mining & Feature Extraction subsystem/ LRU internal sensor data sensor fusion and signal conditioning BIT data consumables monitors maintainer inputs

Prognostic Analysis Tools • Learning Systems & Artificial Intelligence • Genetic Algorithms • Expert Systems • Fuzzy Logic • Neural Networks • Database Techniques • Feature Extraction • Data Mining • Mathematical Techniques • Kalman Filtering • Dempster-Schafer Method • Wavelets • Statistical Analysis • Chaos Math?

Prognostic Analysis Tools • Traditional Academic Solutions to PHM: • Run-to-Failure analysis of large, expensive systems, such as ship or rail engines • Analysis involves impractical, complex math models that require years of training to understand and interpret • Very expensive • Time consuming process • Rarely offer concrete design guidelines or solutions

Prognostic Analysis Tools • Why Engineers in Industry Need More: • We have bottom lines and schedules to meet! • We have customer requirements to satisfy! • Systems Engineers work with designers who don’t like impractical, complex math models that require years of training to understand and interpret! • We have program managers who don’t like very expensive, time consuming solutions! • We like concrete design guidelines and solutions!

Sensor Technology • BIT/BITE • Sensor Fusion and Virtual Sensors • Sensor Conditioning and Filtering • Smart Sensors

Availability Analysis • Availability, Achieved where MTBF = Mean Time Between Failure MTTR = Mean Time To Repair

Availability Analysis • Availability, Operational where MTBUMA = Mean Time Between Unscheduled Maintenance Actions ALDT = Administrative Logistical Down Time MTTR = Mean Time To Repair

Availability Analysis • MTBUMA = Mean Time Between Unscheduled Maintenance Actions where MTBM = Mean Time Between Failures MTBM = Mean Time Between Maintenance

Availability Analysis • How can we improve AO? - By decreasing Administrative & Logistical Down Time (ALDT) - By increasing Mean Time Between Failures (MTBF) - By decreasing Mean Time To Repair (MTTR) - By increasing Mean Time Between Unscheduled Maintenance Actions (MTBUMA) – [by decreasing MTBR induced and MTBR no defect]

Availability Analysis • How can we decrease ALDT? - By improving Logistics Improve scheduling of inspections Improve commonality of parts Decrease time to get replacements - By improving Prognostics Replace parts before they fail, not after Maximize use of component life Improve off-board prognostics trending More sensors!!

Availability Analysis • How can we increase MTBF? - By improving Reliability Select more rugged components Improve life screening and testing Improve thermal management - By improving Quality Better parts screening Better manufacturing processes - By adding Redundancy At the cost of Size, Weight and Power!

Availability Analysis • How can we decrease MTTR? - By improving Maintainability Improve quality and efficacy training Simplify fault isolation Decrease number of tools and special equipment Decrease access time (panels, connectors…) Improve Preventative Maintenance - By improving Diagnostics Improve BIT and BITE Decrease ambiguity group size Improve maintenance manuals and training

Availability Analysis • How can we increase MTBM (induced/no defect)? - By improving Safety Limit the potential for accidental damage - By improving Prognostics Improve PHM models to monitor induced damage - By improving Diagnostics Lower the false alarm rate Don’t repair/replace things which aren’t broken!

Sensor Example Engine Health/Performance Monitoring: Place an acoustic sensor on the engine housing. Establish ‘nominal’ operating parameters. Develop library relating fault precedents to failures: = odd sounds which warn of impending failure. Monitor for ‘out of nominal’ acoustic signature.

PHM Example Consider a toaster: Not just any toaster, but the toaster on the first mission to Mars. NASA could only afford to send one, and it must work, every time, or else the astronauts won’t have toast. The toaster must also not endanger the mission by causing a safety hazard or waste bread. Mission Critical Function: - make toast Safety Critical Functions: - don’t injure the astronauts - don’t damage the spaceship - don’t burn the toast!

PHM Example • Identify the elements of a toaster. • What are the failure modes? • What should we monitor for safety hazards? • What elements should we monitor for diagnostics? • What data should we collect for prognostics? • How would we optimize the sensor coverage and data collection?

Issues Related to PHM • Continually monitoring sensors and storing all that data for analysis will quickly consume available bandwidth and storage space. • Capturing ‘profound knowledge’ of a complex engineered system and its myriad failure modes is very difficult, and involves integrating knowledge which crosses discipline boundaries: SE, EE, ME, RAM-T, Safety, Software, Math, Statistics, Physics… • Prognostic analysis of data is a very difficult problem, with no easy or universal solution. • PHM is a relatively new field.

Final Remarks • Do I have any practical PHM suggestions? - Aim for the low hanging fruit Use the sensors you already have in creative ways. Only add sensors when you must. You can’t monitor everything, so don’t try. - Don’t reinvent the wheel Build on other’s work and experience. Find good tools to design your system.

Additional Prognostic Analysis Tool

Systems Prognostic Health Management EMIS 7305 March 28, 2006

Systems Prognostic Health Management EMIS 7305 March 28, 2006

Presentation Transcript

Anomaly Detection for Prognostic and Health Management System Development

Health Management Information Systems

Health Management Information Systems

EMIS - Special Education 2006-2007

Systems Prognostic Health Management EMIS 7305 March 28, 2006

HERA Status March 2006 HERA-Experiments-Coordination March 28, 2006 F. Willeke, DESY

Lecture 1 Slides March 28 th , 2006

DoSE EMIS Presentation Jan 2006 Renovation of The Gambia EMIS

Management Review - HR March’ 2006

EMIS Update – March 2007

EMIS 7305: Systems Reliability, Supportability And Availability Analysis

WatER - Telecon on March 28, 2006

March 28-29, 2006

Resolution 7305

Systems Prognostic Health Management April 1, 2008

March 28-29, 2006

EMIS Update – March 2007

Health Management Information Systems