430 likes | 556 Views
ASM. A bnormal S ituation M anagement. Defining the way things will be.
E N D
ASM Abnormal Situation Management Defining the way things will be.
ASM grew from an initial focus on alarm management. Most sites are aware that operator overload and alarm floods are common during abnormal operations. As we analyzed the issues around alarm management, we discovered that operator problems with the alarm system were only a symptom of a general issue: the design, implementation, and maintenance of many facilities, systems, and practices. The birth of ASM...
Charter: Research the causes of abnormal situations and create technologies to address this problem Deliverables: Technology, best practices, application knowledge, prototypes, metrics History: Started in 1994 Co-funded by US Govt (NIST) Budget: +$16M USD Current Status: Committed through 2002 Honeywell leadership Expanding membership ASM Consortium Current Membership: UniversityAffiliates B R A D A D A M S W A L K E R A R C H I T E C T U R E, P. C.
Requirements for Safe Operation • Hazards must be recognized and Understood • Equipment must be “fit for purpose” • Systems and procedures to maintain plant Integrity • Competent staff • Emergency Preparedness • Monitor Performance In the area of alarm management most companies fail to meet these basic requirements for safe operation
95 days 79 days 62 days 47 days 23 days 30 days 16 days 8 days 5 days Production Target set by Enterprise A Look At Plant Operations A typical Production Profile for an Asset Intensive Facility for a calendar year. Days per Year < 60% Daily Production 95% 100%
Factors Affecting Plant Operations Plant Operating Target Planning Constraints Plant Availability Operational Constraints Plant Incidents Production Effectiveness Asset Utilization Days per Year Plant Capacity Limit < 60% Daily Production 95% 100% Agility/Flexibility
Real Life Examples This plant had $24.2M in lost capacity due to asset availability & incidents! 24.2M This plant had 5.8% in lost capacity! 5.8% This plant lost $38.5M! And this plant lost $33.5M!
NEW EMPHASIS!! Asset Management Reliability & CMMS Site Studies have identified Plant Lost Opportunity Between 3-15% in Lost Capacity is attributed to asset in-availability and incidents Plant Operating Target Planning Constraints Plant Availability Operational Constraints Production Management DCS/APC/ Optimization efforts Plant Incidents Days per Year Plant Capacity Limit < 60% Daily Production 95% 100% Manufacturing Execution Scheduling & ERP
Major Profit Potential Emphasis on plant & equipment reliability improvements and reduced incidents can result in a recovery of 3-15% of lost capacity! Higher Plant Operating Target Fewer Planning Constraints Fewer Operational Constraints Days per Year Plant Capacity Limit < 60% Daily Production 95% 100%
The Importance of Alarm Management Improvement Project Alarm management is the proper design, implementation, operation, and maintenance of industrial manufacturing plant alarm systems. Current alarming practices are leading to Incidents Major problem is:- alarm flood Standing Alarms Poor Configuration of Alarms Nuisance Alarms Technology exists to significantly contribute to effective alarm systems and provide good Situation Awareness
The lightning struck just before 9:00 AM on a Sunday. It immediately started a fire in the crude distillation unit of the refinery. The control operators on duty responded by calling out the fire brigade, and then had to divert their attention to a growing number of alarms while desperately trying to bring the crude unit to a safe emergency shutdown. Hydrocarbon flow was lost to the deethanizer in the FCCU recovery section, which fed the debutanizer further along. The system was arranged to prevent total loss of liquid level in the two vessels, so the falling level in the deethanizer caused the deethanizer discharge valve to close. This, in turn, caused the level in the debutanizer to drop rapidly and its discharge valve also closed. Heat remained on the debutanizer and the trapped liquid vaporized as the pressure rose causing the pressure relief valve to “pop” (for the first of three times) into the flare KO drum and then immediately onto the flare itself. A Case b
In a matter of minutes, the board operator was able to restore flow to the deethanizer. This permitted the deethanizer discharge valve to be opened, allowing renewed flow forward to the debutanizer. The rising level in the debutanizer should have caused the debutanizer discharge valve to open (by the level controller action) and allow flow on to the naphtha splitter. Although the operators in the control room received a signal indicating the valve had opened, the debutanizer, nonetheless was filling rapidly with liquid while the naphtha splitter was emptying. The operators were concentrating on the displays which focussed on the problems with the deethanizer and debutanizer, and had no overview of the process available to indicate that even though the debutanizer discharge valve registered as open, there was no flow going from the debutanizer to the naphtha splitter. continued b
Despite attempts to divert the excess, the debutanizer became liquid-logged about an hour later and the pressure relief valve lifted for the second time, venting to the flare via the flare KO drum. Because there were enormous volumes of gas venting, the level of liquid in the flare KO drum was rising to a very high value. About 2-1/2 hours later, the debutanizer vented to the flare a third time AND CONTINUED VENTING FOR 36 MINUTES. The high level alarm for the flare drum was activated at this time. But with alarms going off every 2 to 3 seconds, there appears to be no evidence that that alarm was ever seen. By this time, the flare KO drum had filled with liquid well beyond its design capacity. The fast-flowing gas through the overfilled drum forced liquid out of the drum’s discharge pipe. The discharge line was not designed for liquid, so the force of the liquid caused a rupture at an elbow. This released over 20 tons of highly flammable hydrocarbon.
The ensuing release quickly formed an ominous drifting cloud of vapor and droplets. In a matter of minutes, this cloud found its ignition source 350 feet downwind. The resulting explosion was heard 80 miles away. In the town nearest the plant, few windows still held intact panes, so overpowering was the pressure shock wave from the blast. The last fires in the refinery were eventually extinguished 2 days later. end continued
Interface between the organization & the individual Management Defences Workplace Source Failure Types Functional Failure Types Condition Tokens Precursors Unsafe Acts Errors & Violations Organization Individual Stylistic or Cultural Indicators Top Down: Commitment Competence Cognizance data collected &analyzed Diagnostic and remedial measures Near miss Auditing Du Pont Training Workspace Motivation • Attitude • Group Factors • Working Practice Poor workplace design High workload Unsociable hours Inadequate training Poor perception of hazards Alarms Human Factors Control room design General Failure Types Accidents Incidents Near-Misses 1-10 hit list Proactive Design SI Projects Best Practices Safety Information System
Managing Abnormal Situations Anatomy of a Disaster from Operations Perspective Critical Systems: Operational Modes: Operational Goals: Plant Activities: Plant States: Area Emergency Response System Disaster Emergency Minimize Impact Firefighting First Aid Rescue Evacuation Site Emergency Response System Accident Physical and Mechanical Containment System Bring to Safe State Out of Control Safety Shutdown, Protective Systems, Hardwired Emergency Alarms DCS Alarm System Abnormal Return to Normal Manual Control & Troubleshooting Abnormal Decision Support System Process Equipment, DCS, Automatic Controls Plant Management Systems Keep Normal Preventative Monitoring & Testing Normal Normal
Summarized Production Data Unexpected Upsets Cost 3-8% of Capacity ~ $10 Billion annually in lost production ! Plant Operating Target Planning Constraints Operational Constraints Optimization efforts Days per Year Plant Capacity Limit < 60% Daily Production 95% 100%
Major Profit Potential Higher Plant Operating Target Fewer Planning Constraints Fewer Operational Constraints Focused efforts can result in recovery of 3-8% of capacity Days per Year Plant Capacity Limit < 60% Daily Production 95% 100% ~ $10 Billion potential to the bottom line!
Failure Occurrence in the Failure is Safe status of the Process or in the Detected Process assured Safeguarding System System internal diagnostic time Timing diagram of DIN V 19251 as applicable for a single channel SRS with ultimate self tests executed within the PST t Time for Time for reaction of the Process corrective action on the corrective action Fault Tolerance Time Fault tolerance time of the process or Process Safety Time (PST)
Reliability Requirements for Alarms No special requirements – however the alarm system should be operated engineered and maintained to the good engineering standards identified in the EEMUA Guide EMMUA Alarm Systems Guide page 17
Actual remaining risk Risk to meet required Level of Safety EUC Risk Necessary minimum risk reduction [DR] Increasing Risk Actual risk reduction Partial risk covered by E/E/PES SRSs Partial risk covered by Other Technology SRSs Partial risk covered by External Risk Reduction Facilities Risk reduction achieved by all SRSs & External Risk Reduction Facilities CONCEPT 1 : RISK REDUCTION
TABLE 2: SAFETY INTEGRITY LEVELS: TARGET FAILURE MEASURES DEMAND MODE OF OPERATION (Average Probability of failure to perform its design function on demand) CONTINUOUS/HIGH DEMAND MODE OF OPERATION (Average Probability of a dangerous failure per year) SAFETY INTEGRITY LEVEL (SIL) 10-5 to < 10-4 10-5 to < 10-4 4 10-4 to < 10-3 10-4 to < 10-3 3 10-3 to < 10-2 10-3 to < 10-2 2 10-2 to < 10-1 10-2 to < 10-1 1 SAFETY INTEGRITY LEVELS
Reliability requirements for alarms EMMUA Alarm Systems Guide page 17
Reliability requirements for alarms EMMUA Alarm Systems Guide page 17
The Setting of a high pre-trip alarm Maximum rate of change of alarmed variable during fault Limit at which protection operates B Time for operator to respond to alarm and correct fault Abnormal Operating Region Alarm Setting A Limit of largest normal operational fluctuation EMMUA Alarm Systems Guide page 17
120 Explosion Lower Explosive Limit (LEL) 100 Actual Gas Concentration 80 Actual trip point Normal operating Level Error Measured Gas Concentration 60 Set trip point Gas Concentration (Percentage of LEL) Gas concentration prior to fault 40 20 Fault Occurs Sampling Delay Sensor Delay Error Delay Shut Down System Delay 0 0 10 20 30 40 50 60 70 80 Time after onset of fault (Seconds)
Redesign - the plant or its controls to provide greater margin between the normal operating limits & the trip limits. This is the most desirable solution but is often impractical or too expensive; Setting within normal operating limits - setting the alam within the limits of normal operating fluctuations & accepting that spurious alarms will occur during large normal disturbances. This is ergonomically very undesirable and will tend to increase alarm rates and reduce the operator confidence in the alarm system. In effect it increases the Average Probability of Failure on Demand (PFDavg) for the alarm system as a whole; setting nearer trip limits - setting the alarm closer to the trip limits and accepting that some fast transients will not be corrected by the operator before they reach the trip level. This will increase the production losses due to plant trips, & because there are more demands on the protection system, tend to make the plant less safe. It also implies an increase PFDavg for the alarm system. Redesign Choices EMMUA Alarm Systems Guide page 17
Different Kinds of Events Potential Impact of Initiating Event Abrupt/Catastrophic Manageable Insidious Time
Failure is Safe status of the Detected Process assured Impact of DCS Alarm SystemAwareness of Disturbances With typical alarm systems, orienting begins after an event creates an abnormal plant state. The extent of the problem can impact operator’s ability to be fully aware of the locations of process disturbances. As disturbances propagate the number of conditions to be aware of increases as well as the response requirements and the likelihood of missing important information. Incident Potential Impact of Initiating Event Failure Occurrence in the Process or in the Safeguarding System Time Point of operator awareness Correct intervention causes return to normal
Point of operator awareness Correct intervention causes return to normal Impact of DCS Alarm System Management of Problems Incident Inadequate filtering interferes with Action Potential Impact of Initiating Event Alarm Floods delay Evaluation Standing Alarms interfere with Orientation Time
Increases likelihood of awareness of disturbances Reduces time to awareness Hence, reduces the average impact of initiating events Impact of Good Alarm Management in Situation Awareness Potential Impact of Initiating Event Time Average shift in awareness with decision support
Operator diagnostic time High Emergency Trip from SIS Impact of Protection System UN-SAFE Emergency Alarm Incident Trip SAFE Impact of Initiating Event Loss Quality Profit High Alarm Time FTT Process Safety Time FTT= Fault Tolerance Time
No response Incorrect Potential Impact of Initiating Event Suboptimal Best Time
Reduces errors Decreases time to implement response Manages side effects Increases awareness Impact of Decision Support SystemSupport for Optimal Response Potential Impact of Initiating Event Time
Education for Management, Engineers, Technicians and Operators. Alarm Performance Assessment. Requirement for alarm optimization tools. Alignment with Company & EEMUA Guidelines. Alarm Rationalization. User Interface Design. Decision Support Activities ASM Alarm Management Solutions
Enhance operator effectiveness Avoid alarm floods Identify root causes Eliminate nuisance alarms Enhance profitability Reduce variability Maximize plant up time Prevent damage to equipment Reduce risk of : Injury to personnel Environmental incidents Alarm Management Optimization Objectives
Alarm Management Optimization The Process Develop Plant Alarm Management Standards & Philosophy Collect Data Change Management Analyze Implement Identify Enhancements Verify Against Standards
Increase the effectiveness of the existing alarm system through proven methodology Analyze existing system performance Assist in developing an alarm strategy and educating operations staff Rationalize existing alarm system Recommend and apply new alarm management software UserAlert Optimization Suite Alarm Rationalization and Documentation Alarm Metrics and Analysis Advanced Alarm Handlers Before - 30 Points Account for ~ 85 % of All Alarms 100K After - 30 Points Account for ~ 52 % of All Alarms 2K Alarm Management Optimization Alarm Management
Alarm priority (class) is based on severity and level of impact and time Available priority options in TPS: No Action Journal Print Print & Journal Low High Emergency Optimization Suite… Alarm Rationalization
Recommends alarm priorities based on plant philosophy Severity of impact Time to respond Trip Point Electronically captures plant alarm management philosophy Time to respond rules definition Impact and severity rules definition Apply manual priority override Use Alarm Impact Templates Generate EC Files (Honeywell) Optimization Suite… Alarm Rationalization