320 likes | 505 Views
MAX (MYRRHA Accelerator eXperiment ) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case. Adrian Pitigoi – EA ( Spain ). A. Reliability Basics – Concepts & Definitions. B. Common techniques in Reliability Analysis. C. Modeling High-power Accelerators Reliability
E N D
MAX (MYRRHA Accelerator eXperiment) – MAX School- Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)
A. Reliability Basics – Concepts & Definitions B. Common techniques in Reliability Analysis C. Modeling High-power Accelerators Reliability - SNS Linac case (SNS-ORNL) - MyrrhaLinac (MAX project)
Failure -The change functioning - failed state • Repair - Change from a failure to a functioning • Repairing - bring the component /system back to an “as good as new” condition. • For a repairable system, the cycle continues repeatedly with the repair-to failure and the failure-to-repair process. A. Reliability Basics – Concepts & Definitions • Reliability Analysis – objectives: • Evaluate Failure rate of components and overall system reliability • Evaluate Design feasibility, compare design alternatives, • Identify potential failure areas and track reliability improvement. Reliability / Unreliability • Reliability, R(t) - probability that the component/system experiences no failures during the time interval 0 - t1 (new condition /functioning at t0). • Unreliability, F(t) - probability that the component /system experiences the first failure or has failed one or more times during the interval 0 - t, (operating or repaired to a like new condition at t0). • The numerical values of both reliability and unreliability are expressed as a probability from 0 to 1. Availability / Unavailability • Availability, A(t) - probability that the component or system is operating at time t, given that it was operating at time zero. • Unavailability, Q(t), - probability that the component or system is not operating at time t, given that is was operating at time zero. • Therefore, the following relationship holds true since a component or system must be either operating or not operating at any time: A(t) + Q(t) = 1) R(t) + F(t) = 1 Unavailability Q(t) ≤ Unreliability F(t) – (rep) Unreliability F(t) = 1 – R(t)
Failure Rates A. Reliability Basics – Concepts & Definitions • Conditional Failure Rate or Failure Intensity, λ(t) - anticipated number of times an item will fail in a specified time period, (good as new at t0 and functioning at time t). • It is a calculated value that provides a measure of reliability for a product. This value is normally expressed as failures per million hours (fpmh or 106 hours) • Basic categories of failure rates: • Mean time between failures (MTBF) - basic measure of reliability for repairable items = time passed before a component, assembly, or system fails, under the condition of a constant failure rate / expected value of time between two consecutive failures, for repairable systems. • It is a commonly used variable in reliability and maintainability analyses. • Mean time to failure (MTTF) • (non-repairable systems) Ex: a component with a failure rate of 2 failures/106h Is expected to fail 2 times in a million-hour time period. • Mean time to repair (MTTR) - total amount of time spent performing all corrective or preventative maintenance repairs divided by the total number of those repairs. • It is the expected span of time from a failure (or shut down) to the repair or maintenance completion. • This term is typically only used with repairable systems. MTBF= 1/λ, λ=ct.
Relationships Between Failure Parameters Failure Frequencies A. Reliability Basics – Concepts & Definitions • Failure Density f (t)of a comp./system - probability per unit time that the component or system experiences its first failure at time t, given that the comp./ system was operating at time zero.) • Failure Rate r(t)of a component or system, r(t) - probability per unit time that the component or system experiences a failure at time t, (operating at time zero and survived to time t). • Conditional Failure Intensity (Conditional Failure Rate) λ (t) - probability per unit time that the component or system experiences a failure at time t, (operating, or was repaired to be as good as new, at time zero and operating at time t). • Unconditional Failure Intensity or Failure Frequency ω(t)- probability per unit time that the component or system experiences a failure at time t, (operating at time zero). • r(t), λ(t) Difference:failure rate definition addresses the first failure of the component or system rather than any failure of the component or system • CFI-λ(t), ω(t) Difference: the CFI has an additional condition that the component or system has survived to time t. • For most reliability and availability studies the unavailability Q(t) of components and systems is very much less than 1. In such cases.
Constant failure rates • If the failure rate - constant then the following expressions apply: A constant failure rate results in an exponential failure density distribution. A. Reliability Basics – Concepts & Definitions Repairable and Non-repairable Items • Non-repairable items • Components or systems as light bulb, transistor, rocket motor, etc. • Reliability - survival probability over the items expected life / a specific period of time during its life, when only one failure can occur. • The instantaneous probability of the first and only failure is called the hazard rate or failure rate, r(t) . • Life values such as MTTF -used to define non-repairable items. • Non-repairable items • Reliability is the probability that failure will not occur in the time period of interest; when more than one failure can occur, reliability can be expressed as the failure rate, λ. • Reliability can be characterized by MTBF, but only under the condition of constant failure rate. • Availability, A(t), is affected by the rate of occurrence of failures (failure rate, λ) or MTBF plus maintenance time. • A(t) is the probability that an item is in an operable state at any time. • Maintenance can be corrective (repair) • or preventive (reducing the likelihood of failure)
Existence of two or more means, not necessarily identical, for accomplishing a given single function. • Active Redundancy - Active Standby/Hot Standby • All items operating simultaneously in parallel. • No change in the failure rate of the surviving item after the failure of a companion item. • Standby Redundancy • Alternate items are activated upon failure of the first item. • Only one item is operating at a time to accomplish the function. • Warm Standby • Normally active or operational, but not under load. • Failure rate will be less due to lower stress. • Cold Standby (Passive) • Normally not operating. • Failure of an item forces standby item to start operating. • k-out-of-n Systems • Redundant system of n items in which k of the n items must function for the system to function (voting decision). Redundancy A. Reliability Basics – Concepts & Definitions • Active, Standby and Passive Redundancy function. • Redundant components can be fully activated (active), partially activated (standby) or switched off completely (passive). • A mix of the above activity levels is also possible. • Certain Failure modes of one component (short-circuit, major leakeage,etc.) could lead to system failure.
Advantage: ease of reliability expression and evaluation (common system rel. analysis tool- mission success oriented) Reliability block diagrams B. Common techniques in Reliability Analysis A reliability block diagram shows the system reliability structure. It is made up of individual blocks and each block corresponds to a system module or function. The blocks in either series or parallel structure can be merged into a new block with the reliability expression of the equations a), b).
Reliability block diagrams B. Common techniques in Reliability Analysis Five parallel-series connected modules k-out-of-n configuration The merged blocks
Common tool in system safety analysis. It has been adapted in a range of reliability applications - mission fail oriented. Fault Tree Analysis B. Common techniques in Reliability Analysis A fault tree diagram is the underlying graphical model in fault tree analysis. The fault tree shows which combinations of the component failures will result in a system failure; it represents the logical relationships of ‘AND’ and ‘OR’ among diverse failure events. The status of output/top event can be derived by the status of input events and the connections of the logical gates. Fault tree for five modules A fault tree diagram can describe the fault propagation in a system
C. Modeling High-power Accelerators Reliability- SNS Linac case (SNS-ORNL) - MyrrhaLinac (MAX project)
Layout of the SNS Linac 1. SNS Linac Modeling Objective - Feedback on actual SNS reliability performance, in order to develop a reliability modeling toolfor MAX project Activities: Selection of the accelerator to be used for modeling (SNS) SNS Design & Reliability data collection Development of SNS Linac RS reliability model Performing reliability analysis of SNS Linac systems, Targets: Evaluate the SNS Linac model (model results vs. SNS operational data) Conclusions and recommendations on optimization, increasing reliability.
SNS BlockSim Model 2. SNS Model - INPUT DATA http://status.sns.ornl.gov/beam.jsp SNS Design Data SNS main/auxiliary systems Number of components (by type) Data Sources: SNS RAMI Static Model; SNS BlockSim model (Reliasoft) SNS Systems and Functions SNS Parameters Systems and components System functions & interfaces Data Sources: SNS website (http://neutrons.ornl.gov/facilities/SNS/) http://neutrons.ornl.gov/facilities/SNS/works.shtml; SNS Parameters (doc no. SNS 100000000-PL001R13) (http://neutrons.ornl.gov/media/pubs/pdf/sns_parameters_list_june05.pdf) SNS Design Control Documents (DCD) • SNS Reliability Data • Number of components (by type) • Degree of redundancy • Failure data: λ=1/MTTF; MTTR • (λ– Failure rate; MTTF-Main Time To Failure; • MTTR-Main Time To Repair) • Data Sources: • RAMI Static Model; SNS BlockSim model • SNS Operating Status • Component failures - cause, type of component, time to repair, etc. • Availability data (component failures causing accelerator trips: cause, component and system concerned, duration of trip) • Data Sources: • SNS Operation Data collection (http://status.sns.ornl.gov/beam.jsp)
General Assumptions • SNS systems/components not modeled – Ring - RTBT, stripper foil, etc. (considered as not relevant for Max project purposes) • Risk Spectrum Type 1 – Repairable components reliability model (continuously monitored) – Type 1 reliability model - modeling all SNS Linac components • Failure/Repair processes – exponential distributions; • failure/repair rates ct. • It is assumed q=0 • λ=1/MTTF -failure rate); µ=1/MTTR -repair rate • (MTTF;MTTR data – BlockSim Model data) • ¨Mean Unavailability¨ type of calculation is used to obtain the unavailability values for the basic events: • Q=λ/(λ+µ) (the long-term average unavailability Q was calculated for each basic event) 3. Modeling Methodology
4. SNS Reliability Model - Fault Tree Model SNS Module 1- first modeling step: RFQ + MEBT + DTL Gradual development of the SNS Linac model In-depth understanding of the SNS design and functioning for an accurate model.
4. SNS Reliability Model - Fault Tree Model SNS Fault Tree (complete model)- graphical representation of the SNS systems functional structure describing undesired events (“ system failures") and their causes. • The Fault tree – logical gates and basic events. • A fault tree - subdivided between several fault tree pages (bound together using transfer gates).
4. Modeling the SNS Linac SNS Linac Fault Tree Structure - Main levels of the fault trees - major parts of the SNS accelerator (Ion Source, LEBT, RFQ, MEBT, DTL-CCL-SCL, HEBT, CONV - auxiliary systems)
4. Modeling the SNS Linac • DTL RF Fault Tree Structure
4. Modeling the SNS Linac • CCL Transmitter Fault Tree Structure
Analysis Case – Results • Q = 2.60E-01 = 0.26; Q = 26 % • A = 1 - Q = 73 % (the limit Availability – • Mean Availability) 5. SNS Systems - Reliability Analysis Results Minimal Cut-sets (MCS) MCS Contribution
Analysis Case – Results • Q = 2.60E-01 = 0.26; Q = 26 % • A = 1 - Q = 73 % (the limit Availability – • Mean Availability) Minimal Cut-sets (MCS) 5. SNS Systems - Reliability Analysis Results
Analysis Case – Results • Q = 2.60E-01 = 0.26; Q = 26 % • A = 1 - Q = 73 % (the limit Availability – Mean Availability) 5. SNS Systems - Reliability Analysis Results • MCS Analysishas been performed for the SNS Linac complete model (SNS ACC DOWN), or different parts (SCL, etc.) of the accelerator, with the following conclusions: • Results - wide range of failure modesfor comps/systems (wide failures dispersion) • The Linac, (DTL-CCL-SCL)represents the most concerned part (Q=1.25E-01; A=87.5%) • The higher values of Unavailability: • SCL (Q=9.85E-02; A=90%) • DGN&C (Q=7.15E-02; A=93%) • Front-End (Q=6.93E-02; A=93%) • The most affected part of the SCL is the SCL RF system: Q=6.33E-02; A=94% (primarily due to power supplies failures and • klystron failures, but also to cooling and vacuum malfunctions) • The most affected parts of the Front-End are the LEBT (Q=2.83E-02; A=97%) and MEBT (Q= 2.82E-02; A=97%), more • specifically the magnets the vacuum systems
Accelerator trip failures frequency (by system) SNS Reliability graphics (Logbook Availability and failure data) 5. SNS Logbook Data – Accelerator trip failures Availability (Oct.2011 - June 2012) SNS Outages (Jan-Feb, June 2012) • RF system and electrical system failures - the most frequent; • Electrical systems failures - the most important contribution to total accelerator downtime • (in consonance with the conclusions from the SNS RS Model runs) Accelerator downtime contribution (by system)
RF System failures (no. & duration-hours) 5. SNS Logbook Data – Accelerator trip failures Electrical subsystems contribution to the acc. downtime • The most affected subsystems of the SNS Linac (failures leading to accelerator trips): • SCL-HPRF (Superconducting Linac - High Power Radiofrequency)- • (short failures frequency) • HVCM (High Voltage Converter Modulator (duration of trips) • (in accordance with the SCL RS analysis)
SNS Reliability considerations (from past operation experience) • The reliability of input data mix used (RAMI static model, BlockSim model) - sources - data from staff Engineers, manufacturers (e.g. Titan, Varian, Maxwel), design reviews, etc. • A reliability program has been implemented at SNS, reaching significant increase of the reliability of SNS installations in the past few years. • SNS RS Model Limitations • SNS reliability data (MTTF; MTTR) - SNS data mix • The reliability improvement program - not quantified/represented in the RS model. • The LEBT and DGN&C modules - relatively less developed(lack of detailed information) Accelerator reliability Workshop in Cape Town, South Africa in April 2011 (G.Dodson talk) 5. SNS Reliability modeling – Model evaluation • The availability results obtained by MCS analysis run separately for the different SNS Linac parts (IS, RFQ, MEBT, DTL, CCL, SCL, HEBT) have matched up very well with the SNS Logbook Availability records, although the global result is A=73%. This is attributable to the fact that the MTTF and MTTR values used for model quantification may be too conservative and other constraints above. • Considering the reliability database used for quantifying, and the fact that the last years reliability improvements have not been included in the model, it can be affirmed that the overall availability of the SNS Linac (A=73%) resulting from RS model is confirmed by the availability figures of the SNS from the first years of SNS operation
The reliability results show that the most affected SNS Linac parts/systems are: • SCL, Front-End systems (IS, LEBT, MEBT), Diagnostics & Controls • RF systems (especially the SCL RF system) • Power Supplies and PS Controllers • These results are in line with the records in the SNS Logbook • Reliability issue that most needs to be enforced in the linac design is the redundancy of the • systems, subsystems and components most affected by failures • Need for intelligent fail-over redundancy implementation in controllers, for compensation purposes • Enough diagnostics have to be implemented to allow reliable functioning of the redundant solutions and to • ensure the compensation function. 6. Conclusions
7. MAX Task 4.4 – Myrrhalinac Reliability model • Overall approach • Fault Tree, based on SNS model + Max design • Basic Events: Component / Function failures • Undeveloped Events/Systems: Reliability targets • Reliability model: Availability / Failure frequency(Linac shutdown) • Reliability Analysis: Design Optimization • Design & reliability data base • Data Source: SNS, Max team, suppliers, conservative assumptions / reliability targets • Support systems – gen. level
Reliability challenges: • Injector Switch reliability and duration • Conditions: High-reliability of injectors, reduced MTTR and possibility to perform maintenance without stopping the beam. • Injector Switch sequence: • fault detection and first action of MPS • few beam restart tries (w/ short pulses) by the MPS and the fault confirmation • fault full diagnostic and acknowledgement by the control system • dipole magnets switch • fast beam commissioning before reaching nominal beam 7. MAX Task 4.4 – Myrrhalinac Reliability (MTBF > 250 h) Reliability analysis objective: to determine the relation MTTR-MTBF in configuration of 2 injectors, 1 operational and 1 hot standby
Reliability challenges: • Fault tolerance/compensation function (linac fault-recovery system) • Faults compensation- special conditions for the detuning system (CTS piezo detuning of the failed cavities) - higher failure rate should be considered (lower MTBF) • Fault detection + Compensation sequence: • Recovery Data processing - Linac Control System defining new set-points (load or calculate) • RF fields updating in the corrective cavities (by CCSs) • CCS (LLRF loop + CTS) • fast beam commissioning before reaching nominal beam 7. MAX Task 4.4 – Myrrhalinac Reliability (MTBF > 250 h)
7. MAX Task 4.4 – Next steps • Development of the MyrrhaLinac Reliability model, based on the SNS RS Model and considering the • SNS reliability analysis results and conclusions. • Iterative process – MyrrhaLinac Model to be updated during design work • Myrrhalinac Risk Spectrum fault tree - currently under development • Reliability analysis to be performed, with due consideration of reliability challenges • Special attention - design of Diagnostics and Control systems (advanced)