300 likes | 424 Views
JHBickel - ESRT, LLC. 2. Motivations for this work:. No prior risk or importance analysis of existing digital RPS failure experience existsPrior NRC Research reports concluded LER data too sparse to use Only found: 18 microprocessor failures, 4 software failuresSuggested need to consider data fr
E N D
1. 1 Risk Implications of Digital RPS Operating ExperienceFor Presentation atIAEA Technical Meeting on Common-Cause Failures inDigital Instrumentation and Control Systems of Nuclear Power PlantsJune 19-21, 2007Bethesda, Maryland, USA Dr. John H. Bickel
Evergreen Safety & Reliability Technologies, LLC
2. JHBickel - ESRT, LLC 2 Motivations for this work: No prior risk or importance analysis of existing digital RPS failure experience exists
Prior NRC Research reports concluded LER data too sparse to use
Only found: 18 microprocessor failures, 4 software failures
Suggested need to consider data from aerospace, medical, transport systems
Lack of data implied: inability to risk-inform digital I&C applications and issues
My belief:
Much more data actually exists on CE CPCS
Risks from CPCS experience should be assessed
3. JHBickel - ESRT, LLC 3 CE Digital Core Protection Calculator Basics: CE High LPD, Low DNBR RPS design switched from analog Thermal Margin/Low Pressure Trip to digital Core Protection Calculators in mid 1970s
Used 6 specially qualified minicomputers running stored computer software and addressable constants
CPCS performs static/dynamic projections of local power density and DNBR based upon:
Ex-core neutron flux
Pressurizer pressure
Reactor Tcold, Thot
RCP pump speed
Control rod positions
CPCS generates: alarms, pre-trip, and trip safety actions
Original system was licensed on ANO-2 in 1978
Subsequently utilized at: SONGS-2/3, Waterford-3, Palo Verde-1/2/3
. and Korean Standard NPPs
4. 4 CE Digital Core Protection Calculator Basics: CPCS credited for reactor trip for following events:
Uncontrolled Control Rod withdrawal from critical (>10-4 power)
Uncontrolled Boron Dilution from critical (>10-4 power)
Uncontrolled Control Rod withdrawal from power operation
Dropped, or mis-positioned Control Rods
Ejected Control Rods
Single RCP loss of flow
Single RCP shaft seizure
4-RCP loss of flow
Electrical grid under-frequency
Excess secondary steam flow (including turbine bypass valve malfunction)
Excess feedwater flow
Loss of feedwater heater
Steam line break
Single MSIV closure
Rapid increase in local power
5. JHBickel - ESRT, LLC 5 CE Digital CPCS Software Basics: Software Design: One Good Version not N-Version
6. JHBickel - ESRT, LLC 6 CE Digital CPCS Interchannel Communications Basics: 4 CPC computers evaluate: LPD, DNBR using neutron flux, temperature, RCS flow and control rod position inputs in each quadrant
2 CEA computers (CEACs) monitor all quadrants for CEA deviations within groups and generate Penalty Factors transmitted to all 4 CPCs
CEACs communicate to CPCs via one-way simplex communication links
7. JHBickel - ESRT, LLC 7 CE Reactor Protection System PRA Basics: PRA Assessments of overall CE RPS have existed for some time (2001)
Component unavailabilities based on time averaged values
NUREG/CR-5500 Vol.10:
QRPS = 7.2E-6 (Digital CPCS, w/o Operator Action)
QRPS = 1.6E-6 (Digital CPCS, w/ Operator Action)
Relay and breaker CCF dominates predicted QRPS :
CCF of master trip relays (K-1 through K-4)
CCF of reactor trip breaker is not as significant on CE design due to configuration
8. JHBickel - ESRT, LLC 8 How This Study was Carried Out: Failure experience from on-line NRC LER data base currently goes back to 1984 (NOTE: misses first 6 years ANO-2 experience)
Post-1984 CPC LERs on CE plants were evaluated
CPCS Failure experience categorized by subsystem
Size of operating experience pool:
141 LERs (1984 2005)
~145.5 Rx years (or: 1.27x106 Rx hr)
70 actual CPC reactor trip demands
26 events involving latent CCF (including: 1 latent software CCF)
Subsystem failure rates calculated via Bayesian estimation using Jeffreys non-informative prior
CCDP risk estimated via ASP approach
Method highlights CCDP impact of higher than average unavailability
9. JHBickel - ESRT, LLC 9 How Component Population Was Estimated: Total CPCS subsystem operating time estimation was based upon above component inventory per plant
Total CPCS operating time (for 4/4 Channel CCF estimation) was simply total plant operating time.
10. JHBickel - ESRT, LLC 10 How Subsystem Operating Time Was Estimated Each of 4 CPC Computers and 2 CEAC Computers contain: 1 processor board, 1 memory board, 1 multiplexer board, 1 external Watchdog Timer
Each of 4 CPC Channels contains: 1 PZR pressure sensor, 3 ex-core neutron flux inputs, 4 RCP speed sensors, 2 Tcold and 2 Thot inputs
11. JHBickel - ESRT, LLC 11 Subsystem failure rates were calculated via Bayesian estimation using Jeffreys non-informative prior Technique allows bounding failure rate estimation for 0 observed failures
12. JHBickel - ESRT, LLC 12 Failure Rate and Unavailability Estimation Issues Data Needs for Risk Estimation Process:
Ability to estimate CCDP given specific event demands and event-conditional system unavailabilities (such as RPS)
Includes conditional unavailability due to specific combinations of input conditions to digital system
Certain software bugs only triggered by unusual input sets
Overall RPS unavailability must consider combinations of random and CCF events
Operating experience estimates failure rates: ?
Conversion to RPS unavailability uses estimate of time to detect and restore: P = ? x (fault duration)
In many cases for latent Digital CCFs fault durations are many months
13. JHBickel - ESRT, LLC 13 Actual Design Basis CPCS Trip Demands
14. JHBickel - ESRT, LLC 14 Estimated CPCS Single Subsystem Failure Rates
15. JHBickel - ESRT, LLC 15 CPCS Single Subsystem Failure Rates Also important to note:
Failure modes of recent regulatory concern which have not occurred in population exposure time
Recall failure rates can be estimated as: ? ~ 0.5/T
Faults propagated via inter-channel communication:
2 events noted involving loss of CPCS -> Plant Computer communications link that resulted in failure to perform Tech. Spec. required cross-checks, ? = 2.5 / ( 6 x 1.27x106 hours) = 3.3 x10-7/hr
Other events in which communication link failure occurred without operation impairment likely occurred but not reported in LER data base
Events involving a failure propagating to CPC or CEAC would be in LER data base if they occurred
0 events noted in which a communication link failure caused corruption to CPC or CEAC channel, ? ~ 0.5 / ( 6 x 1.27x106 hours), or: ~ 6.6 x10-8/hr
16. JHBickel - ESRT, LLC 16 Estimated CPCS Double Event Failure Rates
17. JHBickel - ESRT, LLC 17 Estimated CPCS System CCF Failure Rates
18. JHBickel - ESRT, LLC 18 Results: CPCS System CCF Failure Rates The issue of latent software CCF represents only 4% of the CCF experience
Calibration, generating, loading of incorrect data sets are the dominant sources of CCF
19. JHBickel - ESRT, LLC 19 Types of Observed CCF Events: Inaccurate cross-calibration of all Ex-core neutron flux (7 events) or all RCS flow channels (2 events)
Computer technicians insert wrong addressable constant data sets into all 4 CPCS channels (3 events)
Swapping addressable data sets between units
CE supplies erroneous data sets (2 events)
Software update provided to plant with incorrect logic for processing of indicated failed sensors (1 event)
20. JHBickel - ESRT, LLC 20 Risk significance of this failure experience? None of actual CCF events resulted in core damage (all were latent faults missing triggering event)
Need to consider CCDP implications of specific failure modes
Intent: apply risk screening process similar to NRC ASP program which focuses on higher than average values of system unavailability
Use: ASP-type failure rate data, SPAR plant specific risk models, actual observed unavailability
CCDP = S ?i x PCPCS-CCF x HEPNR
PCPCS-CCF = ?CPCS-CCF x (duration of latent fault)
First: How sensitive is CCDP to RPS Logic CCF ?
21. JHBickel - ESRT, LLC 21 How sensitive is CCDP to RPS Logic CCF ? RPS failure considers:
Mechanical CCF jamming of control rods
Relay/Breaker CCF failure
RPS Logic CCF
Operators fail to manually trip
Operators fail to trip MG sets
Loss of Offsite Power generates reactor trip without RPS
Sensitivity studies conducted using NRC SPAR PRA models
22. JHBickel - ESRT, LLC 22 How sensitive is CCDP to RPS Logic CCF ? Variations in RPS-LOGIC-CCF are not risk significant until > 1x10-3
23. JHBickel - ESRT, LLC 23 Some example risk assessments of actual Digital CCF events
24. JHBickel - ESRT, LLC 24 1995 SONGS 2-3 Addressable Data Swapped Rod shadowing constants (on data disks) were swapped between adjacent SONGS units for 10,968 hours.
Units at different power and burnup history, rod shadowing corrections thus different.
Rod shadowing constants only impact power density predictions when control rods dropped, or partially inserted.
PCPCS-CCF = 2.75x10-6/hr x 10,968 hr = 3.0 x10-2
Summing over all initiating events involving dropped control rods and rod cycling tests, yields:
CCDP < 0.488/yr x 3.0 x10-2 x 0.01 = 1.5 x 10-4
This represents bounding conservative estimate because better knowledge of duty cycle of rod cycling tests would likely reduce by factor of 10 or more.
25. JHBickel - ESRT, LLC 25 1984 Erroneous Fx,y factors supplied by CE and uploaded to SONGS-2 Incorrect Fx,y factors generated by CE and used for CPCS LPD calculations from 2-7-84 to 3-20-84 (1,032 hrs).
Events such as this have occurred twice.
PCPCS-CCF = 1.96x10-6/hr x 1,032 hr = 2.0 x10-3
CCDP = 0.488/yr x 2.0 x10-3 x 0.01 = 1.5 x 10-4
26. JHBickel - ESRT, LLC 26 2005 Software Design Error in Software Upgrade at Palo Verde 2 for 2,736 hrs. Original software design:
Trip CPC channel if sensor detected to be Failed Out of Range
Software hardware upgrade:
Use inputs from two sets of instruments and multiplexers (primary and secondary)
Out of Range Sensor Failure:
Primary detected sensor failure results in switchover to secondary.
Out of Range Failure on secondary reverts to last stored good value
CCF of all sensors of one type could result in continuous use of last good value in all 4 CPCS channels rather than TRIP.
PCPCS-CCF = 8 x PSensor-CCF x 2.75x10-6/hr x 2,736 hr
=8 x 8.4 x10-4 x 2.75x10-6/hr x 2,736 hr = 5.0 x10-5
Given CCF of instruments, no credit for operators, HEP=1.0
CCDP = 0.289/yr x 5.0 x10-5 x 1.0 = 1.44 x10-5
27. 27 PRPS-CCF values from single events span many decades
Fault duration times drive PRPS-CCF values
Latent data uploading errors are dominant unavailability contributors
Data uploading errors larger than relay and breaker CCF found in NUREG/CR-5500 (which used time-averaged values)
28. 28 Event specific CCDP also dominated by data uploading errors
Latent software CCF event is smaller due to unlikelihood of triggering condition.
29. JHBickel - ESRT, LLC 29 Observations from this Total Picture of RPS: Designers of Digital I&C not particularly surprised by relative dominance of:
Calibration problems and human errors uploading wrong data sets
CCF due to errors by vendor in generating data sets
These failure modes also existed in NPPs with Analog I&C
CCF Unavailability and event CCDP estimates from operating experience are dominated by latent events with very long fault duration intervals
Software-related CCF, while important, isnt dominant CCF source when actual operating experience is evaluated
Likely because: software V&V processes more rigorous than operational controls after deployment at NPP
Most-obvious software bugs generally caught by burn-in testing and qualification programs
Software bugs triggered by highly unlikely input combinations are not key sources of RPS unavailability or CCDP risk
30. JHBickel - ESRT, LLC 30 What is Concluded from all this? To Digital I&C risk its necessary to view Total Picture of RPS not just software or : microprocessors:
Final trip relays and trip breakers - will still be there
Problems cross calibrating nuclear with thermal - will still be there
Human errors inputting set-points and coefficients - will still be there
When this is done - Total Picture of RPS risk emerges
NPPs with CPCS have been operating since 1978 in typical, controlled, nuclear operations environment, which includes:
Vendor generation of cycle specific constants, set-points
Routine hardware, software upgrades developed and installed
Routine operation, trouble alarms, and alarm response
Impact of Technical Specifications, Testing, Calibrations
Actual nuclear field reliability experience is better source of data than non-nuclear sources or theoretical models
Ability to estimate, or bound risks of specific Digital I&C CCF failure modes thus: clearly exists