1 / 27

PEP-II Reliability and Uptime

PEP-II Reliability and Uptime. Roger Erickson 10 October 2003 With thanks to C.W. Allen, W. Colocho, P. Schuh, M. Stanek, and the Operations staff members who collected the data. Excludes “long” downtimes and holiday shut-downs. Statistics: Causes of Unscheduled Down Time.

milt
Download Presentation

PEP-II Reliability and Uptime

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PEP-II Reliabilityand Uptime Roger Erickson 10 October 2003 With thanks to C.W. Allen, W. Colocho, P. Schuh, M. Stanek, and the Operations staff members who collected the data.

  2. Excludes “long” downtimes and holiday shut-downs.

  3. Statistics: Causes ofUnscheduled Down Time • 3 PEP-II running periods considered: January 2000 through June 2003. • 22,936 total scheduled operating hours. • 2994 hours unscheduled down time. • 5469 reported malfunctions (“events”). • 1317 events directly tied to lost hours. We can sort the data by area of the machine (HER, linac, etc.), by system categories (RF, vacuum, etc.), by date, and by details of resolution.

  4. Accelerator Performance Statistics Definitions: Revealed failures: malfunctions resulting in lost beam time. Also called “events”. Unscheduled down time: hours lost from scheduled program due to malfunctions. Mean Time to Fail: MTTF = Scheduled beam time Events Mean Time to Repair: MTTR = Unscheduled down time Events Availability = 1 - Unscheduled down time Scheduled beam time NOTE: PEP-II aborts are not counted as downtime, unless the event is reported; i.e., unless we stop to fix something and make a database entry.

  5. PEP-II Run Totals Run 3: 11/15/02 – 6/30/03 Run 1: 1/12/00 – 10/31/00 Run 2: 2/4/01 – 6/30/02 Long annual downtimes and holiday shut-downs are not included.

  6. Hardware Availability by Run MTTF has been getting shorter (worse) each run. MTTR improved from Run 1 to Run 2, but got worse during Run 3.

  7. Unscheduled Downtime by Major System Unscheduled down time (percentage), sorted by responsible system.

  8. MTTR : PEP-II Rings

  9. Time Required for Repairs Combined data set from all three runs.

  10. PEP Rings Events Requiring > 2 hours to Repair • Run 3 Data: • 33 % of PEP ring events • require > 2 hours to repair. • These account for • 81 % of PEP ring down time.

  11. Problems Requiring > 24 hours to Fix January 2000 – June 2003: • 5 vacuum chamber failures in PEP rings. Some known vulnerabilities were already receiving attention. Vacuum task force is studying options for upgrading some chambers. • 2 site-wide electrical power outages. These were outside SLAC’s control. • SLTR quadrupoles overheated when cooling water pump stopped, but power remained on.

  12. Recent Problems Requiring > 24 hours to Fix August 20, 2003: VVS transformer failure in linac. • Failure occurred during E158; no impact on PEP. Two days for full recovery. • Failure was in the only dry-type transformer among 16 VVS’s. Oil-filled, fixed-ratio replacement options being investigated. September 12, 2003: Site-wide power failure when tree grew too close to 230 kV line. Time lost to PEP program >47 hours. • Tree trimming had not been done on established schedule. • SLAC now has new contract with tree-trimmer company, with option to renew for five years.

  13. Underlying Problems Sometimes Cross Technical and Jurisdictional Boundaries • Seasonal high ambient temperatures cause drift, jitter, timing-shifts, spurious trips, and sometimes component failures in power supplies and sensitive electronics. • Plan to air-condition the electronics alcove at Linac Sector 0, which houses the master oscillator and electronics critical to accelerator timing. A contract has been awarded. • Several PEP support buildings have temperature control problems on hot days. More needs to be done to identify cost-effective improvements. An example of a problem not easily identified by counting malfunction reports.

  14. Injection and Tuning Normal top-off: Typically 4 to 5 minutes to fill at intervals of 40 to 50 min. Approx. 10% of scheduled run time. Why is 21% spent injecting and tuning? Beam aborts require fill from scratch; typically 15 to 25 minutes each time.

  15. Beware of Double counting: An abort in one ring usually leads to an abort in the other.

  16. HER RF Aborts Station Run 2 Run 3 • 12-1: 0.33  1.1 aborts/day • 12-3: 0.50  0.34 • 8-1: 0.22  0.57 • 8-3: 0.50  0.68 • 8-5: 0.51  0.66 • 12-6:  1.65* Total = 2.1  5.0 aborts/day • All stations were worse in 2003, except 12-3. * 12-6 fault accounting only available since 10-May-2003.

  17. LER RF Aborts StationRun 3 • 4-3: 0.88 aborts/day* • 4-4: 0.55 (was 0.56 in 2002) • 4-5: 0.55 (was 0.53 in 2002) Total = 2 aborts per day * 4-3 fault accounting only available since 10-May-2003.

  18. BaBar Radiation Aborts 3-year trend, based on data latched by accelerator control system: • 2000: 5.6 aborts/day • 2001: 4.1 • 2002: 3.6 • 2002/3: 2.8

  19. Injection and Tuning Summary Percentages of scheduled operating hours: • Normal top-offs: 10% Fill from scratch following: • RF aborts: 6.3% • BaBar radiation aborts: 3.5% Approximate total: 20% Trickle charging could have significant beneficial impact!

  20. Scheduled Off Time • No routine scheduled maintenance days. • Repair Opportunity Days (“RODs”) are launched when needed for show-stoppers or upgrade projects (typically 1/month). • As many ROD and SML jobs as possible are completed during program interruption (typically 50 to 100 identified jobs).

  21. Personnel Protection System (PPS) Testing • Formerly required approx 3 months of beam-off, most of which was folded into long downtimes, but “verifications” were required at 6-month intervals. • Net impact on PEP program depended on interval between long downtimes. Typically about 2 weeks/year. • New policies and procedures have reduced testing to about 3 weeks once each year to coincide with long downtimes, plus operator interlock checks.

  22. Opportunities for FurtherPPS Testing Improvements • Add switches and indicators to further decouple zones/subsections/systems for testing purposes. • Further streamline test procedures (much progress made last year). • Train/authorize more staff members, so that testing can be done 24 hours/day when opportunities arise. Additional uptime to be gained? Possibly 1 week/year, depending on long downtime schedule and “opportunistic” down days. Long-range proposal: Replace linac and BSY PPS with modern system to facilitate testing and minimize downtime for diagnosing problems.

  23. How to Increase PEP-II Up Time:Challenges to Ourselves • Allocate resources among hardware projects to achieve optimal improvement in MTTF. • Identify common-mode or infrastructure projects that will improve overall uptime and stability. • Find ways to reduce frequency of aborts. • Minimize scheduled off time through policy and procedure changes and aggressive scheduling. • Reduce MTTR with improved procedures, diagnostic tools, and organizational efficiency.

More Related