1 / 5

LL 93,101 & 109- Summary

LL 93,101 & 109- Summary. Venkat Tirupati, Senior Reliability Engineer Event Analysis Subcommittee – LL Update November 1, 2013. LL 93 – Loss of Authentication. Operators lost ability to authenticate to the EMS system Cause:

Download Presentation

LL 93,101 & 109- Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LL 93,101 & 109- Summary Venkat Tirupati, Senior Reliability Engineer Event Analysis Subcommittee – LL Update November 1, 2013

  2. LL 93 – Loss of Authentication Operators lost ability to authenticate to the EMS system Cause: • A firewall policy allowing authentication failover from local to remote authentication server was inadvertently removed, leading to failure of authentication when local server was taken out for service. Lesson: • EMS network design should, where possible, include a redundant local authentication server on the same internal network as the primary local authentication server. • The IT Change Management process should consider applying the following principles: • Apply a thorough test process that is reviewed with the client for all changes that could affect EMS function. • Test the design redundancy or back-out plan prior to implementing a change. • Test plans need to be comprehensive and include regression-level testing.

  3. LL 101 - SCADA Failure SCADA failure resulted in reduced monitoring functionality Cause: • A planned change to security software policy configuration led to core operating system processes getting blocked on SCADA servers Lesson: • Security software configurations need careful analysis, design, testing, and implementation, as they may impact reliability in unpredictable ways. • Registered entities should consider a “multisite hosting” configuration. This configuration provides flexibility and convenience for rapid recovery capability of EMS and SCADA functions. • Frequent exercise of and training on recovery plans ensures that actual event responses go according to plan and promptly mitigate operational impacts.

  4. LL 109 – EMS Failure Failure of EMS while performing a database (DB) update Cause: • During restoration of DB after noting errors, a stand-by communication server was restarted that led to synchronization of faulty data files across integrated system servers, resulting in several problems while bringing the system back. Lesson: • Training documents should be developed to document revised steps for database updates and communication server restarts. • Database update testing procedures and documentation should be reviewed. EMS analysts will receive training on the existing and new procedures. • Functional separation between PCC & ACC so that DB updates should be required to be performed independently on the PCC and ACC to reduce the risk of any anomalies at the PCC being propagated to ACC.

  5. Questions and Answers

More Related