190 likes | 328 Views
Remote Access Review . Safety Systems Kelly Mahoney December 1, 2010 . Safety Systems. Personnel Safety Systems Fully redundant PLC based safety systems Fail-safe Separate from EPICS controls Machine Protection Systems Special Fail-safe hardware managed by EPICS controls
E N D
Remote Access Review Safety Systems Kelly Mahoney December 1, 2010
Safety Systems • Personnel Safety Systems • Fully redundant PLC based safety systems • Fail-safe • Separate from EPICS controls • Machine Protection Systems • Special Fail-safe hardware managed by EPICS controls • Critical inputs/functions have additional protection
Safety Systems – PSS Now Only Communication with PLCs is through proprietary HW/SW on dedicated machines. PCs can be connected to the accelerator network for patches. JLab IT Infrastructure Enclave Controls Enclave TCP/IP Fully Redundant (1 of 2 systems shown) Modbus (RS232) PSS HMI MB+ (~ 1Mb/s) PLC A0 PLC An-even Bridge/Router PSS Program/Monitor PSS Development MB+
Safety Systems - PSS PSS Remote Access: • on-site (e.g. office) • Only access from dedicated machines, using special hardware/software (Total of 4) • Additional password protection • PLCs in “Read Only” mode • off-site (e.g. home) • No off-site access
Safety Systems - PSS Engineering Solutions: • Segregation/Redundancy/Failsafe • Multiple layers of protection • Division A/B handshake through hardware • No Safety Functions are performed through network • HW Memory Protect • SW “Program” mode protection • Equipment racks padlocked • Working towards NIST SP800-82
Policy/Procedures • Policy • PSS Config Control Policy (Currently Under Revision) • JLab Safety Configuration Management Board • JRRP review for major changes • External technical panel review for major architecture changes, e.g. 12GeV • Procedure • PSS communication network well documented and under config control • Work control/Work authorization procedures • PSS Certification Procedures • Includes specific steps for SW validation • Step to save ‘gold’ copy of software after certification • PSS is line item in ATLis work control system
Effective? • PSS Fiber Optic trunk cut during construction • Able to: • Initiate Investigation/conduct forensics • Identify specific fibers/systems from documentation • Create temporary route, while maintaining config control • Use approved red-line markups for changes • Consult the SCMB on proposed changes • Create work-control documentation • Run temporary fiber trunks/terminate and QA • Re-certify affected systems …In < 11 hours after cut • PLC network drawings updated to reflect “temporary” route as new released revision level • Updated drawings again for permanent fix
Training • All SSG personnel receive classroom training from PLC mfg. • Augmented by specialty training from mfg. in the form of DVDs, webinars, manuals, tutorials, …etc. • Additional training on JLab/SSG specific implementation • All engineers receive system safety training • Typical training cycle is two years before personnel are considered competent in PSS policy/procedures • Operators trained to spot suspicious/inconsistent behavior • Formal Training • OJT • Written Test includes “what if” questions • SSG engineer is trained in safety network architecture • Will be taking test for ProfiNet Engineer certification in December, 2010
Assurance • Risk based/Graded Approach • JLab developed risk assessment method applies to all types of software based systems • Emphasis on competency of SSG staff • Emphasis on Requirements • System Requirements • Logic Specification • Certification procedure directly derived from logic spec • Logic Spec Program Certification Procedures all track at same rev level • Working toward NASA Software assurance model • NASA-STD-8739.8 • NASA-GB-A201
Performance Measures • Very few with regard to remote access • No recorded incidents of attempted cyber intrusion • IF there were an event, it is treated like an accident investigation and information is recorded in corrective action tracker system. • SSG has culture of copious self-reporting. Problems are recorded for future reference. Information analyzed for trends.
Residual Risks • Malware targeted at industrial controls, ala Stuxnet (Transferred through USB memory sticks) Solution: Isolation Disable “autorun” Keep antivirus software current Scan portable memory Dedicated memory devices for PSS software data/license xfer Stay connected to DHS CERT, manufacturer bulletins, and other sources of information on potential threats Dedicated program/HMI PCs • Access by Unauthorized JLab “Insider” • Separate authentication in addition to JLAB/Controls Dept. Requirements at both the PC and PLC level • Multiple layers • Independent/Redundant systems • Locked racks • Least Privileges • On-line program periodically compared to “gold” copy downloaded after last certification • PM program includes inspection of system racks • All PSS systems, infrastructure are labeled
Residual Risks • Restoration after loss of HW/SW • Active PCs have RAID redundant hard drives • Periodically perform image of HD • Development PC is duplicate of control room display PC • SW backup on JLab managed file system • Hardcopy printout • Obsolescence/Compatibility • Dedicated test stands for burn-in/test of spares and new/refurbished HW/SW • Much of the existing I/O is obsolete • Attrition plan is to transfer to new safety PLC model – Re$ource Limited • Programming mistake • Two independent programmers • PLC addressing scheme segregates divisions (A=Even, B=Odd) • PLC address hardcoded in to PLC program • Version check at installation/test • Spread responsibilities for requirements/test procedures among personnel • Induced trip, e.g. DNS, network storm, IT sniffing, … • Isolation from controls network • Bandwidth limits • Communication time allocation limits
Safety Systems - MPS • EPICS based – same access as for other controls • Inputs may be masked to support multiple machine operating modes and configurations. • Engineering controls • Hardware mask disable for always critical inputs (always active) • Additional software controls for mask enable for critical inputs that require masking from time to time • Failsafe design deters tampering • Control room alarms for important run configuration items
Safety Systems • Future Plans • 12GeV Hall D • Dedicated/certified safety PLCs • Redundant Profinet running ProfiSafe protocol • Dedicated PSS industrial firewall and managed switches • Access through firewall requires manufacturer’s certificate • Safety PLCs are “network aware” • Work to SEI CMMI model for better metrics
12GeV Hall D Safety Systems PCs must have certificate from PLC mfg. to communicate through firewall IT Router Controls Domain Potential DMZ Safety Domain PSS Firewall Note: System is redundant after first switch. Only one division shown for clarity. Managed Switch Self-healing ring ProfiSafe Protocol Running on ProfiNet Managed Switch Managed Switch Managed Switch Managed Switch Safety PLCs and I/O
Safety Systems • Future Plans • Upgrade to safety PLCs using industrial Ethernet • Two level authentication for all PCs with access to safety PLCs • Full implementation of suite of complementary standards • Slow and deliberate move to Windows 7 as support software is certified for use with this OS – will start with test stand. • Formal modeling of safety/security system properties
Safety Systems • Comments ?