180 likes | 332 Views
ILC Controls: High Availability Software. Outline. Opening comments ILC software architecture refresher The HA stack Primary and management protocols HPI (Hardware Platform Interface) summary AIS (Application Interface Specification) summary Bottom-up, are these a good fit?
E N D
Outline • Opening comments • ILC software architecture refresher • The HA stack • Primary and management protocols • HPI (Hardware Platform Interface) summary • AIS (Application Interface Specification) summary • Bottom-up, are these a good fit? • HPI and HPI-ATCA • AIS • Conclusions • A proposed “stack” for ILC HA research • Tasks
Opening Comments • Don’t build any critical path software infrastructure without access to source code • HA software is a hard problem • SAF specifications are an impressive unification of known techniques • SAF implementations won’t “solve” HA problem • You still have to determine what you want to do and encode it in the framework – this is where work lies • What are failures • How to identify failure • How to compensate (redundancy or reconfiguration or both) • How long for known reliable, SAF compliant products to come out? • Compare to time between OMG CORBA spec and good implementations… • Is resultant software complexity manageable? • Potential fix worse than the problem
CPU1 I/O 1 I/O 2 CPU2 SM SAF and ILC Controls Architecture GUI Client Tier Report upwards AIS Cluster Membership Service CLM container Crashed middleware container: escalate object Services Tier (middleware) checkpoints Report upwards Hung task: escalate HPI Failed I/O card or power supply: fix locally (localization) Real-Time Tier Shelf Manager sensor
Primary and Management Protocols • How do they interact? • Primary connection mgmt. informed by management protocol • Specific actions carried out over primary protocol based on info from management protocol State Info Level N+1 Primary Controls Protocol HA Management Protocol Level N
HPI (Hardware Platform Interface) Summary • HPI subsumes IPMI(established), SNMP, Others Sessions Client access to events Domains manage Resources - RDR repository (SNMP OIDs) manage Entities - Physical components • HPI passes info as IPMI packets over RMCP • HPI-ATCA • Expose ATCA entities through HPI (hot swap LEDs, etc..)
AIS (Application Interface Specification) Summary • C-code interface specification • No protocols or other language bindings given • AMF (Application Mangement Framework) – the tie that binds • Object lifecycle state diagrams (behavior) • Services • Message – similar to JMS, MQSeries, Tuxedo • Log, Notification, Events • Cluster Membership – redundant instances within a “group” • Checkpoint – save my state so standby can take over • Distributed Lock – basic need of distributed, coordinated system • IMMS – what is out there configured and deployed • LDAP-like DN (Distinguished Names) identify resources
Bottom-up, Are these a good fit? • HPI and HPI-ATCA • Yes! – IPMI and SNMP implementations all gravitating to HPI • Interoperability very useful to us here • Unified view of hardware resources • Front-end CPU’s and I/O cards • Servers (database and application) • NADs (network attached devices) • AIS • Hard problem • Anyone promoting they’ve produced solid 100% compliant AIS product is probably exaggerating • C-code interface only so far • Not clear that components will be interoperable • Are we really going to be shopping for COTS control system middleware components?
HA Middleware:The Contenders (SAF presentation dated 4/26/05) (note: not a good story…) • Commercial Cluster SW • Pro: Transparent to application; ISV support • Con: Failover too slow; Proprietary • FT OS Single System Image • Pro: Transparent • Con: Scalability; Very complex to implement • FT CORBA • Pro: Reasonably Transparent; Industry Standard • Con: Failover times; Heterogeneity; Management • Telco HA Middleware • Pro: Fast Fail-over; Extensible; Management • Con: Intrusive; Non-Intuitive Model
FT-CORBA • No existing CORBA-based control system is HA • Tango – uses open-source JacORB • ACS – uses open-source ORBacus • NIF uses Visibroker with custom connection management • No Commercial FT-CORBA ORB as of beginning of 2004 • Spec out since 2001 – not a good sign • There exists very little open-source FT-CORBA (mostly academic) • GroupPAC • OCI (Object Computing Inc.) TAO
CORBA Alternative - ZeroC ICE • ICE (Internet Communications Engine) www.zeroc.com • High performance middleware • Open-Source GPL licensed • Multiple language bindings (C++, Java, PHP, Python, C# so far) • Used by Hewlett Packard and FCS (Future Combat Systems) • Very much like CORBA, but addresses substantial complexity and performance issues with CORBA (not designed by committee) • HA Features • Has explicit support for storing object state to db • Coarse-grain failover only so far (server to server) • Could possibly even use this to unify RTP (Real Time Protocol) and DOP (Distributed Object Protocol)
Options from world of Java Web Development • JBoss • Open source middleware container • Lots of sophisticated, solid features for redundant deployment • JINI • Java RMI service lookup/discovery protocol • Very useful for connection management • Spring Framework • Lightweight middleware container • Alternative to EJB 2.0 • EJB 3.0 • Response to Spring and flaws in EJB 2.0
Middleware HA – my conclusions • This is a hard problem to solve • It’s OK if this part of our efforts here take longer to solidify • OS based clustering too slow and complex • SAF AIS specification is great on paper, but… • No implementations yet that offer full compliance • No bindings other than C so far as I can tell • FT-CORBA not looking good • Proprietary Telco solutions – need I say more • Success stories seem to use non-HA standards to build HA system • Use set of standards that matches your culture • Ie. Java (JINI/RMI) or non FT-CORBA • Build needed HA behavior custom to your requirements • Add in checkpointing, active/standby, connection mgmt, etc.
Middleware HA – conclusions (2) • My inclination is to look at ICE and/or standard CORBA • Build basic HA features following model of SAF AIS where reasonable • Need more knowledge to even evaluate SAF AIS compliant products • Wait for commercial and open-source implementations of AIS… • In the mean-time, build a la carte from known stable frameworks
Proposed Stack for ILC HA Research Java GUI Applications ICE protocol • ICE Middleware Tier • Examine suitability • build prototype HA features IPMI V1.5 over RMCP Channel Access • Arrow ATCA Starter Kit • Pigeon Point shelf manager • need SM SDK ? • Dual (Quad) X86 processors • we need board developers kit CPU1 COTS Custom CPU2 SM Run EPICS iocCore on dual CPU’s
Tasks • Study and document points of failure (look at FNAL project…) • How to identify failure • How to recover (redundancy and/or reconfiguration) • Port EPICS iocCore to ATCA CPU’s • RTOS ? • Explore redundancy and checkpointing within iocCore • Establish middleware server • Explore HA feature development within ICE • RMCP to ATCA shelf manager • Channel Access to ATCA CPU’s • Look at custom hardware development in ATCA, including potential associated additions to shelf manager software