430 likes | 574 Views
Middleware renovation – technical overview AND plans for migration CMW@GSI 25 th april 201 3. Wojciech Sliwinski BE-CO-IN for the Middleware team: Felix Ehm, Kris Kostro, Joel Lauener, Radoslaw Orecki, Ilia Yastrebov , [Andrzej Dworak]
E N D
Middleware renovation– technicaloverview ANDplans for migrationCMW@GSI25thapril 2013 Wojciech SliwinskiBE-CO-IN for the Middleware team: • Felix Ehm, Kris Kostro, Joel Lauener, • Radoslaw Orecki, Ilia Yastrebov, [Andrzej Dworak] • Special thanks to: Vito Baggiolini and Pierre Charrue
Agenda • Context & Motivation for Renovation • Middleware Reviewprocess • Technical evaluation of the transport layer • Changes in the MW Architecture in LS1 • MW Upgrade milestones in 2013 • Risk assessment and mitigation • Conclusions Wojciech Sliwinski, Middleware Renovation
Agenda Context & Motivation for Renovation Wojciech Sliwinski, Middleware Renovation
MW Mandate & Scope • Standard set of MW solutions • Centrally managed services • Track & optimize runtime parameters • Well defined feedback channel for users • Provide support & follow-up issues • Scope: CERN Accelerator Complex • Operational 24*7*365 • Must be Reliable & High Quality • 73’000 HW devices, 3’150 servers • In all Eqp. groups (4dpts: BE, EN, GS, TE) Wojciech Sliwinski, Middleware Renovation
CMW inthe Controls System CMW client (C++/Java) JAPC GUIs, LabView, RADE JMS client (Java) GUIs CMW client (Java) JAPC Logging, LSA, InCA, SIS CMW client/server (C++/Java) Proxy, DIP, AlarmMon, AQ JMS client (Java) Servers: Logging, InCA, SIS CMW server (C++) PVSS (Cryo, Vacuum) CMW server (C++) FESA, FGC, GM Wojciech Sliwinski, Middleware Renovation
Motivations for MW Renovation • Current CORBA-based CMW-RDA • Integrated in the Control system • Used to operate all CERN accelerators • Provides widely accepted Device/Property model • > 10 years old • Why to review & upgrade MW ? • CORBA was choosen15 years ago • Technical limitations of CORBA-based transport • Functional limitations of the current CMW-RDA • Codebase with long history difficult to maintain, needs architecture review • Major issue of long-term support & future evolution • Evolution of technology over last 10 years: HW, OS, middleware, 3rd party libraries • Human factor less & less CORBA expertise on the market Wojciech Sliwinski, Middleware Renovation
Technical limitations of CORBA transport • Became legacy, not actively supported maintenance issue • Shrinkingcommunity, slowresponsetime • omniORB (C++) – 1 developer/maintainer, lastrelease mid-2011 • JacORB (Java) – fewdevelopers, small community • Major technical limitations • Lack of fullyasynchronousprocessing channel • Blockingcommunication infamous JacORB blockingissue • Lack of low-level control of IO resources (sockets, requestqueues) • Development issues • Difficult to extend the wireprotocol Backward compatibility issue • Complex, error prone API • Heavy in memoryusage Wojciech Sliwinski, Middleware Renovation
Summary: Whychange CORBA? • CORBA was choosen15 years ago • Not activelymaintained big risk for the MW project • Bettersolutionsexist on the market • Invest in futuresolutionratherthanmaintainingold one Wojciech Sliwinski, Middleware Renovation
Functionallimitations of CMW-RDA • Severalpendingoperational issues • Difficult (orhardlypossible) to resolve with currentlibrary • Any major changeverydifficult to introduce • Technical Stops & Xmasbreakstooshort for massivedeployment • High risk Major impact on front-end frameworks and applications • No protection against ’slow/bad’ client applications • Misbehavingapplicationmaydestabilise front-end server • Affectsreliability of the subscription channel • Workaround: introduction of Proxy • Poor scalability when many clients subscribed • Stabilityissuesobservedwhen >200 clientssubscribed (even for Proxy) • Threading model doesn’tscalewell with manyclients • Missing support for priority clients (e.g. SIS, PM, InCA, Logging) • Non-criticalclients (e.g. GUIs) have the same communicationpriority • + others… Wojciech Sliwinski, Middleware Renovation
Summary: Whychange CMW-RDA? • With current CORBA-basedmiddlewarewe can’tsolvethe pendingoperationalissues • We can’tprovidebetterscalability & reliability • CMW-RDA isdifficult to evolve& extend Wojciech Sliwinski, Middleware Renovation
Agenda Middleware Reviewprocess Wojciech Sliwinski, Middleware Renovation
Middleware Renovationprocess • MW Renovation = MW Review + MW Upgrade • MW Review aims to provide the most appropriate technical solution satisfying theuser requirements • MW Upgrade establishes the plan & strategy for introductionof the new MW • Objective: LS1 the uniqueopportunityfor the major MW upgrade • Middleware Review Process • Gathering of users feedback and requirements (2010-11) • Review of communication and serialization libraries (2011-12) • Prototyping using selected communication products (2012) • Design & impl. of new RDA3: Data, Client & Server (2012-13) • Testing & validation of core MW infrastructure (summer’13) • Upgrade of all dependent MW libraries & services (2013-14) • JAPC, Directory Service, Proxy, DIP Gateway Wojciech Sliwinski, Middleware Renovation
Review of usersrequirements • 2010-11 – series of interviews with major users • Lars Jensen, Stephen Jackson (BI) • Andy Butterworth, Frode Weierud, Roman Sorokoletov (RF) • Brice Copy, Clara Gaspar (DIP, DIM) • Frederic Bernard,Herve Milcent, Alexander Egorov (PVSS) • Alexey Dubrovskiy (CTF), Kris Kostro (DIP gateways) • Marine Gourber-Pace, Nicolas Hoibian (Logging) • Nicolas De Metz-Noblat (Front-Ends), Alastair Bland (Infrastructure) • Michel Arruat (FESA), Stephen Page (FGC) • Niall Stapley, Mark Buttner, Marek Misiowiec (LASER & DIAMON) • Nicolas Magnin, Christophe Chanavat (ABT) • Stephane Deghaye, Jakub Wozniak (InCA, SIS) • Vito Baggiolini, Roman Gorbonosov (JAPC & DA systems) • + regularfeedback from OP • + internalteam input • http://wikis/display/MW/Interviews+with+Experts Wojciech Sliwinski, Middleware Renovation
New RDA3: Acceptedrequirements New requirement • General • Java & C++ API, Win (64-bit) & Linux (SLC5 32-bit & SLC6 64-bit) • Accelerator Device Model (i.e. Device/Property) • Get, Set, Async-Get, Async-Set, Subscribe • Early detection of communication failures • Improve error reporting in all the layers: client, server, gateways • Admin interface & runtime diagnostics & statistics • Data support • Data object: primitives, n-dim arrays, data structures • Subscription mechanism • Subscription behaviour the same regardless condition of the server (active, down) • Several client subscription policies (default: continuous) • Providesubscription notification ordering • First-Update enforced via CMW on server-side • Providecallback to front-end framework for the server-side Get • Drop support for on-change flag • Standardise use of subscription filtersand updateflags (e.g. immediate update) • Addheader for acquiredDatacommonmetadata (e.g. acq. stamp, cyclename) • Allloss of data (droppedupdates) must be notified to clients Wojciech Sliwinski, Middleware Renovation
New RDA3: Acceptedrequirements New requirement • Client side • RDA3 client API connects with both: RDA2 (old) & RDA3 (new) servers • Efficientmechanism for: connection, disconnection & reconnection • Must be able to recover from anyinterruption of communication with the server • Server restarts, IP addresschange, rename/move of a device to anotherserver • Improvedsemantics of ArrayCalls, i.e. handling of individualparameters • Enhanced diagnostics & collection of statistics • Server side • Policies for discarding notifications, i.e. deal with overflowsand ’badclients’ • Instrument with counters & timingsallowing to diagnose the notificationsdelivery • Prioritisation of Get/Setrequests for high-priorityclients • Server-side subscription tree fully managed by CMW • Server does not need to manageclientsubscriptionsanymore • Manage the clientconnections, e.g. forceddisconnect of a client • Client lifetime callbacks (i.e. connected, disconnected) Wojciech Sliwinski, Middleware Renovation
New RDA3: Acceptedrequirements New requirement • Server side (cont.) • Client discovery for the diagnostics purposes (i.e. connectedclients with payload) • Enhanced diagnostics & collection of statistics • Ongoingdiscussions (not acceptedyet) • Prioritisation of subscription notifications for high-priorityclients • Technical notes • Invest in asynchronous & non-blocking communication • Prefer0-copy & lock-free data structures, message queues • http://wikis/display/MW/Design+of+New+RDA Wojciech Sliwinski, Middleware Renovation
New RDA3: Summary of requirements • Unchanged • Device/Property model • Set of basicoperations (Get, Set, Subscribe) • Fixes & improvements • Subscription mechanism • Connection management • Diagnostics & statistics • New functionality • Policies for subscription management (client & server) • Client priorities • Server-sidesubscriptiontree • Extended Datasupport • StandardiseFirst-Updateconcept Wojciech Sliwinski, Middleware Renovation
Agenda Technical evaluation of the transport layer Wojciech Sliwinski, Middleware Renovation
Middleware transport requirements Desirable • Lightweight • Friendly API, documentation • Request/reply & pub/sub patterns • Asynchronous • Performance & Scalability • Stability, Maturity & Longevity Mandatory • Active community • Open source license • C++/Java • Linux/Windows • Over TCP/IP LAN Fundamental Wojciech Sliwinski, Middleware Renovation
Evaluation process –> our criteria • Appearance • Simple usage • Testing • Communication patterns • Performance • Exceptional situations • QoS • Configuration • Creators • specification • documentation • Users • forums • bug reports • Internet • Download • licensing • Compile • Linux & gcc • Run examples CRITERIA Resources, binary size, memory QoS Community,maturity API, look & feel, documentation Communications patterns Performance Wojciech Sliwinski, Middleware Renovation Andrzej Dworak, ICALEPCS 2011
Evaluated middleware products All opinions are based only on our knowledge and evaluation. Each of the products, depending on the requirements, may constitute a good solution. OpenAMQ CoreDX RTI DDS QPid ZeroMQ OpenSpliceDDS RabbitMQ YAMI Ice omniORB MQtt RSMB JacORB Thrift Mosquito Wojciech Sliwinski, Middleware Renovation Andrzej Dworak, ICALEPCS 2011
Products comparison (according to the criteria) Wojciech Sliwinski, Middleware Renovation Andrzej Dworak, ICALEPCS 2011
Conclusions • Several good middleware solutions available • The choice is dictated by the most critical requirements • Not easy performance matters but also ease of use, community, … • Prototyping was done with the most promising candidates: • ZeroMQ, Ice& YAMI • Finally we decided to chooseZeroMQ (http://www.zeromq.org/) • Asynchronous & non-blocking communication • 0-copy& lock-free data structures, message queues • Nice API, gooddocumentation & activecommunity Wojciech Sliwinski, Middleware Renovation
New RDA3 Java – Sync Get round-triptime Test setup: 1kB messagepayload, cs-ccr-* machines, 1 server host & 10 clienthosts Wojciech Sliwinski, Middleware Renovation
New RDA3 Java – subscriptionnotificationlatency Test setup: 1kB messagepayload, cs-ccr-* machines, 1 server host & 10 clienthosts Wojciech Sliwinski, Middleware Renovation
New RDA3 Java – subscriptionnotificationlatency Test setup: 1kB messagepayload, cs-ccr-* machines, 1 server host & 10 clienthosts Wojciech Sliwinski, Middleware Renovation
Agenda Changes in the MW Architecture in LS1 Wojciech Sliwinski, Middleware Renovation
Current MW Architecture User written Middleware Java Control Programs Central services VB, Excel, LabView C++ Programs Administration console JAPC API Passerelle C++ Clients RDA Client API (C++/Java) Device/Property Model Directory Service Directory Service RBAC A1 Service RBAC Service Configuration Database CCDB CMW Infrastructure CORBA-IIOP RDA Server API (C++/Java) Device/Property Model CMW integr. CMW int. CMW int. CMW int. CMW int. CMW int. Servers Virtual Devices (Java) FESA Server FGC Server PS-GM Server PVSS Gateway More Servers Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …) Wojciech Sliwinski, Middleware Renovation
User written Changes in MW Architecture in LS1 Middleware Central services Java Control Programs Upgrade in LS1 VB, Excel, LabView C++ Programs Administration console JAPC API Passerelle C++ Clients RDA Client API (C++/Java) Device/Property Model Directory Service Directory Service RBAC A1 Service RBAC Service Configuration Database CCDB CMW Infrastructure ZeroMQ RDA Server API (C++/Java) Device/Property Model CMW integr. CMW int. CMW int. CMW int. CMW int. CMW int. Servers Virtual Devices (Java) FESA Server FGC Server PS-GM Server PVSS Gateway More Servers Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …) Wojciech Sliwinski, Middleware Renovation
Agenda MW Upgrade milestonesin 2013 Wojciech Sliwinski, Middleware Renovation
MW Upgrade Milestones in 2013 July’13 July-Oct’13 September’13 December’13 Winter’13/14 August’14 End-of-Life for RDA2: LS2 Wojciech Sliwinski, Middleware Renovation
MW Upgrade strategy in LS1 and towards LS2 • No BIG-BANG migration but gradual • Backward compatible (connection-wise)newRDA3 client library • New RDA3 clients can communicatewith RDA2 & RDA3 servers • FESA3 willexist with both: old RDA2 (FESA3.1) and new RDA3 (FESA3.2) Client appswillmigrateduring LS1 Only for justified, exceptionalcases OldJAPC New JAPC RDA2 RDA3 Gateway Old RDA2client New RDA3client Old RDA2 server Old RDA2 server FEC developersshouldmigrate to FESA3.2 ASAP New RDA3 server FESA2.10 FESA3.1 FESA3.2 Wojciech Sliwinski, Middleware Renovation
LS1: Changes in JAPC • New major JAPC version upgrade for RDA3 (September’13) • Public API backward compatible • Possible API extensions, but always compatible • Announcement via accsoft-java-announce list • RequiredActions for JAPC Users • Update JAPC jars (via CommonBuild) • Re-release your product (via CommonBuild) • New JAPC will support communication with RDA2 & RDA3 servers Wojciech Sliwinski, Middleware Renovation
LS1: Changes in RDA • New major version: RDA3 (June’13 – alpha version) • Public API NOT backward compatible • New protocol, new architecture, new design • Same Device/Property model & Get/Set/Subscribe calls • Announcementvia cmw-news & accsoft-java-announcelists • RequiredActions for RDA Users • For Java: Use new version of JAPC (API unchanged) • For Java: New JAPC will support communication with RDA2 & RDA3 servers • For C++: Upgrade user code to new RDA3 API • For C++: RDA3 will support communication with RDA2 & RDA3 servers • Consequences if NO Action staying with old RDA2 • NOT possible to communicate with new RDA3 servers (FESA3, FGC, etc.) Wojciech Sliwinski, Middleware Renovation
Agenda Riskassessment and mitigation Wojciech Sliwinski, Middleware Renovation
Riskassessment and mitigation Wojciech Sliwinski, Middleware Renovation
Risk: Wrong product developed(wrong requirements) Mitigation: Early and continuous involvement ofclients & experts • We involved clients and experts since 2010 • Requirements review with all major clients • Technical discussions with eqp.experts • Iterative development involving the Reviewteam • Design meetings (API and internals) since January 2013 • Alpha versions will be available for feedback and validation severalmonths before the final release • Feedback is continuously integrated in development (= iterative) Wojciech Sliwinski, Middleware Renovation
Risk: Product is (too) late Mitigation: Careful planning and follow-up Fall-back to less ambitious goals • Planning prepared and followedby the MW team • Taking into account needs and priorities of other CO projects and clients • Regular follow-up • In CO internally by TECcoordinator • In informal meetings with the MW experts (as doneso far) • Fall-back to less ambitious goals • Plan priorities of functionality • Drop (postpone) work with lower priority Wojciech Sliwinski, Middleware Renovation
Risk: Product has bugs orincompatibilities Mitigation: Early, continuous testing (unit, functional & integrationtests) • Unit tests to asses quality inside the MW project • Requireddev. phase in the MW team • Functionality tests in CO Testbed • Functionality of CMW only • Integration tests to check interoperability • Integration with FESA in CO Testbed • Integration with FGC in FGC Lab Wojciech Sliwinski, Middleware Renovation
Risk: Bugs affect operations Mitigation: Gradual Migration (1) • No BIG-BANGmigration but gradual • Backward compatible (connection-wise)newRDA3client library • New RDA3clients can talk to old RDA2servers • FESA3 willexist with both: old RDA2and new RDA3 OldJAPC New JAPC Old RDA2client New RDA3client Old RDA2 server Old RDA2 server New RDA3 server FESA2 FESA3 FESA3 Wojciech Sliwinski, Middleware Renovation
Risk: Bugs affect operations Mitigation: Gradual Migration (2) • Deploy firston systems controlled by the MW team • E.g. Proxies, Gateways • Gainexperience and confidence • Start deploymentwith less critical systems first Wojciech Sliwinski, Middleware Renovation
Risk: Bugs affect operations Mitigation: Fast deployment of bugfixes • If (inspite of all) something goes wrong in operations • Fast reaction from the MW team • In CO, we will study the need and mechanisms to quickly upgrade also servers Wojciech Sliwinski, Middleware Renovation
Conclusions • We haveto replace CORBA with a newsolution • We collectedupdatedusersrequirements • MW upgradewill be performedduring LS1 • Interoperabilitybetween RDA2 RDA3 • Gradualcontrol system migrationuntil LS2 • End-of-Life for RDA2: LS2 Wojciech Sliwinski, Middleware Renovation