370 likes | 452 Views
Computing in BE-CO. Vito Baggiolini BE-CO-DO IT Technical Forum 4-May-2012. A few words about myself. Senior member of BE-CO Software Engineer Technical Coordinator of BE-CO Section Leader of the new “ DevOps ” section. Outline. Overview of Accelerator Controls
E N D
Computing in BE-CO Vito Baggiolini BE-CO-DO IT Technical Forum 4-May-2012
A few words about myself • Senior member of BE-CO • Software Engineer • Technical Coordinator of BE-CO • Section Leader of the new “DevOps” section
Outline • Overview of Accelerator Controls • Services used from IT department • BE-CO and the new DevOps section • 3 initiatives to further improve quality and availability of the control system V. Baggiolini, ITTF 4-May-2012
“The beam is our master” Vito Baggiolini, ITTF, 4-May-2012
Photo with CCC seen from above V. Baggiolini, TC 12-Mar-2010 “Operations are our main client”
Controls HW infrastructure CTRL CTRL T GPN Operator Consoles Fixed Displays Other operational Consoles Presenta-tion tier Appl Servers PVSS Servers File Servers Technet Middle tier Timing Oracle Real-time Linux LynxOS Computers WorldFIP Systems PLCs T T T T Resource tier T T T T T Beam instrum. KickersMachine Protection, … Magnet Control, RF Quench Protection, … Cryo, Vacuum, …
Fixed Displays Operator Consoles IEEE-NSS-MIC - CERN Beams Controls Group
BE-CO-related Software TCP/IP communication services OP Specific GUIs OP Specific GUIs DB Access Sequencer Fixed Displays Frame Expert GUIs Controls Middleware Java Business Layer Alarms (LASER) LHC Software Architecture Core Post Mortem Acceleraor Logging FILE SERVERS APPLICATION SERVERS SCADA SERVERS TIMING GENERATION DB Settings & Logging DB Settings & Logging DB: Settings & Logging CERN GIGABIT ETHERNET TECHNICAL NETWORK Timing Management TCP/IP communication services Diagnostics Monitoring DIAMON - TIM Software InterlockSystem Data Concentrators Controls Middleware C/C++ Role Based Access Control RBAC Front-End FESA servers Front End Layer Front-End FESA servers TCP/IP communication services Front-End FESA servers device servers RT Lynx/OS VME Front Ends WORLDFIP Front Ends PLC Using BE-CO Frameworks Developed by BE-CO
Java Applications Vito Baggiolini, ITTF, 4-May-2012
Some Software-Releated Numbers • GUI and Middle-tier (Java) • ~8 million lines of production code • > 1000 jar files in production • Combined to 400 different GUIs and 200 server programs • Up to 1000 processes running on 550 machines • Developed by 80 people from 10 different groups • Front-End Layer (C/C++) • 550 different device types • 70’000 device instances on 1000 different front-end machines • Developed by80 people from 8 different groups
BE-CO has a preference for “open” solutions • Open standards and protocols • E.g. CORBA, JMS for middleware • Open source where possible • Linux as the recommended operational OS • Open-source 3rd party for C, C++ and Java • Open hardware initiative • The VHDL of our electronics is on the Internet (http://www.ohwr.org/)
11’ Outline • Overview of Accelerator Controls • Services used from IT department • BE-CO The new DevOps section • 3 initiatives to further improve quality and availability of the control system V. Baggiolini, ITTF 4-May-2012
IT services needed 24/7 for beam • Network: • At the basis of our (distributed) system • Databases: • Oracle used online in core applications. • Authentication/Authorization: (ldap, active directory) • Used for Role-based access control in operational applications • DFS home directories • For some operational accounts (e.g. cryo) • Web Services • Used for accelerator status pages watched (also) by high management • Generally we’re very happy! We have good collaborations and get good services and support.
IT services not directly needed for beam • HyperV (Linux and Window) for SW development on the Technical Network (TN) • Windows TS for expert access from home • DFS and AFS in our daily work • SVN for all sources and part of configuration • Yum (Linuxsoft) and CMF for software installations • ADSM (Tivoli) for automatic backups • CASTOR for long-term backup • Security services (CNIC, security scans) • IT Printer service • … • We heavily rely on your services and expertise!
Outline • Overview of Accelerator Controls • Services used from IT department • BE-CO The new DevOps section • 3 initiatives to further improve quality and availability of the control system V. Baggiolini, ITTF 4-May-2012
The BE-CO group in context • “The beam is our master, BE-OP our main client” • Direct and personal support • Equipment groups • Use our real-time embedded computers and software frameworks • Beam Instrumentation (BE-BI) Radio Frequency (BE-RF), Power Converters (TE-EPC), Kickers, transfer lines (TE-ABT), Machine Protection (TE-MPE), Motorizations (TN-STI) • Industrial controls (EN-ICE) • Use machines provided by BE-CO • IT department as our main solution and service provider • BE-CO often acts as a “value added reseller”
CO mandate and internal structure • Group Mandate: contribute to producing the beam • Controls infrastructure for accelerator complex • Core controls sub-systems and applications • General purpose building blocks and frameworks • Operational support: 24/7 (piquet for HW, best effort for SW) • Development tools and advice • Sections:
BE-CO sections and their mandate • Hardware and Timing: (Javier Serrano) • General purpose electronic modules, • drivers + driver generation framework • Timing system (generation and distribution) • Front-end computing: (Marc vanden Eynden) • Real-time embedded computer platforms (racks, crates, CPUs) • FESA (Front-end SW Architecture - Framework to integrate devices) • OASIS (software-based, remote oscilloscopes) • Data and applications: (Katarina Sigerud) • All data management activities (mainly Oracle) • Core controls applications (Java) • SW frameworks and components (GUIs)
Infrastructure Section (Pierre Charrue) • HW procurement, installation and maintenance • File servers and backup service • User management, mapping from users to roles • Windows system admin • Window Terminal Servers • HyperV PCs, server-side virtualization (upcoming) • Software projects • Controls Middleware (CMW) • Role-based access control (RBAC) • Diagnostics and Monitoring for OP (DIAMON) • Heavily based on IT products, services and expertise Vito Baggiolini, ITTF, 4-May-2012
20’ Dev Ops • Opera-tions • Develop-ment • Quality assurance DevOps Bridging the gap between Development and IT Operations by using agile techniques
DO section members Nicolas DMN Alastair Bland Mike Grosak Niall Stapley Pavel Tarasenko Donat Csikos Jeremy Nguyen Steen Jensen Vito Baggiolini
Development Tools (“Dev”) • “Individual” SW development tools • JDK, GCC, 3rd party libraries on Linux • Tailor-made Eclipse distribution (on Linux and Windows) • Issue tracking and configuration management tools • Atlassian suite (JIRA, Confluence, Fisheye, Crucible) • Repositories • SVN: “added value”: admin, Eclipse integration, best practices • NFS and Maven Repository (Nexus) for binary artifacts • Build/release tools and servers • Java: old (ANT) and new (Maven or Gradle) • C/C++: new build tool with dependency management • Testing and Quality Assurance tools • Continuous Integration servers (Atlassian Bamboo) • Testbed (miniature accelerator controls system)
Plans for dev tools in 2012: • New, tailor-made Eclipse distribution (3rd party tools?) • Dependency analysis pluginfor Eclipse • New build and release tools for Java and C/C++ • Continuous integration also for C/C++ • Testbed becomes official part of release workflow for Control System core • Instrumentation / Usage data collection of applications, jars, Eclipse plugins, JIRA projects, etc.
Continuous Integration Triggered by changes in a dependency SIP -- The Software Improvement Process - K.Sigerud, 13th October Courtesy K. Sigerud
Test Coverage Red = not covered Green = covered Courtesy K. Sigerud SIP -- The Software Improvement Process - K.Sigerud, 13th October
Operational Unix Platforms (“Ops”) • Operational Linux and LynxOSplatforms • On Front-ends, middle-tier servers, operational consoles,boot servers • Selection, configuration and validation of new OS versions and Kernels from IT • OS installation, system administration + upgrades • OS expert user support + debugging • IT Security administration (in collaboration with S. Lueders) • Monitoring and Auditing of our systems • With BE-CO tool (DIAMON), Lemon or other tools • Deployment of operational controls software • Deployment scripts • Console Manager to launch GUIs • Wreboot / transfer.ref for process management • Heavily relying on IT services and support!
Plans for Ops (system admin) in 2012 • New SLC6-based Kernel with MRG Realtime • Discuss and implement selected security improvements (in collaboration with S>>L) • Better/more monitoring, auditing, instrumentation • New deployment tools (Puppet/GLU?) • Prepare for Long Shutdown 1 • Document dependencies between operational machines • Isolate a limited set of machines/services to be run during LS1 • Study possibilities of geographical redundancy after LS1 • Prepare for legacy clean-up (e.g. LynxOS) • Prepare for other changes (e.g. IP multicast, IPv6)
Challenges for Support • Dev Tools user community grew from ~40 people to the whole accelerator sector • Exploding JIRA: 500 users, 210 projects • Exploding SVN: 1200 active SVN projects, 200 committers • Rapidly growing number of machines (for LHC) • From 250 to 1000 embedded real-time computers • From 50 to 300 middle-tier servers (Linux + Windows) • From 100 to 450 consoles (Linux + Windows) • More support with same/fewer people • More (planned) maintenance work in general • More support requests • Too many interruptions
Strategy for support • Reduce diversity • Few official solutions, no/few special cases • Support only for official solutions • Consequent phasing out of old solutions • Optimize solutions for lighter support load • High quality, well-tested solutions that just work • Automation instead of documentation and human help • Documentation • Better tools for remote administration and diagnostics • Use APIs to IT services (e.g. programmatically create/delete NetOps entries, SVN repositories, e-groups, …) • “Delegate” • 1st line support people within user teams • Tell users to directly use IT solutions (e.g. JIRA)
38’ Outline • Overview of Accelerator Controls • Services used from IT department • BE-CO The new DevOps section • 3 initiatives to further improve quality and availability of the control system • Software Improvement Process • Accelerator Controls Exploitation Tools • Smooth Upgrades WG V. Baggiolini, ITTF 4-May-2012
Software Improvement Process (“SIP”) • Started in 2009 for Java (now also C++) • Objectives • Introduce quality assurance as an integral part of the everyday development work • Leverage tools to automate the process as much as possible • Establish guidelines and metrics to measure quality • Recommended/Obligatory Activities (“rules & tools”) • Interactive design reviews across team boundaries • Automatic code analysis (Eclipse Warnings, Findbugs) • Interactive code reviews of critical parts (Atlassian Crucible) • Unit tests with > 30% coverage (Atlassian Clover) • Continuous integration • 4 dedicated SIP days a year • Top/Flop lists as motivation ;-)
Automatic code analysis The ‘bug’ line indicated A list of ‘bugs’ ‘Bug’ explained SIP -- The Software Improvement Process - K.Sigerud, 13th October
TOP / FLOP lists SIP -- The Software Improvement Process - K.Sigerud, 13th October
Accelerator Controls Exploitation Tools (ACET) • Better Troubleshooting and diagnostics tools • “Deepen and broaden” diagnostic scope • Enable operators to do first-line diagnostic (deepen) • Enable CO experts to look into each other’s systems (broaden) • Objectives • Documentation portal • Leverage and integrate existing diagnostic tools • Collect and use dependency information • Early-warning system based on our monitoring tools (DIAMON) • Collect and analyze logfiles centrally (Splunk?) • Timeline • Proof of concepts this year • Fully operational after LS1
Smooth Upgrades WG • We have to upgrade a running controls system • Low-risk, small or peripheral changes: anytime • Riskier, large-scale or core changes: during Technical Stops
Smooth Upgrades WG • We have to upgrade a running controls system • Low-risk, small or peripheral changes: anytime • Riskier, large-scale or core changes: during Technical Stops • Sector-wide working group (with all our clients and partners) lead by me • Elaborate a workflow and spread good habits • Smooth Upgrades Workflow • Good preparation (analysis of impact, Plan B, e.g. roll-back) • Announcements to all relevant people, without spamming • Preference for backward compatible changes • Good testing (continuous integration, testbed) • Deployment with roll-back if problems • Follow-up and learning
Conclusions and next steps • I hope you got an insight into what we do and what IT services we use • BE-CO and its new DevOps section • Good clients and “value-added resellers” of IT services • I hope this laid the basis for good future collaboration with the new DevOps section • I’ll come and talk to you individually • Please share ideas, feedback, etc. with us! • Possible collaborations • IT Services we don’t yet use • … Vito Baggiolini, ITTF, 4-May-2012