1 / 22

Commissioning the CERN IT Agile Infrastructure with experiment workloads

Commissioning the CERN IT Agile Infrastructure with experiment workloads. Ramón Medrano Llamas IT-SDC-OL 14.10.2013. Agenda. The Agile Infrastructure Workload Management Systems Dynamic provisioning Conclusions. The Agile Infrastructure. Private IaaS cloud OpenStack based

minya
Download Presentation

Commissioning the CERN IT Agile Infrastructure with experiment workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL 14.10.2013

  2. Agenda The Agile Infrastructure Workload Management Systems Dynamic provisioning Conclusions Commissioning the CERN IT Agile Infrastructure with experiment workloads

  3. The Agile Infrastructure • Private IaaS cloud • OpenStack based • FederatesMeyrin and Wigner • 15,000 hypervisors by 2015 • 300,000 VMs by 2015 • Configuration management tools • Puppet, Foreman Commissioning the CERN IT Agile Infrastructure with experiment workloads

  4. Workload Management Systems Pilot based systems ATLAS: PanDA CMS: glideinWMS Both have an HTCondorbackend Using Nova and EC2 APIs Commissioning the CERN IT Agile Infrastructure with experiment workloads

  5. PanDA integration Manual HTCondor cluster deployment Long lived worker nodes Condor for pilot submission CVMFS + EOS Same setup on HLT, Helix Nebula, Rackspace… Commissioning the CERN IT Agile Infrastructure with experiment workloads

  6. glidein integration Dynamic cluster deployment (EC2) Worker node automatically managed Condor for batch orchestration CVMFS + EOS Commissioning the CERN IT Agile Infrastructure with experiment workloads

  7. Deployment Commissioning the CERN IT Agile Infrastructure with experiment workloads

  8. Support from the AI Team • Got 1,600 cores from the OpenStackteam • Testing Essex, Folsom, Grizzly • Complete freedom to access resources  • Consultancy at any time • Rapid bug report-solution cycle Commissioning the CERN IT Agile Infrastructure with experiment workloads

  9. Testing strategy Standard HammerCloud benchmark Compared with other clouds, bare metal 690,000 ATLAS jobs 337,000 CMS jobs Commissioning the CERN IT Agile Infrastructure with experiment workloads

  10. Testing summary Commissioning the CERN IT Agile Infrastructure with experiment workloads

  11. Performance testing: ATLAS, Ibex • Tested Late 2012 • Over commission of resources • CPU efficiency not reliable in this context Commissioning the CERN IT Agile Infrastructure with experiment workloads

  12. Performance testing: ATLAS, Grizzly • Tested Late 2013 • Good improvement in performance • And predictability Commissioning the CERN IT Agile Infrastructure with experiment workloads

  13. Performance testing: CMS, Ibex • Tested Mid 2013 • Reliability is incredible good Commissioning the CERN IT Agile Infrastructure with experiment workloads

  14. Performance testing: ATLAS wallclock Commissioning the CERN IT Agile Infrastructure with experiment workloads

  15. Reliability testing • Few failures • Infrastructure vs. WMS failure • Need new monitoring techniques • Difficult to measure with state of the art tools Commissioning the CERN IT Agile Infrastructure with experiment workloads

  16. Reliability testing Commissioning the CERN IT Agile Infrastructure with experiment workloads

  17. Reliability testing Commissioning the CERN IT Agile Infrastructure with experiment workloads

  18. Reliability testing Commissioning the CERN IT Agile Infrastructure with experiment workloads

  19. Dynamic provisioning • gLidein scales clusters automatically, • Slowness with non-batch requests • PanDA was still not ready for it • Studying APF and Cloud Scheduler Commissioning the CERN IT Agile Infrastructure with experiment workloads

  20. Conclusions Being able to successfully use the infrastructure Scalability tests passed  Commissioning the CERN IT Agile Infrastructure with experiment workloads

  21. Future work Unification of image lifetime Federation of other OpenStack clouds Accounting tools for clouds Better understanding of failures Commissioning the CERN IT Agile Infrastructure with experiment workloads

  22. Questions? Commissioning the CERN IT Agile Infrastructure with experiment workloads

More Related