1 / 29

HammerCloud An introduction

HammerCloud An introduction. 08.07.2013. What is this?. HammerCloud. Framework for testing distributed systems Functional probing Stress testing and fine tuning Site profiling optimization Operations automation. The testing framework. Can test anything,

ramona
Download Presentation

HammerCloud An introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HammerCloud An introduction 08.07.2013

  2. What is this?

  3. HammerCloud Framework for testing distributed systems Functional probing Stress testing and fine tuning Site profiling optimization Operations automation

  4. The testing framework Can test anything, as far as you can submit() and get_results() e.g. Grid resources and Cloud resources HammerCloud is a Ganga application Active probing on infrastructures in an on demand fashion on behalf of computing operations

  5. Operations automation Stream of results coming Why not do stuff apart from looking at it? Autoexclusion → user experience Stress testing → site optimization Benchmarking → crosscheck measurements Nightlies → distributed continuous integration

  6. Fact sheet Python 2.7, Django 1.5 54 kLOC 52% JavaScript 24% Python 6% HTML 1.65 commits/day 4 database shards, in-memory 102 GiB of MySQL raw data

  7. History at a glance

  8. A bit of history Born inside ATLAS, in 2009 UAT tests, How big is our grid? Johannes Elmsheuser created runtest.py Dan van der Ster added the AFTs Massimo Paladin added a user interface Mario Úbeda refactored v4

  9. In 2011 we were young... Autoexclusion was the new thing ATLAS was consolidated on v2 CMS was running just a bit of tests on v4 LHCb was thinking about it... 2011 saw 19,157,809 HC jobs submitted 2011 2012 2013

  10. 2012 consolidated things... All instances running common codebase ATLAS autoexclusion policies improved CMS replaced JobRobot for HC LHCb was thinking about it... 2012 saw 39,901,621 HC jobs submitted 2011 2012 2013

  11. Explosion of testing? Cloud is be the thing ATLAS: continuous integration, FAX... CMS testing glideIn, starting with CAF LHCb is deploying new tests! Expecting more than 50 M jobs for 20131 2011 2012 2013 1 21,777,745 jobs as of today

  12. Some ATLAS numbers HC v2 decommission MySQL/Django race condition

  13. Some CMS numbers Overload of local WMS GangaCMS bug with CRAB 2 MySQL/Django race condition

  14. Architecture and deployment

  15. Web Interface Testing Infrastructure PanDA The Grid DB

  16. HC infrastructure as of May, 2011 Web service, submission, internal services and databases voatlas49 voatlas65 vocms38 volhcb29

  17. The current infrastructure Web service Development and internal services voatlas148 voatlas159 voatlas286 ATLAS submission CMS submission LHCb submission voatlas49 voatlas65 vocms06 voatlas167 voatlas284 vocms207 volhcb29 voatlas285 vocms228 Main databases dev_atlas dev_cms dev_lhcb dev_core

  18. Forget cluster management

  19. Cluster management A elastic manager will drive all: Creation of VMs on demand 1 VM per test (rather small) 1 day - 1 week life Need a configuration management

  20. HC infrastructure on the Agile Infrastructure Dynamic submission cluster Web service, internal services volhcb29 volhcb29 hammercloud hc-services vocms38 vocms38 submission submission Main databases dev_atlas dev_cms dev_lhcb dev_core

  21. The current status

  22. Current activities Common Core Statistics improvement General speed bump on MySQL AFS migration ATLAS FAX testing SLC6 migration CMS Starting testing the Common Analysis Framework Improvement of CMS specifics in custom reports LHCb Deploying new tests

  23. Current support schema Common Core, development and deployment Myself ATLAS Myself, Johannes, Federica and Gianfranco CMS Myself, Andrea and Duncan LHCb Myself, Mario (and Stefan helping with specifics)

  24. Challenges and plans

  25. Three challenges Cluster management Development cycle Data mining

  26. Cluster management Deployment of memcached cluster done! Web service optimization done! Introduction of Puppet ongoing Migration to SLC 6 ongoing Migration to EMI 2 ongoing Migration to Agile Infrastructure ongoing

  27. Development cycle Migration to Python 2.7 done! Integration tests for ATLAS done! Sentry central log management ongoing Introduction of continuous integration waiting

  28. Data mining General indexing review planned Schema sustainability planned Automatic online learning waiting

  29. HammerCloud Introduction ?

More Related