290 likes | 386 Views
HammerCloud An introduction. 08.07.2013. What is this?. HammerCloud. Framework for testing distributed systems Functional probing Stress testing and fine tuning Site profiling optimization Operations automation. The testing framework. Can test anything,
E N D
HammerCloud An introduction 08.07.2013
HammerCloud Framework for testing distributed systems Functional probing Stress testing and fine tuning Site profiling optimization Operations automation
The testing framework Can test anything, as far as you can submit() and get_results() e.g. Grid resources and Cloud resources HammerCloud is a Ganga application Active probing on infrastructures in an on demand fashion on behalf of computing operations
Operations automation Stream of results coming Why not do stuff apart from looking at it? Autoexclusion → user experience Stress testing → site optimization Benchmarking → crosscheck measurements Nightlies → distributed continuous integration
Fact sheet Python 2.7, Django 1.5 54 kLOC 52% JavaScript 24% Python 6% HTML 1.65 commits/day 4 database shards, in-memory 102 GiB of MySQL raw data
A bit of history Born inside ATLAS, in 2009 UAT tests, How big is our grid? Johannes Elmsheuser created runtest.py Dan van der Ster added the AFTs Massimo Paladin added a user interface Mario Úbeda refactored v4
In 2011 we were young... Autoexclusion was the new thing ATLAS was consolidated on v2 CMS was running just a bit of tests on v4 LHCb was thinking about it... 2011 saw 19,157,809 HC jobs submitted 2011 2012 2013
2012 consolidated things... All instances running common codebase ATLAS autoexclusion policies improved CMS replaced JobRobot for HC LHCb was thinking about it... 2012 saw 39,901,621 HC jobs submitted 2011 2012 2013
Explosion of testing? Cloud is be the thing ATLAS: continuous integration, FAX... CMS testing glideIn, starting with CAF LHCb is deploying new tests! Expecting more than 50 M jobs for 20131 2011 2012 2013 1 21,777,745 jobs as of today
Some ATLAS numbers HC v2 decommission MySQL/Django race condition
Some CMS numbers Overload of local WMS GangaCMS bug with CRAB 2 MySQL/Django race condition
Web Interface Testing Infrastructure PanDA The Grid DB
HC infrastructure as of May, 2011 Web service, submission, internal services and databases voatlas49 voatlas65 vocms38 volhcb29
The current infrastructure Web service Development and internal services voatlas148 voatlas159 voatlas286 ATLAS submission CMS submission LHCb submission voatlas49 voatlas65 vocms06 voatlas167 voatlas284 vocms207 volhcb29 voatlas285 vocms228 Main databases dev_atlas dev_cms dev_lhcb dev_core
Cluster management A elastic manager will drive all: Creation of VMs on demand 1 VM per test (rather small) 1 day - 1 week life Need a configuration management
HC infrastructure on the Agile Infrastructure Dynamic submission cluster Web service, internal services volhcb29 volhcb29 hammercloud hc-services vocms38 vocms38 submission submission Main databases dev_atlas dev_cms dev_lhcb dev_core
Current activities Common Core Statistics improvement General speed bump on MySQL AFS migration ATLAS FAX testing SLC6 migration CMS Starting testing the Common Analysis Framework Improvement of CMS specifics in custom reports LHCb Deploying new tests
Current support schema Common Core, development and deployment Myself ATLAS Myself, Johannes, Federica and Gianfranco CMS Myself, Andrea and Duncan LHCb Myself, Mario (and Stefan helping with specifics)
Three challenges Cluster management Development cycle Data mining
Cluster management Deployment of memcached cluster done! Web service optimization done! Introduction of Puppet ongoing Migration to SLC 6 ongoing Migration to EMI 2 ongoing Migration to Agile Infrastructure ongoing
Development cycle Migration to Python 2.7 done! Integration tests for ATLAS done! Sentry central log management ongoing Introduction of continuous integration waiting
Data mining General indexing review planned Schema sustainability planned Automatic online learning waiting
HammerCloud Introduction ?