330 likes | 561 Views
Ben Jones ben.dylan.jones@cern.ch. Using OpenStack and Puppet to deliver IaaS at CERN. Agile Infrastructure. Why change the operating model? Twice the compute, same staff levels New DC at Wigner, Budapest “We’re not special”
E N D
Ben Jones ben.dylan.jones@cern.ch Using OpenStack and Puppetto deliver IaaS at CERN NEC'2013
Agile Infrastructure NEC'2013 • Why change the operating model? • Twice the compute, same staff levels • New DC at Wigner, Budapest • “We’re not special” • Existence of open source tool chain: OpenStack, puppet, foreman, kibana • “Coffee time” provisioning of cloud servers
New Data Centre • Data centre in Geneva at the limit of electrical capacity at 3.5MW • New centre chosen in Budapest, Hungary • Additional 2.7MW of usable power • Local on-site support for hardware maintenance and installations NEC'2013
What is Cloud? NEC'2013 • Technology model • virtualization of compute, network, storage • Operational model • run your services in a certain way • Consumption model • “don’t make me talk to IT” • delivered instantly* over the wire, variable price
What is IaaS? NEC'2013
Private Cloud Software • We use OpenStack, an open source cloud project http://openstack.org • ATLAS and CMS High Level Trigger clouds • HEP Clouds at BNL, IN2P3, NECTaR, FutureGrid, … • Clouds at HP, IBM, Rackspace, eBay, PayPal, Yahoo!, Comcast, Bloomberg, Fidelity, NSA, CloudWatt, Numergy, Intel, Cisco … NEC'2013
OpenStack NEC'2013
CERN Network Database Block Storage Provider Cinder Network Account mgmtsystem Compute Scheduler Keystone Nova Microsoft Active Directory Horizon Glance CERN DB on Demand NEC'2013
Nova NEC'2013 • Cloud computing fabric controller • Network manager modified for CERN • integration with network database • specific to our use case, not pushed upstream • Nova Compute aware of CERN DNS & AD • Multiple availability zones • special zone for Hyper-V • scheduler has filter based on image distribution metadata
Glance NEC'2013 • Services for discovering, registering and retrieving VM images • Aim for automated image creation / update • common process for Linux & Windows images • common tools – Aeolus Oz • CERN tools to hook up Oz & Glance API • Images for all CERN supported OS • user defined images supported • Initial contextualization via cloud-init • Cloudbase contributed cloud-init for windows
Keystone NEC'2013 • Identity service: authentication, authorization and service catalog • Full integration with Active Directory via LDAP • CERN’s AD: 44K users & 29K groups • Minimal changes to AD • CERN submitting changes upstream • Account mgmt. System Integration for project creation / deletion • SSL for everything
Operational practices evolving NEC'2013 • Security incidents • old: reinstall, new: replace with new VM • Misconfiguration requiring reboot • Resize a service • lxplus.cern.ch add VMs to serve demand • resize VMs (or rather, replace with bigger) • In future resize services automatically
Service Models • Pets are given names like pussinboots.cern.ch • They are unique, lovingly hand raised and cared for • When they get ill, you nurse them back to health • Cattle are given numbers like vm0042.cern.ch • They are almost identical to other cattle • When they get ill, you get another one NEC'2013
Some other use cases… • Hippos are cattle with block storage. Useful where there is redundancy, ieMongoDB, Cassandra. • Canaries are cattle at high risk to give early warning of failures. Fail fast and fix. NEC'2013
Heat NEC'2013 Heat orchestrates composite cloud apps (stacks) HA (restarts resources) & “auto-scaling”
Configuration Management NEC'2013 • Adopted puppet • widely used, large community, scales • Needed to make reproducible services in the CERN CC • Simplify the configuration of OpenStack itself. • community modules from RH, puppetlabs, users
Accounting NEC'2013 • CERN computing is funded from CERN central budgets, no billing but quotas • Experiments don’t have credit cards • What to do when quota is exceeded? • Unused capacity? • low SLA usage to plug the gaps? • Fair share across the cloud? • Worked for supercomputers but heavy for clouds at scale • Bursting to public clouds?
Ceilometer NEC'2013 • Accounting for OpenStack by project • Collects statistics from each compute node • common OpenStack message bus • ShardedMongoDB store • 2gb / day • HyperV in Havana • Cinder statistics upcoming
CERN Status NEC'2013 • CERN IT OpenStack Cloud • Folsom based service ~500 hypervisors on KVM and Hyper-V • New “grizzly” production service opened late July • 280 hypervisors, 600 VMs, 50 projects and growing rapidly • High availability components using load balancing • ie 3 nova controllers per cell • All Puppet managed to configure OpenStack • LHC experiment farms • CMS currently running 1,300 hypervisors with 50,000 cores • ATLAS starting to ramp up to a similar size • Other science grid sites moving to private cloud on OpenStack • Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP, …
Outlook NEC'2013 • Track stable Grizzly releases in RedHat RDO • Up to date but not too close to the leading edge • Scaling • Expect 15,000 hypervisors, 150,000 VMs by 2015 • Manageability • Metering, Orchestration with Heat, Bare Metal • Functionality • Load Balancing, High Availability Storage and Pets
What have we learnt? NEC'2013 • Automate everything from the beginning • Puppet and Stackforge are a great help • Distributions and appliances make getting started much easier • Constant rate of change requires a different approach • Focus on core technologies and keep up to date • Track new projects but don’t adopt too early unless strategic • Many of our users are cloud aware • Culture changes for legacy application coding and IT services • Communities are major motivators • But administrators need to engage and adapt rather than re-invent
Conclusions NEC'2013 CERN IT is re-engineering to deliver additional capacity to 11,000 physicists within fixed resources Clouds models can simplify current large scale computing infrastructure OpenStack and its ecosystem allows us to meet this challenge and help others through open source
Questions ? NEC'2013
Preproduction Service NEC'2013
mcollective, yum Bamboo Puppet AIMS/PXE Foreman JIRA OpenStack Nova git Koji, Mock Yum repo Pulp Active Directory / LDAP Hardware database Lemon / Hadoop / LogStash / Kibana Puppet-DB NEC'2013
Training for Newcomers Buy the book rather than guru mentoring NEC'2013
Job Opportunities NEC'2013