510 likes | 857 Views
xCAT+Moab Cloud. Egan Ford Team Lead, ATS STaCC (Scientific, Technical, and Cloud Computing) Project Leader, xCAT egan@us.ibm.com. Agenda. Objectives Stateless/Statelite xCAT Moab xCAT+Moab. Objectives. Increase ROI Increase utilization. Reduce management overhead.
E N D
xCAT+Moab Cloud Egan Ford Team Lead, ATS STaCC (Scientific, Technical, and Cloud Computing) Project Leader, xCAT egan@us.ibm.com
Agenda • Objectives • Stateless/Statelite • xCAT • Moab • xCAT+Moab
Objectives • Increase ROI • Increase utilization. • Reduce management overhead. • Reduce downtime/Increase availability. • Installation • Maintenance • Rapid data & application-based provisioning. • Better cross departmental use of computing resources. • Grid, On-Demand, Utility Computing, Cloud • Reduce Investment • Reduce Footprint • Reduce Power Usage
Make Environment Changeable with Stateless, Traditional or Virtual Provisioning Non-virtualizable Environments Stateless/Statelite or Traditional Full Provisioning 60% Multi-purpose Scale-out Computing Virtual Machines *12% 2008 2010 2012 *Virtualization of workloads by virtual machine growth data—Gartner Data Center Event, Las Vegas, Dec. 2008.
What is stateless? • Stateless is not a new concept. • The processors and memory (RAM) subsystems in modern machines do not maintain any state between reboots, i.e., having no information about what occurred previously. • Stateless provisioning takes this concept to the next level and removes the need to store the operating system and the operating system state locally. • Bproc/Beoproc • BlueGene/L • SAN/iSCSI provisioning and NFS-root-RW is not stateless. • State is maintained remotely (disk-elsewhere).
What is stateless provisioning? • Stateless provisioningloads the OS over the network into memory without the need to install to direct-attached disk. • OS state is not maintained locally or remotely after a reboot. For example booting an OS from CD (e.g. Live CD). • The initial start state will always be the same for any nodes using the same stateless image as if reinstalling between reboots. • SAN-boot, iSCSI-boot, NFS-root-RW are not stateless. • Think of your nodes/servers as preprogrammed appliances that serve a fixed or limited purpose. • E.g. DVD Player, toaster, etc...
Stateless is not diskless, diskfree, or disk-elsewhere • Stateless provisioningcan leverage local disk, SAN, or iSCSI for /tmp, /var/tmp, scratch, application data, and swap. • If possible diskfree is recommended. • Reduced power • Reduced cooling • Reduced downtime (Increased system MTBF) • Reduced space • Future diskfree only nodes. E.g. BlueGene. • Stateless does not change the way applications access data. • NFS, SAN, GPFS, local disk, etc… supported.
Why stateless provisioning? • Less (horizontal) software to maintain. • No inconsistencies over time. • Less (vertical) software to maintain. • Small fix purpose images vs. large general purpose images. • Less risk of a software component having a security hole. • Reduced complexity. • Greater security. • No locally stored authentication data. • Initial large installations and upgrades can be reduced to minutes of boot time verses hours or days of operating system installation time. • Reprovisioning/repurposing a large number of machines can be accomplished in a few minutes.
Why stateless provisioning? • Provides a framework for automated per application provisioning using intelligent application schedulers. • Change server function as needed (On-Demand/Utility/Cloud computing). • Increase node utilization. • A stateless image can be easily share across the enterprise enabling the promise of grid computing. • This can be automated with grid scheduling solutions. • The migration of a running virtual machine between physical machines has no large OS image to migrate.
Stateless Customer Set Limitations • Operating System must boot and operate as OS vendor intended and must not require modification. • Support • Vendor • Application • Administrator • Community • Images must be easy to create and maintain. • RPM/YUM/YAST • A real file system layout for manual configuration. • Must support a system or method of per node unique configuration. • IP Addresses, NFS mounts, etc... • License Files • Authentication configuration and credentials (must not be in image). • Avoid reengineering existing or adding new networks. • Predictive Performance • Untethered (e.g. No SAN) • May be unavoidable
What is xCAT? • Extreme Cluster(Cloud) Administration Toolkit • Open Source Linux/AIX/Windows Scale-out Cluster Management Solution • Design Principles • Build upon the work of others • Leverage best practices • Scripts only (no compiled code) • Portable • Source • Vox Populi -- Voice of the People • Community requirements driven • Do not assume anything
What does xCAT do? • Remote Hardware Control • Power, Reset, Vitals, Inventory, Event Logs, SNMP alert processing • xCAT can even tell you which light path LEDs are lit up remotely • Remote Console Management • Serial Console, SOL, Logging / Video Console (no logging) • Remote Destiny Control • Local/SAN Boot, Network Boot, iSCSI Boot • Remote Automated Unattended Network Installation • Auto-Discovery • MAC Address Collection • Service Processor Programming • Remote Flashing • Kickstart, Autoyast, Imaging, Stateless/Diskless, iSCSI • Scales! Think 100,000 nodes. • xCAT will make you lazy. No need to walk to datacenter again.
xCAT Past, Present, Future • October 1999 • xCAT Zero created for Web 1.0 • January 2000 – Present • xCAT used WW for scale-out Linux and Windows clusters • xCAT Community: 342 members from at least 29 countries • October 2007 • xCAT 1.3.0 released • xCAT 2.0-alpha • Linux Only • 2008-2010 • xCAT 2.0, 2.1, 2.2, 2.3, 2.4 released • xSeries, pSeries, zSeries • Linux, Windows, and AIX • Open Source • CLI and Web
xCAT xCAT 2 Director Director/CM CSM Add CSM Value into xCAT & Director Large HPC, Parallel batch jobs Commercial clusters SMB, Departmental, or Heterogeneous clusters 2007 2008 2009 • Open source • Flexible, scalable • Full IBM support available • Expertise required • IBM fully involved in open src development • Full IBM product & support • GUI & CLI • Easier learning curve • All IBM platforms
Where is xCAT in use today? • NSF Teragrid (teragrid.org) • ~1500 IA64 nodes (2x proc/node), 4 sites, Myrinet • A Bank in America • n clouds @ 252 – 1008 iDPX nodes each, multi-site, rollout on-going, 10 GE • University of Toronto (SciNet) • Hybrid 3864 iDPX/Linux (30,912 cores) and 104 P6/AIX (3,328 cores) • Weta Digital (xCAT -- one tool to rule them all) • 1200 Xeon blades (2x proc/node), Gigabit Ethernet • LANL Roadrunner • Phase 1: 2016 Opteron Nodes (8 core/node), IB, Stateless • Phase 3: 3240 LS21, 6480 QS22, IB, Stateless • Anonymous Petroleum Customer • 30,000 nodes, 20,000 in largest single cluster. Was Windows, now Linux. • IBM On-Demand Center • IBM GTS • "They can have my xCAT when they pry it from my cold dead hands." -- Douglas Myers, IBM GS Special Events
xCAT 2 Support Requirements Attributes of support offering • 24x7 support • Worldwide • Close to traditional L1/L2/L3 model • Identical support mechanism for system x and p • Begins with xCAT 2.0 (9/2008)
xCAT 2 Team Members & Responsibilities • Egan Ford (architecture, scaling expert, customer input, marketing, cloud, etc...) • Jarrod Johnson (architecture & development, system x HW control, 1350 test, ESX, Xen, KVM, Linux Containers) • Bruce Potter (architecture, GUI) • Linda Mellor (development, HPC integration) • Lissa Valletta (documentation, general management functions) • Norm Nott (AIX deployment, AIX porting & open source) • Ling Gao (PERCS, monitoring, scaling) • Scot Sakolish (system p HW control) • Shujun Zhou (RoadRunner cluster setup/admin) • Jay Urbanski (Open source approval process) • Adaptive Computing (Hyper V, Moab, cloud) • Sumavi • Other IBMers, BPs, and customers
xCAT 2.x Architecture • Everything is a node • Physical Nodes • Virtual Machines/LPARs/zVM • Xen, KVM, ESXi, ScaleMP • rpower, live migration, console logging, Linux and Windows guests. • Infrastructure • Terminal Servers • Switches • Storage • HMC
xCAT 2.x Scale Infrastructure • A single xCAT management node with multiple service nodes providing boot services to increasing scaling. • Can scale to 1000s and 10000s of nodes. • xCAT already provides this support for large diskfull clusters and it can by applied to stateless as well. • The number of nodes and network infrastructure will determine the number of DHCP/TFTP/HTTP servers required for a parallel reboot with no DHCP/TFTP/HTTP timeouts. • The number of DHCP servers does not need to equal the number of TFTP or HTTP servers. TFTP servers NFS mount read-only the /tftpboot and image directories from the management node to provide a consistent set of kernel, initrd, and file system images.
xCAT 2.x Provisioning • Supported Architectures • x86/x86_64 • Power/PPC/Cell • Supported OSes • Linux • Redhat • CentOS, Fedora, Scientific Linux • SuSE • AIX, • Windows • Provisioning Methods • Local Disk/SAN/Solid State • Stateless (RAM Root, Linux and AIX (xCAT 2.5)) • Statelite (NFS Root, Linux and AIX (xCAT 2.5)) • VM (data dedupe (copy-on-write), VM copy) • iSCSI (Windows and Linux) • x86/x86_64 does not require firmware-based iSCSI initiator. xCAT simulates with netboot.
xCAT Provisioning Methods Stateful –Diskful Local - HD - Flash Stateful – Disk-Elsewhere San - iSCSi Stateless – Disk Optional Memory RAM - CRAM - NFS xCAT xCAT xCAT OS Installer OS Installer Image Push Node Node Node HD Memory HD Memory HD Memory OS OS SAN/iSCSI/NAS • HD • Flash • RAM • CRAM OS • SAN • iSCSI • NAS Statelite
Provisioning Methods • Install to HD (old school). Supported by AIX (NIM), Red Hat and Red Hat-like (Kickstart), SuSE (AutoYast), Windows (native installer and imaging support (imagex)). (xCAT 2.3 partimage cloning for Windows and Linux). • Install to SAN (same as HD, old school for rich kids). • Install to Solid State (same as HD, old school for very rich kids). • Install to iSCSI. For x86 and x86-64 xCAT provides its own software iSCSI initiator based on gPXE. This allows xCAT to install any Linux or Windows that supports iSCSI without any iSCSI hardware or firmware. For PPC-based machines only Linux iSCSI is supported and gPXE is not used. • Stateless. Linux/AIX (xCAT 2.5) only. Since 2005 this has been our recommended cluster provisioning method. Just netboot the OS directly into memory and run it. No state to maintain. This is how the current Top500 #1 #2 system operates. It can boot in about 10 minutes.
IBM-HW Automagic Discovery • One-Button Provisioning • Simplified Service • No need for skilled staff in datacenter. • Simplified Cluster Expansion • Nodes must be predefined! • Idiot Proof • Even managers can do it. • Complements IBM’s Hot Swap/Add Initiatives. xCAT
xCAT 2.x Virtualization Support • KVM and Xen (Paravirtualization and Classic) (libvirt driven) • Allocate on Demand, Provision Linux/Windows Guest • Live Migration • Serial and VGA console • ESXi (Vmware API driven) (xCAT 2.4) • Allocate on Demand, Provision Linux/Windows Guest • Live Migration (Vcenter required) • No console access (WIP, xCAT 2.5) • Microsoft HyperV (xCAT 2.5) • ScaleMP (xCAT 2.4) • Power LPAR • zVM • Linux Containers (Think Solaris Zones) (xCAT 2.5 – 2.6) (libvirt driven) • RH6, SLES11 • Allocate on Demand • Provision/Boot Linux Guest Only • Suspend/Resume (WIP) • Live Migration (WIP) • http://lxc.sf.net
What is Moab? • The Brain: Intelligent Management and Automation Engine. • Policy and Service Level Enforcer. • Provides simple Web-based job management, graphical cluster administration, and management reporting tools.
xCAT and Moab Synergistically- • Provision OS images • provisioning and reconfiguration of operating system to match workload needs based on policy • Start and stop compute resources • reduce power consumption through workload-aware power and temperature management • Monitor and balance workload to fulfill SLAs • dynamically and automatically reallocate resources according to business priorities • keep system failures invisibleto employees, partners, and customers and assure continuing smooth operations • Manage and trigger virtualization • increase resource utilization • assure high availability • provision new virtual serversin minutes to handle shifting workloads • enable livemigration of virtual environments
IBM-HW VMs MCM/VC xCAT + Moab Suite MWM • Moab Cloud Manager/Visual Cloud • GUI • Moab Workload Manager • Scheduler • The Brain • Moab Service Manager • Queue • Lock Management • Universal Translator • xCAT • Actions • The Muscle • The Senses MSM xCAT
Thermal Balancing MOAB • Moab: • Job Impact • Node Information • Temperature and Policies • Sweet Spot • Node Consolidation xCAT Intense Upcoming Workload Less Intense
Idle Node Management MOAB • Moab: • Workload Prediction • Power Control • Energy Savings xCAT
Obstacle Avoidance MOAB • Moab: • Workload Prediction • LED Reporting • Higher Job Throughput • Energy Savings xCAT
Green with xCAT+Moab 6/6/2008 35 Reduce your overall power and cooling costs and decrease your organization’s carbon footprint Moab automatically and seamlessly . . . • Places idle servers in power-saving modes (power capping or power down) via xCAT. • Schedules nodes based on node temperatures and cost per watt. • Maximizes utilization of all CPU cores by orchestrating virtual environments (such as Xen, KVM, ESXi). • Reports on energy used by user, project, or resource to give you greater control of energy consumption and accountability.
Historical Resource Management Linux • Multiple Silos • Additional Management • Complexity • Under Utilization Windows
Moab Dynamic Hybrid Linux Moab • Unified System • Dynamic Resources • Management • Ease of Use • Increased Utilization xCAT Windows Windows Linux
Dynamic Service Nodes Moab SN ProvisioningPolicy Image Push SN xCAT SN etc.
Virtual Machine Automation • Create/Delete VMs • Dynamic Add/Remove • Live Migration (KVM, Xen, ESXi) • Stateless Hypervisor (KVM, Xen, ESXi, ScaleMP) • Multiple HV Support • Balancing • Consolidation • Route Around Current/Future Problems
Moab and xCAT Intelligent Management • Provisioning • Virtualization • Power • Workload Mixed Workloads • Green • Power down idle servers • Dynamic Provisioning • Respond to workload surges and priorities QoS/SLA Assurance • Apps • Users • Projects Automated Fault Avoidance and Recovery Real-time Policy-driven Resource Allocation
Future Needs Policies xCAT & Moab Current Workload Scheduling/ Provisioning Historical Data Benefits of xCAT + Moab • 90-99% Server Utilization* • Simplified Administration & Management – Easy to use interface and intelligent automation • Guarantee QoS & SLA delivery to applications and projects • Green Computing Facilities • Intelligent On-Demand Provisioning • Automated Failure Recovery • Dynamic Allocation of Resources to meet high priority needs of jobs • Rapid Deployment and Scale-out • Reporting & Charting Facilities
Case Studies: IBM Systems w/Moab SciNet Moab provides key adaptive scheduling of xCAT’s on-demand environment provisioning Cluster Resources and IBM Canada jointly worked to respond to the RFP, present to the customer, and secure this win RoadRunner 1-petaFLOP System Cell-based processors IBM’s VLP Moab enables IBM to host resources to software partners to enable testing
Power SciNet Jobs Moab Example: Shared Mem High Mem Shared Memory Visualization High Memory Specialized App Environment Etc. Loadleveler TORQUE xCAT Jobs are orchestrated based on Node Types which are most ideal for the job & on what is available. iDataPlex
Moab Series I Power 5 16-way X 86 Series X Power 4 32-way Network Network Virtual Loaner Program External Data Storage External Data Storage External Data Storage External Users Portal View of Available Resources VPN • Key Benefits • Dynamic Set Up • Dynamic Clean Up • 3X More Usage • Fraction of Cost SSH Vendor Portal “WebSphere” Resource Availability Query Reservation Commitment Database “DB2” Provisioning Manager “Tivoli” Store Reservation Records Request Resources Query Resources Custom RM “HMC” Key Concept: Initiates, Saves and Restores Reservations, Hardware Set Up, Software Images, Configuration, Etc. Provision Resources Middleware DB2, WebSphere, …. Operating System AIX, i5OS, SLES, RHEL Storage Software Source Storage Persistent Storage Storage
Summary • Available Today • xCAT 2.3 • Moab (xSeries Part Number) • Available Soon • xCAT 2.4 (April 30 2010) • Available not-so-soon • xCAT 2.5 (October 31 2010)
Who’s responsible for this stuff? • Blame me: • Egan Ford • egan@us.ibm.com • egan@sense.net (email both for faster service)