580 likes | 716 Views
Session E: Zoni. Zoni. Richard Gass Intel. Sessions: (A) Intro 8.30-9.00 (B) Hadoop 9.00-10.00 Break 10.00-10.30 Hadoop 10.30-12:00 Lunch 12.00-1.30 Pig 1.30-2.00 (D) Tashi 2.00-3.00 Break 3.00-3.30
E N D
Zoni Richard Gass Intel
Sessions: (A) Intro 8.30-9.00 (B) Hadoop 9.00-10.00 Break 10.00-10.30 Hadoop 10.30-12:00 Lunch 12.00-1.30 Pig 1.30-2.00 (D) Tashi 2.00-3.00 Break 3.00-3.30 Zoni 3.30-4.45 Wrap up 4.45-5.00 Overview Plans/Status User View Administration Installation Summary Agenda
Open Cirrus Stack Compute + network + storage resources Management and control subsystem Power + cooling Physical Resource set (Zoni) service Credit: John Wilkes (HP)
Open Cirrus Stack Zoni clients, each with theirown “physical data center” Eucalyptus Tashi/HDFS NFS storage service Experiment Zoni service
Open Cirrus Stack Virtual clusters Virtual cluster Virtual cluster Eucalyptus Tashi/HDFS NFS storage service Experiment Zoni service
Open Cirrus Stack Application running On Hadoop On Tashi virtual cluster On Zoni On real hardware Web Service BigData App Hadoop Virtual cluster Virtual cluster Eucalyptus Tashi/HDFS NFS storage service Experiment Zoni service
Zoni service Open Cirrus stack - Zoni • Initial PRS implementation from HP • Re-write from Intel (in collaboration • with HP) soon to be contributed to Apache Software Foundation • Zoni service goals • Provide mini-datacenters to users • Isolate mini-datacenters from each other • Zoni service approach • Allocate sets of physical co-located nodes, isolated inside VLANs. • Allow running without virtualization overhead • Necessary for predictable QoS • e.g. cache interference
Goals • Reduce complexity in allocating physical resources • Gain User Confidence • Show users that we can efficiently allocate/deallocate resources • Stop the squatting • Incentives • HP’s tycoon (economic model) • Simple points scheme for good behavior or early return
Isolate domains Provision system software Provide platform control On/Off Provide boot debug VLAN PXE IPMI IPMI Responsibilities of Zoni
VLAN • Virtual LAN technology allows a single physical network to appear as several isolated networks • Ethernet packets are tagged with a VLAN id • Switches and NICs enforce the policies associated with each VLAN • By associating Zoni domains with different VLANs, they can be isolated from each other • The Zoni system provides the interfaces necessary to abstract switch configuration programming across multiple switch vendors
Pre- eXecution Environment PXE • Enables provisioning of OS image over the network • On machine boot, the NIC firmware contacts a PXE server via the DHCP process for the appropriate kernel and initrd to load • Once loaded, the init scripts in the initrd can pull the filesystem to the machine • In our environment, we download the desired filesystem to a ramdisk from a NFS server– enabling a very rapid provisioning (30 seconds or less) while leaving the host filesystem undisturbed
Intelligent Platform Management Interface IPMI • Defines a standardized, abstracted, message-based interface to intelligent platform management hardware • Defines standardized records for describing platform management devices and their characteristics • Operates independently of the operating system • Enables cross-platform management
Some History • Previous prototype developed at HP Labs • Focus on economic model • Nice web interface which will be available upon reconvergence of code
Zoni Roadmap • Stage 1 • Manages all cluster hardware • Handles resource provisioning • Provides interfaces for VLAN definition/programming • Administrator is still in the allocation decision-making loop • Stage 2 • Introduces a request queue and primitive scheduler • Admin may still be in loop, definitely for special cases • Enables provisioning of OS to local disk • Enables virtual disk conversion to physical • Stage 3 • Incentives module added (Tycoon) • Tashi integration
Zoni Roles • Admin: root of all authority • Controls the physical resources • User: requests domains • Controls the domain, once allocated
Domains • A Domain is the unit of Zoni isolation • A simple domain is a set of compute nodes gathered into a single VLAN • Nodes are allocated from pools of available resources
Zoni Domains * ISOLATION Domain 1 Services Server Pool 1 Gateway Domain 0 Services DNS PXE DHCP HTTP Domain 1 Domain 0 DNS PXE DHCP HTTP Server Pool 0 Server Pool 0
The Zoni Interface • Users and Admins currently interact with the Zoni system through a command line interface • This interface both: • Queries and updates records in the Zoni database • Wraps the various commands that must be issued to effect changes in the cluster • Zoni is currently a centralized system; users log into the Zoni manager to issue commands • An RPC interface is planned for the near future
Zoni Usage Usage: zoni <options> Standard options: --help [show this help message and exit] --version [show program's version number and exit] --verbose [be verbose] Common options: --nodeName <name> [Specify node] --switchPort <port> [Specify switchport switchname:portnum]
Image Management Interface --addImage <img> [Add image to Zoni] --delImage <img> [Delete image]
User Allocation Interface --createDomain <name> • May fail if name already exists --submitDomainRequest <name> --destroyDomain –domain <name> --requestNodes --domain <name> [--count <N>] [--nodeName <name>] [--cores <n> …] • Add the requested nodes to the domain --assignImage <kernel> <image> • Assign image to resource --associateNewVlan –domain <name> • Allocate an unused VLAN number to domain --createReservation <YYYYMMDD> <YYYYMMDD> • Specify duration of node reservation where start time may be “ASAP” --reservationNotes “notes” --updateReservation
Admin Allocation Interface --allocateNode [Assign node to a user] --releaseNode [Release node allocation] --vlanIsolate <vlanid> [Specify vlan for isolation]
Hardware Control --hardware [Make hardware call] --powerStatus [Get power status] --rebootNode [Reboot node (Soft)] --powerCycle [Power Cycle (Hard)] --powerOff [Power off node] --powerOn [Power on node]
Query Interface --showReservations [Show current node reservations] --showResources [Show available resources to choose from] --procs <N> [Filter by number of processors] --clock <N> [Filter by processor clock] --memory <N> [Filter by amount of memory (Bytes)] --cpuflags “flags” [Filter by CPU flags] --cores <N> [Filter by number of cores] --showPxeImages [Show available PXE images to choose from] --showPxeImageMap [Show PXE images host mapping]
Administration Interface --admin Enter Admin mode --addPxeImage [Add PXE image to database] --enableHostPort [Enable a switch port] --disableHostPort [Disable a switch port] --removeVlan <vlanId> [Remove vlan from all switches] --createVlan <vlanId> [Create a vlan on all switches] --addNodeToVlan <vlanId> [Add node to a vlan] --removeNodeFromVlan <vlanId> [Remove node from a vlan] --setNativeVlan <vlanId> [Configure native vlan] --restoreNativeVlan [Restore native vlan] --removeAllVlans [Removes all vlans from a switchport] --sendSwitchCommand “<command>” [Send Raw Switch Command, BE CAREFUL] --interactiveSwitchConfig “<switchname>” [Interactively configure a switch] --showSwitchConfig <nodename> [Show switch config for node]
Typical Workflow • Admin queries available systems • Admin requests systems with desired user configuration • i.e., cores, memory, image, duration, etc • Request goes in queue • Zoni locates resources and provides a list to admin/Tashi. • Admin/Tashi moves VMs to free resources • Add node to blacklist and tell hadoop to reload • Zoni allocates resources • Provides estimated time to get resources • User can query • Zoni sends notification when allocated • Zoni reclaims resources and adds them back into respective pools • User may extend time period before expiration
System Servers Zoni client queries Zoni server for available resources User chooses machine attributes and submits a request for the resources for some time period Zoni queries DB to locate available resources VM VM VM VM VM Management Servers Results are sent back to the client VM VM VM VM VM VM VM VM DB VM VM VM VM Zoni server VM Node 1 : 8 Core, 16G memory, 6TB disk,30day Node 2 : 8 Core, 16G memory, 6TB disk,30 day Node 3 : 8 Core, 16G memory, 6TB disk,90 day Node 4 : 8 Core, 16G memory, 6TB disk,1 day Node 5 : 8 Core, 8G memory, 2TB disk, 90 day Node 6 : 8 Core, 8G memory, 2TB disk,90 day Node 7 : 8 Core, 8G memory, 2TB disk,90 day Node 8 : 8 Core, 8G memory, 2TB disk,90 day Node 9 : 8 Core, 8G memory, 2TB disk,90 day Node 10: 8 Core, 8G memory, 2TB disk,30 day … Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM
Request Queue System Servers VM VM VM VM VM Management Servers VM VM VM VM VM VM VM VM DB VM VM VM R1 VM Zoni server VM Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM
System Servers VM VM VM VM VM VM VM VM VM VM Management Servers VM VM VM Zoni processes request and identifies physical machines that satify the user request VM VM VM VM VM VM DB VM VM VM VM Zoni server VM Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM
System Servers VM VM VM VM VM Management Servers VM VM VM VM VM VM VM VM VM VM VM Zoni sends request to Tashi to free selected nodes VM VM DB VM VM VM VM VM Zoni server Tashi moves virtual machines off of selected nodes VM Tashi Cluster Manager VM VM VM VM VM VM Zoni client PXE server Administrator or Cluster Manager VM VM VM VM VM
System Servers VM VM VM VM VM Management Servers VM VM VM VM VM VM VM Physical machines boot up with PXE image VM Zoni allocated the physical machines to the requested user and isolates them from the network using VLANs Zoni reboots the physical machine and sets PXE image to users VM DB VM VM VM VM Zoni server Tashi notifies Zoni that migration of virutal machines has completed VM VM VM Tashi Cluster Manager VM VM VM VM VM VM VM VM VM VM Zoni client PXE server PXE PXE PXE PXE Administrator or Cluster Manager VM VM VM Virtual disk image is converted to PXE image VM VM VM
System Servers VM VM VM VM VM Management Servers VM VM PXE VM VM VM VM VM VM PXE DB Zoni updates reservation database VM PXE VM VM User connects to the machines and starts running experiments VM Zoni server VM VM VM Tashi Cluster Manager VM VM VM VM VM VM VM VM VM VM Zoni client Zoni client queries server for allocation PXE server Administrator or Cluster Manager VM VM VM VM VM VM
After allocation • A returned Zoni node is typically untrusted • update the system to default settings • Clean physical node by PXE booting a reset image • Restore all setting to defaults (address, IPMI passwords) • Repartition and format disks • (Option) Trust images from some users • No re-format needed • Clean network configuration (VLAN)
Example: Minicluster ./zoni –addimage amd64-rgass-testing:hardy:8.03 ./zoni –assignimage amd64-rgass-testing –nodename r1r1u25 ./zoni –allocatenode –nodename r1r1u25 –username rgass –reservationDuration 30 –vlanisolate 300 –notes “Practice allocation” ./zoni –addnodetovlan 300 –nodename r1r1u25 ./zoni –hardware –rebootnode –nodename r1r1u25
Example: CloudConnect 1 • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd • Create a VM that acts as a SSH gateway and a NAT for the private cluster • Dynamically configure switches to support the networking experiment
100Mb/s Switch 100Mb/s Switch VLAN #1: Electrical Rack C region Rack A region Rack B region Rack D region Rack D Rack C Rack A Rack B M 1 Gb/s Switch M 4x1Gb trunk link VLAN #2: Optical - server - switch 4Gb/s Switch - manager M 1Gb/s Switch Example: CloudConnect 1 • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd • Create a VM that acts as a SSH gateway and a NAT for the private cluster • Dynamically configure switches to support the networking experiment
Example: CloudConnect 2 for i in r1r1u12 r1r1u13 r1r1u14 r1r1u15;do ./zoni --admin --setnativevlan 300 -n ${i} ./zoni --admin --addnodetovlan 800 -n ${i} ./zoni --admin --addnodetovlan 801 -n ${i} ./zoni --admin --addnodetovlan 802 -n ${i} done ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface range ethernet g(25-28); spanning-tree disable" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g25;switchport mode trunk;exit" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g26;switchport mode trunk;exit" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g27;switchport mode trunk;exit" ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g28;switchport mode trunk;exit“ ./zoni --admin --switchport sw0-r1r1:25 --setnativevlan 802 -v ./zoni --admin --switchport sw0-r1r1:26 --setnativevlan 804 -v ./zoni --admin --switchport sw0-r1r1:27 --setnativevlan 806 -v ./zoni --admin --switchport sw0-r1r1:28 --setnativevlan 808 -v for i in $(seq 12 16);do ./zoni --hardware --rebootnode -n r1r1u${i} done
Future Work • Introduces a request queue and primitive scheduler • Enable provisioning of OS to local disk • Enables virtual disk conversion to physical • Integration with Tashi… • Would enable free exchange of resources between the Tashi pool and the free pool
Necessary Components • DHCP Server • PXE Server • NFS Server • DNS Server (optional) • Configurable switches • New switch types may require new Zoni modules • Hardware access method • E.g. IPMI /iLO/DRAC • IP-addressable PDUs enable rescue if IPMI becomes compromised
Zoni Register * • Gather unique identifier from system • Mac Address / Dell Tag • Assign hostname (r1r2u24) • Switch/PDU info Example • J3GPGD r1r2u24 172.16.129.100 tashi_nm sw0-r1r2:9 pdu0-r1r2:18