1 / 17

UCB Millennium and the Vineyard Cluster Architecture

UCB Millennium and the Vineyard Cluster Architecture. Phil Buonadonna University of California, Berkeley http://www.millennium.berkeley.edu. ½ TB. DLIB. Millennium Project. Hierarchical “Cluster of Clusters”. PIII-X 64x4. Ninja. PIII 32x2. PII PIII. Gigabit Ethernet (GbE). PII 8x2.

slaliberte
Download Presentation

UCB Millennium and the Vineyard Cluster Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UCB Millenniumand theVineyard Cluster Architecture Phil Buonadonna University of California, Berkeley http://www.millennium.berkeley.edu

  2. ½ TB DLIB Millennium Project • Hierarchical “Cluster of Clusters” PIII-X 64x4 Ninja PIII 32x2 PII PIII Gigabit Ethernet (GbE) PII8x2 PII8x2 Astro Math PII8x2 PII8x2 PII8x2 Physics Bio CE UC Berkeley Millennium

  3. Millennium Agenda • Investigate recent PC technologies in Clusters • NT/Linux • VI Architecture / GbE / Distributed I/O • Harvest the lessons learned from NOW • Robust, flexible remote execution • Distributed resource management • Investigate clusters that span administrative units • Turn-key cluster deployment • Sense of ownership • Investigate the “Computational Economy” Approach • Resource management with a natural sense of ownership • Enough heterogeneous interests to be worthwhile • Form basis for Sci. Computing, Internet Services, etc. UC Berkeley Millennium

  4. Vineyard Cluster Architecture • Distributed resource utilization and management in a “Vineyard” of Clusters. Applications / Services Mgmt / Monitoring PBS I/O MPI VEXEC TOOLS REXEC - VIA / GM, GbE - Multicast - NT / Linux (2.2.x) - Stride Scheduler Rootstock Distribution UC Berkeley Millennium

  5. Outline • Millennium Project • Vineyard Cluster SW Architecture • Important Component Technologies • Rootstock cluster SW distribution facility • REXEC: Robust Linux Remote Execution • Economic-based Resource allocation • CAN communication over VIA • IO Rivers • Directions and Discussion UC Berkeley Millennium

  6. Rootstock • Disseminate easy-to-build PC cluster system software • Variety of cluster designs • well-engineered high-performance clusters • low-cost casual workgroup clusters • server farms • scalable internet servers • Root Cluster Server (CS) • Provides cluster software stock • Second-level customized distribution within each cluster from its own CS node UC Berkeley Millennium

  7. Rootstock Cluster • Collection of nodes with IP connectivity • can be dedicated subnet, w/ or w/o NAT, or any collection • run nfsd (within cluster), httpd, ssl • One node designated as Cluster Root • serves as the root of administrative operations and mgmt. • may be same or different from other nodes • may participate in normal cluster operation or not => is trusted by other nodes and has storage for dialtone • May have designated front-end nodes or not • May have dedicated cluster-area-network (eg. Myrinet) or not. UC Berkeley Millennium

  8. 2. Make the CS “graft” - specify IP address - pckg removes - dchp, dns, nis,... sanity check and build - resolv.conf, /etc/hosts, ... constructs cluster build (lease) download CS build floppy 3. CS power-on build - xfer and localize DT - add local admin scripts - node build floppy Cluster leased builds K 4. Node power-on build - local stock from CS Rootstock Mechanics Cluster System Distribution Center cluster stock - build - os - drvrs - mill SW - os mods cs 1. Cluster Stock - Rootstock build pages - Full Current Linux - all fixes and pckgs - SSL, SSH - Cluster Drivers - Cluster System Layers - rexec, mpe, pbs - Optional SW ($) - Cluster Kernal Mods IP network CAN ... 5. Cluster Update button (future) - 2nd dialtone, CF engine, rolling update UC Berkeley Millennium

  9. Computational Economy • Market-based approach to resource allocation • Optimizes for user value TimeShare API API BatchQueue Economic F.E. Access Modules Resources Apps(Value) Resource Managers UC Berkeley Millennium

  10. REXEC Remote Execution • Secure, decentralized remote execution environment • Features • Decouples resource discovery and selection • Multiple Allocation Policies (VEXECs) • Decentralized control • Each client rexec is the root for a distributed task. • Dynamic discovery and configuration • Resource announcements on a cluster multi-cast channel • All Soft State • Simple, well-defined failure and cleanup models • “They all fall down” • Secure • Translates Pricing Mechanism to Resource Allocation UC Berkeley Millennium

  11. REXEC / VEXEC • Components • rexecd, rexec & vexecd Node A Node B Node C Node D rexecd rexecd rexecd rexecd Cluster IP Multicast Channel vexecd(Policy A) vexecd(Policy B) “Node A” run indexer on Nodes AB at 3 credits/min minimum $ rexec %rexec –n 2 –r 3 indexer UC Berkeley Millennium

  12. Interactive Pricing Mechanism • Most work on “economic mechanisms” focuses on single item or batch case • hold auctions (e.g., second-price sealed bid) • integrated into Vineyard PBS • interactive case needs to be very simple • Bidder i gets bi / åkbk of CPU at rate bi • enforced by stride scheduler • Running cluster mirror usage experiment • two identical clusters for one user community with $ accounts • one free and uncontrolled • one for bid and controlled • which is more desirable to use UC Berkeley Millennium

  13. Communication / VIA • Multiple Physical Layers • Fast Ethernet • Gigabit Ethernet (Inter & Intra cluster net) • Myrinet w/ Lanai7 (Intra cluster net) • Transports • IP, IP Multicast • VI Architecture / GM • Explore integrated IPC and distributed I/O UC Berkeley Millennium

  14. AM Architecture Proc A • Components • Endpoints • Virtual Networks • Bundles • Operations • Request / Reply • Short, Med, Long • Create, Map, Free • Poll, Wait • Credit based flow control Proc B Proc C UC Berkeley Millennium

  15. AM-VIA Architecture • VI Queue (VIQ) • Logical channel for AM message type • VI & independent Send/Receive Queues • Independent request credit scheme (counter n) • MAP Object • Container for 3 VIQ’s • Short,Medium,Long • Single Registered Memory Region MAP Object UC Berkeley Millennium

  16. AM-VIA Integration • Endpoints: Collection of MAP objects • Virtual network emulated by point-to-point connections • Bundle: Pair of VI Completion Queues • Send/Receive Proc A Proc B Proc C UC Berkeley Millennium

  17. UC Berkeley Millennium

More Related