530 likes | 691 Views
Rewriting The Rules For Enterprise IT Enterprise Grid Orchestrator. Christof Westhues SE Manager EMEA Platform Computing 2007/03/01 National Grid Meeting Ankara. Platform Enterprise Grid Orchestrator boosting EU-Grid Technology exploitation. Agenda
E N D
Rewriting The Rules For Enterprise ITEnterprise Grid Orchestrator Christof Westhues SE Manager EMEA Platform Computing 2007/03/01 National Grid Meeting Ankara
Platform Enterprise Grid Orchestratorboosting EU-Grid Technology exploitation • Agenda • Increasing the industrial impact of EU Grid Technologies Programme • About Platform Computing • Understanding Industry requirements • Unified Grid resource layer • Integrate your Grid solution with Platform EGO • Platform Collaborations – EGEE, DEISA etc. • Conclusion - Open for new ideas
Platform Enterprise Grid Orchestratorboosting EU-Grid Technology exploitation • The EU Grid Technologies Programme targets the logical next step: 'From Vision to Impacts in Industry and Society' • How to make this real? • Platform Computing holds probably the largest commercially productive install base of Grid infrastructure in industry worldwide. • Now introducing the Enterprise Grid Orchestrator (EGO), the first large scale rolled out Grid-SOI (Service Oriented Infrastructure) for technical as well as business computing. • Platform Computing EGO invites all Grid technology solutions to integrate with its unified Grid resource layer. Increasing the industrial impact of EU Grid Technologies Programme with Platform Enterprise Grid Orchestrator
Platform Computing • Gartner Group 2006 “Cool Vendor” award in I.T. Operations Management The leading systems infrastructure software company accelerating applications and delivering I.T. agility to High Performance Data Centers 14 years of grid computing experience Global network of offices, resellers & partners 7 x 24 world-wide support and consulting
Electronics Other Business Financial Services Life Sciences Government & Research IndustrialManufacturing Our Customers: from all verticals • Fidelity Investments • HSBC • JP Morgan Chase • Mass Mutual • Royal Bank of Canada • Sal Oppenheim • Société Générale • Lehman Brothers • AMD • ARM • ATI • Broadcom • Cadence • HP • IBM • Motorola • NVIDIA • Qualcomm • Samsung • ST Micro • Synopsys • Texas Instr. • Toshiba • BMW • Boeing • Bombardier • Airbus • Daimler Chrysler • GE • GM • Lockheed Martin • Pratt & Whitney • Toyota • Volkswagen • AstraZeneca • Bristol Myers- Squibb • Celera • Dupont • GSK • Johnson &Johnson • Merck • Novartis • Pfizer • Wellcome Trust Sanger Institute • Wyeth • ASCI • CERN • DoD, US • DoE, US • ENEA • Fleet Numeric • MaxPlanck • SSC, China • TACC • Univ Tokyo • Bell Canada • Cablevision • Ebay • Starwood Hotels • Telecom Italia • Telefonica • Sprint • GE • IRI • Cadbury Schweppes
Understanding Industry requirements • Lets have a look at the users • Industry – generically: professional users aiming to create results (€,$,₤) using the tool “Grid” – • Call them customers (change of perspective) • Grid value: shared resources & shared usage. • Unify many different users AND multiple different workload types • Avoid building “Grid-Silos”: don’t become part of the problem • Primary target is “agility” – speed & ease of change • Driven by business process & business change needs • As consequence of handling all workload in the Grid, orchestration, scaling, acceleration, results in agility
Understanding Industry requirements • Quality requirements • Reliability (self-healing, recovery from incidents, policy driven proactive problem containment, no job loss during operation or in error condition, while reconfig or failover. • Performance (n*10millions jobs per day throughput with 90% job-slot utilization based on 15min job-runtime, max 5min for failover) • Scalability (n*1000’s users, hosts, n*millions jobs in one logical cluster at any time, n*10millions jobs per day throughput, n*1000’s way-parallel jobs)
LSF Roadmap • LSF product roadmap is based on the feedback and interviews with 75+ customers including: Agilent, Airbus, AMD, ARM, Apple, ATI, BASF, BMS, Boehringer Ingelheim, Boeing, Broadcom, Caltech, CEA, Cineca, Cinesite, Conexant, Daimler Chrysler, DEISA, Devon Energy, Disney, DoD (ARL, ASC, ERDC, MHPCC, NAVO), DoE (LANL, LLNL, Sandia), Dreamworks, Emulex, Engineous, Ferrari, Fleet Numerics, Ford, Freescale, GE, GM, Halliburton/Landmark, Harvard, Hilti, HP, IDT, Intel, J&J, LandRoverJaguar, Lockheed Martin, LSILogic, Magma, Merck, Motorola, MSC, MTU, NCAR, NCSA, Nissan, NOAA, Novartis, NovoNordisk, NVidia, Philips, Pratt & Whitney, Pfizer, PSA, Qlogic, Qualcomm, RBC, Renesas, Samsung, Sandisk, Seagate, Shell, Skyworks, Synopsys, TACC, TenorNetworks, TI, Toshiba and Volvo
Understanding Industry requirements • Quality requirements • Why scaling counts: Performance and Scalability translates into Reliability • Reliability can be measured as “MTBF” – Mean Transactions (=Jobs) Between Failure • Platform technology meets this requirement – Technology-Leader • Support 24/7 around the globe • Non-Technical Quality requirements • Focus on Grid technology – commitment - • Reliable partner: experienced, stable, profitable.
Unpredictable infinitedemand HPC & Enterprise Applications Application Application Application Application Application Heterogeneous Enterprise Resource Network Bandwidth Servers Licenses Data Storage Finite compute resources Enterprise Grid Problem: workload characteristics Result: under-provisioning or over-provisioning
Application Application Application Application Application IT Architectures Are Still Statically Coupled and Silo’d Core Applications in the Data Center Unpredictable, InfiniteDemand Finite Computing Resources With Multiple Engineering groups collaborating on multiple designs, core and business applications can consume vast amounts of computing resources Applications are “siloed”, often procured out of different budgets at different times for different purposes
Results of statically Coupled and Silo’d Infrastructure The Need is for Variable Resources to Meet Variable Business Demand The Need is for Variable Data Center Business “Pain Points” Underutilized Resources Diffculty meeting SLAs Costly I.T. Environment Complex Unpredictable Some server silos have insufficient capacity while there is an excess capacity in others It is difficult to meet application SLAs because resources may not be available when required With application silos underutilized, excess capacity, cooling, space and power are required Coordination of resources is complex, time-consuming and error-prone Hardware failures, outages or insufficient capacity makes the environment unpredictable
Application Application Application Application Application Create a Shared Pool of Computer Systems Decouple Resources from Applications Model architecture Core Applications in the Data Center Unpredictable InfiniteDemand Computing Resources are Finite
Applications API/CLI API/CLI API/CLI API/CLI API/CLI Application Workload Management Platform LSF HPC Platform LSF Platform Symphony Platform VOVMO & ASE 3rd Party Middleware Integration LS MDA EDA CAE FSI VM’s J2EE DB’s ERP CRM BI Platform EGO SDK/API License Storage e.g. Infiniband SNMP Data Cache Security Platform EGO Standard Services System Resource Orchestration Platform EGO Kernel Deployment Service Service Director Logging Service Portal Service Event Service Infrastructure Plug-ins Manage Allocate Execute Fail-over Resources Plug-ins Solaris Desktops Aix Windows Servers Linux H/W H/W H/W H/W H/W H/W Grid Devices Open & Decoupled Architecture Platform Enterprise Grid Orchestrator SOA SOI
Platform EGO Foundation Host Group: Linux 2.4 Host Group: Linux 2.4 Host Group: Windows NT Example: Dynamic Resource Allocations – Live SOI • Platform EGO responds to requests from consumers and allocates supply according to policy – Service Oriented Infrastructure • Resource allocation: min, max, conditions, resource req. • Dynamic response: Resource re-allocation based on policies (=> SLA’s) – “lend&borrow” • Dynamic response: acquisition of additional resources
Integrate your Grid solution with Platform EGO 3rd party Middleware integration
Integrate your Grid solution with Platform EGO • Meet industrial quality requirements AND deploy innovative technologies and methods • Specific and targeted solutions as well as general purpose workload adapters can join one unified resource Grid • Reliability (self-healing, recovery from incidents, policy driven proactive problem containment) • Dynamic Resource Allocation – peak power on demand • Scalability & Performance
Integrate your Grid solution with Platform EGO • Platform EGO offers by open API/SKD policy based access to all resources in the Grid. • Access the same resource Grid from & for all workload types or Grid solutions • No Grid silos! • Access to resources on EGO includes dynamic allocations within SLA guarantees. • “Breathing” resource allocations: SLA: minimum, maximum – lend&borrow • This may well replace traditional static Advanced Reservations that were building up “virtual silos” – a virtual grid-based flavor of silo’ed infrastructure Grid technology was supposed to make redundant. • No Grid silos – not even virtual!
Platform Engagements and Collaborations • Currently, Platform Computing is engaged at: • QOSCOS • DEISA • EGEE • …
What is QosCosGrid? • Quasi Opportunistic Grid Research Project • Research project proposal to European Union Commission • 9 academic partners & Platform Computing SARLform a consortium IST Proposal Specific Targeted Research Project (STREP) IST Call 5 FP6-2005-IST-5 Quasi-Opportunistic Supercomputing for Complex Systems in Grid Environments (QosCosGrid)
What is QosCosGrid? • Target & Definition, from the proposal paper: • …. “Whereas supercomputing resources are more or less dependable, the grid approach is characterized by an opportunistic sharing of resources as they become available. This distributed quasi-opportunistic supercomputing, while not offering the quality of service of a supercomputer, will be to some degree better than the pure opportunistic grid approach. Furthermore it will enable users to develop applications with supercomputing requirements without the need to deploy supercomputers themselves.… • QosCosGrid is, therefore, an effort to use the best from two worlds: the opportunistic approach of the grid technology to sharing and using resources whenever they become available, and the reliant or dependable approach of the supercomputing. By developing an infrastructure for quasi-opportunistic supercomputing, QosCosGrid aims at providing a reliable, effortlessand cost-effectiveaccessto the enormous computational and storage resources required across a wide range of CS research areas and application domains and industrial sectors.” • Prof. Dr. Dubitzki, University of Ulster
What is QosCosGrid? • The Proposal to the EU-Commission (click here ) • Why Platform Computing?Researchers from initiating University of Ulster remembered Platform Computing from D-Grid (German e-science initiative) working groups and asked for Platform participationEU-Commission funding rule: for each research project there must be a commercial partnerPlatform is invited to enter the academic IT research scene in Europe and by this increase success in a currently under developed marketPlatform was offered a package of 45 person-months with a total of +400000 Euro funding
QosCosGrid Project Plan 30 months runtime Platform (PCC) marked
QosCosGrid Technology Stack & LSF • QosCosGrid Technology Stack: QosCosGrid research and development efforts will be based on the existing grid technology (such as GT4[[i]], Glite[[ii]] and LSF[[iii]] from PCC), and will focus on three additional layers, as depicted in Figure below. • To achieve that, one of the first activities in the project will be the roll-out of a world-spanning Platform LSF-MultiCluster grid – from Ireland across Europe, Israel and Australia. [[i]] GT4: www.globus.org/toolkit / [[ii]] Glite: glite.web.cern.ch/Glite / [[iii]] LSF: www.platform.com/Products
Heterogeneous job submissions and Co-Allocation capability Develop and extend heterogeneous job submission capability (UNIVERSUS) Virtualized Infrastructure IBM Loadleveler OpenPBS / PBSPro PLATFORM LSF NEC NQS (optional)
Heterogeneous job submissions and Co-Allocation capability Co-Allocation: Heterogeneous Multi-Site resource allocation Develop and extend heterogeneous job submission capability (UNIVERSUS) Example: Give me 200 CPU on Site1 and 300 CPU on Site2 at the same time Virtualized Infrastructure IBM Loadleveler OpenPBS / PBSPro PLATFORM LSF NEC NQS (optional) 200 CPU 300 CPU
Platform Computing - EGEE-Business-Associate • The collaboration “Plan” Step 1 • Immediate improvements for the EGEE users and resource providers • Technology boost • SLA Scheduling • Parallel job control and accounting • Resource aware scheduling – double compute efficiency • What‘s next? Step 2 • Mid term target: production Grid unifying all resources AND all users • Enable & integrate with new user groups and their resources • All kind of applications: commercial code; complex systems • Long term target: SOA/SOI for Service Oriented Science • „IT-Agility“ for scientific computing • Introduce novelties faster • respond to changing requests in time
EGEE & Platform: the “Plan” • The collaboration “Plan” Step 1: 4 Actions • 1st Action: improve LSFgLite integration • Platform LSF is one of the supported batch systems of gLite. • Actually, about 45% of all CPUs in EGEE are on LSF • May include version maintenance as well as performance improvements • Will include improved documentation and communication • Leeds to better understanding the capabilities of LSF in order to build complex algorithms that may benefit from information passing to use all the features of LSF
EGEE & Platform: the “Plan” • The collaboration “Plan” Step 1: 4 Actions • 2nd Action: SLA Scheduling • exploit LSF and gLite features to enhance user and resource provider capabilities • SLA scheduling helps both: for the • User it provides guaranteed result delivery – in time or in troughput • Resource provider, it translates to „least impact scheduling“, that is: serving the SLA user while there is still room left to host other requests. I other words: handling different Service Levels, working with different customers, at the same time • Expected results: • Resource providers will offer more resources to EGEE users under well defined SLAs • User perceives predictable result delivery, predictable behaviour of the Grid
EGEE & Platform: the “Plan” • The collaboration “Plan” Step 1: 4 Actions • 3rd Action: Parallel application support • gLite today supports sequential and provides basic support for parallel jobs based on mpich • Exploit LSF-HPC features • LSF-HPC allows control of MPI parallel jobs down to task level • Provides signalling layer for management or workflow control signals • Delivers accounting that include all children of a parallel application • Multiple MPI type in one cluster support • Is parallel application support in EGEE easy? No. • LSF-HPC might be the best choice to start with. • We may identify topics worth a research project / support action • E.g.: parallel application checkpoint / restart
EGEE & Platform: the “Plan” • The collaboration “Plan” Step 1: 4 Actions • 4th Action: Resource aware scheduling – double compute efficiency • Exploit LSF features • LSF supports a generic resource concept, thus data is resource, too • All resources can be used for scheduling decisions • Scheduling paradigm “job-follows-data” results in up to 50% gain in compute power • Is Resource aware scheduling in EGEE easy? No. • EGEE supports co-location of data and computation based on sites, but not for computation scheduling within a site • Major topics in operations model • Medium topics for the compute resources, re-think, re-build, re-budget • Maybe switch to Mid-term horizon …
SLA Scheduling for EGEE • LSF service-level agreement (SLA) scheduling: • Is a goal-oriented "just-in-time" scheduling policy that enables the user to focus on the "what and when" of a project instead of "how“ the resources need to be allocated to satisfy various workload • Defines an agreement between LSF administrators and users • Helps configure workload so that jobs complete on time • Reduces the risk of missed deadlines • Three different types of service-level goals are • Deadline • Velocity • Throughput • or a combination of the service-level goals
100% = 8 Job-slots Cluster filled to 100% Classical opportunistic scheduling time 100% Free resources for dialog users, real-time requests, online sessions, other workload SLA 1 “deadline” SLA 1 consumes 50% of cluster time “deadline” now SLA Scheduling for EGEE: SLA “Deadline” I need to work now! Early enough for me now
Free resources for dialog users, real-time requests, online sessions, other workload, other SLAs, … SLA 2 “troughput” SLA Scheduling for EGEE: SLA “Throughput” 100% more EGEE users ! SLA 2 consumes 25% of cluster time now 4 Results/hr 4 Results/hr 4 Results/hr 4 Results/hr 4 Results/hr I am a scientist, I need just as many results as I can process per time interval.
EGEE High Performance Parallel Computing • Distributed computation • “Imperfectly parallel” – the real world • inter-task-runtime-communication • often implemented using MPI – Message Passing Interface • MPI - Many Possible Implementations • Different communication patterns: • “Neighbour” tasks (defined by problem decomposition topology) • “All to all”, “some to many” (=N-to-M) • Central instance to tasks (commercial code, …)
LSF-HPC – LSF for High Performance Computing • LSF-HPC • LSF plus additional functionality • Topology aware scheduling • large SMPs • large Clusters • Task granular control for parallel computation • Generic and vendor specific MPI integrations • Signal forwarding to all tasks • Resource usage accounting for all tasks • Limit enforcement: time, mem, threads, …. • Scalability: +8000 in LSF6.2 / +16000 in LSF7.0
Platform LSF/HPC – Generic integration 1st executionhost PJL 2nd executionhost Task Task Task Task Architecture Running a parallel job using a non-integrated PJL Without the generic PJL framework, the PJL starts tasks directly on each host, and manages the job. Even if the MPI job was submitted through LSF, LSF never receives information about the individual tasks. LSF is not able to track job resource usage or provide job control. If you simply replace PAM with a parallel job launcher that is not integrated with LSF, LSF loses control of the process and is not able to monitor job resource usage or provide job control. LSF never receives information about the individual tasks.
Platform LSF/HPC – Generic integration Job submission mbatchd mbschd LSF Master host sbatchd mpirun.lsf 1st executionhost PAM PJL wrapper RES 2nd executionhost RES PJL TS TS TS TS Task Task Task Task Architecture: Using the generic PJL framework PAM is the resource manager for the job. The key step in the integration is to place TS in the job startup hierarchy, just before the task starts. TS must be the parent process of each task in order to collect the task process ID (PID) and pass it to PAM.
LSF-HPC – LSF for High Performance Computing • Advantage for EGEE, users and resource providers • Freedom to integrate and use • All MPI types • All compute architectures • May implement optional automated MPI selection, dependent on actual availability – best possible choice • Full application control, ready to implement optional parallel • Preemption - important to guarantee service levels • Suspend/resume • Checkpoint/migrate/restart
Resource aware scheduling for EGEE • The collaboration “Plan” Step 1: 4 Actions • 4th Action: Resource aware scheduling – double compute efficiency • Exploit LSF features • LSF supports a generic resource concept, thus data is resource, too • All resources can be used for scheduling decisions • Scheduling paradigm “job-follows-data” results in up to 50% gain in compute power • Is Resource aware scheduling in EGEE easy? No. • EGEE supports co-location of data and computation based on sites, but not for computation scheduling within a site • Major topics in operations model • Medium topics for the compute resources, re-think, re-build, re-budget • Maybe switch to Mid-term horizon …
Drive LAN Robot Drive Storage Drive EGEE jobs Drive Drive Drive Controller Drive Drive EGEEusers Compute nodes Data nodes EGEE: data handling in the resource center • EGEE example operations model • Job arrives and is started on compute node • Requested data is ordered from storage robot • Tape mounted and content “data set” provided to compute node via NFS • allocating 2 nodes for 1 job
Compute & Data nodes Q R Drive LSF Cluster Robot R Drive Storage R Drive Q EGEE jobs R Drive R Drive 3 R Drive Q Controller R Drive R Drive 1 EGEEusers 5 4 2 LSF-mbschd Resource aware scheduling – up to double compute efficiency Resource: dataValue: “identifier” • Resource aware scheduling • 1Job arrives and is queued, resource requirement e.g. “data=#4711” • 2Requested data-set “#4711” is ordered from storage robot by LSF • 3Tape mounted and LSF resource “data” is updated4 to “data=#4711” • As soon as resource requirements are satisfied, job is 5dispatched to the right host, holding the right data locally