1 / 46

The GRID Era Vanguard, Miami 23 September 2002

The GRID Era Vanguard, Miami 23 September 2002. Gordon Bell gbell@microsoft.com Bay Area Research Center Microsoft Corporation. Grid Technology…. Background Taxonomy Grids: from seti@home to arbitrary cluster platform Grid-type examples and web services Summary….

tmartell
Download Presentation

The GRID Era Vanguard, Miami 23 September 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The GRID EraVanguard, Miami23 September 2002 Gordon Bell gbell@microsoft.com Bay Area Research Center Microsoft Corporation

  2. Grid Technology… • Background • Taxonomy • Grids: from seti@home to arbitrary cluster platform • Grid-type examples and web services • Summary…

  3. Bright spots in the evolution… from prototypes to early suppliers • Early efforts • UC/Berkeley NOW; U of WI Condor; NASA: Beowulf>Airframes • Argonne (Foster el al): Grid & Globus Toolkit, Grid Forum • Entropia startup (Andrew Chien) • Andrew Grimshaw - Avaki • Making Legion vision real. A reality check. • United Devices MetaProcessor Platform • UK e-Sciences research program. Apps-based funding. Web services based Grid & data orientation. • Nimrod at Monash University • Parameter scans… other low hanging fruit • Encapsulate apps! “Excel”-- language/control mgmt. • “Legacy apps. No time or resources to modify code …independent of age, author, or language e.g. Java.” • Grid Services: Gray et al Skyservice and Terraservice • Goal: providing a web service must be as easy as publishing and using a web page…and will occur!!!

  4. Grid Taxonomy c2002 X • Taxonomy… interesting vs necessity • Cycle scavenging and object evaluation (e.g. seti@home, QCD) • File distribution/sharing for IP theft e.g. Napster • Databases & programs for a community(astronomy, bioinformatics, CERN, …NCAR) • Workbenches: web workflow chem, bio… • Exchanges… many sites operating together • Single, large objectified pipeline… e.g. NASA. • Grid as a cluster platform! Transparent & arbitrary access including load balancing • Homogeneous/heterogeneous computers • Fixed or variable network loading • Intranet, extranet, internet (many organizations) Web SVCs

  5. Grids: Ready for prime time. • Economics… thief, scavenger, power, efficiency or resource e.g. programs and database sharing? • Embarrassingly parallel apps e.g. parameter scans… “killer apps” • Coupling large, separated apps • Entry points for “web services” • Research funding… that’s where the money is.

  6. Grid Computing:Concepts, Appplications, and Technologies Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago www.mcs.anl.gov/~foster/talks.htm. Grid Computing in Canada Workshop, University of Alberta, May 1, 2002

  7. Globus Toolkit™ • A software toolkit addressing key technical problems in the development of Grid-enabled tools, services, and applications • Offer a modular set of orthogonal services • Enable incremental development of grid-enabled tools and applications • Implement standard Grid protocols and APIs • Available under liberal open source license • Large community of developers & users • Commercial support

  8. Globus Toolkit: Core Services(work in progress since 1996) • Small, standards based set of protocols.Embedded in Open Source ToolkitEnabling web services and applications • Scheduling (Globus Resource Alloc. Manager) • Low-level scheduler API • Information (Directory Service) …UDDI • Uniform access to structure/state information • Communications (Nexus) • Multimethod communication + QoS management • Security (Globus Security Infrastructure) • Single sign-on, key management • Health and status (Heartbeat monitor) • Remote file access(Global Access to Storage)

  9. Living in an Exponential World(1) Computing & Sensors Moore’s Law: transistor count doubles each 18 months Magnetohydro- dynamics star formation

  10. The 13.6 TF TeraGrid:Computing at 40 Gb/s Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 Caltech Argonne External Networks External Networks NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree NCSA, SDSC, Caltech, Argonne www.teragrid.org

  11. Presenter mic Presenter camera Ambient mic (tabletop) Audience camera Access Grid • High-end group work and collaboration technology • Grid services being used for discovery, configuration, authentication • O(50) systems deployed worldwide • Basis for SC’2001 SC Global event in November 2001 • www.scglobal.org www.accessgrid.org

  12. Human Models Grids at NASA: Aviation Safety Wing Models • Lift Capabilities • Drag Capabilities • Responsiveness Stabilizer Models Airframe Models • Deflection capabilities • Responsiveness Crew Capabilities - accuracy - perception - stamina - re-action times - SOPs Engine Models • Braking performance • Steering capabilities • Traction • Dampening capabilities • Thrust performance • Reverse Thrust performance • Responsiveness • Fuel Consumption Landing Gear Models

  13. A Large Virtual Organization: CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries 100 PB of data by 2010; 50,000 CPUs?

  14. Life Sciences: Telemicroscopy DATA ACQUISITION PROCESSING,ANALYSIS ADVANCEDVISUALIZATION NETWORK COMPUTATIONALRESOURCES IMAGING INSTRUMENTS LARGE DATABASES

  15. Nimrod/G and GriddLeS: Grid Programming with Ease David Abramson Monash University DSTC

  16. Building on Legacy Software • Nimrod • Support parametric computation without programming • High performance distributed computing • Clusters (1994 – 1997) • The Grid (1997 - ) (Added QOS through Computational Economy) • Nimrod/O – Optimisation on the Grid • Active Sheets – Spreadsheet interface • GriddLeS • General Grid Applications using Legacy Software • Whole applications as components • Using no new primitives in application

  17. Parametric Execution • Study the behaviour of some of the output variables against a range of different input scenarios. • Allows real time analysis for many applications • More realistic simulations • More rigorous science • More robust engineering

  18. You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. 1PB ~10,000 >> 1,000 disks At some point you need indices to limit searchparallel data search and analysis Goal using dbases. Make it easy to Publish: Record structured data Find data anywhere in the network Get the subset you need! Explore datasets interactively Database becomes the file system!!! You can FTP 1 MB in 1 sec. You can FTP 1 GB / min. … 2 days and 1K$ … 3 years and 1M$ Some science is hitting a wallFTP and GREP are not adequate (Jim Gray)

  19. SkyServer: delivering a web service to the astronomy community. Prototype for other sciences? Gray, Szalay, et al First paper on the SkyServer http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.doc Later, more detailed paper for database community http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.pdf http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.doc

  20. What can be learned from Sky Server? • It’s about data, not about harvesting flops • 1-2 hr. query programs versus 1 wk programs based on grep • 10 minute runs versus 3 day compute & searches • Database viewpoint. 100x speed-ups • Avoid costly re-computation and searches • Use indices and PARALLEL I/O. Read / Write >>1. • Parallelism is automatic, transparent, and just depends on the number of computers/disks. • Limited experience and talent to use dbases.

  21. Galaxy cluster size distribution Chimera Virtual Data System + iVDGL Data Grid (many CPUs) Sloan Digital Sky Survey Analysis Size distribution of galaxy clusters?

  22. Network concerns • Very high cost • $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper • DSL at home is $0.15 - $0.30 • Disks cost less than $2/GByte to purchase • Low availability of fast links (last mile problem) • Labs & universities have DS3 links at most, and they are very expensive • Traffic: Instant messaging, music stealing • Performance at desktop is poor • 1- 10 Mbps; very poor communication links • Manage: trade-in fast links for cheap links!!

  23. For More Information • www.gridtoday.com • Grid concepts, projects • www.mcs.anl.gov/~foster • The Globus Project™ • www.globus.org • Open Grid Services Arch. • www.globus.org/ogsa • Global Grid Forum • www.gridforum.org • GriPhyN project • www.griphyn.org • Avika, Entropia, UK eSciences, Condor,… • Grid books in press… Published July 1998

  24. The EndAre GRIDs already a real, useful, computing structure?When will Grids be ubiquitous?

  25. Toward a Framework for Preparing and Executing Adaptive Grid Programs An Overview of the GrADS Project Sponsored by NSF NGS Ken Kennedy Center for High Performance Software Rice University http://www.cs.rice.edu/~ken/Presentations/GrADSOverview.pdf

  26. GrADS Vision • Build a National Problem-Solving System on the Grid • Transparent to the user, who sees a problem-solving system • Software Support for Application Development on Grids • Goal: Design and build programming systems for the Grid that broaden the community of users who can develop and run applications in this complex environment • Challenges: • Presenting a high-level application development interface • If programming is hard, the Grid will not not reach its potential • Designing and constructing applications for adaptability • Late mapping of applications to Grid resources • Monitoring and control of performance • When should the application be interrupted and remapped?

  27. Today: Globus • Developed by Ian Foster and Carl Kesselman • Grew from the I-Way (SC-95) • Basic Services for distributed computing • Resource discovery and information services • User authentication and access control • Job initiation • Communication services (Nexus and MPI) • Applications are programmed by hand • Many applications • User responsible for resource mapping and all communication • Existing users acknowledge how hard this is

  28. Performance Feedback Real-time Performance Performance Problem Software Monitor Components Resource Config- Whole- Source Grid Negotiator urable Appli- Program Negotiation Runtime Object Compiler cation System Scheduler Program Binder Libraries GrADSoft Architecture Program Preparation System

  29. Configurable Object Program • Goal: Provide minimum needed to automate resource selection and program launch • Code • Today: MPI program • Tomorrow: more general representations • Mapper • Defines required resources and affinities to specialized resources • Given a set of resources, maps computation to those resources • “Optimal” performance, given all requirements met • Performance Model • Given a set of resources and mapping, estimates performance • Serves as objective function for Resource Negotiator/Scheduler

  30. Performance Feedback Real-time Performance Performance Problem Software Monitor Components Resource Config- Whole- Source Grid Negotiator urable Appli- Program Negotiation Runtime Object Compiler cation System Scheduler Program Binder Libraries GrADSoft Architecture Execution Environment

  31. Grid nj. An arbitrary distributed, cluster platform A geographical and multi-organizational collection of diverse computers dynamically configured as cluster platforms responding to arbitrary, ill-defined jobs “thrown” at it. • Costs are not necessarily favorable e.g. disks are less expensive than cost to transfer data. • Latency and bandwidth are non-deterministic, > cluster with unknown, dynamic parameters • Once a large body of data exists for a job, it is inherently bound to (set into) fixed resources. • Large datasets & I/O bound programs need to be with their data or be database accesses… • But are there resources there to share? • Costs may vary, depending on organization

  32. Cactus(Allen, Dramlitsch, Seidel, Shalf, Radke) • Modular, portable framework for parallel, multidimensional simulations • Construct codes by linking • Small core (flesh): mgmt services • Selected modules (thorns): Numerical methods, grids & domain decomps, visualization and steering, etc. • Custom linking/configuration tools • Developed for astrophysics, but not astrophysics-specific Thorns Cactus “flesh” www.cactuscode.org

  33. Gig-E 100MB/sec 17 4 2 2 12 OC-12 line But only 2.5MB/sec) 12 5 5 SDSC IBM SP 1024 procs 5x12x17 =1020 NCSA Origin Array 256+128+128 5x12x(4+2+2) =480 Cactus Example:Terascale Computing • Solved EEs for gravitational waves (real code) • Tightly coupled, communications required through derivatives • Must communicate 30MB/step between machines • Time step take 1.6 sec • Used 10 ghost zones along direction of machines: communicate every 10 steps • Compression/decomp. on all data passed in this direction • Achieved 70-80% scaling, ~200GF (only 14% scaling without tricks)

  34. Grid Projects in eScience

  35. Nimrod/G and GriddLeS: Grid Programming with Ease David Abramson Monash University DSTC

  36. Distributed computing comes to the rescue …. For each scenario Generate input files Copy them to remote node Run SMOG model Post-process output files Copy results back to root

  37. It’s just too hard! • Doing by hand • Nightmare!! • Programming with (say) MPI • Overkill • No fault tolerance • Codes no longer work as stand alone code. • Scientists don’t want to know about underlying technologies

  38. Building on Legacy Software • Nimrod • Support parametric computation without programming • High performance distributed computing • Clusters (1994 – 1997) • The Grid (1997 - ) (Added QOS through Computational Economy) • Nimrod/O – Optimisation on the Grid • Active Sheets – Spreadsheet interface • GriddLeS • General Grid Applications using Legacy Software • Whole applications as components • Using no new primitives in application

  39. Parametric Execution • Study the behaviour of some of the output variables against a range of different input scenarios. • Allows real time analysis for many applications • More realistic simulations • More rigorous science • More robust engineering

  40. In Nimrod, an application doesn’t know it has been Grid enabled Input Files Substitution Output Files Computational Nodes Root Machine

  41. Job 1 Job 2 Job 3 Job 4 Job 5 Job 6 Job 7 Job 8 Job 9 Job 10 Job 11 Job 12 Job 13 Job 14 Job 15 Job 16 Job 17 Job 18 How does a user develop an application using Nimrod? Description of Parameters PLAN FILE

  42. GriddLeS • Significant body of useful applications that are not Grid Enabled • Lessons from Nimrod • Users will avoid rewriting applications if possible • Applications need to function in the Grid and standalone • Users are not experts in parallel/distributed computing • General Grid computations have much more general interconnections than possible with Nimrod. • Legacy Applications are Components!

  43. GriddLeS … • Specification of the interconnections between components • Interfaces for discovering resources and mapping the computations to them • Locate data files in the grid and connect the applications to them; • Schedule computations on the underlying platforms and making sure the network bandwidth is available and • Monitor the progress of the grid computation and reassign work to other parts of the Grid as necessary.

  44. Today: Condor • Support for matching application requirements to resources • User and resource provider write ClassAD specifications • System matches ClassADs for applications with ClassADs for resources • Selects the “best” match based on a user-specified priority • Can extend to Grid via Globus (Condor-G) • What is missing? • User must handle application mapping tasks • No dynamic resource selection • No checkpoint/migration (resource re-selection) • Performance matching is straightforward • Priorities coded into ClassADs

More Related