1 / 56

The Grid a brief briefing

The Grid a brief briefing. Carole Goble Information Management Group. Roadmap. What is the Grid? Example projects Relationship to the Semantic Web Example architectures The international programme. Take Home. The Grid is an international activity

menora
Download Presentation

The Grid a brief briefing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Grida brief briefing Carole Goble Information Management Group

  2. Roadmap • What is the Grid? • Example projects • Relationship to the Semantic Web • Example architectures • The international programme

  3. Take Home • The Grid is an international activity • The Grid has attracted high profile industrial and government support and funding • The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web • The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.

  4. So what’s the Grid? Isn’t it just High Performance Computing for High Energy Physicists?

  5. Why Grids? • Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. • The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. From Bill Johnston 27 July 01

  6. CERN: Large Hadron Collider (LHC) Raw Data: 1 Petabyte / sec Filtered 100Mbyte / sec = 1 Petabyte/year= 1 Million CD ROMs CMS Detector

  7. Why Grids? • A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour; • A biologist combines a range of diverse and distributed resources (databases, tools, instruments) to answer complex questions; • 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data • Civil engineers collaborate to design, execute, & analyze shake table experiments From Steve Tuecke 12 Oct. 01

  8. Why Grids? (contd.) • Climate scientists visualize, annotate, & analyze terabyte simulation datasets • An emergency response team couples real time data, weather model, population data • A multidisciplinary analysis in aerospace couples code and data in four companies • A home user invokes architectural design functions at an application service provider From Steve Tuecke 12 Oct. 01

  9. Why Grids? (contd.) • An application service provider purchases cycles from compute cycle providers • Scientists working for a multinational soap company design a new product • A community group pools members’ PCs to analyze alternative designs for a local road From Steve Tuecke 12 Oct. 01

  10. The Grid Vision • “…flexible, secure, coordinated resource-sharing among dynamic collections of individuals, institutions, and resources–what we refer to as virtual organisations” • “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” Foster, Kesselman and Tuecke, 2001

  11. The Grid Problem • Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… • central location, • central control, • omniscience, • existing trust relationships. From Steve Tuecke 12 Oct. 01

  12. Multi-disciplinary simulation Decision support and optimization Virtual prototyping Collaborative analysis and visualization Large scale distributed data management Large scale distributed computation High speed communications Dynamic collaborative virtual organisations Visualisation stretch Data Computation Large scale

  13. What is it? Where is it? How to get it? When did it? happen? Who knows it? Why does it? What are you doing? interrogation results workflows Governance & Control Technology Grid Collaboration Grid

  14. Online Access to Scientific Instruments Advanced Photon Source wide-area dissemination desktop & VR clients with shared controls archival storage real-time collection tomographic reconstruction DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago From Steve Tuecke 12 Oct. 01

  15. Supernova Cosmology

  16. Network for EarthquakeEngineering Simulation • NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other • On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC From Steve Tuecke 12 Oct. 01

  17. Home ComputersEvaluate AIDS Drugs • Community = • 1000s of home computer users • Philanthropic computing vendor (Entropia) • Research group (Scripps) • Common goal= advance AIDS research From Steve Tuecke 12 Oct. 01

  18. myGrid • Personalised extensible environments for data-intensive in silico experiments in biology • Straightforward discovery, interoperation, sharing • Workflow oriented • provenance • propagating change • Individual creativity & collaborative working • personalisation

  19. myGrid resources Question: Nucleotide binding protein in mouse Answer: P12345 in Swiss-Prot is an ATPase Terri Attwood is an expert on this Jackson Labs have a database but you need to register A paper has just been published in Proteins by the Stanford lab on this.

  20. GeoDISE – engineering design optimisation • Access to knowledge repository • Access to optimisation and search tools • Industrial analysis codes • Distributed computing and data resources in design optimisation • Applied to industrial problems - large scale CFD codes • Demonstrate scalability across distributed computational and data resources and teams of designers

  21. GeoDISE Modern engineering firms are global and distributed • “Not just a problem of using HPC” How to … ? … improve design environments … cope with legacy code / systems CAD and analysis tools, user interfaces, PSEs, and Visualization … produce optimized designs Optimisation methods … integrate large-scale systems in a flexible way Management of distributed compute and data resources Data archives (e.g. design/ system usage) … archive and re-use design history Knowledge repositories & knowledge capture and reuse tools. … capture and re-use knowledge

  22. Virtual Sky http://virtualsky.org/

  23. Broader Context • “Grid Computing” has much in common with major industrial thrusts • Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing… • Sharing issues not adequately addressed by existing technologies • Complicated requirements: “run program X at site Y subject to community policy P, providing access to data at Z according to policy Q” • High performance: unique demands of advanced & high-performance systems From Steve Tuecke 12 Oct. 01

  24. From Steve Tuecke 12 Oct. 01 Elements of the Problem • Resource sharing • Computers, storage, sensors, networks, … • Sharing always conditional: issues of trust, policy, negotiation, payment, … • Coordinated problem solving • Beyond client-server: distributed data analysis, computation, collaboration, … • Dynamic, multi-institutional virtual organisations • Community overlays on classic org structures • Large or small, static or dynamic • Problem Solving Environments

  25. Broader Context • “Grid Computing” has much in common with major industrial thrusts • Business-to-business, Peer-to-peer, Application Service Providers, Storage Service Providers, Distributed Computing, Internet Computing… • Sharing issues not adequately addressed by existing technologies • Complicated requirements: “run program X at site Y subject to community policy P, providing access to data at Z according to policy Q” • High performance: unique demands of advanced & high-performance systems From Steve Tuecke 12 Oct. 01

  26. The Globus Project™ • Close collaboration with real Grid projects in science and industry • Development and promotion of standard Grid protocols to enable interoperability and shared infrastructure • Development and promotion of standard Grid software APIs and SDKs to enable portability and code sharing • The Globus Toolkit™: Open source, reference software base for building grid infrastructure and applications • Global Grid Forum: Development of standard protocols and APIs for Grid computing From Steve Tuecke 12 Oct. 01

  27. Doesn’t Globus solve it all? • Globus ToolKit is focused on the Data/Computational layer • No database connectivity • Little brokering, and static not dynamic • Weak metadata management, workflow • Trashes firewalls • No, not everything is JCL, FTP and LDAP • Distributed computation dominates etc…etc…

  28. Is it done? • NASA Power Grid is the only one really working • http://www.ipg.nasa.gov • Linking similar supercomputers owned by the same organisation • Computation-focused • High Energy Physics is atypical

  29. AstroGrid: astronomy, etc. (UK) Earth Systems Grid: environment (US DOE) EU DataGrid: physics, environment, etc. (EU) EuroGrid: various (EU) Fusion Collaboratory (US DOE) GridLab: astrophysics, etc. (EU) Grid Physics Network (US NSF) MetaNEOS: numerical optimization (US NSF) NEESgrid: civil engineering (US NSF) RealityGrid (UK) DAME (UK) Comb-e-Chem (UK) GeoDISE (UK) iVDGL, StarLight (US/EU) DiscoveryNet (UK) myGrid (UK) GridPP (UK) Particle Physics Data Grid (US DOE) etc… Example Application Projects

  30. C ondor “ … Since the early days of mankind the primary motivation for the establishment of communitieshas been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub-systems can be integrated into multi-computer ‘communities’. … “ Miron Livny, “Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.

  31. C ondor Every Community needs a Matchmaker! • Condor uses Matchmakers to build Computing Communities out of Commodity Components • .. someone has to bring together community members who have requests for goods and services with members who offer them. • Both sides are looking for each other • Both sides have constraints • Both sides have preferences

  32. Lets look at some Architectures

  33. A Desiderata (adapted from Globus) • Software development toolkits e.g. Globus toolkit • Standard protocols, services & APIs • A modular “bag of technologies” • Enable incremental development of grid-enabled tools and applications • Reference implementations • Learn through deployment and applications • Open source A p p l i c a t i o n s Diverse global services Core services Local OS

  34. Application Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Globus Layered Grid ArchitectureCERN - High Energy Physics From Steve Tuecke 12 Oct. 01

  35. Keith Jeffery

  36. "Reproduced by permission of the IT Innovation Centre, University of Southampton." http://www.it-innovation.soton.ac.uk Three Layer Grid Abstraction Interoperability, higher level ontologies, reasoning, discovery, Reasoning services, Discovery services Fulfillment Grid Scientific Problems Knowledge Knowledge / capability Processes Information Value chain Semantics / process Jobs and Data Data Data / applications Raw Resources

  37. Architecture of a Grid Discipline Specific Portals andScientific Workflow Management Systems Applications: Simulations, Data Analysis, etc. Toolkits: Visualization, Data Publication/Subscription, etc. Grid Common Services: Standardized Services and Resources Interfaces Collaboration and Remote Instrument Services Grid Information Service UniformResourceAccess Co-Scheduling Network Cache Authentication Authorization Security Services Communication Services Global Queuing Global EventServices Data Cataloguing Uniform Data Access Fault Management Monitoring Brokering Auditing = Globus services clusters Distributed Resources national supercomputer facilities tertiary storage national user facilities Condor pools networkcaches High-speed Networks and Communications Services

  38. data publish and subscribe toolkits instrument management toolkits visualization toolkits collaboration toolkits application codes Condor-G Java/Jini Globus MPI CORBA DCOM Architecture of a Grid – upper layers • Knowledge based query • Tools to implement the human interfaces, e.g. SciRun, ECCE, WebFlow, ..... • Mechanisms to express, organize, and manage the workflow of problem solutions (“frameworks”) • Access control Problem Solving Environments Applications and Supporting Tools Grid enabled libraries (security, communication services, data access, global event management, etc.) Application Development and Execution Support Grid Common Services Distributed Resources

  39. National Partnership for Advanced Computational Infrastructure “Knowledge Based” Data Grids Ingest Services Management Access Services Relationships Between Concepts Knowledge Repository for Rules Knowledge or Topic-Based Query / Browse Knowledge XTM DTD • Rules - KQL (Model-based Access) Information Repository Attribute- based Query Attributes Semantics XML DTD SDLIP Information (Data Handling System - SRB) Data Fields Containers Folders Storage (Replicas, Persistent IDs) Grids Feature-based Query MCAT/HDF

  40. Astronomy Sky Survey Data Grid 1. Portals and Workbenches 2.Knowledge & Resource Management Bulk Data Analysis Metadata View Data View Catalog Analysis 3. Standard APIs and Protocols Concept space 4.Grid Security Caching Replication Backup Scheduling Information Discovery Metadata delivery Data Discovery Data Delivery 5. Standard Metadata format, Data model, Wire format 6. Catalog Mediator Data mediator Catalog/Image Specific Access Compute Resources Catalogs Data Archives Derived Collections 7.

  41. referenced items & collections referenced items & collections Referenced Items & Collections Portals & Clients Portals & Clients Portals & Clients NSDL Services NSDL Services Other NSDL Services NSDL Collections NSDL Collections NSDL Collections Core Services: annotation CI Services query transform CI Services topic-map registry Core Services: metadata normalizing CI Services personalization Core Collection- Building Services metadata harvesting CI Services discussion Core Collection- Building Services persistent storage CI Services visualization... User Interfaces NSDL Usage Enhancement Delivery Presentation Aggregation - Channels Information about collections Core NSDL Bus Meta-data delivery Data delivery Query Global Ids Security Network Metadata & data access-based services Virtual Collections & Mediators Collection Building

  42. ERA Concept model

  43. The De Roure Triangle Grid Computing ? e-Science Agents Web Services Semantic Web e-Business

  44. Roy Williams Paul Messina California Institute of Technology

  45. So what is going on? UK: http://www.escience-grid.org.uk/ International: http://www.gridforum.org/

  46. E-Science Programme DG Research Councils Grid TAG E-Science Steering Committee Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) £80m Collaborative projects Industrial Collaboration (£40m) From Tony Hey 27 July 01

  47. Key Elements of UK Grid Development Plan • Development of Generic Grid Middleware • Network of Grid Core Programme e-Science Centres • National Centre http://www.nesc.ac.uk/ • Regional Centres http://www.esnw.ac.uk/ • Grid IRC Grand Challenge Project • Support for e-Science Pilots • Short term funding for e-Science demonstrators • Grid Network Team * Grid Engineering Team • Grid Support Centre * Task Forces Adapted from Tony Hey 27 July 01

  48. Take Home • The Grid is an international activity • The Grid has attracted high profile industrial and government support and funding • The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web • The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.

More Related