360 likes | 560 Views
Grid Computing: Concepts, Applications, and Technologies. Dheeraj Bhardwaj Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Outline. The technology landscape Grid computing The Globus Toolkit Applications and technologies
E N D
Grid Computing:Concepts, Applications, and Technologies Dheeraj Bhardwaj Department of Computer Science and Engineering Indian Institute of Technology, Delhi
Outline • The technology landscape • Grid computing • The Globus Toolkit • Applications and technologies • Data-intensive; distributed computing; collaborative; remote access to facilities • Grid infrastructure • Open Grid Services Architecture • Global Grid Forum • Summary and conclusions
Outline • The technology landscape • Grid computing • The Globus Toolkit • Applications and technologies • Data-intensive; distributed computing; collaborative; remote access to facilities • Grid infrastructure • Open Grid Services Architecture • Global Grid Forum • Summary and conclusions
Living in an Exponential World(1) Computing & Sensors Moore’s Law: transistor count doubles each 18 months Magnetohydro- dynamics star formation
Living in an Exponential World:(2) Storage • Storage density doubles every 12 months • Dramatic growth in online data (1 petabyte = 1000 terabyte = 1,000,000 gigabyte) • 2000 ~0.5 petabyte • 2005 ~10 petabytes • 2010 ~100 petabytes • 2015 ~1000 petabytes? • Transforming entire disciplines in physical and, increasingly, biological sciences; humanities next?
Data Intensive Physical Sciences • High energy & nuclear physics • Including new experiments at CERN • Gravity wave searches • LIGO, GEO, VIRGO • Time-dependent 3-D systems (simulation, data) • Earth Observation, climate modeling • Geophysics, earthquake modeling • Fluids, aerodynamic design • Pollutant dispersal scenarios • Astronomy: Digital sky surveys
Ongoing Astronomical Mega-Surveys • Large number of new surveys • Multi-TB in size, 100M objects or larger • In databases • Individual archives planned and under way • Multi-wavelength view of the sky • > 13 wavelength coverage within 5 years • Impressive early discoveries • Finding exotic objects by unusual colors • L,T dwarfs, high redshift quasars • Finding objects by time variability • Gravitational micro-lensing MACHO 2MASS SDSS DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE ...
Coming Floods of Astronomy Data • The planned Large Synoptic Survey Telescope will produce over 10 petabytes per year by 2008! • All-sky survey every few days, so will have fine-grain time series for the first time
Data Intensive Biology and Medicine • Medical data • X-Ray, mammography data, etc. (many petabytes) • Digitizing patient records (ditto) • X-ray crystallography • Molecular genomics and related disciplines • Human Genome, other genome databases • Proteomics (protein structure, activities, …) • Protein interactions, drug delivery • Virtual Population Laboratory (proposed) • Simulate likely spread of disease outbreaks • Brain scans (3-D, time dependent)
A Brainis a Lotof Data!(Mark Ellisman, UCSD) And comparisons must be made among many We need to get to one micron to know location of every cell. We’re just now starting to get to 10 microns – Grids will help get us there and further
An Exponential World: (3) Networks(Or, Coefficients Matter …) • Network vs. computer performance • Computer speed doubles every 18 months • Network speed doubles every 9 months • Difference = order of magnitude per 5 years • 1986 to 2000 • Computers: x 500 • Networks: x 340,000 • 2001 to 2010 • Computers: x 60 • Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
Outline • The technology landscape • Grid computing • The Globus Toolkit • Applications and technologies • Data-intensive; distributed computing; collaborative; remote access to facilities • Grid infrastructure • Open Grid Services Architecture • Global Grid Forum • Summary and conclusions
Evolution of the Scientific Process • Pre-electronic • Theorize &/or experiment, alone or in small teams; publish paper • Post-electronic • Construct and mine very large databases of observational or simulation data • Develop computer simulations & analyses • Exchange information quasi-instantaneously within large, distributed, multidisciplinary teams
Evolution of Business • Pre-Internet • Central corporate data processing facility • Business processes not compute-oriented • Post-Internet • Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) • Outsourcing becomes feasible => service providers of various sorts • Business processes increasingly computing- and data-rich
The Grid “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
A Comparison • SERIAL • Fetch/Store • Compute • PARALLEL • Fetch/Store • Compute/ communicate • Cooperative game • GRID • Fetch/Store • Discovery of Resources • Interaction with remote application • Authentication / Authorization • Security • Compute/Communicate • Etc
A Comparison • SERIAL • Fetch/Store • Compute • PARALLEL • Fetch/Store • Compute/ communicate • Cooperative game • GRID • Fetch/Store • Discovery of Resources • Interaction with remote application • Authentication / Authorization • Security • Compute/Communicate • Etc
Distributed Computing vs. GRID • Grid is an evolution of distributed computing • Dynamic • Geographically independent • Built around standards • Internet backbone • Distributed computing is an “older term” • Typically built around proprietary software and network • Tightly couples systems/organization
Web vs. GRID • Web • Uniform naming access to documents • Grid - Uniform, high performance access to computational resources http:// http:// Software Catalogs Sensor nets Colleges/R&D Labs
Is the World Wide Web a Grid ? • Seamless naming? Yes • Uniform security and Authentication? No • Information Service? Yes or No • Co-Scheduling? No • Accounting & Authorization ? No • User Services? No • Event Services? No • Is the Browser a Global Shell ? No
What does the World Wide Web bring to the Grid ? • Uniform Naming • A seamless, scalable information service • A powerful new meta-data language: XML • XML will be standard language for describing information in the grid • SOAP – simple object access protocol • Uses XML for encoding. HTML for protocol • SOAP may become a standard RPC mechanism for Grid services • Uses XML for encoding. HTML for protocol • Portal Ideas
The Ultimate Goal • In future I will not know or care where my application will be executed as I will acquire and pay to use these resources as I need them
Why Grids? • Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. • The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and Engineering.
An Example Virtual Organization: CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries 100 PB of data by 2010; 50,000 CPUs?
~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Grid Communities & Applications:Data Grids for High Energy Physics www.griphyn.org www.ppdg.net www.eu-datagrid.org
The Grid:A Brief History • Early 90s • Gigabit testbeds, metacomputing • Mid to late 90s • Early experiments (e.g., I-WAY), academic software projects (e.g., Globus, Legion), application experiments • 2002 • Dozens of application communities & projects • Major infrastructure deployments • Significant technology base (esp. Globus ToolkitTM) • Growing industrial interest • Global Grid Forum: ~500 people, 20+ countries
The Grid World: Current Status • Dozens of major Grid projects in scientific & technical computing/research & education • www.mcs.anl.gov/~foster/grid-projects • Considerable consensus on key concepts and technologies • Open source Globus Toolkit™ a de facto standard for major protocols & services • Industrial interest emerging rapidly • IBM, Platform, Microsoft, Sun, Compaq, … • Opportunity: convergence of eScience and eBusiness requirements & technologies
Outline • The technology landscape • Grid computing • The Globus Toolkit • Applications and technologies • Data-intensive; distributed computing; collaborative; remote access to facilities • Grid infrastructure • Open Grid Services Architecture • Global Grid Forum • Summary and conclusions
Grid Technologies:Resource Sharing Mechanisms That … • Address security and policy concerns of resource owners and users • Are flexible enough to deal with many resource types and sharing modalities • Scale to large number of resources, many participants, many program components • Operate efficiently when dealing with large amounts of data & computation
Aspects of the Problem • Need for interoperability when different groups want to share resources • Diverse components, policies, mechanisms • E.g., standard notions of identity, means of communication, resource descriptions • Need for shared infrastructure services to avoid repeated development, installation • E.g., one port/service/protocol for remote access to computing, not one per tool/appln • E.g., Certificate Authorities: expensive to run • A common need for protocols & services
The Hourglass Model • Focus on architecture issues • Propose set of core services as basic infrastructure • Use to construct high-level, domain-specific solutions • Design principles • Keep participation cost low • Enable local control • Support for adaptation • “IP hourglass” model A p p l i c a t i o n s Diverse global services Core services Local OS
Application Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Layered Grid Architecture(By Analogy to Internet Architecture)
Globus Toolkit™ • A software toolkit addressing key technical problems in the development of Grid-enabled tools, services, and applications • Offer a modular set of orthogonal services • Enable incremental development of grid-enabled tools and applications • Implement standard Grid protocols and APIs • Available under liberal open source license • Large community of developers & users • Commercial support
General Approach • Define Grid protocols & APIs • Protocol-mediated access to remote resources • Integrate and extend existing standards • “On the Grid” = speak “Intergrid” protocols • Develop a reference implementation • Open source Globus Toolkit • Client and server SDKs, services, tools, etc. • Grid-enable wide variety of tools • Globus Toolkit, FTP, SSH, Condor, SRB, MPI, … • Learn through deployment and applications
Key Protocols • The Globus Toolkit™ centers around four key protocols • Connectivity layer: • Security: Grid Security Infrastructure (GSI) • Resource layer: • Resource Management: Grid Resource Allocation Management (GRAM) • Information Services: Grid Resource Information Protocol (GRIP) and Index Information Protocol (GIIP) • Data Transfer: Grid File Transfer Protocol (GridFTP) • Also key collective layer protocols • Info Services, Replica Management, etc.