Introduction to the Grid Roy Williams, Caltech

Introduction to the GridRoy Williams, Caltech

Enzo Case Study • Simulated dark matter density in early universe • N-body gravitational dynamics (particle-mesh method) • Hydrodynamics with PPM and ZEUS finite-difference • Up to 9 species of H and He • Radiative cooling • Uniform UV background (Haardt & Madau) • Star formation and feedback • Metallicity fields

Enzo Features • N-body gravitational dynamics (particle-mesh method) • Hydrodynamics with PPM and ZEUS finite-difference • Up to 9 species of H and He • Radiative cooling • Uniform UV background (Haardt & Madau) • Star formation and feedback • Metallicity fields

Adaptive Mesh Refinement (AMR) • multilevel grid hierarchy • automatic, adaptive, recursive • no limits on depth,complexity of grids • C++/F77 • Bryan & Norman (1998) Source: J. Shalf

Distributed Computing Zoo • Grid Computing • Also called High-Performance Computing • Big clusters, Big data, Big pipes, Big centers • Globus backbone, which now includes Services and Gateways • Decentralized control • Cluster Computing • local interconnect between identical cpu’s • Peer-to-Peer (Napster, Kazaa) • Systems for sharing data without centeral server • Internet Computing • Screensaver cycle scavenging • eg SETI@home, Einstein@home, ClimatePrediction.net, etc • Access Grid • A videoconferencing system • Globus • A popular software package to federate resources into a grid • TeraGrid • A $150M award from NSF to the Supercomputer centers (NCSA, SCSC, PSC, etc etc)

What is the Grid? • The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations • In contrast, the Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe.

What is the Grid? • “Grid” was coined by Ian Foster and Carl Kesselman “The Grid: blueprint for a new computing infrastructure”. • Analogy with the electric power grid: plug-in to computing power without worrying where it comes from, like a toaster. • The idea has been around under other names for a while (distributed computing, metacomputing,…). • Technology is in place to realise the dream on a global scale.

How will it work? • The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world • The Grid search engine will not only find the data the scientist needs, but also the data processing techniques and the computing power to carry them out • It will distribute the computing task to wherever in the world there is spare capacity, and send the result to the scientist

How will it work? • The GRID middleware: • Finds convenient places for the scientists “job” (computing task) to be run • Optimises use of the widely dispersed resources • Organises efficient access to scientific data • Deals with authentication to the different sites • Interfaces to local site authorisation / resource allocation • Runs the jobs • Monitors progress • Recovers from problems • … and …. • Tells you when the work is complete and transfers the result back!

Benefits for Science • More effective and seamless collaboration of dispersed communities, both scientific and commercial • Ability to run large-scale applications comprising thousands of computers, for wide range of applications • Transparent access to distributed resources from your desktop, or even your mobile phone • The term “e-Science” has been coined to express these benefits

Five Big Ideas of Grid • Federated sharing • independent management; • Trust and Security • access policy; authentication; authorization • Load balancing and efficiency • Condor, queues, prediction, brokering • Distance doesn’t matter • 20 Mbyte/sec, global certificates, • Open standards • NVO, FITS, MPI, Globus, SOAP

Grid as Federation • Grid as a federation • independent centers  flexibility • unified interface • power and strength • Large/small state compromise

Grid projects in the world • NASA Information Power Grid • DOE Science Grid • NSF National Virtual Observatory • NSF GriPhyN • DOE Particle Physics Data Grid • NSF TeraGrid • DOE ASCI Grid • DOE Earth Systems Grid • DARPA CoABS Grid • NEESGrid • DOH BIRN • NSF iVDGL • UK e-Science Grid • Netherlands – VLAM, PolderGrid • Germany – UNICORE, Grid proposal • France – Grid funding approved • Italy – INFN Grid • Eire – Grid proposals • Switzerland - Network/Grid proposal • Hungary – DemoGrid, Grid proposal • Norway, Sweden - NorduGrid • DataGrid (CERN, ...) • EuroGrid (Unicore) • DataTag (CERN,…) • Astrophysical Virtual Observatory • GRIP (Globus/Unicore) • GRIA (Industrial applications) • GridLab (Cactus Toolkit) • CrossGrid (Infrastructure Components) • EGSO (Solar Physics)

TeraGrid Wide Area Network

TeraGrid Resources

The TeraGrid VisionDistributing the resources is better than putting them at one site • Recently awarded $150M by NSF • Build new, extensible, grid-based infrastructure to support grid-enabled scientific applications • New hardware, new networks, new software, new practices, new policies • Expand centers to support cyberinfrastructure • Distributed, coordinated operations center • Exploit unique partner expertise and resources to make whole greater than the sum of its parts • Leverage homogeneity to make the distributed computing easier and simplify initial development and standardization • Run single job across entire TeraGrid • Move executables between sites

TeraGrid Allocations Policies • Any US researcher can request an allocation • Policies/procedures posted at: • http://www.paci.org/Allocations.html • Online proposal submission • https://pops-submit.paci.org/ • NVO has an account on Teragrid • (just ask RW)

Wide Variety of Usage Scenarios • Tightly coupled simulation jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO) • Thousands of independent jobs using data from a distributed data collection (NVO) • Science Gateways – "not a Unix prompt"! • from web browser with security • SOAP client for scripting • from application eg IRAF, IDL

Cluster Supercomputer job submission and queueing (Condor, PBS, ..) login node 100s of nodes user purged /scratch parallel I/O parallel file system /home (backed-up) metadata node

MPI parallel programming • Each node runs same program • first finds its number (“rank”) • and the number of coordinating nodes (“size”) • Laplace solver example Algorithm: Each value becomes average of neighbor values node 0 node 1 Serial: for each point, compute average remember boundary conditions Parallel: Run algorithm with ghost points Use messages to exchange ghost points

Storage Resource Broker (SRB) • Single logical namespace while accessing distributed archival storage resources • Effectively infinite storage • Data replication • Parallel Transfers • Interfaces: command-line, API, SOAP, web/portal.

Storage Resource Broker (SRB):Virtual Resources, Replication Similar to NVO VOStore concept certificate casjobs at JHU Browser SOAP client Command-line .... tape at sdsc File may be replicated File comes with metadata ... may be customized myDisk

Globus • Security • Single-sign-on, certificate handling, CAS, MyProxy • Execution Management • Remote jobs: GRAM and Condor-G • Data Management • GridFTP, reliable FT, 3rd party FT • Information Services • aggregating information from federated grid resources • Common Runtime Components • New web service

Public Grids for Astronomy • Data Pipelines • split into independent pieces, send to scheduler • Condor, PBS, Condor-G, DAGman, Pegasus • big data storage • infinite tape, purged disk, scratch disk • no permanent TByte disk • Services • VOStore, SIAP • Science gateways • asynchronous, secure, web, scripted

Public Grids for Astronomy • Databases • Not really supported (note: ask audience if this is true) • VO effort for this (Casjobs, VOStore) • Simulation • Forward: 100’s synchronized nodes, MPI • Inverse: Independent trials, 1000’s of jobs

Introduction to the Grid Roy Williams, Caltech