230 likes | 350 Views
Or, “Science 2.0”. Cyberinfrastructure and the Role of Grid Computing. Ian Foster Computation Institute Argonne National Lab & University of Chicago. “Web 2.0”. Software as services Data- & computation-rich network services Services as platforms
E N D
Or, “Science 2.0” Cyberinfrastructure and the Role of Grid Computing Ian Foster Computation Institute Argonne National Lab & University of Chicago
“Web 2.0” • Software as services • Data- & computation-richnetwork services • Services as platforms • Easy composition of services to create new capabilities (“mashups”)—that themselves may be made accessible as new services • Enabled by massive infrastructure buildout • Google projected to spend $1.5B on computers, networks, and real estate in 2006 • Dozens of others are spending substantially • Paid for by advertising Declan Butler, Nature
User Discovery tools Analysis tools Science 2.0:E.g., Virtual Observatories Gateway Data Archives Figure: S. G. Djorgovski
Data Service @ uchicago.edu Science 2.0:E.g., Cancer Bioinformatics Grid <BPEL Workflow Doc> <Workflow Inputs> link BPEL Engine Analytic service @ duke.edu link link <Workflow Results> link Analytic service @ osu.edu caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al.
Users Discovery tools Analysis tools Data Archives Fig: S. G. Djorgovski The Two Dimensions of Science 2.0 • Decompose across network Clients integrate dynamically • Select & compose services • Select “best of breed” providers • Publish result as new services • Decouple resource & service providers Function Resource
Provisioning Technology Requirements: Integration & Decomposition Users • Service-oriented Gridinfrastructure • Provision physicalresources to support application workloads • Service-oriented applications • Wrap applications & data as services • Compose servicesinto workflows Composition Workflows Invocation ApplnService ApplnService “The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005
Globus SoftwareEnables Grid Infrastructure • Web service interfaces for behaviors relating to integration and decomposition • Primitives: resources, state, security • Services: execution, data movement, … • Open source software that implements those interfaces • In particular, Globus Toolkit (GT4) • All standard Web services • “Grid is a use case for Web services, focused on resource management”
Python Runtime C Runtime Java Runtime Open Source Grid Software Globus Toolkit v4 www.globus.org Data Replication CredentialMgmt Replica Location Grid Telecontrol Protocol Delegation Data Access & Integration Community Scheduling Framework WebMDS Reliable File Transfer CommunityAuthorization Workspace Management Trigger Authentication Authorization GridFTP Grid Resource Allocation & Management Index Security Data Mgmt Execution Mgmt Info Services CommonRuntime Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005
http://dev.globus.org Guidelines(Apache) Infrastructure(CVS, email,bugzilla, Wiki) Projects Include … dev.globus — Community Driven Improvement of Globus Software, NSF OCI
Hosted Science Services 1) Integrate services from external sources • Virtualize “services” from providers 2) Coordinate & compose • Create new services from existing ones Community Content Services Provider Services Capacity Provider Capacity “Service-Oriented Science”, Science, 2005
Cardiff AEI/Golm The Globus-BasedLIGO Data Grid LIGO Gravitational Wave Observatory Birmingham• Replicating >1 Terabyte/day to 8 sites >40 million replicas so far MTBF = 1 month www.globus.org/solutions
Data Replication Service • Pull “missing” files to a storage system Data Location Data Movement GridFTP Local ReplicaCatalog Replica LocationIndex Reliable File Transfer Service GridFTP Local Replica Catalog Replica LocationIndex Data Replication List of required Files Data Replication Service “Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005
Example: Biology Public PUMA Knowledge Base Information about proteins analyzed against ~2 million gene sequences Back OfficeAnalysis on Grid Millions of BLAST, BLOCKS, etc., onOSG and TeraGrid Natalia Maltsev et al.,http://compbio.mcs.anl.gov/puma2
Example:Earth System Grid • Climate simulation data • Per-collection control • Different user classes • Server-side processing • Implementation (GT) • Portal-based User Registration (PURSE) • PKI, SAML assertions • GridFTP, GRAM, SRM • >2000 users • >100 TB downloaded www.earthsystemgrid.org — DOE OASCR
+ + + + + + + = Example:Astro Portal Stacking Service • Purpose • On-demand “stacks” of random locations within ~10TB dataset • Challenge • Rapid access to 10-10K “random” files • Time-varying load • Solution • Dynamic acquisition of compute, storage Sloan Data S4 Web page or Web Service Joint work with Ioan Raicu & Alex Szalay
Preliminary Performance (TeraGrid, LAN GPFS) Joint work with Ioan Raicu & Alex Szalay
Spectral Acceleration Hazard Curve Strain Green Tensor Synthetic Seismogram Rupture Forecast Example: Cybershake Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast Hazard Map Tom Jordan et al., Southern California Earthquake Center
TeraGrid Storage TeraGrid Compute VO Scheduler Enlisting TeraGridResources Provenance Catalog Data Catalog Workflow Scheduler/Engine VO Service Catalog SCEC Storage 20 TB, 1.8 CPU-year Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute
Gigabytes Tarballs Journals Individuals Community codes Supercomputer centers Makefile Computational science Physical sciences Computational scientists NSF-funded Terabytes Services Wikis Communities Science gateways TeraGrid, OSG, campus Workflow Science as computation All sciences (& humanities) All scientists NSF-funded Science 1.0 Science 2.0
Science 2.0 Challenges • A need for new technologies, skills, & roles • Creating, publishing, hosting, discovering, composing, archiving, explaining … services • A need for substantial software development • “30-80% of modern astronomy projects is software”—S. G. Djorgovski • A need for more & different infrastructure • Computers & networks to host services • Can we leverage commercial spending? • To some extent, but not straightforward
Acknowledgements • Carl Kesselman for many discussions • Many colleagues, including those named on slides, for research collaborations and/or slides • Colleagues involved in the TeraGrid, Open Science Grid, Earth System Grid, caBIG, and other Grid infrastructures • Globus Alliance members for Globus software R&D • DOEOASCR, NSF OCI, & NIH for support
For More Information • Globus Alliance • www.globus.org • Dev.Globus • dev.globus.org • Open Science Grid • www.opensciencegrid.org • TeraGrid • www.teragrid.org • Background • www.mcs.anl.gov/~foster 2nd Edition www.mkp.com/grid2 Thanks for DOE, NSF, and NIH for research support!!