580 likes | 664 Views
Enabling eScience: Open Software, Standards, Infrastructure. Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster. UK eScience Meeting, Nottingham, September 2, 2004. The Grid Meets the BBC.
E N D
Enabling eScience:Open Software, Standards, Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster UK eScience Meeting, Nottingham, September 2, 2004
The Grid Meets the BBC “The Grid is an international project that looks in detail at a terrorist cell operating on a global level and a team of American and British counter-terrorists who are tasked to stop it” Gareth Neame, BBC's head of drama
A Better Characterization? “The Grid is an international project that looks in detail at scientific collaborations operating on a global level and a team of computer scientists who are tasked to enable them” But perhaps not as telegenic?
eScience & Grid: 6 Theses • Scientific progress depends increasingly on large-scale distributed collaborative work • Such distributed collaborative work raises challenging problems of broad importance • Any effective attack on those problems must involve close engagement with applications • Open software & standards are key to producing & disseminating required solutions • Shared software & service infrastructure are essential application enablers • A cross-disciplinary community of technology producers & consumers is needed
Infra- structure Software, Standards Computer Science Discipline Advances Implication: A Problem-Driven, Collaborative R&D Methodology Build Deploy Deploy Apply Apply Design Apply Apply Analyze
Overview • How are we doing? • Software • Standards • Infrastructure • Community • An advertorial, and request for input • Globus Toolkit version 4 • Summary
Overview • How are we doing? • Software • Standards • Infrastructure • Community • An advertorial, and request for input • Globus Toolkit version 4 • Summary
Why Open Software Matters • eScience requires sophisticated functionality but is a small “market” • Commercial software does not meet needs • Open software can help jumpstart development by reducing barriers to entry • Encourage adoption of common approaches to key technical problems • Enable broad Grid technology ecosystem • A basis for international cooperation • A basis for cooperation with industry
“Open Software” isUltimately about Community • Contributors: design, development, packaging, testing, documentation, training, support • United by common architectural perspective • Users • May be major contributors via, e.g., testing • Governance structure • To determine how the software evolves • Processes for coordinating all these activities • Packaging, testing, reporting, … • An ecosystem of complementary components • Enabled by appropriately open architecture
Not a monoculture … … or Cambrian explosion … but a web of components “Ecosystem”?
E.g., Globus Alliance & Toolkit(Argonne, USC/ISI, Edinburgh, PDC, NCSA) • An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit • Design, engineering, support, governance • Academic Affiliates make major contributions • EU: CERN, MPI, Poznan, INFN, etc. • AP: AIST, TIT, Monash, etc. • US: SDSC, TACC, UCSB, UW, etc. • Significant industrial contributions & adoption • 1000s of users worldwide, many contribute
Broader Ecosystem*:Example Complementary Projects • NSF Middleware Initiative • Packaging, testing, additional components • Virtual Data Toolkit (GriPhyN + PPDG) • GT, Condor, Virtual Data System, etc. • EGEE and “gLite” • Close collaboration with Globus + Condor • TeraGrid, Earth System Grid, NEESgrid, … • Consume and produce components • Open Middleware Infrastructure Institute • Collaboration on components, testing, etc. * See tutorial by Lee Liming: AHM, GGF, SC’2004.
Broader Ecosystem:E.g., NMI Distributed Test Facility (NSF Middleware Initiative’s GRIDS Center)
How Grid Software Works: NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to reducing vulnerability to catastrophic earthquakes
Building a NEES Collaboratory:What the User Wants Secure, reliable, on-demand access to data, software, people, and other resources (ideally all via a Web Browser)
ComputeServer SimulationTool ComputeServer WebBrowser WebPortal RegistrationService Camera TelepresenceMonitor DataViewerTool Camera Database service ChatTool DataCatalog Database service CredentialRepository Database service Certificate authority How it Really Happens(A Simplified View) Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces
How it Really Happens(without Grid Software) A ComputeServer SimulationTool B ComputeServer WebBrowser WebPortal RegistrationService Camera TelepresenceMonitor DataViewerTool Camera C Database service ChatTool DataCatalog D Database service CredentialRepository E Database service Certificate authority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces
How it Really Happens(with Grid Software) GlobusGRAM ComputeServer SimulationTool GlobusGRAM ComputeServer WebBrowser CHEF Globus IndexService Camera TelepresenceMonitor DataViewerTool Camera GlobusDAI Database service CHEF ChatTeamlet GlobusMCS/RLS GlobusDAI Database service MyProxy GlobusDAI Database service CertificateAuthority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces
NEESgridMultisite OnlineSimulation Test(July 2003) Illinois (simulation) Colorado Illinois
NEESgrid Summary • A successful “turn of the crank” • S/w produced & deployed on time & budget, and new applications enabled • A producer as well as consumer of Grid s/w • Many sociopolitical “learning opportunities” • 4 tasks: develop s/w, engineer s/w, elicit requirements, educate community • Experiment-driven deployment™ was key • “No victory is final”: challenges remain • Hand off s/w to separate operations team • Sharing of facilities, data: politically charged
Software: Summary • Good software arises from trying to solve real problems in real projects—& then generalizing • E.g., Globus: security, job submission/mgmt, data movement, monitoring, etc. • The result is solutions that make sense within a wide variety of applications • Solve real problems, but not every problem • Resulting software is not a “turnkey” solution for any significant application • “Turnkey” solutions require integration • Factoring can extract higher-level “solutions”
Example “Solutions” • Portal-based User Registration System (PURSE) • Source: Earth System Grid, PDC • Web-based A&A management • Lightweight Director Replicator • Source: LIGO • Data replication management • Workflow execution & management • DAGman + Condor-G + Globus components • Source: Virtual data toolkit • Service monitoring & fault detection • Source: Earth System Grid
Overview • How are we doing? • Software • Standards • Infrastructure • Community • An advertorial, and request for input • Globus Toolkit version 4 • Summary
“Standards”: Examples of Success • Grid Security Infrastructure • Broadly used, multiple implns, WS-Security • Rich Grid security ecosystem, with linkages to MyProxy, OTP, KX509, Shibboleth, … • GridFTP • Broadly used, multiple implementations • WSDL/SOAP • Facilitating service-oriented architectures • OGSI/WSRF • Many find encode useful patterns, behaviors
Standards: Status • Open Grid Services Architecture (OGSA)—the lighthouse by which we steer • Defines requirements & priorities • But far from complete • W3C, OASIS, GGF, DMTF, IETF • Good things are happening in many areas • WS-Agreement, DAIS, SRM, …, … • But for those building systems today? • Problem areas: monitoring, policy, data, etc. • Ad hoc approaches: will cost us big later “Experiment-driven deployment” on intl. scale to drive interoperability of infrastructure, code
Overview • How are we doing? • Software • Standards • Infrastructure • Community • An advertorial, and request for input • Globus Toolkit version 4 • Summary
Infrastructure • Broadly deployed services in support of virtual organization formation and operation • Authentication, authorization, discovery, … • Services, software, and policies enabling on-demand access to important resources • Computers, databases, networks, storage, software services,… • Operational support for 24x7 availability • Integration with campus infrastructures • Distributed, heterogeneous, instrumented systems can be wonderful CS testbeds
Grid2003: An Operational Grid • 28 sites (2100-2800 CPUs) & growing • 400-1300 concurrent jobs • 8 substantial applications + CS experiments • Running since October 2003 Korea http://www.ivdgl.org/grid2003
Grid2003 Software Stack(“Virtual Data Toolkit”) Application Chimera Virtual Data System DAGMan and Condor-G Globus Toolkit GSI, GRAM, GridFTP, etc. Site schedulers and file systems Clusters and storage systems Three levels of deployment: + Site services: GRAM, GridFTP, etc. + Global & virtual organization services+ IGOC: iVDGL Grid Operations Center
Grid2003 Applications To Date • CMS proton-proton collision simulation • ATLAS proton-proton collision simulation • LIGO gravitational wave search • SDSS galaxy cluster detection • ATLAS interactive analysis • BTeV proton-antiproton collision simulation • SnB biomolecular analysis • GADU/Gnare genone analysis • Various computer science experiments www.ivdgl.org/grid2003/applications
Example Grid3 Application:NVO Mosaic Construction • Construct custom mosaics on demand from multiple data sources • User specifies projection, coordinates, size, rotation, spatial sampling NVO/NASA Montage: A small (1200 node) workflow Work by Ewa Deelman et al., USC/ISI and Caltech
Next Step:Open Science Grid • U.S. (international?) consortium to provide services to a broad set of sciences • Grid3 as a starting point, expanding to include many more sites • A major focus is the MOU/SLA structure required to sustain & scale operations • Resource providers • Resource consumers • Virtual organizations • We hope to collaborate with TeraGrid, EGEE, UK NGS, etc.
Infrastructure: Summary • Encouraging progress • Real understanding of how to operate Grid infrastructures is emerging • Production infrastructures are appearing and are being relied upon for real science • Significant areas of concern remain • Security is going to get harder • International interoperability still elusive • We haven’t got the right model for sustained infrastructure development & support
Overview • How are we doing? • Software • Standards • Infrastructure • Community • An advertorial, and request for input • Globus Toolkit version 4 • Summary
Community • Big picture is extremely positive • The “eScience”/“Grid” community is large, enthusiastic, smart, and diverse • Significant exchange of ideas, software, personnel, experiences • Real application-CS cooperation • We can do better in various specific areas • Not clear we’re always focusing on the real problems: often viewed as “mundane”?? • CS community could be even more engaged • Software development a community effort
Overview • How are we doing? • Software • Standards • Infrastructure • Community • An advertorial, and request for input • Globus Toolkit version 4 • Summary
What’s New inGT 4.0 (January 31, 2005) • For all: • Additions: data, security, execution, XIO, … • Improved packaging, testing, performance, usability, doc, standards compliance (phew) • WS components ready for broader use • For the end user: • More complementary tools & solutions • C, Java, Python APIs; command line tools • For the developer: • Java (Axis/Tomcat) hosting greatly improved • Python (pyGlobus) hosting for the first time
Apache Axis Web Services Container • Good news for Java WS developers: GT4.0 works with standard Axis* and Tomcat* • GT provides Axis-loadable libraries, handlers • Includes useful behaviors such as inspection, notification, lifetime mgmt (WSRF) • Others implement GRAM, etc. • Major Globus contributions to Apache • ~50% of WS-Addressing code • ~15% of WS-Security code • Many bug fixes • WSRF code a possible next contribution GT bits App bits Security Addressing Axis * Modulo Axis and Tomcat release cycle issues
Standards Compliance • Web services: WS-I compliance • All interfaces support WS-I Basic Profile, modulo use of WS-Addressing • Security a) WS-I Basic Security Profile (plaintext) b) IETF RFC 3820 Proxy Certificate • GridFTP • GGF GFD 020 • Others in progress & being tracked • WSRF (OASIS), WS-Addressing (W3C), OGSA-DAI (GGF), RLS (GGF)
Globus Ecosystem(Just a Few Examples Listed Here) • Tools provide higher-level functionality • Nimrod-G, MPICH-G2, Condor-G, Ninf-G • NTCP telecontrol • GT4IDE Eclipse IDE • Packages integrate GT with other s/w • VDT, NMI, CTSS, NEESgrid, ESG • Solutions package a set of functionality • VO management, monitoring, replica mgmt • Documentation, e.g. • Borja Sotomayor’s tutorial
We’d Getting a Lot of Help,But Could do with A Lot More • Testing and feedback • Users, developers, deployers: plan to use the software now & provide feedback • Tell us what is missing, what performance you need, what interfaces & platforms, … • Ideally, also offer to help meet needs (-: • Related software, solutions, documentation • Adapt your tools to use GT4 • Develop new GT4-based components • Develop GT4-based solutions • Develop documentation components
Overview • How are we doing? • Software • Standards • Infrastructure • Community • An advertorial, and request for input • Globus Toolkit version 4 • Summary
eScience & Grid: 6 Theses • Scientific progress depends increasingly on large-scale distributed collaborative work • Such distributed collaborative work raises challenging problems of broad importance • Any effective attack on those problems must involve close engagement with applications • Open software & standards are key to achieving a critical mass of contributors • Shared software & service infrastructure are essential application enablers • A cross-disciplinary community of technology producers & consumers is vital
Overall, We are Doing Well • Communities & individuals are, increasingly, using the Grid to advance their science • Broad consensus on many key architecture concepts, if not always their implementation • Significant base of open source software, widely used in applications & infrastructure • Service-oriented arch facilitates cooperation on software development & code reuse • Grid standards are making a difference on a daily basis: e.g., GSI, GridFTP
Overall, We are Doing Well (2) • A real understanding of how to operate Grid infrastructures is emerging • Production infrastructures are appearing and are being relied upon for real science • Productive international cooperation is occurring at many levels • A vibrant community has formed and shows no signs of slowing down • Real connections have been formed between computer science & applications
Infra- structure Software, Standards Computer Science Discipline Advances Problem-Driven, Collaborative R&D Methodology Build Deploy Deploy Global Community Apply Apply Design Apply Apply Analyze