340 likes | 424 Views
Grid-related High Performance Middleware and Laboratories. Dr. Carl Kesselman Director Center for Grid Technologies. How do we solve problems?. Communities committed to common goals Virtual organizations Teams with heterogeneous members & capabilities
E N D
Grid-related High Performance Middleware and Laboratories Dr. Carl Kesselman Director Center for Grid Technologies
How do we solve problems? • Communities committed to common goals • Virtual organizations • Teams with heterogeneous members & capabilities • Distributed geographically and politically • No location/organization possesses all required skills and resources • Adapt as a function of the situation • Adjust membership, reallocate responsibilities, renegotiate resources EO Grid Middleware
The Grid Vision “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” • On-demand, ubiquitous access to computing, data, and services • New capabilities constructed dynamically and transparently from distributed services “When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder) EO Grid Middleware
A Little History(U.S. Perspective) • Early 90s • Gigabit testbeds, metacomputing • Mid to late 90s • Early experiments (e.g., I-WAY), software projects (e.g., Globus), application experiments • 2001 • Major application communities emerging • Major infrastructure deployments are underway • Rich technology base has been constructed • Global Grid Forum: >1000 people on mailing lists, 192 orgs at last meeting, 28 countries EO Grid Middleware
g g g g g g Selected Major Grid Projects EO Grid Middleware
g g g g g g Selected Major Grid Projects EO Grid Middleware
g g g g g g Selected Major Grid Projects EO Grid Middleware
g g Selected Major Grid Projects New Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS See also www.gridforum.org EO Grid Middleware
The Grid World: Current Status • Dozens of major Grid projects in scientific & technical computing/research & education • Considerable consensus on key concepts and technologies • Open source Globus Toolkit™ a de facto standard for major protocols & services • Far from complete or perfect, but out there, evolving rapidly, and large tool/user base • Industrial interest emerging rapidly • Opportunity: convergence of eScience and eBusiness requirements & technologies EO Grid Middleware
Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Layered Grid Architecture Application EO Grid Middleware
Globus Toolkit • Globus Toolkit is the source of many of the protocols described in “Grid architecture” • Adopted by almost all major Grid projects worldwide as a source of infrastructure • Open source, open architecture framework encourages community development • Active R&D program continues to move technology forward • Developers at ANL, USC/ISI, NCSA, LBNL, and other institutions www.globus.org
Globus ToolkitComponents Include … • Core protocols and services • Grid Security Infrastructure • Grid Resource Access & Management • MDS information & monitoring • GridFTP data access & transfer • Other services • Community Authorization Service • DUROC co-allocation service • Other Data Grid technologies • Replica catalog, replica management service
MDS-2 (Meta Directory Service) Soft state registration; enquiry Reliable remote invocation GSI (Grid Security Infrastruc-ture) User Reporter(registry +discovery) GIIS: GridInformationIndex Server (discovery) Gatekeeper(factory) Authenticate & create proxy credential Other GSI-authenticated remote service requests Create process Register User User process #1 process #2 Other service(e.g. GridFTP) Proxy Proxy #2 GRAM (Grid Resource Allocation & Management) The Globus Toolkit in One Slide • Grid protocols (GSI, GRAM, …) enable resource sharing within virtual orgs; toolkit provides reference implementation ( = Globus Toolkit services) • Protocols (and APIs) enable other tools and services for membership, discovery, data mgmt, workflow, … EO Grid Middleware
Job manager Job manager Globus Toolkit Structure Service naming Soft state management Reliable invocation GRAM MDS GridFTP MDS ??? Notification GSI GSI GSI Other Service or Application Compute Resource Data Resource Lots of good mechanisms, but (with the exception of GSI) not that easily incorporated into other systems EO Grid Middleware
NSF Middleware Initiative • NSF Funded Project to build national middleware infrastructure • USC/ISI, SDSC, U. Wisc., ANL, NCSA, I2 • Software Integration (NMI Software Releases) • Interoperability • Testing • Install, Configure, Manage • University Campus Infrastructure Integration • Campus Authentication / GSI • Enterprise Directories / GSI and MDS • Use NMI as Teragrid Baseline • Specialize for Teragrid unique aspects (e.g. Viz resources) EO Grid Middleware
NMI-R1 Software Components • Globus Toolkit • Condor-G • Network Weather Service • KX.509 / KCA • Certificate Profile Maker • Pubcookie • Grid Packaging Tools EO Grid Middleware
U.S. GRIDS Center • GRIDS = Grid Research, Integration, Deployment, & Support • NSF-funded center to provide • State-of-the-art middleware infrastructure to support national-scale collaborative science and engineering • Integration platform for experimental middleware technologies • ISI, NCSA, SDSC, UC, UW + commercial partners www.grids-center.org EO Grid Middleware
NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other On-demand access to experiments, data streams, computing, archives, collaboration Network for Earthquake Eng. Simulation www.neesgrid.org: Argonne, Michigan, NCSA, UIUC, USC EO Grid Middleware
SCEC Modeling Environment KNOWLEDGE REPRESENTATION & REASONING Knowledge Server Knowledge base access, Inference Translation Services Syntactic & semantic translation Knowledge Base Ontologies Curated taxonomies, Relations & constraints Pathway Models Pathway templates, Models of simulation codes DIGITAL LIBRARIES Navigation & Queries Versioning, Topic maps Mediated Collections Federated access KNOWLEDGE ACQUISITION Acquisition Interfaces Dialog planning, Pathway construction strategies Pathway Assembly Template instantiation, Resource selection, Constraint checking Code Repositories FSM RDM AWM SRM Users Data & Simulation Products Data Collections GRID Pathway Execution Policy, Data ingest, Repository access Grid Services Compute & storage management, Security Pathway Instantiations Storage Computing EO Grid Middleware
Data Intensive Physical Sciences • High energy & nuclear physics • Including new experiments at CERN • Gravity wave searches • LIGO, GEO, VIRGO • Time-dependent 3-D systems (simulation, data) • Earth Observation, climate modeling • Geophysics, earthquake modeling • Fluids, aerodynamic design • Pollutant dispersal scenarios • Astronomy: Digital sky surveys EO Grid Middleware
National Virtual Observatory http://virtualsky.org/ from Caltech CACR Caltech Astronomy Microsoft Research Virtual Sky has 140,000,000 tiles 140 Gbyte Change scale Change theme Optical (DPOSS) Xray (ROSAT) theme EO Grid Middleware Coma cluster
Grid Physics Network (GriPhyN) Enabling R&D for advanced data grid systems, focusing in particular on Virtual Data concept ATLAS CMS LIGO SDSS www.griphyn.org; see also www.ppdg.net, www.eu-datagrid.org EO Grid Middleware
~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Data Grids for High Energy Physics Image courtesy Harvey Newman, Caltech EO Grid Middleware
Laser Interferometric Gravitational wave Observatory Listening to Collisions of Black Holes and Neutron Stars EO Grid Middleware
LIGO Hardware EO Grid Middleware
Data GridFTP HPSS Grid LIGO Architecture GriPhyN LDAS Text request Request Manager Gatekeeper (GRAM) Local Disk Science Algorithms Software Collaboratory Parallel Computing Replica Catalog Replica Management Transformation Catalog Virtual Data Catalog Clients eg Web, Script, Agent GridFTP other LDAS Condor jobs Virtual Data Request Data Movement Globus RPC EO Grid Middleware
iVDGL: A Global Grid Laboratory “We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.” From NSF proposal, 2001 • International Virtual-Data Grid Laboratory • A global Grid laboratory (US, Europe, Asia, South America, …) • A place to conduct Data Grid tests “at scale” • A mechanism to create common Grid infrastructure • A laboratory for other disciplines to perform Data Grid tests • A focus of outreach efforts to small institutions • U.S. part funded by NSF (2001-2006) • $13.7M (NSF) + $2M (matching) EO Grid Middleware
iVDGL Components • Computing resources • 2 Tier1 laboratory sites (funded elsewhere) • 7 Tier2 university sites software integration • 3 Tier3 university sites outreach effort • Networks • USA (TeraGrid, Internet2, ESNET), Europe (Géant, …) • Transatlantic (DataTAG), Transpacific, AMPATH?, … • Grid Operations Center (GOC) • Joint work with TeraGrid on GOC development • Computer Science support teams • Support, test, upgrade GriPhyN Virtual Data Toolkit • Education and Outreach • Coordination, management EO Grid Middleware
iVDGL Components (cont.) • High level of coordination with DataTAG • Transatlantic research network (2.5 Gb/s) connecting EU & US • Current partners • TeraGrid, EU DataGrid, EU projects, Japan, Australia • Experiments/labs requesting participation • ALICE, CMS-HI, D0, BaBar, BTEV, PDC (Sweden) EO Grid Middleware
Tier1 (FNAL) Proto-Tier2 Tier3 university Initial US-iVDGL Data Grid SKC BU Wisconsin PSU BNL Fermilab Hampton Indiana JHU Caltech UCSD Florida Brownsville Other sites to be added in 2002 EO Grid Middleware
Tier0/1 facility Tier2 facility Tier3 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link iVDGL Map (2002-2003) Surfnet DataTAG Later • Brazil • Chile? • Pakistan • Russia • China EO Grid Middleware
The TeraGrid: Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 Caltech Argonne External Networks External Networks NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree EO Grid Middleware
Summary • Grid infrastructure is becoming widespread • Major deployment based on common technology • Significant new deployment activities • Consensus building mechanisms in place • Global Grid Forum (www.gridforum.org) • Industrial buy in starting • IBM, Entropia, more to come EO Grid Middleware
For More Information • Book (Morgan Kaufman) • www.mkp.com/grids • Globus • www.globus.org • “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” • GRIDS Center • www.grids-center.org • Grid Forum • www.gridforum.org EO Grid Middleware