470 likes | 612 Views
Open Science Grid Linking Universities and Laboratories in National CyberInfrastructure. Paul Avery University of Florida avery@phys.ufl.edu. SURA Infrastructure Workshop Austin, TX December 7, 2005. Bottom-up Collaboration: “Trillium”. Trillium = PPDG + GriPhyN + iVDGL
E N D
Open Science Grid Linking Universities and Laboratories in National CyberInfrastructure Paul Avery University of Florida avery@phys.ufl.edu SURA Infrastructure WorkshopAustin, TXDecember 7, 2005 Paul Avery
Bottom-up Collaboration: “Trillium” • Trillium = PPDG + GriPhyN + iVDGL • PPDG: $12M (DOE) (1999 – 2006) • GriPhyN: $12M (NSF) (2000 – 2005) • iVDGL: $14M (NSF) (2001 – 2006) • ~150 people with large overlaps between projects • Universities, labs, foreign partners • Strong driver for funding agency collaborations • Inter-agency: NSF – DOE • Intra-agency: Directorate – Directorate, Division – Division • Coordinated internally to meet broad goals • CS research, developing/supporting Virtual Data Toolkit (VDT) • Grid deployment, using VDT-based middleware • Unified entity when collaborating internationally Paul Avery
Common Middleware: Virtual Data Toolkit VDT NMI Test Sources (CVS) Build Binaries Build & Test Condor pool 22+ Op. Systems Pacman cache Package Patching RPMs Build Binaries GPT src bundles Build Binaries Test Many Contributors A unique laboratory for testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software! Paul Avery
VDT Growth Over 3 Years (1.3.8 now) www.griphyn.org/vdt/ VDT 1.1.8 First real use by LCG VDT 1.0 Globus 2.0b Condor 6.3.1 # of components VDT 1.1.11 Grid3 VDT 1.1.7 Switch to Globus 2.2 Paul Avery
Globus 3.2.1 Condor 6.7.6 RLS 3.0 ClassAds 0.9.7 Replica 2.2.4 DOE/EDG CA certs ftsh 2.0.5 EDG mkgridmap EDG CRL Update GLUE Schema 1.0 VDS 1.3.5b Java Netlogger 3.2.4 Gatekeeper-Authz MyProxy1.11 KX509 System Profiler GSI OpenSSH 3.4 Monalisa 1.2.32 PyGlobus 1.0.6 MySQL UberFTP 1.11 DRM 1.2.6a VOMS 1.4.0 VOMS Admin 0.7.5 Tomcat PRIMA 0.2 Certificate Scripts Apache jClarens 0.5.3 New GridFTP Server GUMS 1.0.1 Components of VDT 1.3.5 Paul Avery
VDT Collaborative Relationships Partner science projects Partner networking projects Partner outreach projects Requirements Deployment, Feedback Prototyping & experiments • Other linkages • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Science, ENG, Education Communities Techniques & software Tech Transfer Globus, Condor, NMI, TeraGrid, OSG EGEE, WLCG, Asia, South America QuarkNet, CHEPREO, Digital Divide U.S.Grids International Outreach Paul Avery
Major Science Driver:Large Hadron Collider (LHC) @ CERN • 27 km Tunnel in Switzerland & France TOTEM CMS ALICE LHCb Search for • Origin of Mass • New fundamental forces • Supersymmetry • Other new particles • 2007 – ? ATLAS Paul Avery
LHC: Petascale Global Science • Complexity: Millions of individual detector channels • Scale: PetaOps (CPU), 100s of Petabytes (Data) • Distribution: Global distribution of people & resources BaBar/D0 Example - 2004 700+ Physicists 100+ Institutes 35+ Countries CMS Example- 2007 5000+ Physicists 250+ Institutes 60+ Countries Paul Avery
Korea Russia UK USA U Florida Caltech UCSD Iowa FIU Maryland LHC Global Data Grid (2007+) • 5000 physicists, 60 countries • 10s of Petabytes/yr by 2008 • 1000 Petabytes in < 10 yrs? CMS Experiment Online System CERN Computer Center 150 - 1500 MB/s Tier 0 10-40 Gb/s Tier 1 >10 Gb/s Tier 2 2.5-10 Gb/s Tier 3 Tier 4 Physics caches PCs Paul Avery
Grid3 and Open Science Grid Paul Avery
Grid3: A National Grid Infrastructure • October 2003 – July 2005 • 32 sites, 3,500 CPUs: Universities + 4 national labs • Sites in US, Korea, Brazil, Taiwan • Applications in HEP, LIGO, SDSS, Genomics, fMRI, CS Brazil www.ivdgl.org/grid3 Paul Avery
Grid3 Lessons Learned • How to operate a Grid as a facility • Tools, services, error recovery, procedures, docs, organization • Delegation of responsibilities (Project, VO, service, site, …) • Crucial role of Grid Operations Center (GOC) • How to support people people relations • Face-face meetings, phone cons, 1-1 interactions, mail lists, etc. • How to test and validate Grid tools and applications • Vital role of testbeds • How to scale algorithms, software, process • Some successes, but “interesting” failure modes still occur • How to apply distributed cyberinfrastructure • Successful production runs for several applications Paul Avery
http://www.opensciencegrid.org Paul Avery
Open Science Grid: July 20, 2005 • Production Grid: 50+ sites, 15,000 CPUs “present”(available but not at one time) • Sites in US, Korea, Brazil, Taiwan • Integration Grid: 10-12 sites Taiwan, S.Korea Sao Paolo Paul Avery
OSG Operations Snapshot November 7: 30 days Paul Avery
OSG Participating Disciplines Paul Avery
OSG Grid Partners Paul Avery
Example of Partnership:WLCG and EGEE Paul Avery
OSG Technical Groups & Activities • Technical Groups address and coordinate technical areas • Propose and carry out activities related to their given areas • Liaise & collaborate with other peer projects (U.S. & international) • Participate in relevant standards organizations. • Chairs participate in Blueprint, Integration and Deployment activities • Activities are well-defined, scoped tasks contributing to OSG • Each Activity has deliverables and a plan • … is self-organized and operated • … is overseen & sponsored by one or more Technical Groups TGs and Activities are where the real work gets done Paul Avery
OSG Technical Groups (deprecated!) Paul Avery
OSG Activities Paul Avery
OSG Integration Testbed:Testing & Validating Middleware Taiwan Brazil Korea Paul Avery
Networks Paul Avery
Evolving Science Requirements for Networks (DOE High Performance Network Workshop) See http://www.doecollaboratory.org/meetings/hpnpw/ Paul Avery
UltraLight Integrating Advanced Networking in Applications http://www.ultralight.org 10 Gb/s+ network • Caltech, UF, FIU, UM, MIT • SLAC, FNAL • Int’l partners • Level(3), Cisco, NLR Paul Avery
Education Training Communications Paul Avery
iVDGL, GriPhyN Education/Outreach Basics • $200K/yr • Led by UT Brownsville • Workshops, portals, tutorials • Partnerships with QuarkNet, CHEPREO, LIGO E/O, … Paul Avery
Grid Training Activities • June 2004: First US Grid Tutorial (South Padre Island, Tx) • 36 students, diverse origins and types • July 2005: Second Grid Tutorial (South Padre Island, Tx) • 42 students, simpler physical setup (laptops) • Reaching a wider audience • Lectures, exercises, video, on web • Students, postdocs, scientists • Coordination of training activities • “Grid Cookbook” (Trauner & Yafchak) • More tutorials, 3-4/year • CHEPREO tutorial in 2006? Paul Avery
QuarkNet/GriPhyN e-Lab Project http://quarknet.uchicago.edu/elab/cosmic/home.jsp Paul Avery
CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University • Physics Learning Center • CMS Research • iVDGL Grid Activities • AMPATH network (S. America) • Funded September 2003 • $4M initially (3 years) • MPS, CISE, EHR, INT
Grids and the Digital Divide Background • World Summit on Information Society • HEP Standing Committee on Inter-regional Connectivity (SCIC) Themes • Global collaborations, Grids and addressing the Digital Divide • Focus on poorly connected regions • Brazil (2004), Korea (2005) Paul Avery
Science Grid Communications Broad set of activities • (Katie Yurkewicz) • News releases, PR, etc. • Science Grid This Week • OSG Newsletter • Not restricted to OSG www.interactions.org/sgtw Paul Avery
2000 2001 2002 2003 2004 2005 2006 2007 Grid Timeline First US-LHCGrid Testbeds Grid Communications Grid3 operations GriPhyN, $12M UltraLight, $2M Start of LHC LIGO Grid DISUN, $10M iVDGL, $14M CHEPREO, $4M OSG operations VDT 1.0 Grid Summer Schools PPDG, $9.5M Digital Divide Workshops Paul Avery
Future of OSG CyberInfrastructure • OSG is a unique national infrastructure for science • Large CPU, storage and network capability crucial for science • Supporting advanced middleware • Long-term support of the Virtual Data Toolkit (new disciplines & international collaborations • OSG currently supported by a “patchwork” of projects • Collaborating projects, separately funded • Developing workplan for long-term support • Maturing, hardening facility • Extending facility to lower barriers to participation • Oct. 27 presentation to DOE and NSF Paul Avery
OSG Consortium Meeting: Jan 23-25 • University of Florida (Gainesville) • About 100 – 120 people expected • Funding agency invitees • Schedule • Monday Morning: Applications plenary (rapporteurs) • Monday Afternoon: Partner Grid projects plenary • Tuesday Morning: Parallel • Tuesday Afternoon: Plenary • Wednesday Morning: Parallel • Wednesday Afternoon: Plenary • Thursday: OSG Council meeting Taiwan, S.Korea Sao Paolo Paul Avery
Disaster Planning Emergency Response Paul Avery
Grids and Disaster Planning / Emergency Response • Inspired by recent events • Dec. 2004 tsunami in Indonesia • Aug. 2005 Katrina hurricane and subsequent flooding • (Quite different time scales!) • Connection of DP/ER to Grids • Resources to simulate detailed physical & human consequences of disasters • Priority pooling of resources for a societal good • In principle, a resilient distributed resource • Ensemble approach well suited to Grid/cluster computing • E.g., given a storm’s parameters & errors, bracket likely outcomes • Huge number of jobs required • Embarrassingly parallel Paul Avery
DP/ER Scenarios • Simulating physical scenarios • Hurricanes, storm surges, floods, forest fires • Pollutant dispersal: chemical, oil, biological and nuclear spills • Disease epidemics • Earthquakes, tsunamis • Nuclear attacks • Loss of network nexus points (deliberate or side effect) • Astronomical impacts • Simulating human responses to these situations • Roadways, evacuations, availability of resources • Detailed models (geography, transportation, cities, institutions) • Coupling human response models to specific physical scenarios • Other possibilities • “Evacuation” of important data to safe storage Paul Avery
DP/ER and Grids: Some Implications • DP/ER scenarios are not equally amenable to Grid approach • E.g., tsunami vs hurricane-induced flooding • Specialized Grids can be envisioned for very short response times • But all can be simulated “offline” by researchers • Other “longer term” scenarios • ER is an extreme example of priority computing • Priority use of IT resources is common (conferences, etc) • Is ER priority computing different in principle? • Other implications • Requires long-term engagement with DP/ER research communities • (Atmospheric, ocean, coastal ocean, social/behavioral, economic) • Specific communities with specific applications to execute • Digital Divide: resources to solve problems of interest to 3rd World • Forcing function for Grid standards? • Legal liabilities? Paul Avery
Open Science Grid www.opensciencegrid.org Grid3 www.ivdgl.org/grid3 Virtual Data Toolkit www.griphyn.org/vdt GriPhyN www.griphyn.org iVDGL www.ivdgl.org PPDG www.ppdg.net CHEPREO www.chepreo.org UltraLight www.ultralight.org Globus www.globus.org Condor www.cs.wisc.edu/condor WLCG www.cern.ch/lcg EGEE www.eu-egee.org Grid Project References Paul Avery
Extra Slides Paul Avery
Grid3 Use by VOs Over 13 Months Paul Avery
CMS: “Compact” Muon Solenoid Inconsequential humans Paul Avery
LHC: Beyond Moore’s Law LHC CPU Requirements Moore’s Law (2000) Paul Avery
Grids and Globally Distributed Teams • Non-hierarchical: Chaotic analyses + productions • Superimpose significant random data flows Paul Avery
Sloan Data Galaxy cluster size distribution Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN Paul Avery
Cardiff AEI/Golm • The LIGO Scientific Collaboration (LSC)and the LIGO Grid • LIGO Grid: 6 US sites + 3 EU sites (Cardiff/UK, AEI/Germany) Birmingham• * LHO, LLO: LIGO observatory sites * LSC: LIGO Scientific Collaboration Paul Avery