350 likes | 449 Views
Building Large-Scale Cyberinfrastructure Through Collaboration: Opportunities and Challenges. Cyberinfrastructure Workshop National Science Foundation Jan. 29-30, 2007. Paul Avery University of Florida avery@phys.ufl.edu. Open Science Grid: July 20, 2005.
E N D
Building Large-Scale Cyberinfrastructure Through Collaboration:Opportunities and Challenges Cyberinfrastructure WorkshopNational Science FoundationJan. 29-30, 2007 Paul AveryUniversity of Floridaavery@phys.ufl.edu Paul Avery
Open Science Grid: July 20, 2005 • Consortium of many organizations (multiple disciplines) • Production grid cyberinfrastructure • 75+ sites, 24,000+ CPUs:US, Korea, Brazil, Taiwan Paul Avery
The Open Science Grid Consortium Science projects & communities U.S. gridprojects LHC experiments Universityfacilities Regional andcampus grids OpenScienceGrid Educationcommunities Multi-disciplinaryfacilities ComputerScience Laboratorycenters Technologists(Network, HPC, …) Paul Avery
Motivation: Data Intensive Science • 21st century scientific discovery • Computationally & data intensive • Theory + experiment + simulation • Internationally distributed resources and collaborations • Dominant factor: data growth (1 petabyte = 1000 terabytes) • 2000 ~0.5 petabyte • 2007 ~10 petabytes • 2013 ~100 petabytes • 2020 ~1000 petabytes • Powerful cyberinfrastructure needed • Computation Massive, distributed CPU • Data storage & access Distributed hi-speed storage • Data movement International optical networks • Data sharing Global collaborations (100s – 1000s) • Software Managing all of the above How to collect, manage, access and interpret this quantity of data? Paul Avery
Open Science Grid Basics • Who • Comp. scientists, IT specialists, physicists, biologists, etc. • What • Common, shared computing and storage resources • High-speed production and research networks • Meeting place for research groups and facility organizations • Vision • Maintain and operate a premier distributed computing facility • Provide education and training opportunities in its use • Expand reach & capacity to meet needs of stakeholders • Dynamically integrate new resources and applications • Members and partners • Members: HPC facilities, campus, laboratory & regional grids • Partners: Interoperation with TeraGrid, EGEE, NorduGrid, etc. Paul Avery
2009 2007 2005 Community growth Data growth 2003 2001 Principal Science Drivers • High energy and nuclear physics • 100s of petabytes (LHC) 2007 • Several petabytes 2005 • LIGO (gravity wave search) • 0.5 - several petabytes 2002 • Digital astronomy • 10s of petabytes 2009 • 10s of terabytes 2001 • Other sciences coming forward • Bioinformatics (10s of petabytes) • Nanoscience • Environmental • … Paul Avery
OSG Virtual Organizations Paul Avery
OSG Virtual Organizations (2) Paul Avery
Defining the Scale of OSG:Experiments at Large Hadron Collider • 27 km Tunnel in Switzerland & France CMS TOTEM LHC @ CERN ALICE LHCb Search for • Origin of Mass • New fundamental forces • Supersymmetry • Other new particles • 2007 – ? ATLAS Paul Avery
Korea Russia UK FermiLab U Florida Caltech UCSD Maryland Iowa FIU OSG and LHC Global Grid • 5000 physicists, 60 countries • 10s of Petabytes/yr by 2009 • CERN / Outside = 10-20% CMS Experiment Online System CERN Computer Center 200 - 1500 MB/s Tier 0 10-40 Gb/s Tier 1 >10 Gb/s OSG Tier 2 2.5-10 Gb/s Tier 3 Tier 4 Physics caches PCs Paul Avery
Crucial Ingredients in Building OSG • Science “Push”: ATLAS, CMS, LIGO, SDSS • 1999: Foresaw overwhelming need for distributed cyberinfrastructure • Early funding: “Trillium” consortium • PPDG: $12M (DOE) (1999 – 2006) • GriPhyN: $12M (NSF) (2000 – 2006) • iVDGL: $14M (NSF) (2001 – 2007) • Supplements + new projects : UltraLight, CHEPREO, DISUN ($17M) • Social networks: ~150 people with many overlaps • Universities, labs, SDSC, foreign partners • Coordination: pooling resources, developing broad goals • Common middleware: Virtual Data Toolkit (VDT) • Multiple Grid deployments/testbeds using VDT • Unified entity when collaborating internationally • Historically, a strong driver for funding agency collaboration Paul Avery
Standing on the Shoulders of Giants • Previous NSF and DOE investments • NMI, Globus, Condor, MonALISA, … • DOEGrids Certificate Authority + KX509 infrastructure • Open source software components • Linux (thank you, Mr. Redhat) • Apache, GNU tools, myProxy, mySQL, OpenSSH, Perl, Python, Squid, Tcl, UberFTP, Wiki, etc. • Technologies created by members & partners • DRM, Gratia, GUMS, Netlogger, Pacman, Pegasus, PRIMA, pyGlobus, ROCKS, SRM, VDS, VOMS, … • Integration through the Virtual Data Toolkit (next slide) • Our unique contribution Paul Avery
Integration via Virtual Data Toolkit Member & partner projects Grid deployments Grid & networking testbeds Requirements Deployment, Feedback Prototyping & experiments • Other linkages • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Science, ENG, Education Communities Techniques & software Tech Transfer Globus, Condor, NMI, TeraGrid, OSG EGEE, WLCG, Asia, South America QuarkNet, CHEPREO, Digital Divide U.S.Grids International Outreach Paul Avery
Communications:International Science Grid This Week SGTW iSGTW • ~2 years • Diverse audience • >1000 subscribers www.isgtw.org Paul Avery
Project Challenges • Technical constraints • Commercial tools fall far short, require (too much) invention • Integration of advanced CI, e.g. networks (slides) • Financial constraints (slide) • Fragmented & short term funding injections (recent $30M/5 years) • Fragmentation of individual efforts • Distributed coordination and management • Tighter organization within member projects compared to OSG • Coordination of schedules & milestones • Many phone/video meetings, travel • Knowledge dispersed, few people have broad overview Paul Avery
Collaboration with National Lambda Railwww.nlr.net • Optical, multi-wavelength community owned or leased “dark fiber” (10 GbE) networks for R&E • Spawning state-wide and regional networks (FLR, SURA, LONI, …) Paul Avery
LHCNet: Transatlantic Link to CERN NSF/IRNC, DOE/ESnet (2006/2007) AsiaPac SEA ESnet Science Data Network (2nd Core – 30-50 Gbps,National Lambda Rail) Europe Europe Aus. BNL Japan Japan CHI NYC SNV DEN GEANT2 SURFNet IN2P3 DC FNAL MetropolitanAreaRings ESnet IP Core (≥10 Gbps) Aus. CERN ALB ATL SDG ESnet hubs ELP New ESnet hubs Metropolitan Area Rings Major DOE Office of Science Sites LHCNet Data Network (4 x 10 Gbps to the US) High-speed cross connects with Internet2/Abilene Production IP ESnet core, 10 Gbps enterprise IP traffic Science Data Network core, 40-60 Gbps circuit based transport 10Gb/s 10Gb/s 30Gb/s 2 x 10Gb/s Lab supplied Major international LHCNet Data Network NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2 Paul Avery
2000 2001 2002 2003 2004 2005 2006 2007 Funding & Milestones: 1999 – 2007 Grid Communications First US-LHCGrid Testbeds DISUN, $10M UltraLight, $2M GriPhyN, $12M Grid3 start LHC start OSG start iVDGL, $14M VDT 1.3 LIGO Grid VDT 1.0 CHEPREO, $4M Grid Summer Schools 2004, 2005, 2006 OSG, $30M NSF, DOE PPDG, $9.5M • Grid, networking projects • Large experiments • Education, outreach, training Digital Divide Workshops04, 05, 06 Paul Avery
Challenges from Diversity and Growth • Management of an increasingly diverse enterprise • Sci/Eng projects, organizations, disciplines as distinct cultures • Accommodating new member communities (expectations?) • Interoperation with other major grids • TeraGrid • International partners (EGEE, NorduGrid, etc.) • New models of interoperation and service, e.g. caBIG • Education, outreach and training • Training for researchers, students • … but also project PIs, program officers • Operating a rapidly growing cyberinfrastructure • Management of and access to rapidly increasing data stores (slide) • Monitoring, accounting, achieving high utilization (slide) • Scalability of support model (slide) Paul Avery
Rapid Cyberinfrastructure Growth: LHC • Meeting LHC service challenges & milestones • Participating in worldwide simulation productions Tier-2 2008: ~140,000 PCs Tier-1 3 GHz P4 ~ kSI2000 CERN Paul Avery
Jobs Snapshot: 90 Days 5000 simultaneous jobsfrom multiple VOs Nov Dec Jan Oct Paul Avery
OSG Operations Distributed model • Scalability! • VOs, sites, providers • Rigorous problemtracking & routing • Security • Provisioning • Monitoring • Reporting Partners with EGEE operations Paul Avery
Extra Slides Paul Avery
Cyberinfrastructure and Grids • Grid: Geographically distributed computing resources configured for coordinated use • Fabric: Physical resources & networks provide raw capability • Ownership: Resources controlled by owners and shared w/ others • Middleware: Software ties it all together: tools, services, etc. • Enhancing collaboration via transparent resource sharing US-CMS “Virtual Organization” Paul Avery
My Background and This Talk • Professor of Physics: High Energy Physics • CLEO experiment at Cornell (DOE) • CMS experiment at CERN (DOE) • Director of two NSF/ITR Grid projects • GriPhyN (2000-2006), iVDGL (2001-2007) • Co-P.I. of several others • Today: Experiences in Open Science Grid • Building and sustaining national-scale grid cyberinfrastructure • Serving multiple science research communities Paul Avery
Common Middleware: Virtual Data Toolkit NMI (NSF supported) RPMs Build & Test Condor pool (~100 computers,> 20 Op. Systems) Test Sources (CVS) Users … Pacman Cache Package Patching Build Binaries Build Binaries Test Contributors Globus, Condor, myProxy, … VDT: Package, test, deploy, support, upgrade, troubleshoot Paul Avery
LHC Global Collaborations CMS ATLAS • 2000 – 3000 physicists per experiment • USA is 20–31% of total Paul Avery
Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs ESnet Accepted Traffic 1990 – 2005Exponential Growth: Avg. +82%/yr for the Last 15 Years L. Cottrell 700 W. Johnston 2 x 10 Gbit/s 600 500 400 TERABYTES Per Month 300 Progressin Steps 200 100 • 2005 SLAC Traffic ~400 Mbps • Growth in steps (ESNet Limit):~ 10X/4 years Paul Avery
Evolving Science Requirements for Networks (DOE High Performance Network Workshop) See http://www.doecollaboratory.org/meetings/hpnpw/ Paul Avery
OSG Organization Paul Avery
e-Lab and i-Lab Projects • Subsumed under I2U2 program • www.i2u2.org/ Paul Avery
CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University Additional initiatives • CyberBridges • Global CyberBridges • Networking initiatives • Etc. • Physics Learning Center • CMS Research • Cyberinfrastructure • WHREN network (S. America) www.chepreo.org
Digital Divide Effort Background • ICFA/SCIC (Standing Committee on Inter-regional Connectivity) Themes • Global collaborations, Grids and addressing the Digital Divide • Focus on poorly connected regions • Brazil (2004), Korea (2005), Poland (2006) Paul Avery
Grid Summer Schools • Summer 2004, 2005, 2006 • 1 week @ South Padre Island, Texas • Lectures plus hands-on exercises to ~40 students • Students of differing backgrounds (physics + CS), minorities • Reaching a wider audience • Lectures, exercises, video, on web • More tutorials, 3-4/year • Students, postdocs, scientists • Agency specific tutorials Paul Avery