1 / 35

Paul Avery University of Florida avery@phys.ufl.edu

Building Large-Scale Cyberinfrastructure Through Collaboration: Opportunities and Challenges. Cyberinfrastructure Workshop National Science Foundation Jan. 29-30, 2007. Paul Avery University of Florida avery@phys.ufl.edu. Open Science Grid: July 20, 2005.

shalin
Download Presentation

Paul Avery University of Florida avery@phys.ufl.edu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Large-Scale Cyberinfrastructure Through Collaboration:Opportunities and Challenges Cyberinfrastructure WorkshopNational Science FoundationJan. 29-30, 2007 Paul AveryUniversity of Floridaavery@phys.ufl.edu Paul Avery

  2. Open Science Grid: July 20, 2005 • Consortium of many organizations (multiple disciplines) • Production grid cyberinfrastructure • 75+ sites, 24,000+ CPUs:US, Korea, Brazil, Taiwan Paul Avery

  3. The Open Science Grid Consortium Science projects & communities U.S. gridprojects LHC experiments Universityfacilities Regional andcampus grids OpenScienceGrid Educationcommunities Multi-disciplinaryfacilities ComputerScience Laboratorycenters Technologists(Network, HPC, …) Paul Avery

  4. Motivation: Data Intensive Science • 21st century scientific discovery • Computationally & data intensive • Theory + experiment + simulation • Internationally distributed resources and collaborations • Dominant factor: data growth (1 petabyte = 1000 terabytes) • 2000 ~0.5 petabyte • 2007 ~10 petabytes • 2013 ~100 petabytes • 2020 ~1000 petabytes • Powerful cyberinfrastructure needed • Computation Massive, distributed CPU • Data storage & access Distributed hi-speed storage • Data movement International optical networks • Data sharing Global collaborations (100s – 1000s) • Software Managing all of the above How to collect, manage, access and interpret this quantity of data? Paul Avery

  5. Open Science Grid Basics • Who • Comp. scientists, IT specialists, physicists, biologists, etc. • What • Common, shared computing and storage resources • High-speed production and research networks • Meeting place for research groups and facility organizations • Vision • Maintain and operate a premier distributed computing facility • Provide education and training opportunities in its use • Expand reach & capacity to meet needs of stakeholders • Dynamically integrate new resources and applications • Members and partners • Members: HPC facilities, campus, laboratory & regional grids • Partners: Interoperation with TeraGrid, EGEE, NorduGrid, etc. Paul Avery

  6. 2009 2007 2005 Community growth Data growth 2003 2001 Principal Science Drivers • High energy and nuclear physics • 100s of petabytes (LHC) 2007 • Several petabytes 2005 • LIGO (gravity wave search) • 0.5 - several petabytes 2002 • Digital astronomy • 10s of petabytes 2009 • 10s of terabytes 2001 • Other sciences coming forward • Bioinformatics (10s of petabytes) • Nanoscience • Environmental • … Paul Avery

  7. OSG Virtual Organizations Paul Avery

  8. OSG Virtual Organizations (2) Paul Avery

  9. Defining the Scale of OSG:Experiments at Large Hadron Collider • 27 km Tunnel in Switzerland & France CMS TOTEM LHC @ CERN ALICE LHCb Search for • Origin of Mass • New fundamental forces • Supersymmetry • Other new particles • 2007 – ? ATLAS Paul Avery

  10. Korea Russia UK FermiLab U Florida Caltech UCSD Maryland Iowa FIU OSG and LHC Global Grid • 5000 physicists, 60 countries • 10s of Petabytes/yr by 2009 • CERN / Outside = 10-20% CMS Experiment Online System CERN Computer Center 200 - 1500 MB/s Tier 0 10-40 Gb/s Tier 1 >10 Gb/s OSG Tier 2 2.5-10 Gb/s Tier 3 Tier 4 Physics caches PCs Paul Avery

  11. Crucial Ingredients in Building OSG • Science “Push”: ATLAS, CMS, LIGO, SDSS • 1999: Foresaw overwhelming need for distributed cyberinfrastructure • Early funding: “Trillium” consortium • PPDG: $12M (DOE) (1999 – 2006) • GriPhyN: $12M (NSF) (2000 – 2006) • iVDGL: $14M (NSF) (2001 – 2007) • Supplements + new projects : UltraLight, CHEPREO, DISUN ($17M) • Social networks: ~150 people with many overlaps • Universities, labs, SDSC, foreign partners • Coordination: pooling resources, developing broad goals • Common middleware: Virtual Data Toolkit (VDT) • Multiple Grid deployments/testbeds using VDT • Unified entity when collaborating internationally • Historically, a strong driver for funding agency collaboration Paul Avery

  12. Standing on the Shoulders of Giants • Previous NSF and DOE investments • NMI, Globus, Condor, MonALISA, … • DOEGrids Certificate Authority + KX509 infrastructure • Open source software components • Linux (thank you, Mr. Redhat) • Apache, GNU tools, myProxy, mySQL, OpenSSH, Perl, Python, Squid, Tcl, UberFTP, Wiki, etc. • Technologies created by members & partners • DRM, Gratia, GUMS, Netlogger, Pacman, Pegasus, PRIMA, pyGlobus, ROCKS, SRM, VDS, VOMS, … • Integration through the Virtual Data Toolkit (next slide) • Our unique contribution Paul Avery

  13. Integration via Virtual Data Toolkit Member & partner projects Grid deployments Grid & networking testbeds Requirements Deployment, Feedback Prototyping & experiments • Other linkages • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Science, ENG, Education Communities Techniques & software Tech Transfer Globus, Condor, NMI, TeraGrid, OSG EGEE, WLCG, Asia, South America QuarkNet, CHEPREO, Digital Divide U.S.Grids International Outreach Paul Avery

  14. Communications:International Science Grid This Week SGTW  iSGTW • ~2 years • Diverse audience • >1000 subscribers www.isgtw.org Paul Avery

  15. Project Challenges • Technical constraints • Commercial tools fall far short, require (too much) invention • Integration of advanced CI, e.g. networks (slides) • Financial constraints (slide) • Fragmented & short term funding injections (recent $30M/5 years) • Fragmentation of individual efforts • Distributed coordination and management • Tighter organization within member projects compared to OSG • Coordination of schedules & milestones • Many phone/video meetings, travel • Knowledge dispersed, few people have broad overview Paul Avery

  16. Collaboration with Internet2www.internet2.edu Paul Avery

  17. Collaboration with National Lambda Railwww.nlr.net • Optical, multi-wavelength community owned or leased “dark fiber” (10 GbE) networks for R&E • Spawning state-wide and regional networks (FLR, SURA, LONI, …) Paul Avery

  18. LHCNet: Transatlantic Link to CERN NSF/IRNC, DOE/ESnet (2006/2007) AsiaPac SEA ESnet Science Data Network (2nd Core – 30-50 Gbps,National Lambda Rail) Europe Europe Aus. BNL Japan Japan CHI NYC SNV DEN GEANT2 SURFNet IN2P3 DC FNAL MetropolitanAreaRings ESnet IP Core (≥10 Gbps) Aus. CERN ALB ATL SDG ESnet hubs ELP New ESnet hubs Metropolitan Area Rings Major DOE Office of Science Sites LHCNet Data Network (4 x 10 Gbps to the US) High-speed cross connects with Internet2/Abilene Production IP ESnet core, 10 Gbps enterprise IP traffic Science Data Network core, 40-60 Gbps circuit based transport 10Gb/s 10Gb/s 30Gb/s 2 x 10Gb/s Lab supplied Major international LHCNet Data Network NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2 Paul Avery

  19. 2000 2001 2002 2003 2004 2005 2006 2007 Funding & Milestones: 1999 – 2007 Grid Communications First US-LHCGrid Testbeds DISUN, $10M UltraLight, $2M GriPhyN, $12M Grid3 start LHC start OSG start iVDGL, $14M VDT 1.3 LIGO Grid VDT 1.0 CHEPREO, $4M Grid Summer Schools 2004, 2005, 2006 OSG, $30M NSF, DOE PPDG, $9.5M • Grid, networking projects • Large experiments • Education, outreach, training Digital Divide Workshops04, 05, 06 Paul Avery

  20. Challenges from Diversity and Growth • Management of an increasingly diverse enterprise • Sci/Eng projects, organizations, disciplines as distinct cultures • Accommodating new member communities (expectations?) • Interoperation with other major grids • TeraGrid • International partners (EGEE, NorduGrid, etc.) • New models of interoperation and service, e.g. caBIG • Education, outreach and training • Training for researchers, students • … but also project PIs, program officers • Operating a rapidly growing cyberinfrastructure • Management of and access to rapidly increasing data stores (slide) • Monitoring, accounting, achieving high utilization (slide) • Scalability of support model (slide) Paul Avery

  21. Rapid Cyberinfrastructure Growth: LHC • Meeting LHC service challenges & milestones • Participating in worldwide simulation productions Tier-2 2008: ~140,000 PCs Tier-1 3 GHz P4 ~ kSI2000 CERN Paul Avery

  22. Jobs Snapshot: 90 Days 5000 simultaneous jobsfrom multiple VOs Nov Dec Jan Oct Paul Avery

  23. OSG Operations Distributed model • Scalability! • VOs, sites, providers • Rigorous problemtracking & routing • Security • Provisioning • Monitoring • Reporting Partners with EGEE operations Paul Avery

  24. Extra Slides Paul Avery

  25. Cyberinfrastructure and Grids • Grid: Geographically distributed computing resources configured for coordinated use • Fabric: Physical resources & networks provide raw capability • Ownership: Resources controlled by owners and shared w/ others • Middleware: Software ties it all together: tools, services, etc. • Enhancing collaboration via transparent resource sharing US-CMS “Virtual Organization” Paul Avery

  26. My Background and This Talk • Professor of Physics: High Energy Physics • CLEO experiment at Cornell (DOE) • CMS experiment at CERN (DOE) • Director of two NSF/ITR Grid projects • GriPhyN (2000-2006), iVDGL (2001-2007) • Co-P.I. of several others • Today: Experiences in Open Science Grid • Building and sustaining national-scale grid cyberinfrastructure • Serving multiple science research communities Paul Avery

  27. Common Middleware: Virtual Data Toolkit NMI (NSF supported) RPMs Build & Test Condor pool (~100 computers,> 20 Op. Systems) Test Sources (CVS) Users … Pacman Cache Package Patching Build Binaries Build Binaries Test Contributors Globus, Condor, myProxy, … VDT: Package, test, deploy, support, upgrade, troubleshoot Paul Avery

  28. LHC Global Collaborations CMS ATLAS • 2000 – 3000 physicists per experiment • USA is 20–31% of total Paul Avery

  29. Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs ESnet Accepted Traffic 1990 – 2005Exponential Growth: Avg. +82%/yr for the Last 15 Years  L. Cottrell 700 W. Johnston 2 x 10 Gbit/s 600 500 400 TERABYTES Per Month 300 Progressin Steps 200 100 • 2005 SLAC Traffic ~400 Mbps • Growth in steps (ESNet Limit):~ 10X/4 years Paul Avery

  30. Evolving Science Requirements for Networks (DOE High Performance Network Workshop) See http://www.doecollaboratory.org/meetings/hpnpw/ Paul Avery

  31. OSG Organization Paul Avery

  32. e-Lab and i-Lab Projects • Subsumed under I2U2 program • www.i2u2.org/ Paul Avery

  33. CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University Additional initiatives • CyberBridges • Global CyberBridges • Networking initiatives • Etc. • Physics Learning Center • CMS Research • Cyberinfrastructure • WHREN network (S. America) www.chepreo.org

  34. Digital Divide Effort Background • ICFA/SCIC (Standing Committee on Inter-regional Connectivity) Themes • Global collaborations, Grids and addressing the Digital Divide • Focus on poorly connected regions • Brazil (2004), Korea (2005), Poland (2006) Paul Avery

  35. Grid Summer Schools • Summer 2004, 2005, 2006 • 1 week @ South Padre Island, Texas • Lectures plus hands-on exercises to ~40 students • Students of differing backgrounds (physics + CS), minorities • Reaching a wider audience • Lectures, exercises, video, on web • More tutorials, 3-4/year • Students, postdocs, scientists • Agency specific tutorials Paul Avery

More Related