350 likes | 363 Views
High Energy & Nuclear Physics Experiments and Advanced Cyberinfrastructure. www.opensciencegrid.org. Internet2 Meeting San Diego, CA October 11, 2007. Paul Avery University of Florida avery@phys.ufl.edu. Context: Open Science Grid. Consortium of many organizations ( multiple disciplines )
E N D
High Energy & Nuclear Physics Experiments and Advanced Cyberinfrastructure www.opensciencegrid.org Internet2 MeetingSan Diego, CAOctober 11, 2007 Paul AveryUniversity of Floridaavery@phys.ufl.edu Paul Avery
Context: Open Science Grid • Consortium of many organizations (multiple disciplines) • Production grid cyberinfrastructure • 75+ sites, 30,000+ CPUs:US, UK, Brazil, Taiwan Paul Avery
2009 2007 2005 Community growth Data growth 2003 2001 OSG Science Drivers • Experiments at Large Hadron Collider • New fundamental particles and forces • 100s of petabytes 2008 - ? • High Energy & Nuclear Physics expts • Top quark, nuclear matter at extreme density • ~10 petabytes 1997 – present • LIGO (gravity wave search) • Search for gravitational waves • ~few petabytes 2002 – present Future Grid resources • Massive CPU (PetaOps) • Large distributed datasets (>100PB) • Global communities (1000s) • International optical networks Paul Avery
OSG History in ContextPrimary Drivers: LHC and LIGO LIGO operation LIGO preparation LHC construction, preparation, commissioning LHC Ops iVDGL (NSF) OSG Trillium Grid3 GriPhyN (NSF) (DOE+NSF) PPDG (DOE) 2009 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 European Grid + Worldwide LHC Computing Grid Campus, regional grids Paul Avery
LHC Experiments at CERN • 27 km Tunnel in Switzerland & France CMS TOTEM ALICE ATLAS LHCb Search for • Origin of Mass • New fundamental forces • Supersymmetry • Other new particles • 2008 – ? Paul Avery
l l e Higgs + e - Z o e + Z o jet jet SUSY..... e - Collisions at LHC (2008?) ProtonProton 2835 bunch/beam Protons/bunch 1011 Beam energy 7 TeV x 7 TeV Luminosity 1034 cm2s1 Bunch Crossing rate every 25 nsec (~20 Collisions/Crossing) Proton Parton (quark, gluon) • Collision rate ~109 Hz • New physics rate ~105 Hz • Selection: 1 in 1014 Particle Paul Avery
LHC Data and CPU Requirements CMS ATLAS Storage • Raw recording rate 0.2 – 1.5 GB/s • Large Monte Carlo data samples • 100 PB by ~2012 • 1000 PB later in decade? Processing • PetaOps (> 600,000 3 GHz cores) • Users • 100s of institutes • 1000s of researchers LHCb Paul Avery
LHC Global Collaborations CMS ATLAS • 2000 – 3000 physicists per experiment • USA is 20–31% of total Paul Avery
Korea Russia UK FermiLab U Florida Caltech UCSD Maryland Iowa FIU LHC Global Grid • 5000 physicists, 60 countries • 10s of Petabytes/yr by 2009 • CERN / Outside = 10-20% CMS Experiment Online System CERN Computer Center 200 - 1500 MB/s Tier 0 10-40 Gb/s Tier 1 >10 Gb/s OSG Tier 2 2.5-10 Gb/s Tier 3 Tier 4 Physics caches PCs Paul Avery
LHC Global Grid • 11 Tier-1 sites • 112 Tier-2 sites (growing) • 100s of universities Paul Avery J. Knobloch
LHC Cyberinfrastructure Growth: CPU • Multi-core boxes • AC & power challenges Tier-2 ~100,000 cores Tier-1 CERN Paul Avery
LHC Cyberinfrastructure Growth: Disk Disk Tier-2 100 Petabytes Tier-1 CERN Paul Avery
LHC Cyberinfrastructure Growth: Tape Tape Tier-1 100 Petabytes CERN Paul Avery
HENP Bandwidth Roadmapfor Major Links (in Gbps) Paralleled by ESnet roadmap Paul Avery
HENP Collaboration with Internet2www.internet2.edu HENP SIG Paul Avery
HENP Collaboration with NLRwww.nlr.net • UltraLight and other networking initiatives • Spawning state-wide and regional networks (FLR, SURA, LONI, …) Paul Avery
US LHCNet, ESnet Plan 2007-2010:3080 Gbps US-CERN US-LHCNet: NY-CHI-GVA-AMS 2007-10: 30, 40, 60, 80 Gbps AsiaPac SEA Europe Europe ESnet4 SDN Core: 30-50Gbps Aus. BNL Japan Japan SNV CHI NYC GEANT2 SURFNet IN2P3 DEN DC Metro Rings FNAL Aus. ESnet IP Core ≥10 Gbps ALB SDG ATL CERN ELP ESnet hubs New ESnet hubs US-LHCNet Network Plan (3 to 8 x 10 Gbps US-CERN) Metropolitan Area Rings 10Gb/s 10Gb/s 30Gb/s2 x 10Gb/s Major DOE Office of Science Sites High-speed cross connects with Internet2/Abilene Production IP ESnet core, 10 Gbps enterprise IP traffic Science Data Network core, 40-60 Gbps circuit transport Lab supplied Major international ESNet MANs to FNAL & BNL; Dark Fiber to FNAL; Peering With GEANT LHCNet Data Network NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2 Paul Avery
Tier1–Tier2 Data Transfers: 2006–07 1 GB/sec CSA06 Sep. 2007 Mar. 2007 Sep. 2006 Paul Avery
US: FNAL Transfer Rates to Tier-2 Universities Computing, Offline and CSA07 1 GB/s One well configured site. But ~10 such sites in near future network challenge Nebraska June 2007 Paul Avery
Current Data Transfer Experience • Transfers are generally much slower than expected • Or stop altogether • Potential causes difficult to diagnose • Configuration problem? Loading? Queuing? • Database errors, experiment S/W error, grid S/W error? • End-host problem? Network problem? Application failure? • Complicated recovery • Insufficient information • Too slow to diagnose and correlate at the time the error occurs • Result • Lower transfer rates, longer troubleshooting times • Need intelligent services, smart end-host systems Paul Avery
UltraLight Integrating Advanced Networking in Applications http://www.ultralight.org Funded by NSF 10 Gb/s+ network • Caltech, UF, FIU, UM, MIT • SLAC, FNAL • Int’l partners • Level(3), Cisco, NLR Paul Avery
UltraLight Testbed www.ultralight.org Funded by NSF Paul Avery
Many Near-Term Challenges • Network • Bandwidth, bandwidth, bandwidth • Need for intelligent services, automation • More efficient utilization of network (protocols, NICs, S/W clients, pervasive monitoring) • Better collaborative tools • Distributed authentication? • Scalable services: automation • Scalable support Paul Avery
END Paul Avery
Extra Slides Paul Avery
The Open Science Grid Consortium Science projects & communities U.S. gridprojects LHC experiments Universityfacilities Regional andcampus grids OpenScienceGrid Educationcommunities Multi-disciplinaryfacilities ComputerScience Laboratorycenters Technologists(Network, HPC, …) Paul Avery
CMS: “Compact” Muon Solenoid Inconsequential humans Paul Avery
Collision Complexity: CPU + Storage (+30 minimum bias events) All charged tracks with pt > 2 GeV Reconstructed tracks with pt > 25 GeV 109 collisions/sec, selectivity: 1 in 1013 Paul Avery
LHC Data Rates: Detector to Storage 40 MHz ~TBytes/sec Physics filtering Level 1 Trigger: Special Hardware 75 GB/sec 75 KHz Level 2 Trigger: Commodity CPUs 5 GB/sec 5 KHz Level 3 Trigger: Commodity CPUs 0.15 – 1.5 GB/sec 100 Hz Raw Data to storage(+ simulated data) Paul Avery
Cardiff AEI/Golm • LIGO: Search for Gravity Waves • LIGO Grid • 6 US sites • 3 EU sites (UK & Germany) Birmingham• * LHO, LLO: LIGO observatory sites * LSC: LIGO Scientific Collaboration Paul Avery
Is HEP Approaching Productivity Plateau? Beijing 2001 The Technology Hype Cycle Applied to HEP Grids San Diego 2003 Victoria 2007 Padova 2000 Expectations Mumbai 2006 Interlachen 2004 (CHEP Conferences) Gartner Group From Les Robertson Paul Avery
Challenges from Diversity and Growth • Management of an increasingly diverse enterprise • Sci/Eng projects, organizations, disciplines as distinct cultures • Accommodating new member communities (expectations?) • Interoperation with other grids • TeraGrid • International partners (EGEE, NorduGrid, etc.) • Multiple campus and regional grids • Education, outreach and training • Training for researchers, students • … but also project PIs, program officers • Operating a rapidly growing cyberinfrastructure • 25K 100K CPUs, 4 10 PB disk • Management of and access to rapidly increasing data stores (slide) • Monitoring, accounting, achieving high utilization • Scalability of support model (slide) Paul Avery
Collaborative Tools: EVO Videoconferencing End-to-End Self Managed Infrastructure Paul Avery
REDDnet: National Networked Storage • NSF funded project • Vanderbilt • 8 initial sites • Multiple disciplines • Satellite imagery • HENP • Terascale Supernova Initative • Structural Biology • Bioinformatics • Storage • 500TB disk • 200TB tape Brazil? Paul Avery
OSG Operations Model Distributed model • Scalability! • VOs, sites, providers • Rigorous problemtracking & routing • Security • Provisioning • Monitoring • Reporting Partners with EGEE operations Paul Avery