340 likes | 471 Views
High Energy Physics and Data Grids. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu. US/UK Grid Workshop San Francisco August 4-5, 2001. e e. . . u d. c s. t b. Essentials of High Energy Physics.
E N D
High Energy Physicsand Data Grids Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu US/UK Grid Workshop San Francisco August 4-5, 2001 Paul Avery
e e u d c s t b Essentials of High Energy Physics • Better name “Elementary Particle Physics” • Science: Elementary particles, fundamental forces Particles Forces Strong gluon Electro-weak , W, Z0 Gravity graviton Leptons Quarks • Goal unified theory of nature • Unification of forces (Higgs, superstrings, extra dimensions, …) • Deep connections to large scale structure of universe • Large overlap with astrophysics, cosmology, nuclear physics Paul Avery
HEP Short History + Frontiers 10-10 m ~ 10 eV >300,000 Y 1900.... Quantum MechanicsAtomic physics 1940-50 Quantum Electro Dynamics ~ 3 min 10 m -15 MeV - GeV 1950-65 Nuclei, HadronsSymmetries, Field theories 10 m >> GeV ~10 sec 1965-75 Quarks. Gauge theories -16 -6 Z u e + ~ 100 GeV ~10 sec 197083 SPS ElectroWeak unification, QCD 10 m -10 -18 u e - 1990 LEP 3 families, Precision Electroweak 1994 Tevatron Top quark Origin of masses 10-19 m ~10 GeV ~10 sec 2007 LHC Higgs ? Supersymmetry ? 2 -12 The next step... GRAND Unified Theories ? Proton Decay ? 10 m ~1016 GeV ~10 sec -32 -32 Underground The Origin of the 10 m ~1019 GeV ~10 sec ?? Quantum Gravity? -35 -43 Universe Superstrings ? (Planck scale) Paul Avery
HEP Research • Experiments primarily accelerator based • Fixed target, colliding beams, special beams • Detectors • Small, large, general purpose, special purpose • … but wide variety of other techniques • Cosmic rays, proton decay, g-2, neutrinos, space missions • Increasing scale of experiments and laboratories • Forced on us by ever higher energies • Complexity, scale, costs large collaborations • International collaborations are the norm today • Global collaborations are the future (LHC) LHC discussed in next few slides Paul Avery
The CMS Collaboration Belgium Bulgaria Austria USA Finland CERN France Germany Russia Greece Uzbekistan Hungary Ukraine Italy Slovak Republic Georgia UK Belarus Poland Turkey Armenia India Portugal Spain China Estonia Pakistan Switzerland Cyprus Korea China (Taiwan) Associated Institutes Croatia Number of Scientists Number of Laboratories 36 5 Number of Laboratories Member States 58 Non-Member States 50 USA 36 144 Total Number of Scientists Member States 1010 Non-Member States 448 USA 351 Total 1809 1809 Physicists and Engineers 31 Countries 144 Institutions Paul Avery
CERN LHC site CMS LHCb ALICE Atlas Paul Avery
High Energy Physics at the LHC “Compact” Muon Solenoid at the LHC (CERN) Smithsonianstandard man Paul Avery
l e + l Higgs e - Z o e + Z o jet jet SUSY..... e - Collisions at LHC (2007?) ProtonProton 2835 bunch/beam Protons/bunch 1011 Beam energy 7 TeV (7x1012 ev) Luminosity 1034 cm2s1 Bunch Crossing rate 40 MHz(every 25 nsec) Proton Collision rate ~109 Hz (Average ~20 Collisions/Crossing) Parton (quark, gluon) New physics rate ~ 105 Hz Selection: 1in 1013 Particle Paul Avery
HEP Data • Scattering is principal technique for gathering data • Collisions of beam-beam or beam-target particles • Typically caused by a single elementary interaction • But also background collisions obscures physics • Each collision generates many particles: “Event” • Particles traverse detector, leaving electronic signature • Information collected, put into mass storage (tape) • Each event is independent trivial computational parallelism • Data Intensive Science • Size of raw event record: 20KB 1MB • 106 109 events per year • 0.3 PB per year (2001) BaBar (SLAC) • 1 PB per year (2005) CDF, D0 (Fermilab) • 5 PB per year (2007) ATLAS, CMS (LHC) Paul Avery
Data Rates: From Detector to Storage 40 MHz ~1000 TB/sec Physics filtering Level 1 Trigger: Special Hardware 75 GB/sec 75 KHz Level 2 Trigger: Commodity CPUs 5 GB/sec 5 KHz Level 3 Trigger: Commodity CPUs 100 MB/sec 100 Hz Raw Data to storage Paul Avery
LHC Data Complexity • “Events” resulting from beam-beam collisions: • Signal event is obscured by 20 overlapping uninteresting collisions in same crossing • CPU time does not scale from previous generations 2000 2007 Paul Avery
Example: Higgs Decay into 4 Muons 40M events/sec, selectivity: 1 in 1013 Paul Avery
LHC Computing Challenges • Complexity of LHC environment and resulting data • Scale: Petabytes of data per year (100 PB by ~2010) Millions of SpecInt95s of CPU • Geographical distribution of people and resources 1800 Physicists 150 Institutes 32 Countries Paul Avery
Transatlantic Net WG (HN, L. Price) Tier0 - Tier1 BW Requirements [*] [*] Installed BW in Mbps. Maximum Link Occupancy 50%; work in progress
Hoffmann LHC Computing Report 2001 Tier0 – Tier1 link requirements (1) Tier1 Tier0 Data Flow for Analysis 0.5 - 1.0 Gbps (2) Tier2 Tier0 Data Flow for Analysis 0.2 - 0.5 Gbps (3) Interactive Collaborative Sessions (30 Peak) 0.1 - 0.3 Gbps (4) Remote Interactive Sessions (30 Flows Peak) 0.1 - 0.2 Gbps (5) Individual (Tier3 or Tier4) data transfers 0.8 Gbps Limit to 10 Flows of 5 Mbytes/sec each TOTAL Per Tier0 - Tier1 Link 1.7 - 2.8 Gbps • Corresponds to ~10 Gbps Baseline BW Installed on US-CERN Link • Adopted by the LHC Experiments (Steering Committee Report) Paul Avery
LHC Computing Challenges • Major challenges associated with: • Scale of computing systems • Network-distribution of computing and data resources • Communication and collaboration at a distance • Remote software development and physics analysis Result of these considerations: Data Grids Paul Avery
Tier 0 (CERN) 3 3 3 3 T2 T2 3 T2 Tier 1 3 3 T2 T2 3 3 3 3 3 3 4 4 4 4 Global LHC Data Grid Hierarchy Tier0 CERNTier1 National LabTier2 Regional Center (University, etc.)Tier3 University workgroupTier4 Workstation • Key ideas: • Hierarchical structure • Tier2 centers • Operate as unified Grid Paul Avery
Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS Example: CMS Data Grid CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1 Experiment ~PBytes/sec Online System ~100 MBytes/sec Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size CERN Computer Center > 20 TIPS Tier 0 +1 HPSS 2.5 Gbits/sec France Center Italy Center UK Center USA Center Tier 1 2.5 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Institute ~0.25TIPS Institute Institute Institute 100 - 1000 Mbits/sec Physics data cache Physicists work on analysis “channels”. Each institute has ~10 physicists working on one or more channels Tier 4 Workstations,other portals Paul Avery
Tier1 and Tier2 Centers • Tier1 centers • National laboratory scale: large CPU, disk, tape resources • High speed networks • Many personnel with broad expertise • Central resource for large region • Tier2 centers • New concept in LHC distributed computing hierarchy • Size [national lab * university]1/2 • Based at large University or small laboratory • Emphasis on small staff, simple configuration & operation • Tier2 role • Simulations, analysis, data caching • Serve small country, or region within large country Paul Avery
Data Server LHC Tier2 Center (2001) GEth Switch FEth Switch FEth Switch FEth Switch FEth Switch FEth Router Hi-speedchannel WAN Tape >1 RAID Paul Avery
Hardware Cost Estimates • Buy late, but not too late: phased implementation • R&D Phase 2001-2004 • Implementation Phase 2004-2007 • R&D to develop capabilities and computing model itself • Prototyping at increasing scales of capability & complexity 1.1 years 1.4 years 2.1 years 1.2 years Paul Avery
HEP Related Data Grid Projects • Funded projects • GriPhyN USA NSF, $11.9M + $1.6M • PPDG I USA DOE, $2M • PPDG II USA DOE, $9.5M • EU DataGrid EU $9.3M • Proposed projects • iVDGL USA NSF, $15M + $1.8M + UK • DTF USA NSF, $45M + $4M/yr • DataTag EU EC, $2M? • GridPP UK PPARC, > $15M • Other national projects • UK e-Science (> $100M for 2001-2004) • Italy, France, (Japan?) Paul Avery
Submit GriPhyN proposal, $12.5M Q2 00 Q3 00 GriPhyN approved, $11.9M+$1.6M Q4 00 Outline of US-CMS Tier plan DataTAG approved Caltech-UCSD install proto-T2 EU DataGrid approved, $9.3M Submit DataTAG proposal, $2M Submit DTF proposal, $45M 2nd Grid coordination meeting DTF approved? Submit PPDG proposal, $12M Submit iVDGL preproposal 1st Grid coordination meeting iVDGL approved? Q1 01 Q2 01 Submit iVDGL proposal, $15M PPDG approved, $9.5M Q3 01 (HEP Related) Data Grid Timeline Paul Avery
Coordination Among Grid Projects • Particle Physics Data Grid (US, DOE) • Data Grid applications for HENP • Funded 1999, 2000 ($2M) • Funded 2001-2004 ($9.4M) • http://www.ppdg.net/ • GriPhyN (US, NSF) • Petascale Virtual-Data Grids • Funded 9/2000 – 9/2005 ($11.9M+$1.6M) • http://www.griphyn.org/ • European Data Grid (EU) • Data Grid technologies, EU deployment • Funded 1/2001 – 1/2004 ($9.3M) • http://www.eu-datagrid.org/ • HEP in common • Focus: infrastructure development & deployment • International scope • Now developing joint coordination framework GridPP, DTF, iVDGL very soon? Paul Avery
Data Grid Management Paul Avery
BaBar HENPGCUsers D0 Condor Users BaBar Data Management HENP GC D0 Data Management Condor PPDG SRB Users CDF SRB Team CDF Data Management Globus Team Nuclear Physics Data Management Atlas Data Management CMS Data Management Nuclear Physics Globus Users Atlas CMS Paul Avery
EU DataGrid Project Paul Avery
PPDG and GriPhyN Projects • PPDG focus on today’s (evolving) problems in HENP • Current HEP: BaBar, CDF, D0 • Current NP: RHIC, JLAB • Future HEP: ATLAS , CMS • GriPhyN focus on tomorrow’s solutions • ATLAS, CMS, LIGO, SDSS • Virtual data, “Petascale” problems (Petaflops, Petabytes) • Toolkit, export to other disciplines, outreach/education • Both emphasize • Application sciences drivers • CS/application partnership (reflected in funding) • Performance • Explicitly complementary Paul Avery
PPDG Multi-site Cached File Access System Satellite Site Tape, CPU, Disk, Robot PRIMARY SITE Data Acquisition, Tape, CPU, Disk, Robot University CPU, Disk, Users Satellite Site Tape, CPU, Disk, Robot Satellite Site Tape, CPU, Disk, Robot University CPU, Disk, Users University CPU, Disk, Users Resource Discovery, Matchmaking, Co-Scheduling/Queueing, Tracking/Monitoring, Problem Trapping + Resolution Paul Avery
GriPhyN: PetaScale Virtual-Data Grids Production Team Individual Investigator Workgroups ~1 Petaflop ~100 Petabytes Interactive User Tools Request Planning & Request Execution & Virtual Data Tools Management Tools Scheduling Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services Transforms Distributed resources(code, storage, CPUs,networks) Raw data source Paul Avery
Item request Virtual Data in Action • Data request may • Compute locally • Compute remotely • Access local data • Access remote data • Scheduling based on • Local policies • Global policies • Cost Major facilities, archives Regional facilities, caches Local facilities, caches Paul Avery
GriPhyN Goals for Virtual Data • Transparency with respect to location • Caching, catalogs, in a large-scale, high-performance Data Grid • Transparency with respect to materialization • Exact specification of algorithm components • Traceability of any data product • Cost of storage vs CPU vs networks • Automated management of computation • Issues of scale, complexity, transparency • Complications: calibrations, data versions, software versions, … Explore concept of virtual data and itsapplicability to data-intensive science Paul Avery
Discipline - Specific Data Grid Applications Application Usage Request Request Consistency Accounting Management Planning Management Services Services Services Services Collective Replica Replica System Resource Selection Management Monitoring Brokering Services Services Services Services Distributed Community Online Information Coallocation Catalog Authorization Certificate Services Services Services Service Repository Storage Compute Network Catalog Code Service Enquiry Mgmt Mgmt Mgmt Mgmt Mgmt Reg. Resource Protocol Protocol Protocol Protocol Protocol Protocol Protocol Connectivity Communication, service discovery (DNS), authentication, delegation Storage Compute Code Fabric Networks Catalogs Systems Systems Repositories Data Grid Reference Architecture Paul Avery