130 likes | 138 Views
Explore the enigmas of the universe and the data challenges facing particle physicists in resolving them. Discover how REDDnet enables innovative approaches for maximizing productivity in this fascinating realm.
E N D
Data Logistics in Particle PhysicsReady or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University
Outline • How Strange is the Universe? 5 Modern Mysteries. • In trying to resolve these mysteries, particle physicists face a significant data logistics problem. • Solution should be flexible enough to encourage the creative approaches that will maximize productivity. • REDDnet breaks “data-tethered” compute model, allows unfettered access w/o strong central control.
Is the Universe Even Stranger Than We Have Imagined? • One piece of evidence: rotational velocities of stars in galaxies • Pick a star, how fast is it moving around galactic center? 1st Year Physics! • Mass of galaxy is much, much larger than you get by counting the stars in the galaxy
We Don’t Know What The Majority of Matter in the Universe Is. • This “extra” matter is 90% of the Universe! • Conventional explanations have mostly been ruled out • Planets, dust, … ~10% normal matter 90% “other” matter • Most of the matter in the Universe is probably an exotic form of matter — heretofore unknown! • But there is a good chance particle physicists will make some soon at the LHC at CERN!
5 Mysteries for a New Millennium • What is the majority of matter in the universe made of? • Does space have more than three dimensions? • Where is all the anti-matter created by the Big Bang? • What is this bizarre thing called “Dark Energy?” • Why do things have mass?
Answering These Questions Presents Many Challenges… CERN Large Hadron Collider: 2007 Start • Experiments require significant infrastructure, large collaborations • 2500 Physicists! CMS 27 km tunnel in Switzerland & France (100 m below ground)
CMS will generate Petabytes of data per year and require Petaflops of CPU… But physics is done in small groups, geographically distributed 2008: ~50,000 8 GHz P4s Petascale Computing Required
Distributed Resources, People CMS Collaboration: >37 Countries, >163 Institutes • Why Distributed Resources? • Sociology • Politics • Funding To maximize the quality and rate of scientific discovery, all physicists must have equal ability to access and analyze the experiment's data…
UERJ Tier2 Tier2 Center Tier2 Center Tier2 Center Caltech Tier2 LHC Data Grid Hierarchy >10 Tier1 and ~100 Tier2 Centers ~PByte/sec Online System Tier 0 +1 Experiment ~150-1500 MBs CERN Center PBs of Disk; Tape Robot Tier 1 10-40+ Gbps FNAL Tier1 IN2P3 Tier1 INFN Tier1 RAL Tier1 10 Gbps Tier 2 Tier 3 1-10 Gbps Vanderbilt Tier3 The small Analysis Groups doing the physics: work at the Tier 3/4 Level. Institute Institute Institute Physics data cache 1 to 10 Gbps Workstations/Laptops Tier 4
Data Logistics Yin and Yang • Uncertainty reigns at the most important level — where the physics will get done. • Physicists will evolve novel use cases that will not jive with expectations or any plans/rules/edicts.
Use Cases: What we Do Know Physicists will: • need access to 10-100 TB Data Sets for short term periods. • run over this data many times, refining, improving their analysis. • use local computing resources where they may not have much storage available. • make “opportunistic use” of compute resources at Tier 3 sites and Grid sites. • perform “production runs” at Tier 2 sites.
REDDnet at Tier 3 • Opportunistic computing vs data-tethered computing • CMS has no formal solution for Tier 3 storage • Compute on resources — even those where data not hosted • On-demand working storage • improve data logistics • Acts local — familiar user tools • Demonstrate at a Tier 3 • Performance • Reliability • … and convenience
Near Term Plan of Work • Provide T3 scratch space • Host/mirror popular datasets on REDDnet • Participate in Data and Service Challenges • Summer 07 Challenge Starting Soon • Network and Data Transfer Load tests • Integrate with existing CMS tools • Develop a Tier 3 Analysis environment • Initial small test community • Test with individual analyses • Run on the Grid REDDnet SC06 Depots