1 / 28

“Data Handling in HEP” Towards LHC computing…

“Data Handling in HEP” Towards LHC computing…. EPS Conference July 22, 2003 David Stickland, Princeton University. Overview. Why GRIDS? High Level Triggers Demonstration that LHC Physics can be selected by purely software triggers and “commodity” computing farms Software

sen
Download Presentation

“Data Handling in HEP” Towards LHC computing…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Data Handling in HEP”Towards LHC computing… EPS Conference July 22, 2003 David Stickland, Princeton University DPS Jul 03, EPS plenary

  2. Overview • Why GRIDS? • High Level Triggers • Demonstration that LHC Physics can be selected by purely software triggers and “commodity” computing farms • Software • GEANT4 almost ready for prime-time • Persistency and Core Framework Software • Computing • PASTA Summary of Computing Costs and Projections • Planning for LHC Computing • Deploying an HEP 24x7 Operational GRID DPS Jul 03, EPS plenary

  3. Why GRIDs? • The Computing must be distributed: • Politics, Economics, Physics, Manpower, … • Optimized use of globally distributed resources: • Retrieve data from remote disks with best availability • Submit jobs to centers for which they are best suited. • Base these decisions on the current systems status • Requires • Common protocols for data and status information exchange • The GRID DPS Jul 03, EPS plenary

  4. Without GRIDS? • Major experiments (CDF,D0,Babar, Belle) are running now using only some GRID base components. • Production tasks can always be made to work (If there are enough resources) • The collaboration members need to be “organized” to avoid resource contention suffocating the system. • D0 developed the “SAM” system, an advanced Data Management system for HEP analysis • (now also adopted by CDF) • Regional Analysis Centers becoming operational based on SAM. • Important tool for Tevatron (and testing ground for LHC) • LHC data rates to tape are >20 times those of Tevatron • Widely spread, and large, collaborations require, efficient access to what is expected to be a very rich physics environment DPS Jul 03, EPS plenary

  5. The LCG: LHC Computing GRID LCG • Building a Production GRID service • “LCG1” ready for deployment now for 2003/4 Data Challenges • Developing and maintaining some of the base framework software • Infrastructure: Savannah, SCRAM, External Software, … • LCG Projects: POOL(Persistency) ,SEAL (Framework Services),… • LCG Contributions/Collaboration: ROOT, GEANT4,… • Deeply collaborating with the Experiment Software teams • Integrating the Worldwide GRID prototype software • GLOBUS, EDG, VDT, … • Collaborating with the Recently Approved EGEE EU project to build a heterogeneous and interoperable European GRID • Managed by the LHC Experiments, CERN and the Regional Centers DPS Jul 03, EPS plenary

  6. p-p collisions at LHC Event rate Level 1 Trigger Rate to tape Crossing rate 40 MHz Event Rates: ~109 Hz Max LV1 Trigger 100 kHz Event size ~1 Mbyte Readout network 1 Terabit/s Filter Farm ~107 Si2K Trigger levels 2 Online rejection 99.9997% (100 Hz from 50 MHz) System dead time ~ % Event Selection: ~1/1013 Luminosity Low 2x1033 cm-2 s-1 High 1034 cm-2 s-1 “Discovery” rate DPS Jul 03, EPS plenary

  7. HLT Muon track reconstruction Level-3 Algorithmic efficiency  Single muons 10<Pt<100 GeV/c • Standalone Muon Reconstruction: “Level-2” • Seeded by Level-1 muons • Kalman filtering technique applied to DT/CSC/RPC track segments • GEANE used for propagation through iron • Trajectory building works from inside out • Track fitting works from outside in • Fit track with beam constraint • Inclusion of Tracker Hits: “Level-3” • Define a region of interest through tracker based on L2 track with parameters at vertex • Find pixel seeds, and propagate from innermost layers out, including muon DPS Jul 03, EPS plenary

  8. Inclusive b tagging at HLT Regional Tracking: Look only in Jet-track matching cone Loose Primary Vertex association Conditional Tracking: Stop track as soon as Pixel seed found (PXL) / 6 hits found (Trk) If Pt<1 GeV with high C.L. ~300 ms low lumi ~1 s high lumi Performance of simple signed IP “track counting” tags ~ same as after full track reconstruction Use tracks to define Jet axis (if rely on L1 Calo Jet ~ randomize signed IP) DPS Jul 03, EPS plenary Inclusive b tag at HLT possible, provided alignment under control

  9. HLT table: LHC start… Level-1 rate “DAQ staging”: 50 KHz Total Rate: 105 Hz Average HLT CPU: 300ms*1GHz Improvements are possible • HLT performances: • Priority to discovery channels DPS Jul 03, EPS plenary

  10. HLT: CPU usage All numbers for a 1 GHz, Intel Pentium-III CPU Total: 4092 s for 15.1 kHz  271 ms/event Time completely dominated byslow GEANE extrapolationin muons– will improve! Consider ~50% uncertainty! • Today: ~300 ms/event • on a 1GHz Pentium-III CPU • Physics start-up (50 kHz LVL1 output): • need 15,000 CPUs • Moore’s Law: 8x faster CPUs in 2007 • ~ 40 ms in 2007, ~2,000 CPUs • ~1,000 dual-CPU boxes in Filter Farm DPS Jul 03, EPS plenary

  11. CMS Full Detector Simulation with GEANT4 G4 G3 Tracker Reconstructed Hits as a function of  Going in to production for Data Challenge DC04 (Now) DPS Jul 03, EPS plenary

  12. Muon Energy Loss in Liquid Argon 10-1 10-2 Fraction events/0.1 GeV 10-3 Eμ= 100 GeV, ημ ≈ 0.975 10-4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Reconstructed Energy [GeV] 0 -0.5 -1.0 -1.5 Δ events/0.1 GeV [%] -2.0 -2.5 -3.0 -3.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Reconstructed Energy [GeV] • Geant4 simulation (+ el. noise) describes well beam test data Electromagnetic Barrel Calorimeter EMB (Liquid Argon/Lead Accordion) Hadronic EndCap Calorimeter (HEC) (Liquid Argon/Copper Parallel Plate) 800 700 600 180 GeV μ 500 400 300 200 100 0 400 -100 0 100 200 300 500 Calorimeter Signal [nA] DPS Jul 03, EPS plenary

  13. Electron shower shapes in EMB 0 0 0 0 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 • Geant4 electromagnetic showers for 20-245 GeV electrons in the EMB are more compact longitudinally than in Geant3: • Latest comparison with data shows that Geant4 does better job at all  • Small discrepancy in last sampling (2X0 out of 24X0 full calorimeter depth) but energy deposition is very small and uncertainty is large Sampling 1/Ebeam [%] Epreshower/Ebeam [%]   Sampling 3/Ebeam [%] Sampling 2/Ebeam [%]   250 GeV e– energy in longitudinal samplings as a function of   - Geant4 o - Geant3  - data DPS Jul 03, EPS plenary

  14. LCG Blueprint Software Decomposition LCG • Building a Common Core Software Environment for LHC Experiments DPS Jul 03, EPS plenary

  15. The LCG Persistency Framework LCG • POOL is the LCG Persistency Framework • Pool of persistent objects for LHC • Started in April ’02 • Common effort in which the experiments take a major share of the responsibility • for defining the system architecture • for development of POOL components • The LCG Pool project provides a hybrid store integrating object streaming (eg Root I/O) with RDBMS technology (eg MySQL/Oracle) for consistent meta data handling • Strong emphasis on component decoupling and well defined communication/dependencies • Transparent cross-file and cross-technology object navigation via C++ smart pointers • Integration with Grid technology (via EDG-RLS) • but preserving networked and grid-decoupled working model DPS Jul 03, EPS plenary

  16. POOL , Files and Navigation LCG • GRID mostly deals with data of file level granularity • File Catalog connects POOL to Grid Resources • eg via the EDG-RLS backend • POOL Storage Service deals with intra file structure • need connection via standard Grid File access • Both File and Object based Collections are seen as important End User concepts • POOL offers a consistent interface to both types • The goal is transparent navigation back from an object in an “ntuple” through the DST and even back to the Raw Data. • Gives the possibility to do a complex selection, deep copy only the relevant data and run a new calibration or reconstruction pass Functional complete POOL V1.1 release has been produced in June CMS and ATLAS Integrating and Testing Now (CMS Hopes to have Pool in production by the end of the summer ) DPS Jul 03, EPS plenary

  17. PASTA III Technology Review A: Semiconductor Technology Ian Fisk (UCSD/CMS), Alessandro Machioro (CERN), Don Petravik (Fermilab) B:Secondary Storage Gordon Lee (CERN), Fabien Collin (CERN), Alberto Pace(CERN) C:Mass Storage Charles Curran (CERN), Jean-Philippe Baud (CERN) D:Networking Technologies Harvey Newman (Caltech/CMS), Olivier Martin (CERN), Simon Leinen(Switch) E:Data Management Technologies Andrei Maslennikov (Caspur), Julian Bunn (Caltech/CMS) F:Storage Management Solutions Michael Ernst (Fermilab/CMS), Nick Sinanis (CERN/CMS), Martin Gasthuber (DESY ) G:High Performance Computing Solutions Bernd Panzer (CERN), Ben Segal (CERN), Arie Van Praag(CERN) Chair David Foster Editor Gordon Lee http://lcg.web.cern.ch/LCG/PEB/PASTAIII/pasta2002Report.htm DPS Jul 03, EPS plenary

  18. Basic System Components- Processors Performance evolution and associated cost evolution for both High-end machines (15K$ for quad processor) and Low-end Machines (2K$ for dual CPU) DPS Jul 03, EPS plenary

  19. Network Progress • Network backbones are advancing rapidly to the 10 Gbps range • “Gbps” end-to-end throughput data flows will be in production soon (in 1-2 years) • Wide area data migration/replication now feasible and affordable. • Tests of multiple streams to the US running over 24hrs at the full capacity of 2Gbit/sec were successful. • Network advances are changing the view of the networks’ roles • This is likely to have a profound impact on the experiments’ Computing Models, and bandwidth requirements • Advanced integrated applications, such as Data Grids, rely on seamless “transparent” operation of our LANs and WANs • With reliable, quantifiable (monitored), high performance • Networks need to be integral parts of the Grid(s) design DPS Jul 03, EPS plenary

  20. HENP Major Links: Bandwidth Roadmap (in Gbps) DPS Jul 03, EPS plenary Continuing the Trend: ~1000 Times Bandwidth Growth Per Decade

  21. LHC Computing Outlook • HEP trend is to fewer and bigger experiments • Multi- Peta-Bytes, GB/s, MSI2k • Worldwide collaborations, thousands of physicists, … • LHC experiments will be extreme cases • But, CDF, D0, Babar and Belle are approaching the same scale and tackling the same problems even now. • (Worldwide) Hardware Computing costs at LHC will be in the region of 50M€ per year • Worldwide Software development for GRID in HEP also in this ballpark • With so few experiments, so many collaborators, so much money: • We have to get this right (enough)… DPS Jul 03, EPS plenary

  22. LHC Data Grid Hierarchy CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1 ~PByte/sec ~100-1500 MBytes/sec Online System Experiment CERN Center PBs of Disk; Tape Robot Tier 0 +1 Tier 1 ~2.5-10 Gbps FNAL Center IN2P3 Center INFN Center RAL Center 2.5-10 Gbps Tier 2 Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center ~2.5-10 Gbps Tier 3 Institute Institute Institute Institute Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later. Physics data cache 0.1 to 10 Gbps Tier 4 Workstations DPS Jul 03, EPS plenary Emerging Vision: A Richly Structured, Global Dynamic System

  23. Scheduled Computing • Organized, Scheduled, Simulation and Large-Scale Event Reconstruction is a task we understand “well” • We can make reasonably accurate estimates of the computing required • We can perform simple optimizations to share the work between the large computing centers DPS Jul 03, EPS plenary

  24. Chaotic Computing Move Data to Job Moving only those parts of the data that the user really needs All of some events, or some parts of some events? Very different resource requirements Web-Services/ Web-Caching may be the right technologies here • Data Analysis is a “Feeding Frenzy” • Data is widely dispersed, may be geographically mismatched to available CPU • Choosing between data and job movement? • How/When will we have the information to motivate those choices? • Move Job to Data • Information required to describe the data requirements can (will) be complex and poorly described • Difficult for a resource broker to make good scheduling choices • Current Resource Brokers are quite primitive • Balancing the many priorities internal to an experiment is essential • Completing the a-priori defined critical physics as quickly and correctly as possible • Enabling the collaboration to explore the full Physics richness • Build a Flexible System, Avoid Optimizations now DPS Jul 03, EPS plenary

  25. (Some) Guiding Principles for LHC Computing • Access to Data is more of a bottleneck than access to CPU • Make multiple distributed copies as early as possible • Experiment needs to be able to enact Priority Policy • Stream data from Raw onwards • Some overlap allowed • Partition CPU according to experiment priorities • Initial detailed analysis steps will be run at the T1’s • Need access to large data samples • T2’s have (by definition?) more limited Disk/Network than the T1’s • Good for final analysis, small (TB) samples • Make sure there is rapid access to locally replicate these • Perfect for Monte-Carlo Production • User Analysis tasks are equal in magnitude to Production tasks • 50% Resources for each • Self correcting fraction • (When it gets to big strong motivation to make the user task a common production task) DPS Jul 03, EPS plenary

  26. Data Challenge CMS DC04 T1 Calibration sample T2 Calibration Jobs TAG/AOD (replica) Replica Conditions DB T2 MASTER Conditions DB Fake DAQ (CERN) T0 T1 Replica Conditions DB 25Hz 1.5MB/evt 40MByte/s 3.2 TB/day 1st pass Recon- struction Event streams Higgs DST TAG/AOD (20 kB/evt) T2 TAG/AOD (replica) T2 Event server Higgs background Study (requests New events) SUSY Background DST DC04 Calibration challenge Starting Now. “True” DC04 Feb, 2004 DC04 Analysis challenge DC04 T0 challenge CERN disk pool ~40 TByte (~20 days data) 25Hz 1MB/evt raw 25Hz 0.5MB reco DST Pre Challenge Production HLT Filter ? 50M events 75 Tbyte 1TByte/day 2 months Disk cache Archive storage DPS Jul 03, EPS plenary CERN Tape archive CERN Tape archive

  27. Deployment Goals for LCG-1A First Production GRID LCG • Production service for Data Challenges in second half of 2003 & 2004 • Initially focused on batch production work • Gain experience in close collaboration between the Regional Centers • Must have wide enough participation to understand the issues • Learn how to maintain and operate a global grid • Focus on a production-quality service • Robustness, fault-tolerance, predictability, and supportability take precedence; additional functionality gets prioritized • LCG should be integrated into the sites’ physics computing services – should not be something apart DPS Jul 03, EPS plenary

  28. The Goal is the Physics, not the Computing… • Motivation: at L0=1033 cm-2s-1, • 1 fill (6hrs) ~ 13 pb-1 • 1 day ~ 30 pb-1 • 1 month ~ 1 fb-1 • 1 year ~ 10 fb-1 • Most of Standard-Model Higgs can be probed within a few months • Ditto for SUSY • Turn-on for detector +computing and software will be crucial DPS Jul 03, EPS plenary

More Related