150 likes | 293 Views
High-Energy Physics Data Delivering Data in Science ICSTI Winter Workshop. Tim Smith – CERN/IT Department. Delivering Data in HEP . Data Storage and Stewardship Distribution and Access Interpretability, Reusability and C itability. LHC and the Data Deluge. 150 million sensors.
E N D
High-Energy Physics DataDelivering Data in ScienceICSTI Winter Workshop Tim Smith – CERN/IT Department
Delivering Data in HEP • Data Storage and Stewardship • Distribution and Access • Interpretability, Reusability and Citability
LHC and the Data Deluge 150 million sensors 40 million times /sec 22 PB in 2012
Just a Drop in the Ocean! …Selection e + Higgs e - Z o e + Z o SUSY..... e - Crossing rate 40 Million /sec Bunch Protons/bunch 1011 Proton Collision rate 1 Billion /sec Parton Filter to 200 /sec (quark, gluon) Particle
Data Storage 6 GB/s
Data Stewardship: Migration LHC era 60 PB LEP era 100 TB
Data Distribution & Access Worldwide LHC Computing Grid 11 T1 140 T2 T3s
Data Access 70 PB Worldwide x N x tens x few 22 PB Publication data Derived physics data Analysis Object Data Reconstructed Data Raw Data / Simulated Data T3 T2 T1 T0
Data Access ≠ Data Usability Data Access ≠ Data Usability
Data Reuse: Raw/Processed Data • Reuse of the Reconstructed & Analysis Object Data • Calibrations, Configurations • Conditions DBs: tens of TBs • Reconstruction and identification algorithms • Detector response parameterizations • Software: millions of lines-of-code
Data Reuse: Publication Data • Published observables • Model-independent measurements • Distributions and cross-sections • HEPData: tabular • DOIs • Rivet routines • Parameterize analysis acceptance • Compare simulated & measured data • http://rivet.hepforge.org/
Data Reuse: Derived Physics Data • Access, ability to reinterpret • Reanalysis with new QCD calculations • Combination with data from future colliders • …serendipitous discovery • Pitfalls: Large investment of effort required • Correlations, efficiencies, systematic uncertainties • Backgrounds estimated from data driven techniques • Intertwined with event selection criteria • Searches…
Data Reuse: Derived Physics Data • RECAST • Limits of an existing search for an alternative hypothesis • Brokering service • Collaboration • Archives the analysis code • Provides authority • Digital Preservation in HEP • http://www.dphep.org/
WLCG Delivering HEP data to scientists around the world