390 likes | 405 Views
Disciplinary Case Study 1 Physical Sciences - Particle Physics at CERN and elsewhere. Jürgen Knobloch CERN/IT ERPANET/CODATA International Archiving Workshop on the Selection, Appraisal, and Retention of Digital Scientific Data Biblioteca Nacional, Lisbon, Portugal 15-17 December 2003.
E N D
Disciplinary Case Study 1Physical Sciences - Particle Physics at CERN and elsewhere Jürgen Knobloch CERN/IT ERPANET/CODATA International Archiving Workshop on the Selection, Appraisal, and Retention of Digital Scientific Data Biblioteca Nacional, Lisbon, Portugal 15-17 December 2003
Overview • Physics data retention past and future • Particle physics laboratory – CERN • Particle physics – data flow – data types – data volume • Keeping digital data available • In the longer term • For the public use • Some examples • LEP data long-term availability • QUAERO – opening data to the public • Particle Data Group • Limits of re-using data • Conclusion – Points for discussion
Physics data … “Laboratory”, Karnak, Egypt > 3000 years
… are not always cast in stone Law of motion Galileo’s notebook ~ 1638
Enrico Fermi - 1942 Notebook recording the first controlled, self-sustaining nuclear chain reaction, December 2, 1942; Records of the Atomic Energy Commission; Record Group 326; National Archives.
… or on film … Bubble chamberCERN, 1973
… now it is all electronic! CERN – UA1 1983
CERN: Annual budget: ~1000 MSFr (~700 M€) Staff members: 2650 Member states: 20 + 225 Fellows, + 270 Associates + 6000 CERN users CERN (founded 1954) = “Conseil Européen pour la Recherche Nucléaire” “European Organisation for Nuclear Reseach” Particle Physics 27 km circumference tunnel CERN
CERN – where the Web was born Tim Berners-Lee Fist Web Server WSIS, Geneva, October 10-12, 2003
CERN’s 20 member states CERN Convention: … shall provide … research of pure scientific and fundamental character… … shall have no concern with work for military requirements and the results of its experimental and theoretical work shall be published or otherwise made generally available.
… and scientists from the rest of the world: • OBSERVERS: • UNESCO • EU • Israel • Turkey • SPECIAL OBSERVERS • (for LHC): • USA • Japan • Russia
Particle Physics Establish a periodic system of the fundamental building blocks andunderstandforces
Methods of Particle Physics The most powerful microscope Creating conditions similar to the Big Bang
e+ f Z0 f e- Detector response apply calibration, alignment Fragmentation, Decay, Physics analysis Basic physics Results Particle physics data From raw data to physics results 2037 2446 1733 1699 4003 3611 952 1328 2132 1870 2093 3271 4732 1102 2491 3216 2421 1211 2319 2133 3451 1942 1121 3429 3742 1288 2343 7142 _ Raw data Convert to physics quantities Interaction with detector material Pattern, recognition, Particle identification Analysis Reconstruction Simulation (Monte-Carlo)
Detector alignment Detector description Reconstruction parameters Detector calibration Generate Events Build Reconstruction Geometry Physics Build Simulation Geometry Reconstuction geometry Reconstruct Events Analyze Events Simulation geometry Simulate Events Raw Data ESD AOD HEP Data Analysis
CERN data archiving policy • "CERN is not just another laboratory. It is an institution that has been entrusted with a noble mission which it must fulfil not just for tomorrow but for the eternal history of human thought.“ • (Albert Picot, 3rd Session of CERN Council, Geneva, 1955) • "Rules applicable to archival material and archiving at CERN“ – CERN Operational circular No. 3 (1997) • Historical and scientific archives • Implemented by the CERN archivist – Anita Hollier • Does not specifically cover digital physics data
Data archiving - LEP • Large Electron-Positron collider (LEP) was running from 1989 – 2000 (80 – 200 GeV) • Accelerator and experiments dismantled to make room for LHC • Four experiments ALEPH, DELPHI, L3, OPAL • Officially “terminating” in the near future • Experiments request to keep the analysis of data alive (as long as possible/reasonable) • In case that other experiments (e.g. at LHC) see new phenomena that are within the reach of LEP • Require also to be able to re-run simulation (Monte Carlo).
LEP data agreement • Keep the (FORTRAN) software running “as is”. • No further development or maintenance to central software such as CERNLIB • Have the required data available in the standard CERN hierarchical mass storage system CASTOR. • Carry the data forward in case of MSS evolution. • Have the software and data access running on a special cluster of LINUX computers. • Carry software with the operating system and compiler forward as much as possible without major effort • Run the whole as a “Museum system”
Issues and concerns • Event displays of some experiments depend on external commercial software. • Security updates for a given OS version are available for a limited time only. • The system requires – hopefully limited – manpower for central support. • The experiments need to keep it alive and perform regular tests. • Major changes in hardware technology cannot be accommodated (what about emulators?). • The data cannot be analyzed in a meaningful way by people who were not involved in the original collaboration.
QUAERO – Making HEP data publicly available • Developed by Bruce Knuteson
PDG: “Rosenfeld tables” • First review of particle properties & data: • “Hyperons and Heavy Mesons (Systematics and Decay)” by M. Gell-Mann and A. H. Rosenfeld, Ann.Rev.Nucl.Sci. 7 (1957) 407 • Separate efforts at LBL and CERN joined in 1964 • Rev.Mod.Phys. 36 (1964) 977 • Particle Data Group: • Maintaining a data-base of experimental results. • More than 20,000 measurements from 6000 papers. • Review and the Booklet are published in even-numbered years. • Web version updated between printed editions
PDG: Compilations Cross-sections Structure functions
PDG: Following history … … can be revealing!
The next step:LHC (Large Hadron Collider) 2000 M€ cost 27 km circumference 100 m underground
Challenge 1: Large, distributed community CMS “Offline” software effort: 1000 person-yearsper experiment ATLAS Software life span: 20 years ~ 5000 Physicistsaround the world- around the clock LHCb
Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Concorde (15 Km) 6 cm Mt. Blanc (4.8 Km) 50 CD-ROM = 35 GB Challenge 2: Data Volume Annual data storage: 12-14 PetaBytes/year
Challenge 3a: Find the Needle in a Haystack All interactions 9 orders of magnitude! The HIGGS Rare phenomena Huge backgroundComplex events
Challenge 3b: Provide mountains of CPU CalibrationReconstructionSimulationAnalysis For LHC computing,some 100 Million SPECint2000 are needed! 1 SPECint2000 = 0.1 SPECint95 = 1 CERN-unit = 4 MIPS - a 3 GHz Pentium 4 has ~ 1000 SPECint2000
Limits of re-using data • Example: • Fischbach et al. re-analysing after 100 years data from Eötvös’ classic experiment concerning the proportionality of inertial and gravitating masses. • Eötvös had an accuracy of 1/100 000 000 • Fischbach et al. used the data to claim the discovery of the fifth force! • Influence of the architecture of the building, the geology and people moving about make the conclusion rather difficult – to say the least!
Points for discussion • Experimental physics is publicly funded and expensive • LHC costs 1010 € • Not easy to repeat • Experimental data are useless without documentation, metadata, and software • Technology evolution is a serious burden • Have to keep data alive (re-copying) • What is the right level of archiving? • A la Particle Data Group – reliable, long-term • Four vectors – limited possibilities – not always useful • (Mini)-DST – requires more expertise • Making raw and intermediate data publicly available • Education – a clear case that is pursued by several groups • Scientists are rather hesitant to do it in general