430 likes | 450 Views
Data Production Data Analysis at CERN. Nestl é Research Center 23 Mai 2007 Ren é Brun / CERN. The Large Hadron Collider (LHC) is being built in a circular tunnel 27 km in circumference. The tunnel is buried around 50 to 175 m. underground. It straddles the Swiss and French
E N D
Data ProductionData Analysisat CERN Nestlé Research Center 23 Mai 2007 René Brun/ CERN
The Large Hadron Collider (LHC) is being built in a circular tunnel 27 km in circumference. The tunnel is buried around 50 to 175 m. underground. It straddles the Swiss and French borders on the outskirts of Geneva. Data Production & Analysis at CERN
Overview of CERN's accelerator layout Data Production & Analysis at CERN
Overview of CERN's accelerator layout Data Production & Analysis at CERN
CERN Accelerators life time SPS Design/Simulation LEP Run Construction LHC 1982 1998 1994 2000 2015 1975 Data Production & Analysis at CERN
Lowering dipole magnet (one of 1232) in tunnel Data Production & Analysis at CERN
The key element – the 1232 dipoles bend the beam around the 27 km circumference Data Production & Analysis at CERN
The LHC collaborations (1995->20XX) • ALICE, ATLAS, CMS, LHCb • >5000 physicists • > 500 Univ or Labs • Many PetaBytes per year. • Billions of events. Data Production & Analysis at CERN
A typical detector component Data Production & Analysis at CERN
More than 10 millions electronic channels per experiment Data Production & Analysis at CERN
A simulated collision Data Production & Analysis at CERN
How Much Data is Involved? High Level-1 Trigger(1 MHz) High No. ChannelsHigh Bandwidth(500 Gbit/s) Level 1 Rate (Hz) 106 1 billion people surfing the Web LHCB ATLAS CMS 105 HERA-B KLOE CDF II High Data Archive(5 PetaBytes/year) 10 Gbits/s in Data base 104 CDF 103 H1ZEUS ALICE NA49 UA1 STAR 102 104 105 106 107 LEP Event Size (bytes) Data Production & Analysis at CERN
Lhe LHC collaborations (some parameters) Data Production & Analysis at CERN
From Mainframes to the GRID Data Production & Analysis at CERN
LHC collaborations (Analysis Steps) Raw Data (PetaBytes) DAQ -> T0 -> T1 After reconstruction (100 TeraBytes) T1 -> T2 Ready for analysis (10 TeraBytes) T2 -> T3 Analysis par physicist (1 TeraByte) Data Production & Analysis at CERN
Tools for Data Storage Objectivity hydra ROOT hydra zbook oracle 1982 1998 1994 2000 2020 1975 Data Production & Analysis at CERN
Tools for Data Visualization & Analysis LHC++ ROOT PAW JAS 1982 1998 1994 2000 2020 1975 Data Production & Analysis at CERN
VOBOX::SA xrootd (master) Data Storage and Access Tools Disk DPM SRM xrootd (worker) SRM xrootd(worker) Castor SRM ROOT xrootd (worker) MSS dCache SRM xrootd emulation (worker) MSS Data Production & Analysis at CERN
The ROOT open Source Projecthttp://root.cern.ch ROOT 5.12 functionality ROOT 3.0 LHC Large Hadron Collider ROOT 2.0 RHIC, FNAL/RUN II Babar, KEK, SPS,FNAL ROOT 1.0 LEP,HERA,SPS ROOT 0.5 1995 2000 2005 Data Production & Analysis at CERN
from plotters to objects All items are clickable objects Data Production & Analysis at CERN
Can take advantage of graphics accelerators Data Production & Analysis at CERN
GUI Examples Data Production & Analysis at CERN
ROOT Math Libraries Data Production & Analysis at CERN
Multivariate Analysis/ Cluster Analysis Data Production & Analysis at CERN
Self-describing files • Dictionary for persistent classes written to the file. • ROOT files can be read by foreign readers • Support for Backward and Forward compatibility • Files created in 2001 must be readable in 2015 • Classes (data objects) for all objects in a file can be regenerated via TFile::MakeProject Root >TFile f(“demo.root”); Root > f.MakeProject(“dir”,”*”,”new++”); Data Production & Analysis at CERN
Objects in directory /pippa/DM/CJ eg: /pippa/DM/CJ/h15 A Root file pippa.root with two levels of directories Data Production & Analysis at CERN
Memory <--> TreeEach Node is a branch in the Tree Memory T.GetEntry(6) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 T.Fill() 18 T Data Production & Analysis at CERN tr
8 leaves of branch Electrons A double-click to histogram the leaf 8 Branches of T Data Production & Analysis at CERN
Chains of Trees • A TChain is a collection of Trees. • Same semantics for TChains and TTrees • root >.x h1chain.C • root >chain.Process(“h1analysis.C”) { //creates a TChain to be used by the h1analysis.C class //the symbol H1 must point to a directory where the H1 data sets //have been installed TChain chain("h42"); chain.Add("$H1/dstarmb.root"); chain.Add("$H1/dstarp1a.root"); chain.Add("$H1/dstarp1b.root"); chain.Add("$H1/dstarp2.root"); } Data Production & Analysis at CERN
Access Transparency TFile *f1 = TFile::Open(“local.root”) TFile *f2 = TFile::Open(“root://cdfsga.fnal.gov/bigfile.root”) TFile *f3 = TFile::Open(“rfio:/castor.cern.ch/alice/aap.root”) TFile *f4 = TFile::Open(“dcache://main.desy.de/h1/run2001.root”) TFile *f5 = TFile::Open(“chirp://hep.wisc.edu/data1.root”) TFile *f5 = TFile::Open(“http://root.cern.ch/geom/atlas.root”) Data Production & Analysis at CERN
Data Sets Hierarchy +100 Millions files per experiment ! Copied/distributed in many sites around the world 100MB 1GB 10GB 100GB 1TB 10TB 100TB 1PB 1 1 5 50 500 5000 50000 TTree TChain A TFile typically contains 1 TTree A TChain is a collection of TTrees or/and TChains A TChain is typically the result of a query to the file catalogue Data Production & Analysis at CERN
Interactive and batch tasks Same Interface for batch et interactive systems Medium term jobs, e.g. analysis design and development using also non-local resources Analysis jobs with well defined algorithms (e.g. production of personal trees) Interactive analysis using local resources, e.g. • end-analysis calculations • visualization Data Production & Analysis at CERN
Sample of analysis activity G. Ganis, CHEP06, 15 Feb 2006 Monday at 10h15 ROOT session on my laptop AQ1: 1s query produces a local histogram AQ2: a 10mn query submitted to PROOF1 AQ3->AQ7: short queries AQ8: a 10h query submitted to PROOF2 Monday at 16h25 ROOT session on my laptop BQ1: browse results of AQ2 BQ2: browse temporary results of AQ8 BQ3->BQ6: submit 4 10mn queries to PROOF1 Wednesday at 8h40 Browse from any web browser CQ1: Browse results of AQ8, BQ3->BQ6 Data Production & Analysis at CERN
From Laptop to the GRIDParallelism at all levels Online/Offline Farms Local/remote Storage Laptop Data Analysis tools must be able to exploit parallelism on multi-core laptops, use remote computers in parallel as well as storage elements and networks in a transparent way GRID Data Production & Analysis at CERN
Batch: Classical Approach catalog files query jobs data file splitting myAna.C merging final analysis outputs submit Storage Batch farm queues manager • “static” use of resources • jobs frozen, 1 job / worker node • “manual” splitting, merging • limited monitoring (end of single job) Data Production & Analysis at CERN
Interactive Parallel ROOT/PROOF files scheduler query PROOF query: data file list, myAna.C feedbacks (merged) final outputs (merged) catalog Storage PROOF farm MASTER • farm perceived as extension of local PC • more dynamic use of resources • real time feedback • automated splitting and merging Data Production & Analysis at CERN