90 likes | 100 Views
Explore datasets, formats, and access patterns in the CMS infrastructure for physicists to generate, distribute, and analyze data efficiently.
E N D
What is the CMS(UK) DataModel? Assume that CMS software is available at every UK institute connected by some infrastructure (ie. Grid). The problem then reduces to: • What datasets are required? • Where are they required? • Why are they required? • Who is going to generate, distribute them? • What are the formats, sizes & access patterns? Glenn Patrick
Event Tag Data Physics Objects Reconstructed Data Raw Data
Mass Storage & Disk Servers Database Servers Data Import Data Export Tier 2 Network from CERN Local institutes Network from Tier 2 and simulation centers Production Reconstruction Raw/Sim-->ESD Scheduled, predictable experiment/ physics groups Production Analysis ESD-->AOD AOD-->DPD Scheduled Physics groups Individual Analysis AOD-->DPD and plots Chaotic Physicists CERN Tapes Tapes Desktops Support Services Physics Software Development R&D Systems and Testbeds Info servers Code servers Web Servers Telepresence Servers Training Consulting Help Desk
Offline Data and Computation for Physics Analysis event filter (selection & reconstruction) detector processed data event summary data raw data batch physics analysis event reconstruction analysis objects (extracted by physics topic) event simulation
CPU for production Mass Storage for RAW, ESD AOD, and TAG CPU and data servers CPU for analysis Mass storage for AOD, TAG AOD,TAG real : 80TB/yr sim: 120TB/yr Regional Centre Regional Centre Regional Centre Regional Centre User analysis User analysis User analysis User analysis AOD,TAG 8-12 TB/yr Institute Institute Institute Institute Selected User Analyses Selected User Analyses Selected User Analyses Selected User Analyses LHCb Production Centre Generate raw data Reconstruction Production analysis User analysis
Real Data Simulated Data LHCb CERN RAL , Lyon, ... Data collection Triggering Reconstruction Final State Reconstruction Event Generation GEANT tracking Reconstruction Final State Reconstruction Production Centre (x1) WAN Output to each RC: AOD and TAG datasets 20TB x 4 times/yr= 80TB/yr WAN Output to each RC: AOD, Generator and TAG datasets 30TB x 4 times/yr= 120TB/yr Regional Centre (~x5) User Analysis User Analysis WAN Output to each Institute: AOD and TAG for samples 1TB x 10 times/yr= 10TB/yr WAN Output to each institute: AOD and TAG for samples 3TB x 10 times/yr= 30TB/yr Institute (~x50) Selected User Analysis Selected User Analysis
Physics Analysis DAQ system L2/L3 Trigger Reconstruction Reconstruction Tags RAW Tags Calibration Data RAW Data Physics Tags Analysis Workstation Analysis Object Data (AOD) Physics results Dataflow Model ESD Detector First Pass Analysis L3YES, sample L2/L3NO RAW ESD Private Data Event Summary Data (ESD) Reconstruction Tags
Need to answer questions like... • How will a physicist in Bristol/Brunel/IC/RAL: • Select events for a given physics channel from a year’s worth of data taking? • Transfer/replicate the selection for further analysis? • Generate & process a large sample of simulated events? • Run his/her batch job on existing samples of Monte-Carlo events (eg. at Tier1/Tier2)? • Where do you want the data? • What sort of data do you need - Tag,AOD,ESD,Raw?
How to Go Forward? • Need to identify critical mass of people formed from all of the institutes who will start to study, develop and exploit CMS(UK) facilities now. • Require expert(ise) in OO databases - specifically Objectivity (BaBar estimate 1 FTE). • Each institute needs to start to identify its data requirements for simulation/physics/trigger studies. • Need to understand how best to distribute, replicate, and centralise database & associated resources. • Need good organisation with regular meetings, etc.