170 likes | 247 Views
First Ideas on Distributed Analysis for LHCb. LHCb Software Week CERN, 28th March 2001 Glenn Patrick (RAL) http://hepwww.rl.ac.uk/lhcb/physics/lhcbcern280301.ppt. Analysis and the Grid?. Monte-Carlo Production is readily mapped onto a Grid Architecture because:
E N D
First Ideas on Distributed Analysis for LHCb • LHCb Software Week • CERN, 28th March 2001 • Glenn Patrick (RAL) • http://hepwww.rl.ac.uk/lhcb/physics/lhcbcern280301.ppt Glenn Patrick
Analysis and the Grid? • Monte-Carlo Production is readily mapped onto a Grid • Architecture because: • It is a well defined problem using the same executable. • Already requires distributed resources (mainly cpu) in large centres (eg. Lyon, RAL, Liverpool ...). • Few people involved. Analysis is much more inventive/chaotic and will involve far more people in a wide range of institutes. How easily this is perceived to map onto the Grid depends on where we sit on the Hype Cycle .... Glenn Patrick
Peak of Inflated Expectations Hype Plateau of Productivity Slope of Enlightenment Trough of Disillusionment Trigger Time Hype Cycle of Emerging Technology Courtesy of Gartner Group Glenn Patrick
Issues • There are two basic issues: • What is the data model for the experiment? • Most work on this was done BG (before Grid). Is it still relevant? • Do we move analysis jobs to the data or the data to the jobs? • What is the minimum dataset required for analysis (AOD,ESD)? • Are we accessing objects or files? • Interactive versus batch computing. • What services and interfaces have to be provided to grid-enable the LHCb analysis software? Difficult until a working Grid architecture emerges. Have to make a start and gradually evolve? Glenn Patrick
Data ModelNetworking Evolution • In UK, WorldCom is providing the national backbone for SuperJanet4 from March 2001. 2000 SuperJanet3 155 Mbit/s 1Q2001 16 x SuperJanet3 2.5 Gbit/s 4Q2001 64 x SuperJanet3 10 Gbit/s 2Q2002 128 x SuperJanet3 20 Gbit/s Few years ago - Most bulk data was moved by tape. Now - Almost all data from RAL is moved over the network. More scope for moving data to the application? Glenn Patrick
SuperJanet4 UK Backbone, March 2001 Scotland via Edinburgh Scotland via Glasgow WorldCom Edinburgh WorldCom Glasgow C&NL MAN NorMAN NNW WorldCom Warrington WorldCom Leeds North Wales MAN YHMAN Northern Ireland EMMAN WorldCom London WorldCom Reading EastNet 155Mbit/s single fibre interface MidMAN External Links TVN 622Mbit/s single fibre interface WorldCom Reading WorldCom Portsmouth South Wales MAN LMN 2.5Gbit/s single fibre interface Kentish MAN 2.5Gbit/s dual fibre interface SWAN & BWEMAN 2.5Gbit/s development network LeNSE Glenn Patrick
Data ModelLast Mile Problem? • Having a fast backbone is not much use if local • bottlenecks exist (typically 100 Mbit/s). Need to do • point-to-point tests using realistic datasets. • Connection Rate Tape(750MB) • RAL CSF RAL PPD 1600kB/s 8 minutes • RAL CSF CERN 360kB/s 35 minutes • RAL CSF Liverpool ~90kB/s 2.3 hours • Very crude tests done on a bad day. Need to perform • spectrum of tests with realistic datasets, new tools, etc. • Parallel Grid-FTP(multiple streams) 1MB/s RALCERN • But increasing data flow down the analysis chain... Glenn Patrick
Analysis Object Data Analysis Object Data Analysis Object Data AOD ESD: Data or Monte Carlo Ref: Tony Doyle(WP2/ATLAS) Event Tags Event Selection Tier 0,1 Collaboration wide Calibration Data Analysis, Skims INCREASINGDATAFLOW Raw Data Tier 2 Analysis Groups Physics Objects Physics Objects Physics Objects Tier 3, 4 Physicists Physics Analysis
Physics Analysis Generator Data AOD Group Analysis Tags ESD Some physicists for small sample of events Calibration Data Analysis Workstation Physics results Which Datasets are really needed for Analysis? Likely to be different requirements at startup. Analysis Cycle (for each physicist) For event with “interesting” Group Analysis Tags Raw Data Few physicists and for very few events Private Data (e.g. ntuple) For Monte Carlo events Glenn Patrick
Datasets 2007 - Hoffman • ALICE(pp) ATLAS CMS LHCb • RAW per event 1MB 1MB 1MB 0.125MB • ESD per event 0.1MB 0.5MB 0.5MB 0.1MB • AOD per event 10kB 10kB 10kB 20kB • TAG per event 1kB 0.1kB 1kB 1kB • Real Data Storage 1.2PB 2PB 1.7PB 0.45PB • Simulation Storage 0.1PB 1.5PB 1.2PB 0.36PB • Calibration Storage0.0 0.4PB 0.01PB 0.01PB Glenn Patrick
Physics Use-Cases • Baseline model assumes: • Production Centre stores all phases of data (RAW, ESD, AOD and TAG). • CERN is production centre for real data. • TAG and AOD datasets shipped to Regional Centres. • Only 10% of ESD data moved to outside centres. • LHCb has smaller dataset sizes (but perhaps more • specialised requirements) more options available? • Even with 2 x 109 events/year, total AOD sample is only • 40 TB/year. Glenn Patrick
Analysis InterfaceGaudi meets the Grid? Gaudi Services Application Manager Job Options Service Detector Description EventData Service Histogram Service Message Service Particle Property Service GaudiLab Service Grid Services Information Services Scheduling Security Monitoring Data Management Service Discovery Database Service? Meta Data Data Most Grid services are producers or consumers of meta-data Logical DataStores Event Detector Histogram Ntuple Standard Interfaces & Protocols Glenn Patrick
Data Query Data Replication High Level Data Mover Data Locator Medium Level Other MSS CASTOR HPSS Low Level High Level Interfaces • Need to define high-level Grid interfaces essential to • Gaudi, especially relating to data access. • For example: Glenn Patrick
Analysis and the Grid • In the Grid, analysis appears to be seen as a series of • hierarchical queries (cuts) on databases/datasets: • eg. (PTRACK < 150.0) AND (RICHpid = pion) • Architectures based on multi-agent technology. • Intelligent agent is a software entity with some degree of autonomy and can carry out operations on behalf of a user or program. • Need to define “globally unique” LHCb namespace(s). ATF proposes using URI syntax… • eg. http://lhcb.cern.ch/analy/Bpipi/event1.dat Glenn Patrick
Agent Architecture(Serafini et al) User 1 User 2 User n INDEX Agent Based Query Facilitator Contains variety of agents: User agents Index agents MSS agents Query Execution Strategies Caching Strategies MSS 1 Cache/Disk Tape robotics MSS 2 Cache/Disk Tape robotics MSS k Cache/Disk Tape robotics Glenn Patrick
Evolving LHCb Analysis Testbeds? FRANCE CERN ITALY : RAL CSF 236 Linux cpu IBM 3494 tape robot Institutes LIVERPOOL MAP 300 Linux cpu RAL (PPD) Bristol RAL DataGrid Testbed Imperial College GLASGOW/ EDINBURGH “Proto-Tier 2” Oxford Cambridge Glenn Patrick
Conclusions • 1. Need better understanding of how Data Model will • really work for analysis. Objects versus files? • 2. Pragmatic study of performance/topology/limitations of national (and international) networks. • feed back into 1. • 3. Require definition of high-level Grid services which can be exploited by Gaudi. Agent technology? • 4. Need some realistic “physics” use-cases. • feed back into 1 and 3. • 5. Accumulate experience of running Gaudi in a distributed environment (eg.CERN UK). Glenn Patrick