330 likes | 522 Views
LHCb Computing and Grid Status. Glenn Patrick LHCb(UK), Dublin – 23 August 2005. Computing completes TDRs. Jan 2000. June 2005. LHCb – June 2005. MF4. MF1-MF3 Mu-filters. LHCb Magnet. HCAL. ECAL. 03 June 2005. Tier 1. Tier 1. Tier 1. Tier 1. Tier 1. Tier 1. Online System.
E N D
LHCb Computingand Grid Status Glenn Patrick LHCb(UK), Dublin – 23 August 2005
Computing completes TDRs Jan 2000 June 2005
LHCb – June 2005 MF4 MF1-MF3 Mu-filters LHCb Magnet HCAL ECAL 03 June 2005
Tier 1 Tier 1 Tier 1 Tier 1 Tier 1 Tier 1 Online System 40 MHz Level-0 Hardware 1 MHz Level-1 Software 40 kHz HLT Software Grid World Raw Data: 2kHZ, 50MB/s Tier 0 2 kHz
HLT Output Understand bias on other B selections. Clean peak allows PID calibration. Calibration for proper-time resolution. 200 Hz Hot Stream Will be fully reconstructed on online farm in real time. “Hot stream” (RAW + rDST) written to Tier 0. 2kHz RAW data written to Tier 0 for reconstruction at CERN and Tier 1s.
Event Model/Physics Event Model Framework - Gaudi Data Flow Stripped DST Detector Description Conditions Database Analysis DaVinci Digitisation Boole Reconstruction Brunel Simulation Gauss Raw Data MC Truth DST Analysis Objects
Grid Architecture Tier 1 centre (RAL) + 4 virtual Tier 2 centres LCG-2/EGEE World’s Largest Grid! ~16,000 CPU and 5PB over 192 sites in ~39 countries GridPP provides ~3,000 CPU at 20 UK sites
Grid Ireland EGEE made up of regions. UKI region consists of 3 federations: GridPP Grid Ireland National Grid Service We are here
LHCb Computing Model • CERN Tier 1 essential • for accessing “hot • stream” for • First alignment & calibration. • First high-level analysis. 14 candidates
LHC Comparison Experiment TIER 1 TIER 2 ALICE Reconstruction MC production Chaotic analysis Chaotic analysis ATLAS Reconstruction Simulation Scheduled analysis/strimming Analysis Calibration Calibration CMS Reconstruction Analysis for 20-100 users All simulation prodn. LHCb Reconstruction MC production Scheduled strimming No analysis. Chaotic analysis
Distributed Data CERN = Master Copy RAW DATA 500 TB 2nd copy distributed over six Tier 1s Pass 1: During data taking at CERN and Tier 1s (7 months) RECONSTRUCTION 500TB/pass Pass 2: During winter shutdown at CERN, Tier 1s and online farm (2months) Pass 1: During data taking at CERN and Tier 1s (7 months) STRIPPING 140 TB/pass/copy Pass 2: After data taking at CERN and Tier 1s (1 month) Pass 3: During shutdown at CERN, Tier 1s and online farm Pass 4: Before next year data taking at CERN and Tier 1s (1 month)
Prod DB group1 group2 groupN Stripping Job - 2005 Stripping runs on reduced DSTs (rDST). Pre-selection algorithms categorise events into streams. Events that pass are fully reconstructed and full DSTs written. Read INPUTDATA and stage them in 1 go Usage of SRM Check File status CERN, CNAF, PIC used so far – sites based on CASTOR. Not yet Staged Send bad file info staged Send file info Check File integrity Check File integrity Check File integrity Check File integrity Good file Merging processDST and ETC ETC DaVinci stripping DaVinci stripping DaVinci stripping DST DaVinci stripping
2010 2008 2009 Comparisons - CPU Tier 1 CPU – integrated (Nick Brook) LHCb
Tier-2 Tier-1 54%pledged CERN Comparisons- Disk LCG TDR – LHCC, 29.6.2005 (Jurgen Knobloch)
UK Tier 1 Status LHCb(UK) 2008 (15% share) CPU = 663 KSI2K Disk = 365 TB Tape = 311 TB Total Available (August 2005) CPU = 796 KSI2K (500 dual cpu) Disk = 187 TB (60 servers) Tape = 340 TB Minimum Required Tier 1 2008 CPU = 4732 KSI2K Disk = 2678 TB Tape = 2538 TB LHCb(UK) 2008 (1/6 share) CPU = 737 KSI2K Disk = 405 TB Tape = 346 TB
Capacity 70% Non-Grid LHCb 69% Jan-July 2005 Grid UK Tier 1 Utilisation CPU/Walltime < 50% for some ATLAS jobs Grid use increasing. CPU “undersubscribed” (but efficiencies of Grid jobs may be a problem). Hardware purchase scheduled for early 2005 postponed. PPARC discussions ongoing.
2005 2004 LHCb ATLAS LHCb ATLAS BaBar BaBar 17.8.05 UK Tier 1 Exploitation
UK Tier 1 Storage • Classic SE not sufficient as LCG storage solution. • SRM now the agreed interface to storage resources. • Lack of SRM prevented data stripping at UK Tier 1. • This year, new storage infrastructure deployed for UK Tier 1. • Storage Resource Manager (SRM) – Interface providing a combined view of secondary and tertiary storage to Grid clients. • dCache – Disk Pool Management system jointly • developed by DESY and FermiLab. • Single namespace to manage 100s of TB of data. • Access via GRIDFTP and SRM. • Interfaced to RAL tapestore. • CASTOR under evaluation as replacement for home-grown (ADS) tape service. CCLRC to deploy 10,000 tape robot? LHCb now has disk allocation of 8.2TB with 4x1.6TB under dCache control (c.f. BaBar=95TB, ATLAS=19TB, CMS=40TB). Computing Model says LHCb Tier 1 should have ~122TB in 2006…
Committed Resources available to experiment at Tier-2 in 2007 Size of an average Tier-2 in experiment's computing model UK Tier 2 Centres Under Delivered - Tier1+Tier2 (March 2005) CPU = 2277 KSI2K out of 5184 KSI2K, DISK = 280TB out of 968TB Improving as hardware is deployed in the Tier 2 institutes. Hopefully, more resources from future funding bids e.g. SRIF3 April 2006 – March 2008
Tier 2 Exploitation Over 40 sites in UKI federation of EGEE + over 20 Virtual Organisations. ATLAS LHCb BaBar CMS 800 data points – improved accounting prototype on the way… GRIDPP only. Does not include Grid Ireland. …but you get the idea. Tier 2 sites are vital LHCb Grid resource. 17 Aug, Grid Operations Centre
DIRAC Architecture User interfaces Job monitor Production manager GANGA UI User CLI BK query webpage FileCatalog browser BookkeepingSvc FileCatalogSvc DIRAC Job Management Service DIRAC services JobMonitorSvc InformationSvc MonitoringSvc JobAccountingSvc AccountingDB Agent Agent Agent DIRAC resources DIRAC Storage LCG Resource Broker DIRAC Sites CE 3 DIRAC CE gridftp bbftp DIRAC CE DIRAC CE DiskFile CE 2 CE 1 rfio Services Oriented Architecture
187 M Produced Events Phase 1 Completed 3-5 106/day LCG restarted LCG paused LCG in action 1.8 106/day DIRAC alone Data Challenge 2004 20 DIRAC sites + 43 LCG sites were used. Data written to Tier 1s. UK second largest producer (25%) after CERN. • Overall, 50% of events produced • using LCG. • At end, 75% produced by LCG.
RTTC - 2005 Real Time Trigger Challenge – May/June 2005 150M Minimum bias events to feed online farm and test software trigger chain. 37% Completed in 20 days (169M events) on 65 different sites. 95% produced with LCG sites 5% produced with “native” DIRAC sites Average of 10M events/day. Average of 4,000 cpus
Start DC06 Processing phase May 2006 Analysis at Tier 1s Nov. 2005 Alignment/calibration Challenge October 2006 Next Challenge SC3 – Sept. 2005 Ready for data taking April 2007 2005 2006 2007 2008 SC3 First physics cosmics First beams Full physics run SC4 LHC Service Operation Looking Forward Excellent support from UK Tier 1 at RAL. 2 application support posts at Tier 1 appointed in June 2005 BUT LHCb(UK) technical co-ordinator still to be appointed.
Phase 1 (Sept. 2005 ): Movement of 8TB of digitised data from CERN/Tier 0 to LHCb Tier 1 centres in parallel over a 2 week period (~10k files). Demonstrate automatic tools for data movement and bookkeeeping. Removal of replicas (via LFN) from all Tier 1 centres. Redistribution of 4TB data from each Tier 1 centre to Tier 0 and other Tier 1 centres over a 2 week period. Demonstrate data can be redistributed in real time to meet stripping demands. Moving of stripped DST data (~1TB, 190k files) from CERN to all Tier 1 centres. LHCb and SC3 Phase 2 (Oct. 2005 ): • MC production in Tier 2 centres with DST data collected in Tier 1 centres in real time followed by stripping in Tier 1 centres (2 months). Data stripped as it becomes available. • Analysis of stripped data in Tier 1 centres.
GridPP Status GRIDPP1 Prototype Grid £17M, complete September 2001 – August 2004 GRIDPP2 Production Grid £16M, ~20% complete September 2004 – August 2007 Beyond August 2007? Funding from September 2007 will be incorporated as part of PPARC’s request for planning input for LHC exploitation. To be considered by panel (G. Lafferty, S. Watts & P. Harris) providing input to the Science Committee in the autumn. Input from ALICE, ATLAS, CMS, LHCb and GRIDPP.
LCG-2 (=EGEE-0) 2004 prototyping prototyping product 2005 2006 2007 2008 2005 product SC3 We are here SC4 LHC Service Operation LCG-3 (=EGEE-x?) cosmics First physics First beams Full physics run LCG Status LCG has two phases. Phase 1: 2002 – 2005 • Build a service prototype, based on existing grid middleware • Gain experience in running a production grid service • Produce the TDR for the final system LCG and experiment TDRs submitted • Phase 2: 2006 – 2008 • Build and commission the initial LHC computing environment
Software installation Gauss execution Steps Gauss B Gauss MB Gauss MB Gauss MB Sim Check logfile Dir listing Bookkeeping report Boole B Boole MB Boole MB Boole MB Digi Modules Reco Brunel B Brunel MB UK:Workflow Control Production Desktop Gennady Kuznetsov (RAL) Primary event Spill-over event Used for RTCC and current production/stripping.
TCP/IP Streaming GANGA application Bookkeeping ARDA Server ARDA Client ARDA Client Tomcat API API Servlet Web Browser UK: LHCb Metadata and ARDA Carmine Cioffi (Oxford) Testbed underway to measure performance with ARDA and ORACLE servers.
Job Job Job Job LSF LSF store & retrieve job definition localhost localhost gLite gLite submit, kill LCG2 LCG2 prepare, configure Athena DIRAC DIRAC get output update status DIAL DIAL Gaudi AtlasPROD AtlasPROD scripts Ganga4 UK: GANGA Grid Interface Karl Harrison (Cambridge) Alexander Soroko (Oxford) Alvin Tan (Birmingham) Ulrik Egede (Imperial) Andrew Maier (CERN) Kuba Moscicki (CERN) Ganga 4 beta release 8th July + split, merge, monitor, dataset selection
UK: Analysis with DIRAC Software Installation + Analysis via DIRAC WMS Stuart Patterson (Glasgow) Check for all SE’s which have data PACMAN DIRAC installation tools Data as LFN DIRAC API for analysis job submission Job DIRAC If no data specified [ Requirements = other.Site == "DVtest.in2p3.fr"; Arguments = "jobDescription.xml"; JobName = "DaVinci_1"; OutputData = { "/lhcb/test/DaVinci_user/v1r0/LOG/DaVinci_v12r11.alog" }; parameters = [ STEPS = "1"; STEP_1_NAME = "0_0_1" ]; SoftwarePackages = { "DaVinci.v12r11" }; JobType = "user"; Executable = "$LHCBPRODROOT/DIRAC/scripts/jobexec"; StdOutput = "std.out"; Owner = "paterson"; OutputSandbox = { "std.out", "std.err", "DVNtuples.root", "DaVinci_v12r11.alog", "DVHistos.root" }; StdError = "std.err"; ProductionId = "00000000"; InputSandbox = { "lib.tar.gz", "jobDescription.xml", "jobOptions.opts" }; JobId = ID ] Matching Closest SE Agent See later talk! Installs software Task-Queue Job executes on WN
2007 Data Taking Distributed Analysis Distributed Reconstruction Data Stripping 2005 Monte-Carlo Production on the Grid Conclusion Half way there! But the climb gets steeper and there may be more mountains beyond 2007 DC04 DC03