1 / 27

CMS Plans for Use of LCG-1

Detailed plan for utilizing CMS computing resources to cope with data challenge, including milestones, workflows, calibration, analysis, and grid operation strategies.

iwanda
Download Presentation

CMS Plans for Use of LCG-1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS Plans for Use of LCG-1 David Stickland CMS Core Software and Computing (Heavily based on recent talks by Ian Fisk, Claudio Grandi and Tony Wildish ) DPS May 03, CMS CPTweek

  2. Computing TDR Strategy Technologies Evaluation and evolution Estimated Available Resources (no cost book for computing) • Physics Model • Data model • Calibration • Reconstruction • Selection streams • Simulation • Analysis • Policy/priorities… • Computing Model • Architecture (grid, OO,…) • Tier 0, 1, 2 centres • Networks, data handling • System/grid software • Applications, tools • Policy/priorities… Iterations / scenarios Required resources Validation of Model DC04 Data challenge Copes with 25Hz at 2x10**33 for 1 month Simulations Model systems & usage patterns • C-TDR • Computing model (& scenarios) • Specific plan for initial systems • (Non-contractual) resource planning DPS May 03, CMS CPTweek

  3. Schedule: PCP, DC04, C-TDR … • 2003 Milestones • June Switch to OSCAR (critical path) • July Start GEANT4 production • Sept Software baseline for DC04 2004 Milestones • April DC04 done (incl. post-mortem) • April First Draft C-TDR • Oct C-TDR Submission PCP DPS May 03, CMS CPTweek

  4. PCP and DC04 • Two quite different stages: • Pre Challenge Production (PCP) • Important thing is to get it done; how its done is not the big issue • But anything that can reduce manpower here is good, this will be a 6-month process • We intend to run a hybrid (non-GRID and GRID) operation (with migration from first to second case as tools mature) • Data Challenge (DC04) • Predicated on GRID operation for moving data and for running jobs (and/or pseudo jobs) at the appropriate locations • Exercise some calibration and analysis scenarios DPS May 03, CMS CPTweek

  5. DC04 Workflow • Process data at 25 Hz at the Tier-0 • Reconstruction produces DST and AOD • AOD replicated to all Tier-1(assume 4 centers) • Sent on to participating Tier 2 (pull) • DST replicated to at least one Tier-1 • Assume Digis are already replicated in at least one Tier-1 • No bandwidth to transfer Digis synchronously • Archive Digis to tape library • Express lines transferred to selected Tier-1: • Calibration streams, Higgs analysis stream, … • Analysis & recalibration • Produce new calibration data at selected Tier-1 and update the Conditions Database • Analysis from the Tier-2 on AOD, DST, occasionally on Digis DPS May 03, CMS CPTweek

  6. T1 Calibration sample T2 Calibration Jobs TAG/AOD (replica) Replica Conditions DB T2 MASTER Conditions DB Fake DAQ (CERN) T0 T1 Replica Conditions DB 25Hz 1.5MB/evt 40MByte/s 3.2 TB/day 1st pass Recon- struction Event streams Higgs DST TAG/AOD (20 kB/evt) T2 TAG/AOD (replica) PCP T2 Event server Higgs background Study (requests New events) 50M events 75 Tbyte 1TByte/day 2 months CERN Tape archive SUSY Background DST Data Flow In DC04 DC04 Calibration challenge DC04 Analysis challenge DC04 T0 challenge CERN disk pool ~40 TByte (~20 days data) 25Hz 1MB/evt raw 25Hz 0.5MB reco DST HLT Filter ? Disk cache Archive storage DPS May 03, CMS CPTweek CERN Tape archive

  7. DC04 Strategy (partial...) • DC04 is focussed on preparations for the first months of data, not the operation in 2010 • Grid enters mainly in the data distribution and analysis • Express Lines pushed from Tier-0 to Tier-1’s • AOD, DST published by Tier-0 and pulled by Tier-1’s • Use Replica Manager services to locate and move the data • Use a Workload Management System to select resources • Use a Grid-wide monitoring system • Conditions DB segmented in read-only Calibration Sets • Versioned • Metadata stored in the RefDB • Temporary solution: • need specific middleware for read-write data management • Client-server analysis: Clarens? • How does it interface to LCG-1 information and data management systems? DPS May 03, CMS CPTweek

  8. Boundary conditions for PCP CMS persistency is changing POOL (by LCG) is replacing Objectivity/DB CMS Compiler is changing gcc 3.2.2 is replacing 2.95.2 Operating system is changing Red Hat 7.3 is replacing 6.1.1 Grid middleware structure is changing EDG on top of VDT  Flexibility to deal with a dynamic environment during the Pre-Challenge Production! DPS May 03, CMS CPTweek

  9. Perspective on Spring 02 Production • Strong control at each site • Complex machinery to install and commission at each site • Steep learning curve • Jobs coupled by Objectivity • Dataset-oriented production • not well suited to most sites capabilities DPS May 03, CMS CPTweek

  10. Perspective for PCP04 • Must be able to run on grid and on non-grid resources • Have less (no!) control at non-dedicated sites • Simplify requirements at each site: • Less tools to install • Less configuration • Fewer site-local services • Simpler recovery procedures • Opportunistic approach • Assignment  dataset • Several short assignments make one large dataset • Allows splitting assignments across sites • Absolute central definition of assignment • Use RefDB instead of COBRA metadata to control run numbers etc • Random numbers & events per run from RefDB • Expect (intend) to allow greater mobility • One assignment != one site DPS May 03, CMS CPTweek

  11. Catalogs, File Systems etc • A reasonably functioning Storage Element needs • A data catalog • Replication management • Transfer Capabilities • On the WAN side at least GridFTP • On the LAN side a POSIX compliant I/O (Some more discussions here I expect) • CMS is installing on “CMS Centers” SRB to solve the immediate data management issues • We aim to integrate with an LCG supported SE as soon as reasonable functionality exists • CMS would like to install dCache on those centers doing specialized tasks such as High Luminosity Pileup DPS May 03, CMS CPTweek

  12. Specifics: operations • CMKIN/CMSIM • Can run now • Would help us commission sites with McRunjob etc • OSCAR (G4) • Need to ‘plug the gaps’, be sure everything is tightly controlled and recorded from assignment request to delivered dataset • ORCA (Reconstruction) • Presumably need the same control-exercise as for OSCAR to keep up with the latest versions • Digitisation • Don’t yet know how to do this with pileup on a grid • Looks like dCache may be able to replace the functionality of AMS/RRP(AMS/RRP is a fabric level tool not middleware or experiment application) DPS May 03, CMS CPTweek

  13. Operations (II) • Data-movement will be our killer problem • Can easily generate > 1 TB per day in the RCs • Especially if we don’t work to reduce the event size • Can we expect to import 2 TB/day to the T0? • Yes, for a day or two, but for several months? • Not without negotiating with Castorand the network “Authorities” DPS May 03, CMS CPTweek

  14. Timelines • Start ramping up standalone production in the next ~2 weeks • Exercise the tools & users • Bring sites up to speed a few at a time • Start deploying SRB and testing it on a larger scale • Production-wide by ~end of June • Start testing LCG-<n> production in June • Use CMS/LCG-0 as available • Expect to be ready for PCP in July • Partitioning of HW resources will depend on state of LCG and the eventual workload DPS May 03, CMS CPTweek

  15. PCP strategy • PCP cannot be allowed to fail† (no DC04) • Hybrid, GRID and non-GRID operation • Minimum baseline strategy is to be able to run on dedicated, fully controllable resources without the need of grid tools (local productions) • We plan to use LCG and other Grids wherever they can operate with reasonable efficiency • Jobs will run in a limited-sandbox • input data local to the job • local XML POOL catalogue (prepared by the prod. tools) • output data/metadata and job monitoring data produced locally and moved to the site manager asynchronously • synchronous components optionally update central catalogs. If they fail the job will continue and the catalogs are updated asynchronously • reduce dependencies on external environment and improve robustness †(Conversely, DC04 can be allowed to fail. It is the Milestone) DPS May 03, CMS CPTweek

  16. Limited-sandbox environment Worker Node User’s Site File transfers, if needed, are managed by external tools (EDG-JSS, additional DAG nodes, etc...) Job input Job input Job Wrapper (job instru- mentation) User Job Job output Job output Journal writer Journal Journal Remote updater Asynchronous updater Metadata DB DPS May 03, CMS CPTweek

  17. Grid vs. Local Productions • OCTOPUS runs only on the User Interface • No CMS know-how is needed at CE/SE sites • UI installation doesn’t require full middleware • Little Grid know-how is needed on UI • CMS programs (ORCA, OSCAR…) pre-installed on CE’s (RPM, DAR+PACMAN) • A site that does “local” productions is the combination of a CE/SE and a UI that submits only to that CE/SE • Switch between grid and non-grid production only reconfiguring the production software on the UI • OCTOPUS: “Overtly Contrived Toolkit Of Previously Unrelated Stuff” • Soon to have a CVS software repository too • Release McRunjob, DAR, BOSS, RefDB, BODE, RMT, CMSprod as a coherent whole DPS May 03, CMS CPTweek

  18. Computer farm DAG job job job job Hybrid production model User’s Site (or grid UI) Resources Production Manager defines assignments RefDB Phys.Group asks for an official dataset shell scripts Local Batch Manager JDL EDG Scheduler Site Manager starts an assignment CMS/ LCG-0 MCRunJob DAGMan (MOP) User starts a private production Chimera VDL DPS May 03, CMS CPTweek Virtual Data Catalogue Planner

  19. A Port in every GRID • MCRunJob is compatible with opportunistic use of almost any GRID environment • We can run purely on top of VDT (USCMS IGT tests, Fall 03) • We can run in an EDG environment (EDG Stress Tests) • We can run in CONDOR pools • Griphyn/iVDGL have used CMS production as a test case • (We think this is a common approach of all experiments) • Within reason if a country/center wants to experiment with different components above the base GRID we can accommodate that • We think it even makes sense for centers to explore the tools available! DPS May 03, CMS CPTweek

  20. Other Grids • The US already deploys a VDT based GRID environment that has been very useful for our productions(probably not just a US concern) • They will want to explore new VDT versions on timescale that may be different to LCG • For good reason, they may have different higher level middleware • This makes sense to allow/encourage • For the PCP the goal is to get the work done any way we can • If a region wants to work with LCG1 subsets or extensions so be it • (Certainly in the US case there will also be pure LCG functionality) • There is not “one and one only GRID” • collaborative exploration makes sense • But for DC04, we expect to validate pure LCG<n> DPS May 03, CMS CPTweek

  21. CMS Development Environment • CMS/LCG-0 is a CMS-wide testbed based on the LCG pilot distribution, owned by CMS • Before LCG-1 (ready in July): • gain experience with existing tools before start of PCP • Feedback to GDA • develop production&analysis tools to be deployed on LCG-1 • test new tools of potential interest to CMS • Common environment for all CMS productions • Use also as a base configuration for non-grid productions DPS May 03, CMS CPTweek

  22. dCache • What is dCache? • dCache is a disk caching system developed at DESY as a front end for Mass Storage Systems • It now has significant developer support from FNAL and is used in several running experiments • We are using it as a way to utilize disk space on the worker nodes and efficiently supply data in intense applications like simulation with pile-up. • Applications access the data in d-cache space over a POSIX compliant interface. The d-cache directory (/pnfs) from the user perspective looks like any other cross mounted file system • Since this was designed as a front-end to MSS, once closed, files cannot be appended • Very promising set of features for load balancing and error recovery • dCache can replicate data between servers if the load is too high • if a server fails, dCache can create a new pool and the application can wait until data is available.

  23. High Luminosity Pile-Up • This has been a bugbear of all our productions. It needs specially configured data serving (Not for every center) • Tried to make a realistic test. In the interest of time we generated a fairly small minimum bias dataset. • Created 10k cmsim minimum bias events writing the fz files directly into d-cache. • Hit Formatted all events (4 times). • Experimented with writing events to d-cache and local disk • Placed all ROOT-IO files for Hits, THits, MCInfo, etc. into d-cache and the META data in local disk. • Used soft links from the local disk to /pnfs d-cache space • Digi Files are written into local disk space • Minimum bias Hit files are distributed across the cluster

  24. writeAllDigis Case • Pile-up is the most complicated case • Pile-up events are stored across the pools • Many applications can be running in parallel each writing to their own metadata but reading the same minimum bias ROOT-IO files Pool Node Pool Node Pool Node Pool Node PNFS Worker Node Pool Node libpdcap.so Pool Node writeAllDigis Pool Node Pool Node Dcache Server Local Disk Pool Node

  25. dCache Request • dCache successfully made use of disk space from the computing elements, efficiently reading and writing events for CMSIM • No drawbacks were observed • Writing into d-Cache with writeHits jobs worked properly • Performance is reduced for writing metadata, reading fz files worked well • The lack of ability to append files makes storing metadata inconvenient • Moving closed files into d-cache and soft linking works fine • High Luminosity pile-up worked very well reading events directly from d-cache • Data Rate is excellent • System stability is good • Tests were quite successful and generally quite promising • We think this could be useful at many centers (But we are not requiring this) • but it looks like it will be required at least at the centers where we do Digitization (Typically at some Tier -1s) • But can probably be setup “in a corner” if it is not part of the center model

  26. Criteria for LCG1 Evaluation • We can achieve a reasonable fraction of the PCP on LCG resources • We continue to see a tendency to reducing the (CMS) expertise required at each site • We see the ability to handle the DC04 scales • 50-100k files • 200TB • 1 MSI2k • 10-20 “users” in PCP, 50-100 in DC04 (From any country, using any resource) • Final criteria for DC04/LCG success will be set in September/October DPS May 03, CMS CPTweek

  27. Summary • Deploying LCG-1 will is clearly a major task • expansion must be clearly controlled; expand at constant efficiency • CMS needs LCG-1 for DC04 itself • and to prepare for DC04 during Q3Q4 of this year. • The pre-challenge production will use LCG-1 as much as feasible • help burn-in LCG-1 • PCP can run 'anywhere', not critically dependent on LCG-1 • CMS-specific extensions to LCG-1 environment will be minimal and non-intrusive to LCG-1 itself • SRB CMS-wide for the PCP • dCache where we want to digitize with pileup DPS May 03, CMS CPTweek

More Related