240 likes | 261 Views
CMS Data Challenge 2004 Claudio Grandi CMS Grid Coordinator. EGEE Cork Conference, April 19 th 2004. www.eu-egee.org. EGEE is a project funded by the European Union under contract IST-2003-508833. Contents. Definition of DC04 Pre-Challenge Production (PCP04) PCP on grid
E N D
CMS Data Challenge 2004Claudio GrandiCMS Grid Coordinator EGEE Cork Conference, April 19th 2004 www.eu-egee.org EGEE is a project funded by the European Union under contract IST-2003-508833
Contents • Definition of DC04 • Pre-Challenge Production (PCP04) • PCP on grid • Description of DC04 setup • RLS • Preliminary results • Appendix: DC04 setup schemas (for the really interested people) EGEE Cork Conference, April 19th, 2004 - 2
Definition of DC04 Aim of DC04: • reach a sustained 25Hz reconstruction rate in the Tier-0 farm (25% of the target conditions for LHC startup) • register data and metadata to a catalogue • transfer the reconstructed data to all Tier-1 centers • analyze the reconstructed data at the Tier-1’s as they arrive • publicize to the community the data produced at Tier-1’s • monitor and archive of performance criteria of the ensemble of activities for debugging and post-mortem analysis Not a CPU challenge, but a full chain demonstration! Pre-challenge production in 2003/04 • 70M Monte Carlo events (20M with Geant-4) produced • Classic and grid (CMS/LCG-0, LCG-1, Grid3) productions • Digitization still going-on “in background” EGEE Cork Conference, April 19th, 2004 - 3
Dataset metadata RLS Computer farm JDL Grid (LCG) Scheduler LCG-0/1 DAG Grid3 DAGMan (MOP) Job metadata job Push data or info Chimera VDL Virtual Data Catalogue job job Planner Pull info job Pre-Challenge Production setup Phys.Group asks for a new dataset Production Manager defines assignments RefDB shell scripts Data-level query Local Batch Manager BOSS DB Job level query McRunjob + plug-in CMSProd Site Manager starts an assignment EGEE Cork Conference, April 19th, 2004 - 4
Statistics for PCP 750K jobs 3500 KSI2000 months 700K files 80 TB of data Simulation Digitization Start of DC04 Start of DC04 EGEE Cork Conference, April 19th, 2004 - 5
CMS/LCG-0 Joint project CMS-LCG-EDT Based on LCG pilot distribution Including GLUE, VOMS, GridICE, RLS About 170 CPU’s and 4 TB disk Sites: Bari Bologna Bristol Brunel CERN CNAF Ecole Polytechnique Imperial College ISLAMABAD-NCP Legnaro Milano NCU-Taiwan Padova U.Iowa Push data or info Pull info PCP on grid: CMS-LCG CMS-LCGRegional Center - 0.5 Mevts “heavy” pythia: ~2000 jobs ~8 hours each, ~10 KSI2000 months - 2.1 Mevts cmsim+oscar: ~8500 jobs ~10hours each, ~130 KSI2000 months ~2 TB data Gen+Sim on LCG RefDB Dataset metadata CMS/LCG-0 LCG-1 RLS UI CE SE JDL McRunjob + ImpalaLite RB CE CE SE SE bdII WN BOSS CE Job metadata SE EGEE Cork Conference, April 19th, 2004 - 6
Grid3 US grid projects + US LHC expt.’s Over 2000 CPU’s in 25 sites MOP Dagman and Condor-G for specification and submission Condor-based match-making process selects resources USMOP Regional Center - 7.7 Mevts pythia: ~30000 jobs ~1.5min each, ~0.7 KSI2000 months - 16 Mevts cmsim+oscar: ~65000 jobs ~10hours each, ~1000 KSI2000 months ~12 TB data Still running!!! Remote Site 1 Batch Queue Master Site GridFTP DAGMan Condor-G MCRunJob mop_submitter GridFTP Remote Site N Batch Queue GridFTP PCP on Grid: Grid3 Simulation on Grid3 MOP System EGEE Cork Conference, April 19th, 2004 - 7
Tier-0 Tier-0 data distribution agents EB GDB Tier-2 Tier-2 Tier-2 Physicist Physicist Physicist ORCA RECO Job T2 storage T2 storage T2 storage Tier-1 Tier-1 Tier-1 Tier-1 agent Tier-1 agent Tier-1 agent ORCA Local Job ORCA Local Job ORCA Local Job RefDB IB TMDB MSS MSS MSS T1 storage T1 storage T1 storage fake on-line process ORCA Analysis Job ORCA Analysis Job ORCA Analysis Job ORCA Grid Job ORCA Grid Job ORCA Grid Job POOL RLS catalogue Castor DC04 layout LCG-2 Services EGEE Cork Conference, April 19th, 2004 - 8
Main aspects of DC04 1/2 Maximize reconstruction efficiency • no interactions of Tier-0 jobs with outside components Automatic registration and distribution of data • via a set of loosely coupled agents Support a (reasonable) variety of data transfer tools • SRB (RAL, GridKA, Lyon, with Castor, HPSS and Tivoli SE) • LCG Replica Manager (CNAF, PIC, with SE/Castor) • SRM (FNAL, with d-chache/Enstore) Use a single file catalogue (accessible from Tier-1’s) • RLS used for data and metadata (POOL) by all transfer tools • Test replica at CNAF (via ORACLE multi-master mirroring) • Transfer Management DB (TMDB) used for assigning data to Tier-1’s and for inter-agent communication Failover systems and automatic recovery EGEE Cork Conference, April 19th, 2004 - 9
Main aspects of DC04 2/2 Monitor and archive resource and process information • MonaLisa used on almost all resources • GridICE used on all LCG resources (including WN’s) • LEMON on all IT resources • Ad-hoc monitoring of TMDB information Job submission at Regional Centers left to their choice • Using LCG-2 in Italy and Spain and at most Tier-2’s • Copy of the LCG-2 bdII at CERN includes also CMS-only resources • Submission via a dedicated Resource Broker at CERN • Using the official RLS at CERN. Will use the RLS mirror at CNAF • Using the official LCG-2 VOMS • Software installation via the new LCG tools (CMS Software Manager) User analysis • Prototyping GROSS: based on BOSS, supports user analysis on LCG EGEE Cork Conference, April 19th, 2004 - 10
Reconstruction and analysis: using ORCA DST have links to raw data but may be processed without raw data Event streams operational Persistency through POOL All jobs use local XML catalogues Updates to central RLS catalogue only done for successful jobs: using an external agent for reconstruction jobs at Tier-0 in the job wrapper for user jobs SCRAM re-creates run-time environment on Worker Nodes CMS software and POOL Runs Trigger Digis L1 DiMuon Stream Tracks and Partial Muon Reconstruction Full DST including Tracks, Muons, Cluster, jets EGEE Cork Conference, April 19th, 2004 - 11
Use of POOL-RLS catalogue RLS used as a POOL catalogue • Register files with their POOL metadata • Query metadata to determine where to send files • Register physical location of files on Tier-0 Export Buffers • Use catalogue to replicate files to Tier-1’s • Tools have been developed to synchronize SRB-GMCAT and RLS • Local POOL catalogues at Tier-1’s are optionally populated • Analysis jobs on LCG use the catalogue through the Resource Broker to submit jobs close to the data • Analysis jobs on LCG register their private data Replication via ORACLE multi-master mirroring EGEE Cork Conference, April 19th, 2004 - 12
POOL RLS catalogue Description of RLS usage Local POOL catalogue TMDB Tier-1 Transfer agent SRB GMCAT Replica Manager RM/SRM/SRB EB agents 4. Copy files to Tier-1’s Resource Broker 3. Copy files to export buffers 5. Submit analysis job ORCA Analysis Job Configuration agent 2. Find Tier-1 Location (based on metadata) 6. Process DST and register private data 1. Register Files XML Publication Agent ORACLE mirroring RLS replica LCG Grid Production Job EGEE Cork Conference, April 19th, 2004 - 13
RLS performance ●Time to register the output of a single job (16 files) – left axis ●Load on client machine at the time of registration – right axis 0.4 files/s 25 Hz April 2nd, 18:00 EGEE Cork Conference, April 19th, 2004 - 14
Statistics for DC04 2200 jobs/day (about 500 CPU’s) running at Tier-0 4 MB/s produced and distributed to each Tier-1 0.4 files/s registered to RLS (with POOL metadata) Reconstruction 25 Hz 15 Mevt/week EGEE Cork Conference, April 19th, 2004 - 15
Preliminary results The full chain is demonstrated for limited amount of time Reconstruction, data transfer and analysis may run at 25 Hz When too many files are registered in the system, it slows down below the 25 Hz threshold Identified the main areas for improvement: Reduce number of files (increase <#events>/<#files>) • more efficient use of bandwidth • fixed time to “start-up” dominates command execution times • E.g. Java for EDG commands, or positioning of tape drivers • address scalability of MSS systems • reduce load on databases indexed by files (e.g. POOL cat.) Improve handling of file metadata in catalogues • RLS too slow both inserting and extracting full file records • introduce the concept of “file-set” to support bulk operations Need to manage read-write “objects” to store event metadata • needed to cope with evolving datasets! EGEE Cork Conference, April 19th, 2004 - 16
AppendixDC04 set-up schemas …only if you’re really interested!
reference to push or create read Fake on-line operations RefDB Input Buffer • Pre-condition • empty Digi and Hits COBRA metadata available • RefDB has POOL metadata for Digis • Post-conditions • input buffer filled with Digi files and consistent COBRA metadata • POOL catalogue correctly filled • entry in RefDB specifies new job to be run • entry in Transfer Management DB for digi files (if transferring Digi files) POOL RLS catalogue Digi+Hits COBRA metadata TMDB 4. get POOL fragment 5. register PFN & metadata Digi files 6. insert new “request” 7. insert 3. attachRun 1. get dataset file names 25Hz fake on-line process Dataset priority list (PRS) 2. stage Castor EGEE Cork Conference, April 19th, 2004 - 18
Tier-0 job preparation operations ORCA RECO script & .orcarc Input Buffer XML catalogue • Pre-condition • Empty Reco COBRA metadata file is available and registered in POOL • Post-conditions • XML catalogue to be used by the job is ready • execution script and accessory files are ready • job is submitted to LSF Digi+Hits COBRA metadata 3. McRunJob create 2b. POOL publish Digi files Job preparation agent LSF 4. McRunJob run General Dist. Buffer 2a. POOL cat. read Empty Reco COBRA metadata 1. discover RefDB POOL RLS catalogue EGEE Cork Conference, April 19th, 2004 - 19
Tier-0 reconstruction ORCA RECO script & .orcarc 13. read e-mail LSF Input Buffer Original XML catalogue Summary e-mail • Post-conditions • Reco files are on the General Distribution Buffer and on tape • POOL catalogue correctly updated with Reco files • Reco file entries are inserted in the Transfer Management Database RefDB updater Digi COBRA metadata 1. execute RefDB 2. read catalogue 14. update 8. send e-mail Digi files Checksum file 11. Discover cksm file 3. rfcp (download) TMDB Agent ORCA RECO Job 7. write General Dist. Buffer TMDB Empty Reco COBRA metadata 6. diff 12. insert 5. rfcp (upload) XML fragment 9. Discover XML catalog Reco files XML Publ. Agent POOL RLS catalogue 10. register files & metadata Castor 4. update XML with local copy of Reco COBRA metadata EGEE Cork Conference, April 19th, 2004 - 20
RM/SRM/SRB EB agent RM/SRM/SRB clean-up agent Data distribution @ Tier-0 POOL RLS catalogue 2. get metadata Configuration agent • Pre-conditions: • Digi and reco files are registered in the Transfer Management DB • Post-conditions: • Input and general distribution buffer are cleared of any files already at Tier-1’s • All data files assigned (copied) to Tier-1 as decided by Configuration agent logic • Transfer Management DB and POOL RLS cat. kept up-to-date with file locations 3. Assign file to Tier-1 1. new file discovery Tier-1 Transfer Manag. DB 8. discover 9. update 13. check Clean-up agent 6. add PFN 15. update 12. Delete PFN 4. discover 14. purge 7. update dCache SRM 10. check Input Buffer General Dist. Buffer 5b. copy (write) Digi files 5a. copy (read) SE RM 11. delete Reco files SRB Vault SRB EGEE Cork Conference, April 19th, 2004 - 21
Tier-1 RM data import/export TMDB local POOL catalogue • Pre-conditions: • POOL RLS catalogue is either the CERN one or a local mirror • Transfer Manag. DB at CERN is accessible from Tier-1’s • Post-conditions: • data copied at Tier-1 on MSS and available to Tier-2 • CERN POOL RLS catalogue and local POOL catalogue updated • Transfer Management DB updated 6.FCpublish (if not an RLS mirror) 1. discover Tier-1 agent 5. update MSS 5.update 3. replicate 2.lookup 4ac.lookup & update RM POOL RLS catalogue 4b. copy SRM 4ac.lookup & update GMCAT 4d.add SFN SRB 7. discover 8. Sget LCG SE 9. gridftp SRB2LCG agent 10.add SFN EGEE Cork Conference, April 19th, 2004 - 22
Tier-1 analysis job preparation ORCA RECO script & .orcarc Loal storage or SE • Pre-conditions: • a local POOL catalogue is populated with at least the local files (may be an RLS) • the list of files of a given run is provided by the global POOL catalogue (i.e. RLS) • Post-conditions: • XML catalogue to be used by the job is ready • execution script and accessory files are ready • job is submitted to a local or grid resource manager XML catalogue EmptyReco COBRA metadata 3. McRunJob create 2b. POOL publish Reco files Job preparation agent Local or Grid Resource Manager 4. McRunJob run 2a. POOL cat. read Global (RLS) POOL catalogue 1. discover Local POOL catalogue EGEE Cork Conference, April 19th, 2004 - 23
Tier-1 analysis ORCA RECO script & .orcarc Local storage or SE Original XML catalogue • Post-conditions • Root or ntuple files are on the local storage or on a storage element • RLS updated if on the grid Note: if the Tier-1 uses SRB the local storage may be an SRB vault and the RLS catalogue is replaced by the GMCAT Empty Reco COBRA metadata 2. read catalogue Reco files 3. file download ORCA Analysis Job Resource Manager 1. execute root or ntuple files 6a. file upload 5. attachRun on the local copy of the COBRA metadata 6.b register new files (if on grid) RLS catalogue 4. update XML with local copy files (only if downloaded) EGEE Cork Conference, April 19th, 2004 - 24