1 / 18

Analysis Facility Integration Project - Towards Efficient Data Transfers and Monitoring

This project aims to streamline data transfers, implement the PhEDEx User Tools, CRAB for job tasks, and provide monitoring services for analysis jobs. Resource allocation and software installations are detailed, focusing on CMS Tier 2 and 3 requirements.

blaramie
Download Presentation

Analysis Facility Integration Project - Towards Efficient Data Transfers and Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMS T2_FR_CCIN2P3Towards the Analysis Facility (AF) JP CMS-France May 27-28, 2009 Strasbourg Available Resources Data Transfers – PhEDEx User Tools - CRAB Users Jobs Monitoring Conclusions Tibor Kurča Institut de Physique Nucléaire de Lyon T.Kurca JP CMS-France

  2. CMS Distributed Analysis • To be run at T2/T3 or locally  T2/T3 local resources needed • CMS software  CMSSW pre-installed on the sites • Grid Analysis is Data Driven  physics groups data allocation • Data Distribution via PhEDEx  specifica for sites with T1&T2 • User tools to run analysis jobs  CRAB • Monitoring of jobs related activities  tracked by Dashboard (central monitoring service) T.Kurca JP CMS-France

  3. T2/T32009 Pledged Resources T2 T3 CPU845k SI2k 562k SI2k ~500 jobs ~340 jobs Disk space dCache 171 TB 114 TB physics groups 4 x 30 TB (EWK 38 TB) /sps 25+8 TB (50% usage) xrootd25 TB (24% usage) T.Kurca JP CMS-France

  4. CMS Data Access T0 T1, T2, T3 HPSS … HPSS DMZ IFZ rfcp 16TB 33 TB dCache Transf In/Out 1 TB 150 TB 84 TB Semipermanent /sps (gpfs) 38TB Prod pool Data pool Imp T0 pool Analysis pool 25 TB srmcp dcap gsidcap dcap xrootd dcap cp Prod-Merging Jobs Production Jobs Analysis Jobs T.Kurca JP CMS-France

  5. CMSSW Installations • Centralized from T0 • By a high-priority grid jobs • Release versionpublishedon a site information system • Deprecated releases removed • Localy:Possibility of additional installations for the needs of local users • Two partitions: • 1./afs/in2p3.fr/grid/toolkit/cms2 = $VO_CMS_SW_DIR • ccali38:tcsh[210] fs lq • Volume Name Quota Used %Used Partition • grid.kit.cms2 60000000 39153734 65% 60% • - in the past problems with space & removal of old releases •  additional 30 GB + central regular removal • 2./afs/in2p3.fr/grid/toolkit/cms • ccali05:tcsh[214] fs lq • Volume Name Quota Used %Used Partition • grid.kit.cms 46000000 31918855 69% 49% T.Kurca JP CMS-France

  6. T2 & Physics Groups CCIN2P3: EWK, QCD, Tau/Pflow, Tracker IPHC: Top, b-tag GRIF: Higgs, Exotica, Egamma T.Kurca JP CMS-France

  7. CMS Tier 2 vs Tier 1 • T2_FR_CCIN2P3 is specific • Usually diffrent sites for different Tiers • - exceptions : CERN (T0, T1, T2) , FNAL (T1, T3) and CCIN2P3 (T1, T2 ) • CE …. ok • - SE, PhEDEx node : some complications to be solved • What can we learn from CERN/FNAL ? Tier 3 IPNL Tier 2 GRIF Tier 1 CC IN2P3 Tier 2 CC AF Tier 2 IPHC T.Kurca JP CMS-France

  8. CERN-FNAL Comparison CERN FNAL PhEDEx nodes: different different SE: reallydifferent different (only alias) srm-cms.cern.ch (T1) cmssrm.fnal.gov (T1) caf.cern.ch(T2) cmsdca2.fnal.gov(T3) dCache: the same for T1 & T2 the same for T1 & T3 Disk pools : different the same  needed special download agents T.Kurca JP CMS-France

  9. Data Transfers CERN : - T2 subscription – if data already at T1 then no actual PhEDEx transfer again …. just stageing to the right disk - developed dedicated T1CAF local download agents to ensure replication to the correct service class and to register download data in the local CAF DBS - using space tokens to separate T1T1_CH_CERN from T1T2 transfers FNAL : -T1 subscription doesn’t mean automatically also data at T3 - T1 data are fully accessible via CRAB to T3 users (no blacklisting) - user data are subscribed to T3 – track kept by the T3-manager as the dcache is the same for T1/T3 T3 data will be migrated to tape, but PhEDEx doesn’t know about it - caveat : don’t subscribe the same data to T1 & T3 T.Kurca JP CMS-France

  10. T2_FR_CCIN2P3 Before • Site configuration : • CE - different for T1 & T2 • SE - one for both T1&T • PhEDEx – only T1 node • Access to data in T1 for users of T2 • - data stored at T1 only • - non productions jobs to be run at T2 • Jobs: temporaryhack from CRAB_2_4_4(Jan 23,2009) •  users jobs can access T1_CCIN2P3 data without show-prod = 1 option • …. all T1 are masked in DLS by default except CCIN2P3 •  at the end transparent for the user T.Kurca JP CMS-France

  11. T2_FR_CCIN2P3 Now dCache: the same for T1 & T2 Disk pools : only for T1 ?  create specific for T2 ? … for the moment one pool PhEDEx nodes: T1_FR_CCIN2P3_Buffer,T1_FR_CCIN2P3_MSS  created & installed T2 node T2_FR_CCIN2P3 as disk only VOBox …. cclcgcms06 SE: ccsrm.in2p3.fr (T1) - ccsrm.in2p3.fr (T2)  created T2 specific ccsrmt2.in2pp3.fr …. alias Main Goals: - avoid transferring the same data 2x - avoid T1T2 intra CCIN2P3 transfers - avoid hacks on different levels  should be solved at PhEDEx level with different T2_FR_CCIN2P3 node & correct config T.Kurca JP CMS-France

  12. CRAB CMS Remote Analysis Builder • transparent access to distributed data & computing resources https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrab • intended to simplify the process of creation & submisson of CMS analysis jobs to grid • implemented in Python as a batch-like command line application crab –c crab.cfg -create (-submit , -status, -getoutput, -resubmit ….) • CRAB standalone : direct submission from UI via WMS - simple, but lacks some important features, suitable for small tasks (~100 jobs) limits the size of the sandbox • Client-Server architecture :  CRABServer - automating as much as possible the whole analysis workflow : (re)submission, error handling, output retrieval - improving the scalability of the system - transparent to end users: interface, installation, configuration procedure and usage the same as in standalone mode  possibility of submission to local batch system ! …. For BQS needed to write BossLite plugin T.Kurca JP CMS-France

  13. CRAB Architecture Courtesy G. Codispoti T.Kurca JP CMS-France

  14. CRAB Installations CRAB client 2_5_1 https://twiki.cern.ch/twiki/bin/view/CMS/CrabClientRelNotes251 installed on afs : $VO_CMS_SW_DIR/CRAB  no need for private installations ! CRAB Server1_0_6 https://twiki.cern.ch/twiki/bin/view/CMS/CrabServer#CRABSERV • https://twiki.cern.ch/twiki/bin/view/CMS/CrabServer_RelNotes_106  installed from a scratch on the new hardware node ccgridli03.in2p3.fr : double powering Intel Xeon 2.50 GHz (E5420) 16 GB RAM 250 GB disk RAID (redundancy) SATA - monitoringhttp://ccgridli03.in2p3.fr:8888/ T.Kurca JP CMS-France

  15. CRAB Environment Setup your environment 1) Grid UI : lcg_env 2) CMSSW environment: cms_def alias for source $VO_CMS_SW_DIR/cmsset_default.(c)sh cms_sw alias for eval `scramv1 runtime -(c)sh` 3) CRAB environment : crabX alias for source $VO_CMS_SW_DIR/CRAB/crab.(c)sh OR if working in the existing directory  simply do « cms_env » an alias for : source $VO_CMS_SW_DIR/cmsenv.(c)sh cms_env T.Kurca JP CMS-France

  16. CRAB Data Stageout • CRAB Server usage: crab.cfg • [CRAB] • scheduler=glite • jobtype=cmssw • server_name = in2p3 • W/o CMS Storage Name Convention: • [USER] • copy_data = 1 • storage_element = ccsrmt2.in2p3.fr • user_remote_dir = /test • storage_path = /srm/managerv2?SFN=/pnfs/in2p3.fr/data/cms/data/store/user/kurca • With CMS Storage Name Convention: • [USER] • copy_data = 1 • storage_element = T2_FR_CCIN2P3 • user_remote_dir = /test •  data will be written to /pnfs/in2p3.fr/data/cms/data/store/user/kurca/test • …. the same as in the w/o case ! T.Kurca JP CMS-France

  17. Jobs Monitoring • CRAB Server : http://ccgridli03.in2p3.fr:8888/ Service Description Tasks Tasks entities data in this CrabServer Jobs Jobs entities data in this CrabServer Component Monitor Component and Sevice status User Monitoring User task and job log information • CMS Dashboard:http://arda-dashboard.cern.ch/cms/ - link to job exit codes - Task monitoring for the analysis users - Site availability based on the SAM tests - Site status board • Comments: crab status behind that of Dashboard inconsistencies possible  space for improvements T.Kurca JP CMS-France

  18. Conclusions • T2_FR_CCIN2P3 - operationel long time , strong contribution to CMS computing - not fully separated from T1 (few hacks needed)  separate PhEDEx node installed, testing/debugging phase  « new » SE ccsrmt2.in2p3.fr declared & published (alias only) • User Tools Available: - CRAB client 2_5_1 installed - CRAB server 1_0_6 - Monitoring via Dashboard & CRAB server • ‘Base de Connaisance’ CC-IN2P3 you can find a collection of different information localy + cms related http://cc.in2p3.fr/cc_accueil.php3?lang=fr into empty field ‘Rechercher’ type your word e.g. ‘crab’ - not complete yet, feedback, suggestions welcome • Plans ?: - to have fully transparent tools for local (nongrid) and grid analysis  develop BossLite plugin for CRAB enabling direct submission to BQS the same jobs submitted locally, w/o additional grid layer T.Kurca JP CMS-France

More Related