170 likes | 337 Views
STEP’09: Last challenge before data taking. Patricia Méndez Lorenzo (CERN, IT/GS) ALICE Offline Week Grid Status & Experience session CERN, 24/06/09. Outlook. CCRC’08: Reminder What is STEP’09: Origin STEP’09 for ATLAS, CMS and LHCb STEP’09 for ALICE Goals and Results
E N D
STEP’09: Last challenge before data taking Patricia Méndez Lorenzo (CERN, IT/GS) ALICE Offline Week GridStatus & Experience session CERN, 24/06/09
Outlook • CCRC’08: Reminder • Whatis STEP’09: Origin • STEP’09 for ATLAS, CMS and LHCb • STEP’09 for ALICE • Goals and Results • Summary and Conclusions STEP'09: last challenge before data taking
Whatis STEP’09: CCRC’08 reminder WLCG Common ComputingReadiness Challenge 2008 (CCRC’08) • It was the first big WLCG Service Challenge whichjoined the 4 experimentstogether • Proposed by CMS and ATLAS during the pre-CHEP WLCG WS in Victoria (2007) • Goal:measurement of the readiness of the Grid services and operationsbefore the real data taking • Complementary to the experimentsFullDressRehearsals • Distributed in two phases: Feb and May 2008 STEP'09: last challenge before data taking
ALICE Resultsduring the CCRC’08 Slidestakenfrom the CCRC08 Post-Mortem WS at CERN (June 2008) STEP'09: last challenge before data taking
Whatis STEP’09: Origin WLCG ScaleTesting for the Experiment Program at the WLCG 2009: STEP’09 • Proposed by CMS during the WLCG pre-CHEP WS in Prague (2009) • Scheduled for June 2009 • Similar scope to CCRC’08 withspecialemphasis to data management (data recording, MSS behaviour and transfers) • STEP’09 post-mortem in July • All experimentspresentedtheir programs during WLCG GDB in April 2009 STEP'09: last challenge before data taking
STEP’09 for CMS STEP'09: last challenge before data taking In commonwith ALICE
STEP’09 for ATLAS STEP'09: last challenge before data taking In commonwith ALICE
STEP’09 for LHCb • Participation in STEP’09 as part of theirspecific Full Experiment Test (FEST’09) • LHCb goals • Data injection into the HLT farm • File size canbetuned • Distribution to T1 sites • Using standard share • Reconstruction at T1 sites • Long enough queues at the sites are needed • Storage Requirements • 3.5 TB/day for RAW (T1D0) at Tier0 • < 1TB/day for RAW at Tier1s STEP'09: last challenge before data taking In commonwith ALICE
STEP’09 for ALICE • Gridactivities • Replication T0->T1 • PlannedtogetherwithCosmics data taking, or • Repeat the exercise of CCRC’08 withsame rates (100MB/s) and same destinations (All T1 sites) • Re-processingwith data recallsfrom tape at T1 • Highlydesirableexercise, data alreadyavailableat the T1 MSS storage • Non-Gridactivities • Transfer rate tests from DAQ@PIT to CASTOR • Validation of the new CASTOR and xrootd for RAW • Criticallydependent on the availability of CASTOR v2.1.8 • Transfer rate test coupledwith the 1st pass reco@T0 STEP'09: last challenge before data taking
ALICE non-Gridactivities • RAW data transfersfrom PIT to CASTOR • Basicallyvalidated • The goal was 1.25GB/sec for one week (justfinished) • DAQ managedto fill the entirealicedisk pool (850TB) • Validation and feedback of the CASTOR v2.1.8 and xrootd • Very positive results • the xrootd copy P2->Diskisbasicallyvalidated • second part isdisk->tape copy (to a recyclabe pool of tapes) with the same speed of 1.25GB/sec (thisisPb+Pb full rate) • Activitystillongoing • Pass 1 reconstruction of RAW data at the T0 • Stillpending STEP'09: last challenge before data taking
ALICE Gridactivities: results ALICE began the STEP09 exercise the 1st of June and finishedit the 18th of June • Production results • New record of 15000 concurrent jobs by the 1st of June STEP'09: last challenge before data taking New MC cycle
Problemsfaced by alice: Production • Instabilitieswith the CREAM-CE system at CERN • The system has facedinstabilities for somedays • Fullyaffecting the production by the 17th of June • Both CREAM-CE services down • This morning the system came back in production • A power cut by the 18th of June • voalice03 (CREAM VOBOX) could not berecovered • In addition the VOBOXES willbe out of warrantyat the end of the year • 4 VOBOXES have been required (2 production, 2 backup) • New site entered production: CESGA (Santiago de Compostela, Spain) • 800 jobs submitted for 29 CPUs • Site wasreporting 0 jobs running/waitingthroughVOview • ALICE has changed the query to the info system based in VOview STEP'09: last challenge before data taking
ALICE FTS transfers • General result: Verysuccessfullexerciseduring the whole STEP09 period • New FTD module in production • During the wholeperiod the 6 T1 sites wereavailablewith few issues alwayssolved in the day • Very good support of the FTS experts during the wholeperiod STEP'09: last challenge before data taking ALICE requirement
Problemsfaced by alice: Transfers • Pre-staging on files: MEETING WITH FIO STILL PENDING • The operationtakesforever • New files have tobecreatedinsteadof pre-stagingthosealreadyexisting • Asked CMS and LHCb for theirownprocedures • CMS has implemented a Phedex utility at the client level for CASTOR sites able to make the pre-staging. • Comparisonsbetweenmethodsusing SRM APIS, Manualpre-staging and also the samePhedex • The staging speed in the 3 cases is comparable and • CMS usedthe STEP09 exercise to define the best way to define the pre-staging • LHCbisusing GFAL libs to make an asyn. pre-staging of the files STEP'09: last challenge before data taking
Problemsfaced by alice: Transfers • Files overwritting: SOLVED • This procedurewouldallow to perform a previousremoval of the alreadytransferred file • ALICEimplementedcorrectly the corresponding option howeverstillfailing • FTS experts involved in the discussion: • the 'overwrite' flag isproperlypassed to the FTS agent, howeverit selects SRMv1 endpointinstead of SRMv2.2 • Whilechecking the details ALICE should chose the qualified SURL to ensure the usage of SRMv2.2. STEP'09: last challenge before data taking
Problemsfaced by alice: Transfers • Issues per site • NDGF using a wrong SURL whiletranferring files: SOLVED • RAL: Permission denied to write in the corresponding SE area (twice): SOLVED • SARA: No spaceavailable (twice): SOLVED • FZK: gridFTP issue. There was a problem of dcache pools beeingfilled up, and also a gpfsproblem of not correctlyreportingspace: SOLVED • This week CERN: Transfersstucked for more than 60h. Stillunder investigation • It seemssome sites do not allow concurrent transfers STEP'09: last challenge before data taking
Summary and conclusions • STEP’09 has been the 2ndmulti-VOexercisebefore the real data taking • Proposed by CMS during the pre-CHEP WS in Prague • ALICE emphasize the testing of the Data Management elements of the computing model • Key elements for the 4 LHC experiments • ALICE results: Very good behaviour in terms of production, MSS@T1 and FTS transfers • The 4 LHC experimentswillpresenttheirresultsduring the new STEP’09 post-mortem WS in July at CERN (9-10 July) STEP'09: last challenge before data taking