130 likes | 162 Views
Analysis Trains. Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week, 04.10.12. LEGO Trains. 42 trains configured (37 active) 5 CF, 4 GA, 1 PP, 8 JE, 5 DQ, 11 HF, 8 LF Submitted trains this year 213 CF, 35 DQ, 24 GA, 124 HF, 173 JE, 114 LF, 3 PP
E N D
Analysis Trains Costin Grigoras Jan Fiete Grosse-Oetringhaus ALICE Offline Week, 04.10.12
LEGO Trains • 42 trains configured (37 active) • 5 CF, 4 GA, 1 PP, 8 JE, 5 DQ, 11 HF, 8 LF • Submitted trains this year • 213 CF, 35 DQ, 24 GA, 124 HF, 173 JE, 114 LF, 3 PP • 1-5 train operators / train • Operator mailing listalice-analysis-train-operators@cern.ch • TWiki pagehttps://twiki.cern.ch/twiki/bin/ viewauth/ALICE/AnalysisTrains since 01.02.12 on average 2400 jobs at any given time Analysis Trains - Jan Fiete Grosse-Oetringhaus
Running Statistics alidaq aliprod alitrain SUM Analysis Trains - Jan Fiete Grosse-Oetringhaus
Time until trains finish • Time between train submission and submission of final merging job • Average below 2 days (good!) but quite some spread per Train Average per month Analysis Trains - Jan Fiete Grosse-Oetringhaus
AliEn Upgrade • The upgrade this Monday of parts to v2-20 had a few side-effects • General interruption from 10.00 to midnight; during this period Costin & Pablo were continuously working on fixing the situation • Jobs (in particular) merging that got submitted during that time failed, and needed to be retried later Mistake, LPM should have been disabled for the upgrade • New status FAILED which is not considered as a final state lead to some delay for merging jobs, fixed today (parallel failure of CERN EOS makes submission very slow) • Bug in SE selection, some jobs go to FAILED being fixed by Pablo at present • I propose that planned upgrades are evaluated in particular with respect to the analysis trains and a plan is made how to recover failures from/during the period Analysis Trains - Jan Fiete Grosse-Oetringhaus
Planned Improvements Analysis Trains - Jan Fiete Grosse-Oetringhaus
Improve Merging • Merging • Dedicated CE/SE for merging (at CERN) being investigated • Merging job submission to be speeded up (at the moment dependent on number of waiting analysis jobs) • Job Splitting • Investigate new AliEn option to select the input files once the job has started increases number of files per job (less merging, more files for event mixing) Analysis Trains - Jan Fiete Grosse-Oetringhaus
Train Statistics • Add consumed CPU and wall time for total and per job in run view 2.2y CPU total 3.2y Wall total 3.2h CPU / job 4.2h wall / job 4.7 files / job Analysis Trains - Jan Fiete Grosse-Oetringhaus
Dataset Selection • Allow users on the interface to indicate on which dataset they would like to run • Operator marks dataset as "active" (similar to wagons) • User selects the desired datasets among those Desired datasets LHC10h_AOD086 LHC11h_AOD095 … Analysis Trains - Jan Fiete Grosse-Oetringhaus
Merging Test • Test also the merging per wagon Merging test OK Failed Analysis Trains - Jan Fiete Grosse-Oetringhaus
Further Ideas • Number of wagons • Enabling/disabling by lists (of wagon numbers / names?) • Saving / loading of train configurations • Groups of wagons • Ordering of wagons Analysis Trains - Jan Fiete Grosse-Oetringhaus
Demo …some new features… Analysis Trains - Jan Fiete Grosse-Oetringhaus
Summary • The LEGO train system got very popular • The average finishing time of a train is 2 days, but with quite some spread • We have lots of improvements requests and ideas • We have a lack of manpower (there is only Costin and me, both with many other tasks, too) which leads sometimes to large response times Analysis Trains - Jan Fiete Grosse-Oetringhaus