30 likes | 280 Views
< Transient> Datasets Deletion and Tasks Obsoletion Procedures ADC weekly meeting 18 March 2014. Alexei Klimentov Brookhaven National Laboratory. Introduction. General procedure for tasks obsoletion Task is obsoleted by task submitter
E N D
<Transient> Datasets Deletionand Tasks Obsoletion ProceduresADC weekly meeting18 March 2014 Alexei Klimentov Brookhaven National Laboratory
Introduction • General procedure for tasks obsoletion • Task is obsoleted by task submitter • obsoleted tasks are checked twice per week and corresponding datasets are marked for deletion • Initial scenario • 1 week grace period before dataset is deleted isn’t respected anymore • HLT tasks • 3 months lifetime. Weekly tasks obsoletionwith 1 week grace period • Group production • Transient datasets deletion. Every 2 weeks, no grace period (with exception list from GPC) • Sporadically. Semi-automatic. Datasets patterns are defined by Group Production Coordination and kept in SVN repository. 1 week grace period • Reprocessing • After Reprocessing Coordinator confirmation that period (or campaign) hasfinished • Final datasets produced and validated • Dataset pattern(s) is provided • MC transient datasets deletion (unmerged HITS and unmerged AOD) • 2009 procedure (BPK scripts to update postproduction task field, followed by tasks obsoletion or/and datasets deletion) • 2013 fall : automatic transient datasets deletion (replacement of BPK scripts) • Use cases : mc%AOD. mc%HITS, AF • MC Production team defines Project name(s) • Several “dry runs” in Jun, Oct, Nov to validate the new procedure • In production since Dec 2013. Monthly check. • Steady concern and uncertainty : task request table has ‘container like dataset name’ as INPUTDATASET task parameter, though in many cases (~probably all) TID dataset is used. Alexei Klimentov
March 4-8 MC Tasks Obsoletion • Tasks ID range : 1.350.000 – 1.435.000 • 12708 tasks • 15% tasks with project mc12_14TeV • New project since Feb check • 2.7 PB deleted (according to TK) • “too much deletion” : 48 tasks (identified by RW) • The earliest input dataset task on Dec 22 • The latest input dataset task on Jan 3 • Actions : Revisit the procedure • Add extra check for AF tasks • Add extra check for ALL tasks • Parent task ID check • Thanks to Rod, Sasha, Wolfgang for suggestions/discussion • Recovering : • Rerun fast simulation tasks • Interactive cloning of tasks from the above list • Script to reinsert database records • April deletion : dry run first Alexei Klimentov