PD2P, Caching etc.

PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011

Introduction • Caching at T2 using PD2P and Victor works well • Have 6 months experience (>3 months with all clouds) • Almost zero complaint from users • Few operational headaches • Some cases of disk full, datasets disappearing… • Most issues addressed with incremental improvements like space checking, rebrokering, storage cleanup and consolidation • What I propose today should solve remaining issues • Many positives • No exponential growth in storage use • Better use of Tier 2 sites for analysis • Next step – PD2P for Tier 1 • This is not a choice – but necessity (see Kors’ slides) • We should treat part of Tier 1 storage as dynamic cache Kaushik De

Life Without ESD • New plan – see document and Ueda’s slides • Reduction in storage requirement from 27 PB -> ~10 PB for 2011 for data @ 400 Hz (but could be as much as 13 PB) • Reduction of 2010 data from 13PB to ~6 PB • But we should go farther • We are still planning to fill almost all T1 disks with pre-placed data • 2010+2011+MC = 6 + 10 + 8 = 24 PB = available space • Based on past experience, reality will be tougher, and disk crises will hit us sooner – we should do things differently this time • We must trust caching model Kaushik De

What can we do? • Make some room for dynamic caches • For discussion below, do not count T0 copy • Use DQ2 tags – custodial/primary/secondary – rigorously • Custodial = LHC Data = Tape only (1 copy) • Primary = minimal, disk at T1, so we have room for PD2P caching • LHC Data primary == RAW (1 copy), AOD, DESD, NTUP (2 copies) • MC primary == Evgen, AOD, NTUP (2 copies only) • Secondary = copies made by ProdSys (ESD, HITS, RDO), PD2P (all types except RAW, RDO, HITS) and DaTri only • Lifetimes – required strictly for all secondary copies (i.e. consider secondary == cached == temporary) • Locations – custodial ≠ primary; primary ≠ secondary • Deletions – any secondary copy can be deleted by Victor Kaushik De

Reality Check • Primary copy (according to slide 4) • 2010 data ~ 4 PB • 2011 data ~ 4.5 PB • MC ~ 5 PB • Total primary = 14 PB • Available space for secondaries > ~10 PB at Tier 1’s • Can accommodate additional copies, only if ‘hot’ • Can accommodate some ESD’s (expired gracefully after n months) • Can accommodate large buffers during reprocessing (new release) • Can accommodate better than expected LHC running • Can accommodate new physics driven requests Kaushik De

Who Makes Replicas? • RAW - managed by Santa Claus (no change) • 1 copy to TAPE (custodial), 1 copy DISK (primary) at a different T1 • First pass processed data – by Santa Claus (no change) • Tagged primary/secondary according to slide 4 • Secondary will have lifetime (n months) • Reprocessed data – by PanDA • Tagged primary/secondary according to slide 4, and set lifetime • Additional copies made to a different T1 disk, according to MoU share, automatically based on slide 4 (not by AKTR anymore) • Additional copies at Tier 1’s – only by PD2P and DaTri • Must always set lifetime • Note – only PD2P makes copies to Tier 2’s Kaushik De

Additional Copies by PD2P • Additional copies at Tier 1’s – always tagged secondary • If dataset is ‘hot’ (defined on next slide) • Use MoU share to decide which Tier 1 gets extra copy • Copies at Tier 2’s – always tagged secondary • No changes for first copy – keep current algorithm (brokerage), use age requirement if we run into space shortage (see Graeme’s talk) • If dataset is ‘hot’ (see next slide) make extra copy • Reminder – additional replicas are secondary = temporary by definition, may/will be removed by Victor Kaushik De

What is ‘Hot’? • ‘Hot’ decides when to make secondary replica • Algorithm is based on additive weights • w1 + w2 + w3 + wN… > N (tunable threshold) – make extra copy • w1 – based on number of waiting jobs • nwait/2*nrunning, averaged over all sites • Currently disabled due to DB issues – need to re-enable • Don’t base on number of reuse – did not work well • w2 – inversely based on age • Either Graeme’s table, or continuous, normalized to 1 (newest data) • w3 – inversely based on number of copies • wN – other factors based on experience Kaushik De

Where to Send ‘Hot’ Data? • Tier 1 site selection • Based on MoU share • Exclude site if dataset size > 5% (as proposed by Graeme) • Exclude site if too many active subscriptions • Other tuning based on experience • Tier 2 site selection • Based on brokerage, as currently • Negative weight – based on number of active subscriptions • Other tuning based on experience Kaushik De

What About Broken Subscriptions? • Becoming an issue (see Graeme’s talk) • PD2P already sends datasets within a container to different sites to reduce wait time for users • But what about datasets which take more than few hours? • Simplest solution • ProdSys imposes maximum limit on dataset size • Possible alternative • Cron/PanDA to break up datasets and rebuild container • Difficult but also possible solution • Use _dis datasets in PD2P • Search DQ2 for _dis datasets in brokerage (there will be performance penalty if we use this route) • But this is perhaps the most robust solution? Kaushik De

Data Deletions will be Very Important • Since we are caching everywhere (T1+T2), Victor plays equally important role as PD2P • Asynchronously cleanup all caches • Trigger based on disk fullness threshold • Algorithm based on (age+popularity)&secondary • Also automatic deletion of n-2 – by AKTR/Victor Kaushik De

How Soon Can we Implement? • Before LHC startup! • Big load initially on ADC operations to cleanup 2010 data, and to migrate tokens • Need some testing/tuning of PD2P before LHC starts • So, we need decision on this proposal quickly Kaushik De

PD2P, Caching etc.

PD2P, Caching etc.

Presentation Transcript

Caching

ARP Caching

Practical Caching

Etc. etc. etc.

Changes in PD2P replication strategy

Caching

Caching

web caching

Web Caching

PD2P The DA Perspective

Web Caching

PD2P for Tier 1 Implementation Plan

Caching

Geo caching

Caching Game

Web caching

ETC

SNORING, ETC,ETC,&ETC

Public Caching

PD2P, Caching etc.

PD2P, Caching etc.

Presentation Transcript

Caching

ARP Caching

Practical Caching

Etc. etc. etc.

Changes in PD2P replication strategy

Caching

Caching

web caching

Web Caching

PD2P The DA Perspective

Web Caching

PD2P for Tier 1 Implementation Plan

Caching

Geo caching

Caching Game

Web caching

ETC

SNORING, ETC,ETC,&amp;ETC

Public Caching

SNORING, ETC,ETC,&ETC