120 likes | 249 Views
User Board Input. Glenn Patrick Rutherford Appleton Laboratory. Tier Storage Review 21 November 2008. UB: Castor Migration Path. 21 December 2006. CMS CSA06 worked and full production Castor service expected from Jan 2007. Plan to switch off dCache 30 June 2007 .
E N D
User Board Input Glenn Patrick Rutherford Appleton Laboratory Tier Storage Review 21 November 2008
UB: Castor Migration Path • 21 December 2006. CMS CSA06 worked and full production Castor service expected from Jan 2007. Plan to switch off dCache 30 June 2007. • 20 June 2007. Original schedule unrealistic. Agreed that dCache would not be terminated until at least end 2007 and a minimum of 6 months notice to be given. • 20 June 2007. Separate ATLAS, CMS, LHCb & Gen Castor instances proposed. • 24 June 2008. Migration to be completed by end of 2008. • 21 November 2008. Still on track...New building also looms. Castor Data
UB Total = 2222TB Name Server 1 +vmgr Tape Server Tape Server Tape Server Tape Server Oracle stager Oracle DLF UK Tier 1 – Castor2 Mass Storage Oracle NS+ vmgr Name Server 2 Shared Services Tape Server Tape Server Oracle stager Oracle DLF Oracle stager Oracle DLF Oracle DLF Oracle repack Oracle stager stager DLF stager DLF stager DLF stager DLF repack LSF LSF LSF LSF CMS Stager Instance Atlas Stager Instance LHCb Stager Instance Repack and Small User Stager Instance Diskservers Diskservers Diskservers 1 Diskserver
Background of Shrinking Capacity • Terabyte(1012)/Tebibyte(240) amnesty. ~10% inflation for those experiments which applied. From 2007/Q4. • Disk0Tape1 caches – overhead not included in (some experiment requests). Currently ATLAS - 67.7TB, LHCb – 16.9TB, ALICE – 5.6TB (CMS – n/w buffers?). dCache & Castor • Duplicate resources in dCache and Castor for experiment migration/testing/etc. From 2008/Q1 and before. • 5% Castor storage inefficiency hit taken (plus capacity audit). Experiments get what their data requires!From 5 Nov. 2008. ... and a background of Experiment Uncertainties
... But Still Made it! LHCb Start-up Allocations 2008/Q4 LHC pledges ~met No reserve! No headroom!
Who has what? Not much left over for other experiments! Reminder: For “Other”, GridPP3 proposal only had: T1 Disk 2008 = 18TB T1 Disk 2009 = 31TB T1 Tape 2008 = 180TB T1 Tape 2009 = 310TB T1 CPU = 0 CASTOR ONLY LHCb CMS ATLAS
“Other” Experiments 1 ALICE: No storage resources deployed in Castor for most of 2008 (requirements revised downwards, deprecated due to h/w delays and ATLAS/CMS/LHCb given priority for CCRC08). Batch jobs submitted to UK Tier 1 • Required xrootd (not been highest Castor priority). Finally, in ALICE production Oct. 2008. • Low on manpower. Suffer from no RAL involvement. • Getting back on track for 2009 (5.6TB disk0 to go to ~90TB). • Communications improved (e.g. Cristina). MINOS: Still migrating dCache/NFS data. Also, MC system draws down several hundred flux files from Castor at start of each job – Castor seems to manage load even from multiple jobs (except for some benign errors). Double disk allowance for migration – last overdraft left? Limited manpower.
“Other” Experiments 2 Silicon Detector Design Study:Urgent simulation required for physics benchmarks in Letter of Intent (due April 2009). Enhanced CPU allocation (268KSI2K) in absence of LHC work. 4M events reconstructed out of simulated 8M. Castor server deployed+SRM for staging to SLAC, but took time to deploy (PPD Tier2 helped out in meantime). Need to be able to be flexible for this sort of sudden activity... MICE: Currently setting up. RAL is “Tier 0” for experiment – pseudo real-time beam tuning, data distribution, etc. Castor server deployed and MICE currently setting up. BaBar: The end approaches … long story since ~Sept. 2006! 49.6TB in NFS disk + ADS tape (35TB alloc). UKQCD: Plan to access Tier 1 via SRM. Large bid for Tier 1 tape submitted to HPC call. Need to engage on technical deployment (require 30TB disk, 1.8TB NFS so far). VO enabled, memory requirements?
Farewell (sort of)... Others! For lingering legacy experiments still in dCache, “others” and new small VOs, probably give some minimal deployment in Castor…. shared disk pool.
On the Horizon... T2K: On the horizon for 2009. Some disk already pre-allocated. 16TB out of 20TB. NA48/3: On the horizon for 2009. Some disk pre-allocated. 25TB out of 50TB. SUPERNEMO: Mainly Tier 2 so far. Yet to feature at Tier1 (except perhaps under “other”). No storage allocation. SUPER-B?: Depends on UK proposals. No allocation yet. ~5 TB disk1tape1 growing from mid 2009.
Castor Experiment Planning Once allocations and overall strategy are agreed in UB, it is up to experiments to engage with T1 team over storage classes, space tokens, SRM end-points, etc. User Board Oct 2008 Apr 2007 Apr - Sep 2007 Castor Bumpy Ride Series of weekly meetings evolved to deal with Castor technical issues. Monthly meetings for (mainly) other Tier 1 issues. Success of experiments correlated with how well they engage at these.
Looking Forward • Cant keep everybody happy all of the time. PPRP numbers are assumed for LHC experiments and smaller experiments only get the scraps that are left over. • What will real LHC data look like? Background, increased event sizes, trigger rates, etc. can swallow up resources for the smaller experiments. What are the error bars? Won’t know until later in 2009… • Smaller experiments (including ALICE) limited by manpower. Good communications + technical help (e.g. Janusz, Shaun, Matt, Catalin, Derek…) and documentation (Stephen) vital. • Evolution of regular (weekly/monthly) storage/T1 meetings very successful and efficient. Need to be more pro-active here (make it part of process and condition of deploying allocations). • Improve disk deployment and make more flexible (server quantisation). Easy for me to say of course. • Need to be realistic about effort all this takes (T1+expts)!