70 likes | 82 Views
This summary provides an overview of the bookkeeping discussions held at the RAL Workshop. The discussions covered topics such as dataset management, technical decisions, planning, and file size considerations. The recommendations include developing a general framework for dataset management and maintaining reasonably large file sizes for efficient analysis job access and archiving.
E N D
Summary of Bookkeeping discussions at RAL Workshop Tim Adye Rutherford Appleton Laboratory Kanga Phone Meeting 22nd January 2003 Tim Adye
RAL Workshop • Two half-day parallel sessions • Monday afternoon: presentations from Adil, Jean-Yves, Andy, Alessandra, Gregory, Alessio, and myself • Tuesday afternoon: discussion • Joined by the other parallel (event store) at the end • See presentations here http://www.slac.stanford.edu/BFROOT/www/Computing/Distributed/workshops/Jan2003/ • I summarise the Tuesday discussion session • Andy took the minutes, so these notes are just my own memory/interpretation • Andy should send out notes tomorrow Tim Adye
CMWG2 recommendations • Many CMWG2 recommendations. One was that we develop a general framework for dataset management • Persuasively presented by Gregory • Generic enough to be of interest to other experiments? • We should try to work with others (and recruit effort!), but BaBar should lead (due to our shorter timescales) • Hopefully this can be built “on top of” SkimTools. Tim Adye
Technical decisions • Will start new SkimTools package, borrowing code from the old. • Decided to support only Oracle and MySql, but encourage people to maintain ODBC compliance wherever possible. • Stick to Perl wherever possible. Tim Adye
Planning Decisions • Identified 3.5 FTE • ~0.5x7 FTE: Alessandra, Douglas, Jacek, Antonio, Martino, Paul Jackson, Tim • Two stage plan (can go in parallel): • (Stage 0: immediately-required changes existing SkimTools) • Stage 1: new SkimTools to handle immediate requirements of new model and user requests • Come up with use-cases in each area: • Alessandra: skimData • Tim: Data distribution • … • Stage 2: CMWG2’s dataset management framework Tim Adye
File size considerations (1) • It would be very useful to try to maintain reasonably large file sizes • More efficient for analysis job access • Simpler for archiving • Archiving: mass-store systems (HPSS etc) have problems with • too many files: catalogue problems • too small files: overhead per GB is larger Tim Adye
File size considerations (2) • Figure of merit ~200 MB • If many files smaller than this, then we would need to start blobbing files together (eg. with tar) for HPSS • This is not trivial to manage • Should be able to merge runs for SP and skims • Most OPR output files should be > 200MB • Teela agreed to make a ballpark estimate to check this • Hope to hold off implementing mass-store blobbing until needed • System must allow for the possibility of introducing it later Tim Adye