1 / 7

Summary of Bookkeeping discussions at RAL Workshop

This summary provides an overview of the bookkeeping discussions held at the RAL Workshop. The discussions covered topics such as dataset management, technical decisions, planning, and file size considerations. The recommendations include developing a general framework for dataset management and maintaining reasonably large file sizes for efficient analysis job access and archiving.

dforeman
Download Presentation

Summary of Bookkeeping discussions at RAL Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary of Bookkeeping discussions at RAL Workshop Tim Adye Rutherford Appleton Laboratory Kanga Phone Meeting 22nd January 2003 Tim Adye

  2. RAL Workshop • Two half-day parallel sessions • Monday afternoon: presentations from Adil, Jean-Yves, Andy, Alessandra, Gregory, Alessio, and myself • Tuesday afternoon: discussion • Joined by the other parallel (event store) at the end • See presentations here http://www.slac.stanford.edu/BFROOT/www/Computing/Distributed/workshops/Jan2003/ • I summarise the Tuesday discussion session • Andy took the minutes, so these notes are just my own memory/interpretation • Andy should send out notes tomorrow Tim Adye

  3. CMWG2 recommendations • Many CMWG2 recommendations. One was that we develop a general framework for dataset management • Persuasively presented by Gregory • Generic enough to be of interest to other experiments? • We should try to work with others (and recruit effort!), but BaBar should lead (due to our shorter timescales) • Hopefully this can be built “on top of” SkimTools. Tim Adye

  4. Technical decisions • Will start new SkimTools package, borrowing code from the old. • Decided to support only Oracle and MySql, but encourage people to maintain ODBC compliance wherever possible. • Stick to Perl wherever possible. Tim Adye

  5. Planning Decisions • Identified 3.5 FTE • ~0.5x7 FTE: Alessandra, Douglas, Jacek, Antonio, Martino, Paul Jackson, Tim • Two stage plan (can go in parallel): • (Stage 0: immediately-required changes  existing SkimTools) • Stage 1: new SkimTools to handle immediate requirements of new model and user requests • Come up with use-cases in each area: • Alessandra: skimData • Tim: Data distribution • … • Stage 2: CMWG2’s dataset management framework Tim Adye

  6. File size considerations (1) • It would be very useful to try to maintain reasonably large file sizes • More efficient for analysis job access • Simpler for archiving • Archiving: mass-store systems (HPSS etc) have problems with • too many files: catalogue problems • too small files: overhead per GB is larger Tim Adye

  7. File size considerations (2) • Figure of merit ~200 MB • If many files smaller than this, then we would need to start blobbing files together (eg. with tar) for HPSS • This is not trivial to manage • Should be able to merge runs for SP and skims • Most OPR output files should be > 200MB • Teela agreed to make a ballpark estimate to check this • Hope to hold off implementing mass-store blobbing until needed • System must allow for the possibility of introducing it later Tim Adye

More Related