80 likes | 222 Views
The Data Carousel. what problem it’s trying to solve the data carousel and the grand challenge the bits and pieces: how it all works what’s ready now; what’s left to do. the problem we’re facing. PHENIX program heavy in “ensemble” physics
E N D
The Data Carousel what problem it’s trying to solve the data carousel and the grand challenge the bits and pieces: how it all works what’s ready now; what’s left to do
the problem we’re facing • PHENIX program heavy in “ensemble” physics • typical day (or week) at the office: get lots of events, make foreground and background distributions, compare, improve code, repeat until published • needs to move lots of data very efficiently • needs to be comprehensible to PHENIX physicists • people are accustomed to “staging files” • needs to work with the CAS analysis architecture • lots of linux boxes with 30 GB disk on each • main NFS server with 3 TB disk • solution: optimized batch file mover • similar to Fermilab data “freight train” • works with existing tools: HPSS, ssh, pftp, perl
the carousel and the grand challenge • complementary tools for accessing event data • works at lower level of abstraction than GC • files, not objects • can work with non-event data files • important since it doesn’t take much to clog access to tapes • 11 MB/sec/drive in principle; 6 MB/sec/drive in practice • best case: Eagles take ~20 seconds to load, seek: read ~100 MB files at random and you’ll see no better than 50% bandwidth • MDC1,2 naive ftp only saw ~1 MB/sec effective bandwidth for reads • already works with disjoint staging areas • can, in principle, work over the WAN • doesn’t reorganize data, doesn’t provide event iterator, isn’t coupled to analysis code • good or bad, depends on what you’re expecting
the bits and pieces • split-brain server • part which knows HPSS, part which knows PHENIX • HPSS batch queue (Jae Kerr, IBM) • optimizes tape mounts for a given set of file requests • once file is staged to cache, used NFS write to non-cache disk • modified to use pftp call-back (Tom Throwe, J.K.) • carousel server (J. Lauret, SUNYSB) • feeds sets of files to batch queue at measured pace • knows about groups, does group-level accounting • implements file retrieval policy • maintains all state info in external database • client side scripts • implements file deletion policy (defaults to LRU cache) • client side requirements are kept ALARA • ssh + .shosts, perl + few modules, pftp
carousel architecture HPSS tape data mover “ORNL” software carousel server mySQL database HPSS cache filelist client rmine0x pftp pftp CAS NFS disk CAS local disk
accounting tables • group-level accounting information provides possibility of tailoring access to HPSS resources
current state and future directions • works (has basically worked since MDC2) • two main sources for code • http://nucwww.chem.sunysb.edu/pad/offline/carousel/ • PHENIX CVS repository • there remains one PHENIX-ism to be exorcised • HPSS batch queue is currently hardwired to suid to “phnxreco” • instead could select uid, gid based on COS • lots of future improvements are possible • have worked to make system “good enough” to use • could use more sophisticated server/client communication • check for available space before staging file to HPSS cache