1 / 28

New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC

New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC. David Malon, Argonne presenting for the HENP-GC collaboration* ( http:/www-rnc.lbl.gov/GC/ ) 8 Feb 2000 CHEP *slides thanks to L. Bernardo, D. Olson, A. Shoshani, S. Vanyashin. Outline.

melina
Download Presentation

New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Capabilities in the HENP Grand Challenge Storage Access System and its Application at RHIC David Malon, Argonnepresenting for the HENP-GC collaboration* (http:/www-rnc.lbl.gov/GC/) 8 Feb 2000 CHEP *slides thanks to L. Bernardo, D. Olson, A. Shoshani, S. Vanyashin

  2. Outline • Overview of HENP-GC • What’s new since CHEP’98 • STACS • Experiment interface • Scalability testing • Conclusion

  3. High-Energy & Nuclear Physics Grand Challenge • 3 year project, mid-1997 to mid-2000 • Funded by DOE/MICS with contributed effort from DOE/HENP • Participants: • NERSC/Berkeley Lab • L. Bernardo, A. Mueller, H. Nordberg, A. Shoshani, A. Sim, J. Wu • Argonne • D. Malon, E. May, G. Pandola • Brookhaven Lab • B. Gibbard, S. Johnson, J. Porter, T. Wenaus • Nuclear Science/Berkeley Lab • D. Olson, A. Vaniachine, J. Yang, D. Zimmerman

  4. What is the Grand Challenge architecture? • An order-optimized prefetch architecture for data retrieval from multilevel storage in a multiuser environment • Queries select events and specific event components based upon tag attribute ranges • query estimates are provided prior to execution • collections as queries are also supported • Because event components are distributed over several files, processing an event requires delivery of a “bundle” of files • Events are delivered in an order that takes advantage of what is already on disk, and multiuser policy-based prefetching of further data from tertiary storage • GCA intercomponent communication is CORBA-based, but physicists are shielded from this layer

  5. Client Client Client Client Client System Overview GCA STACS Index File Catalog pftp Staged event files Event Tags (Other) disk-resident event data HPSS

  6. STorage Access Coordination System (STACS) Query Estimator Query Bit- Sliced Index Estimate List of file bundles and events Policy Module Query Monitor File Bundles, Event lists Query Status, Cache Map Requests for file caching andpurging Pftp and file purge commands Cache Manager File Catalog

  7. What is new since CHEP’98 • Multi-component multi-file event model • Event is composed of separate components • Components of a single event are stored in separate files • Removal of Objectivity/DB-specific dependencies • eventID is an experiment-specific typedef • CORBA file catalog interface • User-accessible file bundle information for user-code- dependent file I/O • CORBA interface to tag database • Scalability tests • 10M events, 7 components, 100-250 queries

  8. Multiple-Component Events • Event Components • partition each event into 5-10 pieces • tracks, hits, vertices … • Queries can request one or more components • all components of an event must be in disk cache at the same time • Problem: how to manage multiple component files to minimize re-caching of files • Pseudo query language SELECT tracks, hitsFROM Run17WHERE glb_trk_tot>0 & glb_trk_tot<10 & n_vert_total<3

  9. Example of multiple components Files of Component A Files of Component B e1 e2 e3 Component A of event e1 Component B of event e1 e4 File 2 File 1 e5 e6 e7 e8 e9 File 4 File 3 File Bundles: (F1,F2: e1,e2,e3,e5), (F3,F2: e4,e7), (F3,F4: e6,e8,e9)

  10. Multicomponent Event Delivery • Grand Challenge software • partitions the collection of qualifying events according to which file bundles must be cached on disk to permit their processing • attempts to optimize the order of bundle delivery in a multiuser environment • supports prefetching of bundles

  11. File Weight Policy for Multi-Component Events • File weight (bundle) = 1 if it appears in a bundle, = 0 otherwise • Initial file weight = SUM (all bundles for each query) over all queries • Dynamic file weight: the file weight for a file in a bundle that was processed is decremented by 1

  12. Caching Policy (1) • Query service policy • round robin • query is skipped if no bundle fits available cache • when query is skipped, a skip_service counter is incremented. • if counter is above preset limit, all activity stops till this query is serviced

  13. Caching Policy (2) • Bundle caching policy • select bundle with most files in cache • if a tie, select bundle with highest weight • if not enough space in cache, select next bundle that fits in cache • if none fit in cache, select next bundle with one less file in cache, etc. • if no bundles found, skip query, and increaseskip_service counter

  14. Caching Policy (3) • File purging policy • files are in 2 categories: • file currently in use • file not currently in use • purge file with lowest dynamic_file_weight • if a tie, purge largest file • Pre-fetching policy • Initially: unlimited • this parameter can be assigned dynamically

  15. File Tracking Log Bundle (3 files) formed, then passed to query Bundle shared by two queries Bundle was found in cache Query 1 starts here Query 2 starts here

  16. STAR event model T. Ullrich, Jan. 2000

  17. GCA Interface gcaClient FileCatalog IndexFeeder Interfacing GCA to experiment GC System STAR Components QueryEstimator StIOMaker QueryMonitor database fileCatalog CacheManager tagDB IndexBuilder

  18. Experiment-specific Implementations • IndexFeeder utility • Experiment provides “tag database” that has for each event • attributes used for event selection (“tags”) • fileID for each event component • IndexFeeder reads the experiment’s “tag database” so that GC-provided “index builder” can create index • FileCatalog server • FileCatalog queries the “file catalog” database of the experiment to translate fileID to HPSS & disk path

  19. Client-side implementation • gcaClient interface • gcaResources interface: initialization, configuration, establishment of contact with remote STACS components • QueryObject: query definition, estimation, execution • Order-Optimized Iterator: delivery of event ids (and optional file information) as bundles are cached

  20. Iterator Extensions • // iterator is initialized with this query's token, and a // pointer to GCA_Resources for access to remote STACS //components and configuration parameters: OrderOptIter GCIter(query->token(), &GCA_Resources); while (GCIter.next(eventID& myEvent){ usercode(myEvent); // process an event } • In order to allow for experiment-specific code to handle file I/O an optional flag that signals a new file bundle is used along with a method to retrieve the file name for each event component: while(GCIter.next(eventID& myEvent, bool& thisIsANewBundle)) { ...} • string getComponentFileName(const string componentName)

  21. STAR uses the fileCatalog & instance tables in MySQL to satisfy the fcFileCatalog CORBA interface // fileCat.idl // Luis Bernardo <LMBernardo@lbl.gov>, Alex Sim <ASim@lbl.gov> // Lawrence Berkeley National Laboratory // May 99 // Purpose: defines interface between File Catalog (server) and Cache // Manager, Query Estimator and Query Monitor (clients). #include "smDefs.idl" struct FileInfo { FID_T fid; double fileSize; string localFileName; string remoteFileName; string tapeID; }; typedef FileInfo FILEINFO_T; interface fcFileCatalog { double getFileSize(in FID_T fid); double getSumFileSizes(in FIDSET_T fset); FSIZE_SET_T getListFileSizes(in FIDSET_T fset); FILEINFO_T getFileInfo(in FID_T fid); void updateFileCatalog(in string asciifilecat); };

  22. Schema of fileCatalog table mysql> desc fileCatalog; +--------------+-----------------------------------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+-----------------------------------+------+-----+---------------------+----------------+ | prodType | enum('unknown','daq','sim','job') | | MUL | unknown | | | prodName | varchar(80) | | | | | | prodSerie | int(11) | | | 0 | | | prodInstance | int(11) | | | 0 | | | fileSequence | int(11) | | | 0 | | | dbServer | enum('unknown','bnl','lbl') | | | unknown | | | eventType | int(11) | | | 0 | | | path | varchar(64) | | | | | | fileName | varchar(20) | | | | | | dataset | varchar(64) | | | | | | size | int(11) | | | 0 | | | createTime | datetime | | | 0000-00-00 00:00:00 | | | insertTime | timestamp(10) | YES | | NULL | | | Nevents | mediumint(9) | | | 0 | | | NevLo | mediumint(9) | | | 0 | | | NevHi | mediumint(9) | | | 0 | | | owner | varchar(20) | | | | | | grp | varchar(20) | | | star | | | permit | varchar(10) | | | -rw-r----- | | | type | varchar(20) | | | | | | component | varchar(20) | | | | | | format | varchar(10) | | | | | | site | varchar(10) | | | | | | hpss | enum('Y','N') | | | Y | | | status | smallint(6) | | | 0 | | | comment | blob | | | NULL | | | tape | smallint(6) | | | 0 | | | generation | smallint(6) | YES | | 0 | | | ID | mediumint(9) | | PRI | 0 | auto_increment | +--------------+-----------------------------------+------+-----+---------------------+----------------+ 29 rows in set (0.08 sec) Path to primary instancein HPSS

  23. Schema of instance table mysql> desc instances; +--------------+---------------+------+-----+---------------------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+---------------+------+-----+---------------------+-------+ | fID | int(9) | | PRI | 0 | | | instance | tinyint(4) | | PRI | 0 | | | created | datetime | | | 0000-00-00 00:00:00 | | | volume | varchar(30) | | | n/a | | | path | varchar(128) | | | n/a | | | permit | varchar(10) | YES | | NULL | | | owner | varchar(20) | YES | | NULL | | | grp | varchar(20) | YES | | NULL | | | hpss | enum('Y','N') | | | Y | | | lastAccessed | datetime | | | 0000-00-00 00:00:00 | | | site | varchar(10) | | | BNL | | | location | varchar(10) | | | rcf | | | tape | smallint(6) | | | 0 | | | comment | varchar(255) | YES | | NULL | | | enteredDB | timestamp(10) | YES | | NULL | | +--------------+---------------+------+-----+---------------------+-------+ 15 rows in set (0.10 sec) Disk path, location wherecache manager puts file

  24. Scalability testing • Test Dataset • 10M events • 7 event components • 1.6 TB • 4700 files • QE tested up to 100 concurrent queries • QM tested up to 250 concurrent queries • 24 hour runs • Bugs were found & fixed, system ran OK

  25. File processing by 100 queries

  26. File stage requests

  27. QE estimation times

  28. Conclusion • HENP-GC has developed a system for optimized access to multi-component event data files stored in HPSS. • General CORBA interfaces are defined for interfacing with the experiment. • A client component encapsulates interaction with the servers and provides an ODMG-style iterator. • Has been tested up to 10M events, 7 event components, 250 concurrent queries. • Is currently being integrated with the STAR experiment ROOT-based I/O analysis system.

More Related