360 likes | 530 Views
CHEP 2000. Data Handling in KLOE I.Sfiligoi INFN LNF, Frascati, Italy. The KLOE experiment. K S p + p - K L p + p - (CP not). at DA F NE -factory main goal: CP violation study other interesting fields: kaon form factors kaon rare decays radiative f decays.
E N D
CHEP 2000 Data Handling in KLOE I.Sfiligoi INFN LNF, Frascati, Italy
The KLOE experiment KSp +p - KLp +p - (CP not) • at DAFNE -factory • main goal: • CP violation study • other interesting fields: • kaon form factors • kaon rare decays • radiative f decays KSp +p - KL3p 06g
KLOE Requirements • Data acquisition (at full DAFNE luminosity) • 1011 events per year acquired • 50 MB/s sustained throughput • Computing power • ALL the events need to be reconstructed • Storage requirements • one petabyte of raw and reconstructed events • hundreds of megabytes of related data(configurations, slow control data, calibration parameters, etc.)
KLOE computing environment • Based on a set of medium-sized servers • Connected using commercial switched networks (Fast Ethernet and Gigabit Ethernet) • Heterogeneous environment, several platforms: • IBM AIX on PowerPC • Sun Solaris on Sparc • Compaq Tru64 Unix on Alpha • HP-UX on PA-RISC
KLOE storage pool • Different policies for different types of data: • raw and reconstructed events on tape libraries, with big disk pools for data caching • related data managed by a disk based database system • analysis output on disk pools
Disk pools • Four categories of disk pools are present: • each data acquisition node in the farm has its own small disk pool • computing nodes write their output to centralized, NFS mounted disk pools • separate disk pools are used as a cache for the events on tape • analysis output is written to its own, central AFS mounted disk pool
Tape library • Several automated tape libraries supported(at the moment the 5500 slot tape library is partitioned between two tape servers) • Accessed using commercial software • IBM ADSM with the current tape library
KLOE software • Three distinct categories • DAQ (or online) • reconstruction and analysis (or offline) • Monte Carlo ANSI C FORTRAN inside A_C FORTRAN The interface to the Data Handling System must be compatible with all of them
KLOE Data Handling System • Composed of four elements: • Database System • Archiving System • Spy System • KLOE Integrated Dataflow (KID)
KLOE Data Handling System A mix of commercial and custom software the dependency on commercial software is minimized by the layers of custom software commercial software carries on all the vital functions • custom software mostly extends and coordinates the functionality of the commercial software
bypasses TCP/IP filtering flexible, programming language and operating system independent no configuration needed on the client side KLOE Data Handling System • Based on a set of multi-threaded non-privileged daemons and related libraries • Distributed across several nodes • Communication by means of TCP/IP sockets on high ports
KLOE Data Handling System Composed of four elements: Database System Archiving System Spy System KLOE Integrated Dataflow (KID)
offline database system Database System • Two distinct database systems are used based on HepDB data stored as ZEBRA banks • online database system based on a Relational DBMS data are structured in fields extended for distributed environments
app app RDBMS app DD Online Database System • data stored in a Relational DBMS • IBM DB2 Universal Database at the moment • communication between the clients (user applications) and the RDBMS through a database daemon
Database Daemon • The database daemon is the only link between the applications and the RDBMS • if the RDBMS is changed in the future, only the database daemon will need to be changed • Different kinds of commands are managed by the daemon • general SQL commands • KLOE specific commands
general SQL commands • passed directly to the RDBMS select run_nr from run_logger where status = 'OK' • managed by the daemon itself • the RDBMS is used to retrieve and store data needed by the daemon itself log that I am starting processing file relative to run 3 Database Daemon • Different kinds of commands are managed by the daemon • KLOE specific commands
for example, the DAQ configuration cache reduces the typical access time from 4 to 0.1 s Database Daemon • The use of KLOE specific commands has several advantages • additional checks and restrictions are possible • data consistency management is centralized • fast central caches can be implemented
A light version • The RDBMS is used to ensure flexibility, reliability and performance • Demanding in terms of computing resources and management effort • stand-alone environments oftencannot afford it • A RDBMS-independent version of the database daemon is under development
A light version A RDBMS-independent version of the database daemon is under development limited to KLOE specific and the most frequently used SQL commands based on use of flat files containing a small portion of the data not suitable for production environment,but enough for home use
KLOE Data Handling System Composed of four elements: Database System Archiving System Spy System KLOE Integrated Dataflow (KID)
KLOE Archiving System • Expected event data managed by KLOE • 1 PB • Tape libraries needed • data storage and retrieval non trivial • random access to data very inefficient • Disk-based intermediate buffers used
KLOE Archiving System • Two types of intermediate buffers • DAQ, offline and Monte Carlo output are structured as YBOS files and written on their disk output areas • event data needed by offline as input are read from the archiving system disk-cache
Data needs to be migrated from output areas to the tape library as soon as possible(taking into account also efficiency concerns) from the tape library to the disk cache when an application needs it(or even better, a bit earlier) Migration is totally automated and transparent to the applications KLOE Archiving System
KLOE Archiving System • The Archiving System is made of four components • storage managers • disk space managers • output areas • cache areas • archival director • cache manager • Communication by means of TCP/IP sockets • Coordinated by the online database archADSM spacekeeper filekeeper archiver retrieve
Storage Managers • One for each logical tape library • Allows • queries about tape library content • file archival • file retrieval • Transaction oriented(if the underlying tape library software supports it)
The only link between the tape library and the rest of the system interface independent of the underlying archiving software IBM ADSM is used with the current tape library if other products is used in the future, only a specific storage manager will need to be developed Storage Managers
Disk Space Managers • One for each disk pool • Create and delete files • unused files get deleted to make space for new ones
Archival Director • Fully automated • Works in polling mode • from time to time looks for files ready to be archived • starts archiving only when enough data is available • Files are ordered and grouped to minimize the expected retrieve time • Several groups of files can be archived in parallel
Cache Manager • User driven • when a file is needed, the application asks the cache manager where it is located • a retrieve is performed by the manager if needed • Several requests can be issued at the same time • the manager reorders them internally to minimize the tape mounts • Communication by means of TCP/IP sockets
KLOE Archival System archiver Tape Library Tape Library ... n archADSM archADSM . . . m spacekeeper spacekeeper Disk Pool Disk Pool DB . . . filekeeper k filekeeper Disk Pool Disk Pool retrieve NFS mount local file system TCP/IP socket TCP/IP socket
KLOE Data Handling System Composed of four elements: Database System Archiving System Spy System KLOE Integrated Dataflow (KID)
Spy System • KLOE data acquisition software allows the event data to be read-out before they get written to disk • The mechanism that reads those data is called Spy • Based on use of shared memory buffers • DAQ processes are piped using this mechanism • the spy system reads data from the buffers without interfering with the DAQ
KLOE Data Handling System Composed of four elements: Database System Archiving System Spy System KLOE Integrated Dataflow (KID)
spy:/buffer datarec:(run_nr=5000) and (stream='ksl') open a spy channel and pass the events to the application read the list from DB, ask the cache manager for the files, pass the events from the files to the application KLOE Integrated Dataflow (KID) • Integration library • database accesses and retrieve operations hidden • Offers a single point of access to all the services • URI-based selection
Management effort • The entire system is managed by only a few people: • 3 people (2 full time) are engaged in KLOE computing system management (including storage) • 1 person is engaged in the development and management of the online database and the archiving system • 2 people spend few percent of their time for the maintenance of the offline database
CHEP 2000 Data Handling in KLOE I.Sfiligoi INFN LNF, Frascati, Italy