140 likes | 147 Views
POOL Project Status. GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA. What is POOL?. P ool O f persistent O bjects for L HC develops a common object I/O for High Energy Physics applications in the LHC era Started in April 2002
E N D
POOL Project Status GridPP 10th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA
What is POOL? • Pool Of persistent Objects for LHC • develops a common object I/O for High Energy Physics applications in the LHC era • Started in April 2002 • In the context of LHC Computing Grid (LCG) Application Area (AA) • Joint project of the LHC experiments and the CERN IT/DB group • Several GridPP funded people actively involved • Successfully used in production • LHC data challenges in 2004 GridPP 10th Collaboration Meeting
POOL project purpose • Is to allow storage and retrieval of the multi-PB of experiment data and associated meta data in a distributed and Grid enabled fashion • Data comes in different volumes • Event data, physics and detector simulation, • Detector data and bookkeeping data • Data comes in various forms • Bulk data • Time dependent data • Metadata • This challenge is faced by a hybrid technology approach • C++ object streaming technology for bulk data • Using ROOT framework • Transactional safe services for catalogs, collections and meta data • Using RDBMS systems such as Oracle, MySQL, … GridPP 10th Collaboration Meeting
POOL architecture • POOL is a storage technology neutral API • It is a component based system following the LCG Architecture Blueprint recommendations • The POOL is built from SW components where these • Implement pure abstract C++ interfaces • Experiment framework user code is insulated from concrete implementation details and technologies • Expose minimal dependencies • Weak coupling ensured by interactions only via their abstract interfaces • Are loaded on demand • Using the SEAL plug-in management and component model GridPP 10th Collaboration Meeting
POOL Work Package breakdown • Storage Manager • Streams transient C++ objects into/from a storage • Resolves a logical object reference into a physical object • File Catalog • Maintains the information about POOL accessible data files • Helps the Storage Manager to resolve the physical location of the data • Resolves a logical reference into a physical data source • For more details see the talk of Maria Girone later this morning in Grid Data Management track • Collections • Provides the tools to manage potentially (large) ensembles of objects stored via POOL persistence services • Explicit: server-side selection of object from query able collections • Implicit: defined by physical containment of the objects GridPP 10th Collaboration Meeting
Interaction between POOL components GridPP 10th Collaboration Meeting
POOL is Grid aware via the File Catalog component based on the LCG Replica Location Service (RLS) File resolution and meta data queries are forwarded to Grid middleware requests See talks in Grid Data Management Session The POOL Storage Manager allows access to a remote file via ROOT framework remote I/O facilities Such as RFIO or dCache POOL Grid access facilities might evolve The new Grid File Access Library (GFAL) introduces uniform access to file catalog and mass storage services GFAL integration into POOL is being discussed by all involved parties POOL and the Grid GridPP 10th Collaboration Meeting
POOL New Developments • Changes triggered by evolution of foundation libraries • Integration of the latest new features in SEAL software • Parallel development of ROOT 4 based storage service • New developments due to the new set of use-case and requirements • Prototyping Relational Access back-ends • Implementation of some existing components using the new Relational Access layer GridPP 10th Collaboration Meeting
SEAL & ROOT 4 Related Development • Adapt to the interface changes in SEAL PluginManager • Simplification of plug-in management code • Pick up new interfaces of SEAL component model • Improves internal component organization and run-time configuration • Performed with close collaboration with experiments to ensure minimal impact on the client code • Integration with ROOT 4 • Evaluation Work has already started. • Improves support for STL data types • Faster execution thanks to direct calls to ROOT API • Will prepare a migration plan with the experiments • Until agreement is reached with experiments on migration, version 3.x will be used in the production releases. • POOL 1.7.0 still with ROOT 3.x (we are maintaining a parallel development branch for bug fixing). • But we will offer a development version with ROOT 4 as well. GridPP 10th Collaboration Meeting
POOL Relational Abstraction (I) • Motivation: independence from DB vendors • Activity started for most parts only in March. • Requirements collection • Domain decomposition • Draft project plan • Addressing the needs of the existing POOL relational components (FileCatalog, Collection), the POOL object storage mechanism (StorageSvc) and eventually also the ConditionsDB (if requested by the experiments). • The use-cases and requirements are defined in close cooperation with experiments GridPP 10th Collaboration Meeting
POOL Relational Abstraction (II) POOL::FileCatalog POOL::Collection MySQL, Root Collection XML, MySQL, EDG Catalog RelationalCollection RelationalFileCatalog RelationalAccess POOL::StorageService OracleAccess ObjectRelationalAccess SQLiteAccess RelationalStorageSvc MySQLAccess Root Storage Service ODBCAccess GridPP 10th Collaboration Meeting
POOL Relational Abstraction (III) • Relational Abstraction Layer status: • Base interfaces defined • AuthenticationService implementation provided. • Oracle plug-in implemented and unit-tested using OCI 9 • SQLite plug-in implemented. Testing in progress. • ODBC plug-in implementation in progress. • Proof of concept RelationalFileCatalog implemented and tested using the Oracle plug-in and the FileCatalog component. • MySQL access is via ODBC • Direct Implementation now would run into maintenance problems as MySQL API will change with MySQL 5 • Until then POOL will access MySQL via the more generic ODBC plug-in GridPP 10th Collaboration Meeting
POOL Object Relational Storage • Current status • Object/Relational mapping mechanism defined. • User driven mapping with default rules. • Command line tools which generate and store the mapping given a set of header files. • Implementation of the mapping I/O almost complete. • Next steps • Basic object I/O within the next weeks. • Functional POOL Relational Storage Service soon afterwards. GridPP 10th Collaboration Meeting
Summary • POOL highest priority remains the support of this year’s data challenge and test beam activities • therefore only a few FTEs dedicated to new major developments. • Main POOL development this year • following the developments in SEAL, • integrating with ROOT 4, • the relational abstraction layer, • the relational storage manager. • Development progress close to the proposed POOL work plan • GridPP contribution has made a significant impact on POOL GridPP 10th Collaboration Meeting