290 likes | 409 Views
GFAL and LCG data management. Jean-Philippe Baud CERN/IT/GD. Agenda. LCG Data Management goals Common interface Current status Current developments Medium term developments Conclusion. LCG Data Management goals. Meet requirements of Data Challenges Common interface Reliability
E N D
GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD HEPiX 2004-05-23
Agenda • LCG Data Management goals • Common interface • Current status • Current developments • Medium term developments • Conclusion HEPiX 2004-05-23
LCG Data Management goals • Meet requirements of Data Challenges • Common interface • Reliability • Performance HEPiX 2004-05-23
Common interfaces • Why? • Different grids: LCG, Grid3, Nordugrid • Different Storage Elements • Possibly different File Catalogs • Solutions • Storage Resource Manager (SRM) • Grid File Access Library (GFAL) • Replication and Registration Service (RRS) HEPiX 2004-05-23
Storage Resource Manager • Goal: agree on single API for multiple storage systems • Collaboration between CERN, FNAL, JLAB and LBNL and EDG • SRM is a Web Service • Offering Storage resource allocation & scheduling • SRMs DO NOT perform file transfer • SRMs DO invoke file transfer service if needed (GridFTP) • Types of storage resource managers • Disk Resource Manager (DRM) • Hierarchical Resource Manager (HRM) • SRM is being discussed at GGF and proposed as a standard HEPiX 2004-05-23
Grid File Access Library (1) • Goals • Provide a Posix I/O interface to heterogeneous Mass Storage Systems in a GRID environment • A job using GFAL should be able to run anywhere on the GRID without knowing about the services accessed or the Data Access protocols supported HEPiX 2004-05-23
Grid File Access Library (2) • Services contacted • Replica Catalogs • Storage Resource Managers • Mass Storage Systems through diverse File Access protocols like FILE, RFIO, DCAP, (ROOT I/O) • Information Services: MDS HEPiX 2004-05-23
POOL Physics Application POSIX I/O VFS Grid File Access Library (GFAL) root I/O open() read() etc. dCap I/O open() read() etc. Replica Catalog Client Information Services Client rfio I/O open() read() etc. SRM Client Local File I/O Wide Area Access MDS RC Services SRM Service dCap Service Root I/O Service rfio Service Grid File Access Library (3) HEPiX 2004-05-23
GFAL File System • GFALFS now based on FUSE (Filesystem in USErspace) file system developed by Miklos Szeredi • Uses: • VFS interface • Communication with a daemon in user space (via character device) • The metadata operations are handled by the daemon, while the I/O (read/write/seek) is done directly in the kernel to avoid context switches and buffer copy • Requires installation of a kernel module fuse.o and of the daemon gfalfs • The file system mount can be done by the user HEPiX 2004-05-23
GFAL support • GFAL library is very modular and is small (~ 2500 lines of C): effort would be minimal unless new protocols or new catalogs have to be supported • Test suite available • GFAL file system: • Kernel module: 2000 lines (FUSE original) + 800 lines (GFAL specific for I/O optimization) • Daemon: 1600 lines (FUSE unmodified) + 350 lines GFAL specific (separate file) • Utilities like mount: 600 lines (FUSE + 5 lines mod) HEPiX 2004-05-23
Replication and Registration Service • Copy and register files • Multiple SEs and multiple Catalogs • Different types of SE • Different types of RC • Different transfer protocols • Optimization, handling of failures • Meeting at LBNL in September 2003 with participants from CERN, FNAL, Globus, JLAB and LBNL • Refined proposal by LBNL being discussed HEPiX 2004-05-23
Current status (1) • SRM • SRM 1.1 interfaced to CASTOR (CERN), dCache (DESY/FNAL), HPSS (HRM at LBNL) • SRM 1.1 interface to EDG-SE being developed (RAL) • SRM 2.1 being implemented at LBNL, FNAL, JLAB • SRM “basic” being discussed at GGF • SRM is seen by LCG as the best way currently to do the load balancing between GridFTP servers. This is used at FNAL. HEPiX 2004-05-23
Current status (2) • EDG Replica Catalog • 2.2.7 (improvements for POOL) being tested • Server works with Oracle (being tested with MySQL) • EDG Replica Manager • 1.6.2 in production (works with classical SE and SRM) • 1.7.2 on LCG certification testbed (support for EDG-SE) • Stability and error reporting being improved HEPiX 2004-05-23
Current status (3) • Disk Pool Manager • CASTOR, dCache and HRM were considered for deployment at sites without MSS. • dCache is the product that we are going to ship with LCG2 but this does not prevent sites having another DPM or MSS to use it. • dCache is still being tested in the LCG certification testbed HEPiX 2004-05-23
CASTOR • This solution was tried first because of local expertise • Functionality ok • Solution dropped by CERN IT management for lack of manpower to do the support worldwide HEPiX 2004-05-23
HRM/DRM (Berkeley) • This system has been used in production for more than a year to transfer data between Berkeley and Brookhaven for the STAR experiment • The licensing and support was unclear • However VDT will probably distribute this software • IN2P3 (Lyon) is investigating if they could use this solution to provide an SRM interface to their HPSS system HEPiX 2004-05-23
dCache (DESY/FNAL) • Joint project between DESY and FNAL • DESY developed the core part of dCache while FNAL developed the Grid interfaces (GridFTP and SRM) and monitoring tools • dCache is used in production at DESY and FNAL, but also at some Tier centers for CMS • IN2P3 is also investigating if dCache could be used as a frontend to their HPSS system HEPiX 2004-05-23
Current status (4) • Grid File Access Library • Offers Posix I/O API and generic routines to interface to the EDG RC, SRM 1.1, MDS • A library lcg_util built on top of gfal offers a C API and a CLI for Replica Management functions. They are callable from C++ physics programs and are faster than the current Java implementation. • A File System based on FUSE and GFAL is being tested (both at CERN and FNAL) HEPiX 2004-05-23
LCG-2 SE (April release) • Mass Storage access – to tape • SRM interfaces exist for Castor, Enstore/dCache, HPSS • SRM SEs available at CERN, FNAL, INFN, PIC • Classic SEs (GridFTP, no SRM) deployed everywhere else • GFAL included in LCG-2 – it has been tested against CASTOR SRM and rfio as well as against Enstore/dCache SRM and Classic SEs. HEPiX 2004-05-23
Test suites • Test suites have been written and run against classic SE, CASTOR and dCache for: • SRM • GFAL library and lcg_util • The new version (better performance) of the GFAL File System is being extensively tested against CASTOR and the tests against dCache have started • Latest versions (> 1.6.2) of the Replica Manager support both the classical SEs and the SRM SEs HEPiX 2004-05-23
File Catalogs in LCG-2 • Problems were seen during Data Challenges • performance of java CLI tools • performance problems due to lack of bulk operations • no major stability problems • JOINs between Replica Catalog and Metadata Catalog is expensive • worked with users and other middleware to reduce these joins (often unnecessary) HEPiX 2004-05-23
Proposal for next Catalogs • Build on current catalogs, and satisfy medium term needs from the DC's • Replica Catalog • like current LRC, but not "local" • we never had "local" ones anyway, since RLI was not deployed • no user defined attributes in catalog -> no JOINs • File Catalog • store Logical File Names • impose a hierarchical structure, and provide "directory-level“ operations • user defined metadata on GUID (like in current RMC) HEPiX 2004-05-23
Replication of Catalogs • need to remove single point of failure and load • during one Saturday of CMS DC, Catalogs accounted for 9% of all external traffic at CERN. • RLI (distributed indexes) were never tested or deployed • RLI does not solve distributed metadata query problem (only indexes GUIDs) • IT/DB tested Oracle based replication with CMS during Data Challenge • Proposed to build on this work, and use replicated, not distributed catalogs • small number of sites (~4 - 10) • New design (Replica Catalog and File Catalog) should reduce replication conflicts • need to design the conflict resolution policy - last updated might be good enough HEPiX 2004-05-23
Questions (1) • Is this a good time to introduce security ? • authenticated transactions would help with problem analysis • How many sites should have replicated catalogs ? • Sites require Oracle (not a large problem, most Tier1's have it and license is not a problem) • replication conflicts rise with more sites. • It depends on outbound TCP issues from worker nodes (but a proxy could be used). HEPiX 2004-05-23
Questions (2) • What about MySQL as a backend? • Oracle/MySQL interaction being investigated by IT/DB and others under a "Distributed Database Architecture" proposal • replication between the two is possible • Likely to use MySQL at Tier-2s and Tier-1s without Oracle • Need to investigate which is minimum version of MySQL we require • probably will be MySQL 5.x, when it is stable HEPiX 2004-05-23
Current developments • Bulk operations in EDG RC (LCG certification testbed) • Integration of GFAL with ROOT • Classes TGfal and TGfalFile • Support of ROOT I/O in GFAL • Interface GFAL and lcg_util to EDG-SE HEPiX 2004-05-23
Medium term developments • Reshuffling of Replica Catalogs for performance • Replicated Catalogs instead of Distributed Catalogs • File Collections? • SRM 2.1 • Replication/Registration Service (Arie Shoshani) • Integration of POOL with GFAL to reduce dependencies (using TGfal class) HEPiX 2004-05-23
Important features of SRM 2.1 for LCG (compared to SRM 1.1) • Global space reservation • Directory operations • Better definition of statuses and error codes HEPiX 2004-05-23
Conclusion • In the past 12 months • Common interfaces have been designed, implemented and deployed (SRM and GFAL) • The reliability of the Data Management tools has been improved quite considerably • We are still improving the performance of those tools HEPiX 2004-05-23