370 likes | 471 Views
Data Management & Information Systems. Markus Schulz – SA3 - CERN OGF - EGEE-II User Forum Manchester - 9 May 2007. Disclaimer. Material that went into this presentation has been provided by many developers from inside JRA1, SA3, NA4 and external contributors Many thanks Ask questions!.
E N D
Data Management & Information Systems Markus Schulz – SA3 - CERN OGF - EGEE-II User Forum Manchester - 9 May 2007
Disclaimer • Material that went into this presentation has been provided by many developers from inside JRA1, SA3, NA4 and external contributors • Many thanks Ask questions! OGF - EGEE II User Forum - Manchester - 9 May 2007
EGEE Data Management VO Frameworks User Tools Data Management lcg_utils FTS Cataloging Storage Data transfer GFAL Vendor Specific APIs (Classic SE) LFC gridftp (RLS) SRM RFIO OGF - EGEE II User Forum - Manchester - 9 May 2007
LFC LCGFile Catalog LHCComputing Grid File Catalog LargeHadron Collider Computing Grid File Catalog
File replica 1 LFC file name 1 … GUID File replica 2 LFC file name n … File replica m LCG “File” Catalog • The LFC stores mappings between • Users’ file names • File locations on the Grid • The LFC is accessible via • CLI, C API, Python interface, Perl interface • Supports sessions and bulk operations • Data Location Interface (DLI) • Web Service used for match making: • given a GUID, returns physical file location • ORACLE backend for high performance applications • Read-only replication support OGF - EGEE II User Forum - Manchester - 9 May 2007
LFC features • Hierarchical Namespace • GSI security • Permissions and ownership • ACLs (based on VOMS) • Virtual ids • Each user is mapped to (uid, gid) • VOMS support • To each VOMS group/role corresponds a virtual gid /grid /vo /data LFC DLI lfc-ls –l /grid/vo/ file lfc-getacl /grid/vo/data OGF - EGEE II User Forum - Manchester - 9 May 2007
What's new ? • LFC bulk operations • New method: lfc_getreplicas • Greatly improves replicas listing performance • Secondary groups support • Since LFC version 1.6.3 (in production) OGF - EGEE II User Forum - Manchester - 9 May 2007
Since LFC version 1.6.3 • With secondary groups • User 2 can register a file in dir1 • As (s)he belongs to gid2 and gid1 • But: User 1 cannot register a file in a directory created by User 2, if (s)he does not have the same VOMS Role ! LFC dir 1 775 (uid1, gid1) 1. Creates directory User 1 from VO 1 Mapped to (uid1, gid1) 2. Tries to create file NEW User 2 from VO1 with VOMS Role Mapped to (uid2, gid2, gid1) Also belongs to VO1 OGF - EGEE II User Forum - Manchester - 9 May 2007
Storage Resource Manager (SRM) hides the storage system implementation (disk or active tape) handles authorization translates SURLs (Storage URL) to TURLs (Transfer URLs) disk-based: DPM, dCache,+; tape-based: Castor, dCache File I/O: posix-like access from local nodes or the grid GFAL (Grid File Access Layer) Storage Element OGF - EGEE II User Forum - Manchester - 9 May 2007
What is a DPM ? • Disk Pool Manager • Manages storage on disk servers • SRM support • 1.1 • 2.1 (for backward compatibility) • 2.2 (released in DPM version 1.6.3) • GSI security • ACLs • VOMS support • Secondary groups support (see LFC) OGF - EGEE II User Forum - Manchester - 9 May 2007
DPM strengths • Easy to use • Hierarchical namespace • $ dpns-ls /dpm/cern.ch/home/vo/data • Easy to administrate • Easy to install and configure • Low maintenance effort • Easy to add/drain/remove disk servers • Target: small to medium sites • Single disks --> several disk servers OGF - EGEE II User Forum - Manchester - 9 May 2007
DPM: user's point of view /dpm /domain /home CLI, C API, SRM-enabled client, etc. /vo (uid, gid1, …) DPM head node file data transfer • DPM Name Server • Namespace • Authorization • Physical files location • Disk Servers • Physical files • Direct data transfer from/to disk server (no bottleneck) • External transfers via gridFTP DPM disk servers OGF - EGEE II User Forum - Manchester - 9 May 2007
GFAL & lcg_util • Data management access libs. • Shield users from complexity • Interacts with information system, catalogue and SRM-SEs • GFAL • Posix like C API for file access • SRMv2.2 support • User space tokens correspond to • A certain retention policy (custodial/replica) • A certain access latency (online/nearline) • lcg_util (command line + C API ) • Replication, catalogue interaction etc. OGF - EGEE II User Forum - Manchester - 9 May 2007
LFC & DPM deployment status • EGEE Catalog • 110 LFCs in production • 37 central LFCs • 73 local LFCs • EGEE SRM Storage Elements • CASTOR • dCache • DPM • 96 DPMs in production • Supporting 135 VOs • LFC and DPM • Stable and reliable production quality services • Well established services • Require low support effort from administrators and developers Storage Element instances published in EGEE’s Top BDII OGF - EGEE II User Forum - Manchester - 9 May 2007
FTS overview OGF - EGEE II User Forum - Manchester - 9 May 2007 gLite File Transfer Service is a reliable data movement fabric service (batch for file transfers) • FTS performs bulk file transfers between multiple sites • Transfers are made between any SRM-compliant storage elements (both SRM 1.1 and 2.2 supported) • It is a multi-VO service, used to balance usage of site resources according to the SLAs agreedbetween a site and theVOs it supports • VOMS aware
FTS OGF - EGEE II User Forum - Manchester - 9 May 2007 Why is it needed ? • For the user, the service it provides is the reliable point to point movement of Storage URLs (SURLs) and ensures you get your share of the sites’ resources • For the site manager, it provides a reliable and manageable way of serving file movement requests from their VOs and an easy way to discover problems with the overall service delivered to the users • For the VO production manager, it provides ability to control requests coming from his users • Re-ordering, prioritization,… • The focus is on the “service” delivered to the user • It makes it easy to do these things well with minimal manpower
FTS: key points OGF - EGEE II User Forum - Manchester - 9 May 2007 Reliability • It handles the retries in case of storage / network failures • VO customizable retry logic • Service designed for high-availability deployment Security • All data is transferred securely using delegated credentials with SRM / gridFTP • Service audits all user / admin operations Service and performance • Service stability: it is designed to efficiently use the available storage and network resources without overloading them • Service recovery: integration of monitoring to detect service-level degradation
Service scale OGF - EGEE II User Forum - Manchester - 9 May 2007 Designed to scale up to the transfer needs of very data intensive applications Currently deployed in production at CERN • Running the production WLCGtier-0 data export • Target rate is ~1 Gbyte/sec 24/7 • Over 9 petabytes transferred in last 6 months > 10 million files Also deployed at ~10 tier-1 sites running a mesh of transfers across WLCG • Inter-tier1 and tier-1 to tier-2 transfers • Each tier-1 has transferred around 0.2 – 0.5 petabytes of data
Metadata in EGEE • Metadata is information about data stored in files • usually lives in relational databases • AMGA is a joint JRA1-NA4 development • Used by several application domains ( BioMed, HEP, EarthObs….) • Implementation: • SOAP and Text front-ends • Streamed Bulk Operations ----> performance • Supports single calls, sessions & connections • SSL security with gridcerts (X509) • and others, passwords, Kerberos • Own User & Group management + VOMS • PostgreSQL, Oracle, MySQL, SQLite backends • Query parser supports good fraction of SQL: • Access permissions per directory/entry via ACLs • AMGA integrates support for replication of metadata • Asynchronous replication: Ideal for WAN
AMGA Clients & APIs • AMGA Clients (for setup, administration) • Shell-like client • Graphical Browser (Python) • Many Programming APIs • Diverse user community requested/providedC/C++, Java, Python, Perl, PHP • SOAP interface • Works with gSOAP, Axis, PySOAP
1e+06 AMGA 1000 rows JDBC 1000 rows AMGA 1 row JDBC 1 row 100000 10000 Throughput [entries/s] 1000 100 # clients 1 10 100 Performance • Performance comparable to direct DB access • C++, TCP streaming protocol, very fast SSL sessions Logarithmic Scale! Throughput comparison between AMGA and direct access via JDBC reading same table on a LAN
Scale • LHCb (HEP VO use case) • 100 Million entries successfully tested! • 150GB data • 100 000 entries/day insert rate expected • 10 entries/second read-rate • Uses ORACLE RAC backend • For most demanding use cases
Motivation Medical community as the principal user • large amount of images • privacy concerns vs. processing needs • ease of use (image production and application) Strong security requirements • anonymity (patient data is separate) • fine grained access control (only selected individuals) • privacy (even storage administrator cannot read) Legacy service in use, based on gLite-1.5 Described components are under development
gridftp SRM I/O DICOM SE Building Blocks • Hospitals: • DICOM = Digital Image and COmmunication in Medicine • Grid: SE = SRM + gridftp + I/O • and a client (application processing an image) Goal: data access at any location
Exporting Images “wrapping” DICOM : • anonymity: patient data is separated and stored in AMGA • access control: ACL information on individual files in SE (DPM) • privacy: per-file keys • distributed among several Hydra key servers • fine grained access control Image is retrieved from DICOM and processed to be “exported” to the grid. AMGAmetadata HydraKeyStore HydraKeyStore HydraKeyStore gridftp patient data keys SRMv2 trigger file ACL image I/O DICOM DICOM-SE
Accessing Images • image ID is located by AMGA • key is retrieved from the Hydra key servers • file is accessed by SRM (access control in DPM) • data is read and decrypted block-by-block in memory only (GFAL and hydra-cli)---> useful for all Still to be solved: • ACL synchronization among SEs HydraKeyStore HydraKeyStore AMGAmetadata HydraKeyStore gridftp 2. keys 1. patient look-up SRMv2 3. get TURL GFAL image 4. read I/O DICOM DICOM-SE
Information Systems • R-GMA • BDII (ldap based information system) OGF - EGEE II User Forum - Manchester - 9 May 2007
For users R-GMA appears similar to a single relational database. Implementation of GGF’s Grid Monitoring Architecture (GMA) Rich set of APIs (WebBrowsers, Java, C/C++, Python) Backbone of EGEE monitoring (almost every activity leaves traces) See Dashboard, Realtime Monitor ++++++ about 20 tools Used by EGEE accounting as transport Relational Grid Monitoring ArchitectureR-GMA Publish Tuples Producer application Producer Service API SQL “INSERT” Register Registry Service Tuples Query SQL “SELECT” Locate Send Query Consumer application Consumer Service API Receive Tuples Schema Service SQL “CREATE TABLE” OGF - EGEE II User Forum - Manchester - 9 May 2007
Service discovery • SD provides simple methods for locating services • hides underlying information system (simplified use) • plug-ins for R-GMA, BDII and XML files • API available for Java, C/C++ and command line tools OGF - EGEE II User Forum - Manchester - 9 May 2007
The Information System Berkeley Data Base Information Index BDII top-level FCR Queries (15HZ) WMS VO specific filter, based on live status WN 2 minutes Site BDII site-level UI FTS Based on ldap Standardized information provider (GIP) GLUE-1.3 schema Used with 230+ sites Roughly 60 instances in EGEE Top level BDII at CERN 15HZ query rate >20MByte of data BDII resource MDS GRIS provider provider OGF - EGEE II User Forum - Manchester - 9 May 2007
Inside A BDII FCR Write to cache 2171 LDAP 2172 LDAP 2173 LDAP Write to cache Update DB & Modify DB Write to cache Write to cache Write to cache ldapsearch Swap DBs 2170 Port Fwd 2170 Port Fwd OGF - EGEE II User Forum - Manchester - 9 May 2007
Load Balanced BDII BDII 2170 BDII 2170 BDII 2170 BDII 2170 BDII 2170 BDII 2170 DNS Round Robin Alias Queries OGF - EGEE II User Forum - Manchester - 9 May 2007
GIN BDII GIN BDII ARC BDII Used by the GIN group Generic Information Provider Provider Naregi Provider EGEE Provider Teragrid Provider Pragma Provider NDGF Provider OSG OSG Site Pragma Grid NDGF Site Naregi Grid Teragrid Grid EGEE Site OGF - EGEE II User Forum - Manchester - 9 May 2007
Information Systems • Current problems • SLAPD demons on loaded systems can starve • CEs drop out of the system • Move info provider from CE • SiteBDIIs co-hosted on busy systems time out • Loss of an entire site in the info system • Move on large sites to low load node • Improve reliability by fail back top level BDIIs • Needs work in the clients • Scalability tests indicate limits (1-2 years time) • Cache static data more aggressively • Smarter schema (OGF- GLUE) • Change underlying technology • Simple insulation API needed ------> Standardization (OGF SAGA?) OGF - EGEE II User Forum - Manchester - 9 May 2007