130 likes | 312 Views
File Catalog Development in Japan e-Science Project. GFS-WG, OGF24 Singapore. Hideo Matsuda Osaka University. Japan e-Science Project. 3.5 years project, starting from September 2008 Sponsored by MEXT (the Ministry of Education, Culture, Sports, Science and Technology), Japan
E N D
File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University
Japan e-Science Project • 3.5 years project, starting from September 2008 • Sponsored by MEXT (the Ministry of Education, Culture, Sports, Science and Technology), Japan • Two major sub-projects • System Software (Leader: Yutaka Ishikawa, Univ. Tokyo) • Grid Software (Leader: Ken-ichi Miura, NII) 2
Overview of e-Science Grid Software Project 3 End users Computation Workflow Job Submissio Application mgmt DB Federation DB access control AuthN info mgmt Application I/F App control script API App monitoring DB DB DB Grid Middleware Info. Tech. Center DB Laboratory Middleware Evaluation Grid Operation Infrastracture / Application Evaluation Grid Middleware Data Sharing Nation-wide distributed FS File catalog Laboratory
Nation-wide Distributed File System • Goal: Development of distributed file system technology spread over nation-wide with comparative performance of local fileserver • Research Topics: • Optimal automatic placement of file replicas based on Gfarm 2.0. • Fault tolerance with file replicas Client Client Client File Virtual Distributed File System Optimal Replica Placement File Server 1 File Server 2 File Server 3 File Replica File Replica File Replica Storage Storage Storage 4
File Catalog Service Goal: Development of interoperable file catalog service between heterogeneouse Grid environments. • Current file catalog systems (LFC (EGEE gLite), MCAT (SRB), etc.) does not have interoperability to each other. • Development of standardized file catalog based on RNS (Resource Namespace Service) specification. (1) Logical File Name File Catalog System Client (2) Physical File Location (EPR) (3) File Access with GridFTP EGEE gLite File Server SRB or iRODS File Server Japan e-Science Distributed File System 5
File Catalog in e-Science • File Catalog can be used for not only file-location management but also metadata in e-Science since matadata is often described with hierarchical representation in many sciences. Genome Proteome ATLAS CMS Bacterial Genome Functional Analysis Structure Analysis 20071003 20080110 Human Genome Plant Genome run1 run2 sp|P37231 pdb|1FM6 gb|AY157024 track1 track2 High Energy Physics Molecular Biology 6
Metadata Management using File Catalog • Currently metadata are mainly stored in File Catalogs using their hierarchical namespace functionality. • gLite: LFC, Fireman • iRODS (SRB): ICAT • Globus: RLS • NAREGI: Gfarm • It is not easy to exchange metadata over different Grid middlewares. 7
Resource Namespace Service (1) http://www.ogf.org/documents/GFD.101.pdf 8 RNS lets you map any resource into single, hierarchical namespace Resources are referred to in a form of EndpointReference (WS-Addressing) RNS Specification is published as GFD-R-P.101 RNS implementation is available from U.Virginia and U.Tsukuba.
Resource Namespace Service (2) Hierarchical namespace management that provides name-to-resource mapping Basic Namespace Component Virtual Directory Non-leaf node in hierarchical namespace tree Junction Name-to-resource mapping that interconnects a reference to any existing resource into hierarchical namespace /grid ogf jp file1 file2 data gfs file2 file1 file3 file4 EPR2 EPR1 EPR: Endpoint Reference 9
Development of File Catalog System (Plan) • RNS can interconnect a reference to any existing resource into hierarchical namespace • Most of Grid middlewares have GridFTP for data transfer gUse RNS as a standardized File Catalog Use GridFTP URL “gsiftp://.../” as the address of Endpoint Reference. (1) query RNS Client (2) EPR list (including address) (3) Access with GridFTP protocol RNS Japan e-Science File Server Globus GridFTP Server gLite File Server (SRM) iRODS File Server 10
Comparison with gLite LFC Comments from Erwin Laure (OGF22 GFS-WG) • add EPR: RNS is missing the detailed attributes of the replicas. • query EPR: The attributes of a namespace entry should be defined, allowing specialized queries and lookups. • RNS lacks bulk operations, sessions, transactions. Adoption of those may improve performance. • Access control and VO management are also not introduced yet. 11
Comparison with iRODS Comments from Reagan Moore (OGF23 GFS-WG) • Applications now manipulate structured information. iRODS can generate and manipulate structured information with micro-services. • Multiple standards for describing structured information. 12
Summary • Standarized File Catalog is useful for federating heterogeneous Data Grids. • Need to establish File Catalog Profile for interoperation of different File Catalogs (and for its standardization). 13