190 likes | 333 Views
Data Management Challenge - The View from OGF. OGF22 – February 28, 2008 Cambridge, MA, USA. Erwin Laure <Erwin.Laure@cern.ch> David E. Martin <martinde@us.ibm.com> Data Area Directors. Early Grid View of Grids. Early Grid systems had a quite simplistic view: Dispatch a job to machine
E N D
Data Management Challenge -The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure <Erwin.Laure@cern.ch> David E. Martin <martinde@us.ibm.com> Data Area Directors
Early Grid View of Grids • Early Grid systems had a quite simplistic view: • Dispatch a job to machine • GridFTP files to the machine from “Somewhere” • Run the job • GridFTP results to “Somewhere” • Grids defined “Computing Elements (CE)” • Data and storage was considered to be “there” • Storage Elements (SE) concept came much later • Barely OK for Initial Data Analysis • Physics, Geosciences, etc 2
Then Data kicked in … • Compute jobs have to deal with input/output data, transient data • Data is • Heterogeneous (storage, data formats) • Distributed • Independently managed 3
The Grid Grows Up • Databases Access • DAIS • Storage/File Management • SRM • File/Data Transfer • gridFTP, RTF, FTS • Data Location • RLS, LFC • Metadata • Data Management Systems • SRB • … 4
SRM Interactions Client 4 SRM 1 2 3 5 Storage • The client asks the SRM for the file providing an SURL (Site URL) • The SRM asks the storage system to provide the file • The storage system notifies the availability of the file and its location • The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed • The client interacts with the storage using the protocol specified in the TURL
Application Client Toolkit OGSA-DAI service Engine XPath SQLQuery readFile GZip XSLT GridFTP Activities JDBC XMLDB File Data Resources DB2 SQL Server MySQL XIndice SWISS PROT Data- bases
Control Control Control Control Data Data Data Data GridFTP and RFT RFT Client SOAP Messages Notifications(Optional) globus-url-copy RFT Service
gLite FTS • Logical unit of management • Represent a directed network pipe between two sites • Mono-directional, Dedicated link • Independently manageable • State • Number of streams • Number of concurrent transfers • Inter-VO scheduling • VO share • No Routing involved • Non-dedicated channels • E.g. star channel
SRB as a Data Grid DB MCAT SRB SRB SRB SRB SRB SRB Data Grid has arbitrary number of servers Complexity is hidden from users Data Management in Production Grids
Need for Grid Data Architecture • and Standards • OGF OGSA Data Architecture WG • Started in October 2005 • Data Architecture document published as GFD.121 10
Serviceinterface Resourceinterface OGSA-Data Architecture Client APIs (non-OGSA) / Other services Sink/ Source Sink/ Source Description Storage Access Access Description Data Service Data Service Storage Management Stored Data Resources Other Data Resources Managed Storage 11
Serviceinterface Resourceinterface OGSA-Data: Data Replication/Transfer Client APIs (non-OGSA) / Other services Replication Transfer Replication Transfer Sink/ Source Description Sink/ Source Access Access Description Data Service Data Service Data Resources Data Resources Transfer Protocols 12
OGF Data Area WGs I • Data Format Description Language WG (dfdl-wg) • Describe the structure of binary and character encoded files and data streams • Database Access and Integration Services WG (dais-wg) • Provide consistent access to existing, autonomously managed databases from web services • Grid File System Working Group (gfs-wg) • Service interface(s) and architecture of a logical file system • Grid Storage Management WG (gsm-wg) • Provide dynamic space allocation and file management of shared storage components on the Grid (Storage Resource Manager – SRM) • GridFTP WG (gridftp-wg) • Improvements of FTP suitable for grid applications. 13
OGF Data Area WGs II • Info Dissemination WG (infod-wg) • Develop a model for Information Dissemination • OGSA ByteIO Working Group (byteio-wg) • Define a minimal Web Service interface for providing "POSIX-like" file functionality • OGSA Data Movement Interface WG (ogsa-dmi-wg) • Managed data movement • OGSA-Data Working Group (ogsa-d-wg) • Data Architecture 14
Activities related to file system and data movement • GFS: • Resource Namespace Service Specification (GFD.101) • Byte-IO: • Byte-IO OGSA WSRF Basic Profile Rendering (GFD.88) • GSM • The Storage Resource Manager Interface Specification Version 2.2 (in public comment) • DMI • OGSA-DMI Specification (in public comment) 15
Data Architecture: Gaps • Standardized metadata • Identify query languages, data formats, transport protocols, … • Needed in DAIS, DMI, ByteIO, … • Data catalogs & Registries • Discovery an important part of Grids • Replication/Caching • Data Federation 16
Standards Gaps • Caching and Replication • Integrated Data Management • Transactions in a Grid • Storage Provisioning • Virtualization • Provenance, Integrity, Policy • File Metadata • Streaming • Versioning 17
Standards Gaps • Dependencies • Security: IETF, OGF • Management: DMTF, SNIA • WS-*: OASIS and W3C 18
Main Focus for Future Work Where can we exploit synergies with SNIA? • File systems • NFSv4, pNFS • Interface to Metadata stores • Policies (not only Data) • Name your favorite 19