240 likes | 404 Views
Data Area Overview. OGF24 15 September 2008. Erwin Laure <Erwin.Laure@cern.ch> David E. Martin <martinde@us.ibm.com> Data Area Directors. Data Area Goals. The Data Area groups explore different aspects of data handling on grids Access Transport Management
E N D
Data Area Overview OGF24 15 September 2008 Erwin Laure <Erwin.Laure@cern.ch> David E. Martin <martinde@us.ibm.com> Data Area Directors
Data Area Goals • The Data Area groups explore different aspects of data handling on grids • Access • Transport • Management • Overall Data Architecture developed by OGSA Data Architecture group: • http://www.ogf.org/documents/GFD.121.pdf 2
Data Access • Goals: locate and provide seamless access to data stored on Grids • Data Access and Integration Services (DAIS-WG) • Base Specs Published for Database Access (GFD 74,75,76) • Implementation in OMII-UK • Now Working on Data Access Services for RDF Data Resources • Grid File Systems (GFS-WG) • Naming Spec Published – Resource Namespace Service (GFD101) • Working on Resource Catalog • Prototypes from SDSC, UVA, Univ. of Tsukuba • Data Format Description Language (DFDL-WG) • XML-based languagefor describing the structure of binary and textual files and data streams • Simplifying the Concepts and Trying to Remove Complexity to Shorten Draft Spec • Prototypes from LANL and IBM • Byte IO (ByteIO-WG) • Web Service interface for providing "POSIX-like" file functionality (GFD 87,88) • Spec Finished Comment, Need to Make Small Changes • Production Version from UVA, Will Be in OMII 3
Data Transport • OGSA Data Movement Interface (OGSA-DMI-WG) • Discover and negotiate proper data transport protocols and manage data transport (GFD134) • Working on interoperability • GridFTP WG (GridFTP-WG) • Grid enabled FTP protocol • Spec Published 3 Years Ago (GFD20) • Many Production Implementations • Need Experience Report for Full Standard 4
Data Management • Grid Storage Management (GSM-WG) • Storage Resource Manager (SRM) to provide common interface to storage resources (GFD129) • Several interoperating implementations in production use • Working on 3.0 Spec • Information Dissemination (INFOD-WG) • Model for Information Dissemination; focus on query-like operations • Base specs published (GFD110) • Looking at candidates for follow-on Work • Storage Networking Community Group (SN-CG) • Led by Vincent Franceschini, Chair of SNIA Board • Portal to SNIA Work • Follow-on to EGA Data Provisioning WG 5
Data Grid Specifications and Use Cases Material provided byAndrew Grimshaw (grimshaw@virginia.edu)
Outline • Background – The Rule of 3s • Specifications • Implementations
Access Layer Grid Services Layer Resource Provisioning Layer Files, databases, instruments Interfaces, e.g. FUSE,SAGA, NFS, CIFS Standard portypes (RNS, ByteIO, WS-DAI, SRM) Classic three layer view
Classic 3-layer name scheme Abstract name: EPI, rebinding Addresses Human names RNS file name 1 File replica 1 … WS-name EPR File replica 2 RNS file name n … File replica m This is essentially a table WS-Names are WS-Addresses with optional EPI and resolver EPR
Outline • Background – The Rule of 3s • Specifications • Implementations
Six specs • RNS – directory service that maps human names (strings) to abstract names or addresses (EPRs) • Insert, delete, list • Can build directed graphs, including trees • Leaves can be most anything, web pages, ByteIO endpoints, DMI endpoints, BES resources • RNS 1.1 under development • WS-Naming – A profile on WS-Addressing that supports identity, abstract name to address mapping, and rebinding of addresses – migration, failure, and replication transparency • ByteIO – think POSIX file/steam, read, write, stat • WS-DAI – query interface onto structured data, e.g., relational databases or XML databases • SRM – Management of data stores • BES – Accepts JSDL documents and executes them
Outline • Background – The Rule of 3s • Specifications • Implementations
There are several implementations(not a complete list!) There are over a dozen OGSA-BES/HPC-BP implementations .
Let’s see what you can do with these specifications • Imagine • an access layer that consists of a Grid-aware FUSE file system driver for Linux (both Genesis II and gFarm have these) or a Grid-aware Installable File System (IFS) for Windows (Genesis II has one – G-ICING). • a provisioning layer that proxies Windows/Unix files and directories into the Grid as RNS and ByteIO endpoints and relational databases as WS-DAI endpoints. • OGSA-BES endpoints that also support the RNS specification – allowing jobs to be started simply by copying a JSDL file “into” the directory. • a WS-Trust STS endpoint that also supports RNS
Users can access Grid resources simply by copying files, dragging and dropping, etc. • Applications don’t need to be re-written to access the Grid
Using RNS to name non-file-system components • BES resources are also RNS directories • We can schedule a job on a resource simply by “dropping” it into the directory
Use SRM to abstract from Storage implementations Client 4 SRM 1 2 3 5 Storage • could use RNS • give back byte-I/O endpoint • The client asks the SRM for the file providing an SURL (Site URL) • The SRM asks the storage system to provide the file • The storage system notifies the availability of the file and its location • The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed • The client interacts with the storage using the protocol specified in the TURL 20
WS-DAI endpoints that support RNS • To execute a query, copy a text file with the SQL into the directory that represents the database. The results of the query are accessible as either a file (they can be read, “cat’d”, or loaded into an Excel file as a csv), or subsequently queried as well.
Mapping data into the Grid • Links directories and files from source location to data grid directory and user-specified name • Presents unified view of the data across platforms, locations, domains, etc. • Data publisher controls authorization policy. Data clients Data clients Data publisher Data publisher Data publisher Windows Windows Linux
Moral of the story • RNS allows us to place arbitrary resources into a traditional directed graph/tree structure • FUSE/IFS map RNS namespaces into the local file system • Users can interact with the grid without knowing anything about grids
Data Area Future • From Data Area Gaps Analysis • High-level Data Movement • Caching and Replication • Integrated Data Management • Transactions in a Grid • Recent Interest • Storage Provisioning • Virtualization • Provenance, Integrity, Policy • Link to Digital Libraries • Dependencies • OGSA • Security: IETF, OASIS • Management: DMTF, WSDM/WS-Man Convergence • WS-*: OASIS and W3C, WS-RF/WS-T Convergence 24