1 / 24

Data Area Overview

Data Area Overview. OGF24 15 September 2008. Erwin Laure <Erwin.Laure@cern.ch> David E. Martin <martinde@us.ibm.com> Data Area Directors. Data Area Goals. The Data Area groups explore different aspects of data handling on grids Access Transport Management

ellema
Download Presentation

Data Area Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Area Overview OGF24 15 September 2008 Erwin Laure <Erwin.Laure@cern.ch> David E. Martin <martinde@us.ibm.com> Data Area Directors

  2. Data Area Goals • The Data Area groups explore different aspects of data handling on grids • Access • Transport • Management • Overall Data Architecture developed by OGSA Data Architecture group: • http://www.ogf.org/documents/GFD.121.pdf 2

  3. Data Access • Goals: locate and provide seamless access to data stored on Grids • Data Access and Integration Services (DAIS-WG) • Base Specs Published for Database Access (GFD 74,75,76) • Implementation in OMII-UK • Now Working on Data Access Services for RDF Data Resources • Grid File Systems (GFS-WG) • Naming Spec Published – Resource Namespace Service (GFD101) • Working on Resource Catalog • Prototypes from SDSC, UVA, Univ. of Tsukuba • Data Format Description Language (DFDL-WG) • XML-based languagefor describing the structure of binary and textual files and data streams • Simplifying the Concepts and Trying to Remove Complexity to Shorten Draft Spec • Prototypes from LANL and IBM • Byte IO (ByteIO-WG) • Web Service interface for providing "POSIX-like" file functionality (GFD 87,88) • Spec Finished Comment, Need to Make Small Changes • Production Version from UVA, Will Be in OMII 3

  4. Data Transport • OGSA Data Movement Interface (OGSA-DMI-WG) • Discover and negotiate proper data transport protocols and manage data transport (GFD134) • Working on interoperability • GridFTP WG (GridFTP-WG) • Grid enabled FTP protocol • Spec Published 3 Years Ago (GFD20) • Many Production Implementations • Need Experience Report for Full Standard 4

  5. Data Management • Grid Storage Management (GSM-WG) • Storage Resource Manager (SRM) to provide common interface to storage resources (GFD129) • Several interoperating implementations in production use • Working on 3.0 Spec • Information Dissemination (INFOD-WG) • Model for Information Dissemination; focus on query-like operations • Base specs published (GFD110) • Looking at candidates for follow-on Work • Storage Networking Community Group (SN-CG) • Led by Vincent Franceschini, Chair of SNIA Board • Portal to SNIA Work • Follow-on to EGA Data Provisioning WG 5

  6. Data Grid Specifications and Use Cases Material provided byAndrew Grimshaw (grimshaw@virginia.edu)

  7. Outline • Background – The Rule of 3s • Specifications • Implementations

  8. Access Layer Grid Services Layer Resource Provisioning Layer Files, databases, instruments Interfaces, e.g. FUSE,SAGA, NFS, CIFS Standard portypes (RNS, ByteIO, WS-DAI, SRM) Classic three layer view

  9. Classic 3-layer name scheme Abstract name: EPI, rebinding Addresses Human names RNS file name 1 File replica 1 … WS-name EPR File replica 2 RNS file name n … File replica m This is essentially a table WS-Names are WS-Addresses with optional EPI and resolver EPR

  10. Outline • Background – The Rule of 3s • Specifications • Implementations

  11. Six specs • RNS – directory service that maps human names (strings) to abstract names or addresses (EPRs) • Insert, delete, list • Can build directed graphs, including trees • Leaves can be most anything, web pages, ByteIO endpoints, DMI endpoints, BES resources • RNS 1.1 under development • WS-Naming – A profile on WS-Addressing that supports identity, abstract name to address mapping, and rebinding of addresses – migration, failure, and replication transparency • ByteIO – think POSIX file/steam, read, write, stat • WS-DAI – query interface onto structured data, e.g., relational databases or XML databases • SRM – Management of data stores • BES – Accepts JSDL documents and executes them

  12. Outline • Background – The Rule of 3s • Specifications • Implementations

  13. There are several implementations(not a complete list!) There are over a dozen OGSA-BES/HPC-BP implementations .

  14. Let’s see what you can do with these specifications • Imagine • an access layer that consists of a Grid-aware FUSE file system driver for Linux (both Genesis II and gFarm have these) or a Grid-aware Installable File System (IFS) for Windows (Genesis II has one – G-ICING). • a provisioning layer that proxies Windows/Unix files and directories into the Grid as RNS and ByteIO endpoints and relational databases as WS-DAI endpoints. • OGSA-BES endpoints that also support the RNS specification – allowing jobs to be started simply by copying a JSDL file “into” the directory. • a WS-Trust STS endpoint that also supports RNS

  15. Users can access Grid resources simply by copying files, dragging and dropping, etc. • Applications don’t need to be re-written to access the Grid

  16. You don’t have to imagine

  17. Windows Grid-awre IFS

  18. Linux Grid-aware FUSE

  19. Using RNS to name non-file-system components • BES resources are also RNS directories • We can schedule a job on a resource simply by “dropping” it into the directory

  20. Use SRM to abstract from Storage implementations Client 4 SRM 1 2 3 5 Storage • could use RNS • give back byte-I/O endpoint • The client asks the SRM for the file providing an SURL (Site URL) • The SRM asks the storage system to provide the file • The storage system notifies the availability of the file and its location • The SRM returns a TURL (Transfer URL), i.e. the location from where the file can be accessed • The client interacts with the storage using the protocol specified in the TURL 20

  21. WS-DAI endpoints that support RNS • To execute a query, copy a text file with the SQL into the directory that represents the database. The results of the query are accessible as either a file (they can be read, “cat’d”, or loaded into an Excel file as a csv), or subsequently queried as well.

  22. Mapping data into the Grid • Links directories and files from source location to data grid directory and user-specified name • Presents unified view of the data across platforms, locations, domains, etc. • Data publisher controls authorization policy. Data clients Data clients Data publisher Data publisher Data publisher Windows Windows Linux

  23. Moral of the story • RNS allows us to place arbitrary resources into a traditional directed graph/tree structure • FUSE/IFS map RNS namespaces into the local file system • Users can interact with the grid without knowing anything about grids

  24. Data Area Future • From Data Area Gaps Analysis • High-level Data Movement • Caching and Replication • Integrated Data Management • Transactions in a Grid • Recent Interest • Storage Provisioning • Virtualization • Provenance, Integrity, Policy • Link to Digital Libraries • Dependencies • OGSA • Security: IETF, OASIS • Management: DMTF, WSDM/WS-Man Convergence • WS-*: OASIS and W3C, WS-RF/WS-T Convergence 24

More Related