1 / 13

Self-organizing Smart Namespaces : Next Generation Data Grid Systems

Self-organizing Smart Namespaces : Next Generation Data Grid Systems. Arun Jagatheesan iRODS.org. Content Outline. State of the art Where we stand Concepts What is next, new, hot and exciting? Yesterday’s research - now Today’s research - future?

ishana
Download Presentation

Self-organizing Smart Namespaces : Next Generation Data Grid Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

  2. Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards

  3. State of the art - where we are now(Shameless self promotion or fact!) • Estimated 2 petabytes of data brokerage • Multiple agencies- DoD, NARA, NSF, NIH, … • Multiple countries - US, UK, Japan, France…, Antartica • Span off a private company … • We don’t live in the past anyways…

  4. Concepts and Lessons(Current understanding - looking back) • Don’t hide distributed computing • Allows users to “enjoy” distributed namespace rather than cheat them with “location opaque” namespace (unlike traditional file systems) • Human readable or enjoy-able (No urls, uuids etc) • Logical mappings to physical heterogeneities • Data (files), storage resource, metadata, user groups, policies, and even file systems become logical entities in data grids • Hide every thing including with logical human-friendly names • Keep it simple and scalable (It’s the data model & design) • Not layer on top of another layer. Finished product not lego blocks. • Hybrid approach - Neither too much P2P nor too much centralization. Just the right level of distributed computing with some TLC for users

  5. Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • An use case - LSST • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards

  6. Motivational Use Case • LSST = Large Synoptic Survey Telescope • 150+ Petabytes • Multiple countries, multiple data centers • Multiple heterogeneous file systems (high performance, high distribution, interoperability, P2P, …) • Multiple heterogeneous hardware

  7. Yesterday’s research • Data Grid Workflows and policies • Some concepts prototyped in SRB Matrix • Event, Condition, Action (ECA) based “data grid flows” • If, for, for-each, if-else, switch-case • Server-side workflows on data grids • Use a separate language to capture the recipe of workflow and execute it as action - Data Grid Language • Let the flow be with you (Flow data type was introduced)

  8. Today’s research = future • Now = Lessons learnt + yesterday’s research • Allow logical namespace to reflect local namespace (local file system logically mounted on global namespace) • Allow users to define their own policies and workflows (Services, rules) • iRODS.org - Open source platform - world’s first open source Data Grid Management System (DGMS).

  9. iRODS.org • Its all about the namespace and how user’s or applications interact with it • What if we made this namespace “smart” • ECA Rules + Machine Learning or bootstrapped learning • Event: (any thing, as simple as a file upload) • Condition: based on system or user metadata • Action: Any system-defined or user-defined service

  10. iRODS • Namespace #1 (data) • Human readable data names to data (or virtual data) • Namespace #2 (resource) • Human readable resource names to storage resource (allows distributed computing) • Namespace #3 (policies) • Human readable policy namespace of how data needs to be managed • Again every thing can be accessed and controlled by end-users (not just SYSTEM adminis)

  11. Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • An use case - LSST • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards

  12. OGF, SNIA and iRODS.org • Collaborative data management • FAN / Data grid??? - but still Distributed data management • But still needs a standard simple API as a standard • Data grid namespace on XAM resources • Standardize a simple API (java, C/C++) to provide data grid concepts on top of existing SNIA XAM or products • Open source data grid software • Involve engineers from different participating member organizations • Multi-institutional participation • Multiple countries, mulitple companies, academic and commercial participants

  13. Enthusiasm is contagious http://www.iRODS.org

More Related