130 likes | 244 Views
Self-organizing Smart Namespaces : Next Generation Data Grid Systems. Arun Jagatheesan iRODS.org. Content Outline. State of the art Where we stand Concepts What is next, new, hot and exciting? Yesterday’s research - now Today’s research - future?
E N D
Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org
Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards
State of the art - where we are now(Shameless self promotion or fact!) • Estimated 2 petabytes of data brokerage • Multiple agencies- DoD, NARA, NSF, NIH, … • Multiple countries - US, UK, Japan, France…, Antartica • Span off a private company … • We don’t live in the past anyways…
Concepts and Lessons(Current understanding - looking back) • Don’t hide distributed computing • Allows users to “enjoy” distributed namespace rather than cheat them with “location opaque” namespace (unlike traditional file systems) • Human readable or enjoy-able (No urls, uuids etc) • Logical mappings to physical heterogeneities • Data (files), storage resource, metadata, user groups, policies, and even file systems become logical entities in data grids • Hide every thing including with logical human-friendly names • Keep it simple and scalable (It’s the data model & design) • Not layer on top of another layer. Finished product not lego blocks. • Hybrid approach - Neither too much P2P nor too much centralization. Just the right level of distributed computing with some TLC for users
Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • An use case - LSST • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards
Motivational Use Case • LSST = Large Synoptic Survey Telescope • 150+ Petabytes • Multiple countries, multiple data centers • Multiple heterogeneous file systems (high performance, high distribution, interoperability, P2P, …) • Multiple heterogeneous hardware
Yesterday’s research • Data Grid Workflows and policies • Some concepts prototyped in SRB Matrix • Event, Condition, Action (ECA) based “data grid flows” • If, for, for-each, if-else, switch-case • Server-side workflows on data grids • Use a separate language to capture the recipe of workflow and execute it as action - Data Grid Language • Let the flow be with you (Flow data type was introduced)
Today’s research = future • Now = Lessons learnt + yesterday’s research • Allow logical namespace to reflect local namespace (local file system logically mounted on global namespace) • Allow users to define their own policies and workflows (Services, rules) • iRODS.org - Open source platform - world’s first open source Data Grid Management System (DGMS).
iRODS.org • Its all about the namespace and how user’s or applications interact with it • What if we made this namespace “smart” • ECA Rules + Machine Learning or bootstrapped learning • Event: (any thing, as simple as a file upload) • Condition: based on system or user metadata • Action: Any system-defined or user-defined service
iRODS • Namespace #1 (data) • Human readable data names to data (or virtual data) • Namespace #2 (resource) • Human readable resource names to storage resource (allows distributed computing) • Namespace #3 (policies) • Human readable policy namespace of how data needs to be managed • Again every thing can be accessed and controlled by end-users (not just SYSTEM adminis)
Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • An use case - LSST • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards
OGF, SNIA and iRODS.org • Collaborative data management • FAN / Data grid??? - but still Distributed data management • But still needs a standard simple API as a standard • Data grid namespace on XAM resources • Standardize a simple API (java, C/C++) to provide data grid concepts on top of existing SNIA XAM or products • Open source data grid software • Involve engineers from different participating member organizations • Multi-institutional participation • Multiple countries, mulitple companies, academic and commercial participants
Enthusiasm is contagious http://www.iRODS.org