230 likes | 379 Views
Towards Data Grid Standard Implementations. Arun Jagatheesan San Diego Supercomputer Center. Open Grid Forum 19 Jan 31, 2007 – session II. Outline. Community Introduction : OGF-GFS User perspective Developer/Vendor Perspective Need for standard community implementation
E N D
Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open Grid Forum 19 Jan 31, 2007 – session II
Outline • Community Introduction : OGF-GFS • User perspective • Developer/Vendor Perspective • Need for standard community implementation • Community implementation process • GFS-WG community architecture sketch • Follow-up actions
Motivation • Global namespace for unstructured data storage • Collaboration amongst multiple partners / teams • Long-term management of unstructured data • Files, collection-based digital entities
Used or Required by • Large scale academic projects • Federal agencies (NARA, LoC, …) • Fortune 500, Forbes Global 2000, ….
DGMS Concept-wise • Large-scale logical file system • File System • Database System • Grid Computing = Data Grid Management System (DGMS) • Core Concepts • Logical shared collections • Logical shared resources • Collaborative communities
Problem solved / Requirements –1 • Collaborative logical namespace • Global collaborations of multiple teams • Collaborations of multiple organizations • Avoid multiple mount points as they restrict scalability of the collaboration • Coordinated data sharing at any granular level (data, metadata, annotations,…)
Problem solved / Requirements –2 • Data Distribution • Multi-site replicas reduce access times • Replicas have the same logical name everywhere in the enterprise (big plus for users) • Concept of replica, copy, cache • Replicas controlled by user, admin, system-enabled (automated or policy based) • Reduce WAN latency (chattiness)
Problem solved / Requirements –3 • Data Classification and Discovery • Major advantage for Global 2000 companies • Tag data with any arbitrary metadata schema • Each team can organize its data based on user-defined attributes • Multiple teams can have different metadata attributes on the same data • Query, discover and access data without knowing path or protocol to be used
User Perspective • Designed for Off the shelf • don’t want to assemble (or DIY) • But able to customize the solution • One point of contact or responsibility • If it does not work I have one mailing list or number to call
Vendor/developer perspective • “OGF-GFS compatible” • OGF-GFS Data Grid Applications • OGF-GFS Data Grid Appliance • Ease of standard evolution • Avoid unnecessary dependencies on multiple interfaces for operations that are the same granular level • Ability to collaborate, learn and compete • An end-to-end solution with common interface • Additional capabilities that add value to the solution
Lessons Learnt • Software v/s Specification • Software implementation to engage and collaborate as we define standards (unless every wants to invest on software development from the start) • Make both the user and vendor/developer happy • Have users happy to be confident to share requirements and demand for the standards from vendors/developers • Vendors/developers know it’s a real thing that can be implemented around their existing products or software
The scope (from GFS Architecture) • A single interface • Protocols • A hybrid of XML and byte-level protocol • XML – command channel of operations • Byte-level – data movement • Possible Functionalities • File namespace and file operations (read, write, … • Meta-data operations (user-defined metadata, search) • Data Grid Language for policy, rules etc.,
XML-command protocol XML-command protocol Byte-level data protocol Byte-level data protocol What could be the right high level picture? Facilitate SOA DGMS Object-transfer
XML-command protocol XML-command protocol Byte-level data protocol Byte-level data protocol What could be the right high level picture? DGMS server DGMS server DGMS server
User perspective User defined meta data for data discovery Secret Recipe Logical Resources Multiple Replicas Users from different organizations
So what will we be doing (products?) • Definition • Concept ( data grid namespace, resource-namespace…) • Initial functionalities (DGMS operations to be targeted) • Namespace (Files, Metadata, Resource, Policy rules) • XML protocol • XML-handshake and message transfer between DGMS-client and DGMS-server • Most importantly… • Software as a common framework for the evolution, adoption and growth of the standard and DGMS concepts
So how will we do it? (process) • Community-based open design (OPEN FORUM) • Design discussions as a community • Code through multiple parties to make sure we keep the vendor/developer community and user community engaged • Community-based open standard (OPEN STDS) • Specs written using wiki and other mechanisms • Community based spec for OGF • Interoperability workshops and Workshops along with other relevant agencies like SNIA or DMTF
How can you get started? • Initial requirements • Can you delete email? (sign up for our mailing list) • Got Bandwidth and browser? (Visit our group page) • Can you scream or shout or smile ( join our WG sessions) • Are you a user or consumer or researcher? • Tell us what is needed? • What should be there for you to put this open source software/standard in production • Are you a vendor/developer? • Have your engineer or developer talk to us (we will convert him to a DGMS developer or DGMS Guru) • We are developing a open standard – take advantage of it and develop a value added solution around it
When do we get started? • Right now (Hmmm.. We did long time back) • Conference calls every other week • Mostly Wednesdays • Attend through phone call, Skype or Polycom Video conference (any thing you like) • Discussions influencing, design requirements • Face to face meeting • Once every quarter (planned), OGF sessions
Suggestions, comments, critics • TO DO • Standard operations based on policies/rules • Take advantage of OGF standards as possible • Other commercial or magic tools could be used below the standard • NOT TO DO
Conclusions • Data Grids • Data Grid Management systems (DGMS) • Very good user need in academic and non-academics • Need for standards framed by Grid File System WG • Software-included Spec Strategy