150 likes | 270 Views
Global Resource Naming. Developers: Reagan Moore ( moore@sdsc.edu ) Arcot Rajasekar ( sekar@sdsc.edu ) Mike Wan ( mwan@sdsc.edu ) Wayne Schroeder ( schroede@sdsc.edu ) Richard Marciano (marciano@sdsc.edu) Goals: Experience from production data grids
E N D
Global Resource Naming • Developers: Reagan Moore (moore@sdsc.edu) Arcot Rajasekar (sekar@sdsc.edu) Mike Wan (mwan@sdsc.edu) Wayne Schroeder (schroede@sdsc.edu) Richard Marciano (marciano@sdsc.edu) • Goals: • Experience from production data grids • Evolution of global resource naming requirements • Projects: • Data grids - NOAO, BIRN, UK e-Science data grid • Digital libraries - NSDL, SCEC • Preservation environments - NARA TPAP, Taiwan • Real-time sensor systems - ROADnet, OOI OGF-22
Intellectual Property Policy • I acknowledge that participation in OGF22 is subject to the OGF Intellectual Property Policy. • Intellectual Property Notices Note Well: All statements related to the activities of the OGF and addressed to the OGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the OGF plenary session, • any OGF working group or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any OGF mailing list, including any working group or research group list, or any other list functioning under OGF auspices, • the GFD Editor or the GWD process • Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions. • Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification.OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process. OGF-22
Data Management Applications • Data grids • Share data - organize distributed data as a collection • Digital libraries • Publish data - support browsing and discovery • Persistent archives • Preserve data - manage technology evolution • Real-time sensor systems • Federate sensor data - integrate across sensor streams • Workflow systems • Analyze data - integrate client- & server-side workflows • Coalescence of requirements into generic infrastructure OGF-22
Extremely Successful • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, … • Astronomy Data grid • Bio-informatics Digital library • Earth Sciences Data grid • Ecology Collection • Education Persistent archive • Engineering Digital library • Environmental science Data grid • High energy physics Data grid • Humanities Data Grid • Medical community Digital library • Oceanography Real time sensor data, persistent archive • Seismology Digital library, real-time sensor data • Goal has been generic infrastructure for distributed data OGF-22
Evolution of Naming • Storage Resource Broker data grid naming • Assemble distributed data into a shared collection • Global naming for: • Users • Files • Resources • Observe that need to make assertions about properties of the shared collection • Shared collection is assembled for a purpose • Need to describe the management policies that define the purpose • Need to describe the management processes that enforce assertions • Observe interactions between the global resource name and the management policies • Storage specific policies OGF-22
Evolution of Naming • Integrated Rule Oriented Data system • Automate application of management policies • Global naming for: • Management procedures • Micro-services executed at each storage system • Management policies • Rules controlling execution of procedures • Persistent state information • Context generated by application of micro-services • Resource is now described by: • Identity • Procedures performed on the resource • Policies that control execution of procedures • State information that can be queried to verify enforcement of policies OGF-22
Simple Example • Institution Research Board approval policies for distribution of data associated with human subjects • Approval flag for distribution of data by a project • May require that data be restricted to storage on specific resource • Approval flag for distribution of individual data sets • May require restrictions on set of resources where data can be moved • Approval flag for distribution of data sets to specific individuals • May require restrictions on set of resource used to access the data • Enforcement requires global naming for the storage resources, users, and data files • Also requires specification of which policy is being enforced • Also requires specification of the access procedure that is used at the storage resource OGF-22
Architecture Implication • Storage resource is combination of: • Physical resource • Server for interacting with storage resource protocol • Rule engine for enforcing management policies and executing procedures under policy control • Associated persistent state information tracking operations performed at resource OGF-22
Using a Data Grid - Details DB iRODS Server Rule Engine Metadata Catalog Rule Base iRODS Server Rule Engine • User asks for data • Data request goes to iRODS Server • Server looks up information in catalog • Catalog tells which iRODS server has data • 1st server asks 2nd for data • The 2nd iRODS server applies rules OGF-22
Logical Names • The logical naming serves multiple functions: • Support naming independence from the choice of physical resource • Can change physical resource without changing name • Support relationships between name spaces • Access constraints on use of storage • Support aggregation within the name space • Collective operations on sets of resources • Support resource specific management policies • Enforce retention and disposition policies for specific storage type • Enable versioning • Changes to policies • Changes to operations specific to resource OGF-22
Global Naming • We observe at least four types of identifiers: • Physical resource name • Logical resource name • GUIDs (globally unique identifiers associated with the logical resource name) • Descriptive metadata associated with the logical resource name • All four types are required: • Physical resource name is storage system dependent • Logical resource name supports aggregation, relationships • GUIDs provide unique identifier across management systems (but do not support aggregation and relations which are dependent on the data management system) • Descriptive metadata provides the context for discovery OGF-22
Aggregations on Resource Names • Load leveling • Dynamically add storage resource to a list • Dynamically distribute data across storage systems • Automated replication • Create multiple copies by writing to a logical resource name and create copy at each associated physical resource • Automated caching • Associate disk cache with a tape archive • SRM style functionality for resource interactions, automate caching of data onto your local resource • Automated file aggregation management • Migrate containers to disk cache for extraction of file from container OGF-22
Federation • Name management between independent data grids • Can my data grid write onto your storage resource? • Resource defined by {data grid, logical name, physical name) • Automation of data movement between data grids • Control policies for • Chained data grids • Central repositories for archiving • Central sources for authoritative copy • Peer-to-peer data grids • Automation of sharing of resource specific policies and procedures • Simple example is redaction rule and associated redaction process • When data is moved onto your storage resource, the associated redaction rule and process are also migrated OGF-22
For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/ http://irods.sdsc.edu/ OGF-22