70 likes | 183 Views
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS. Reagan W. Moore moore @diceresearch.org. Policy-based Data Environments. Purpose - reason a collection is assembled Properties - attributes needed to ensure the purpose
E N D
Policy Based Data ManagementData-Intensive ComputingDistributed CollectionsGrid-Enabled StorageiRODS Reagan W. Moore moore@diceresearch.org
Policy-based Data Environments Purpose- reason a collection is assembled Properties - attributes needed to ensure the purpose Policies- controls for enforcing desired properties, mapped to computer actionable rules Procedures- functions that implement the policies mapped to computer actionable workflows Persistent state information- results of applying the procedures mapped to system metadata Assessment criteria- validation that state information conforms to the desired purpose mapped to periodically executed policies 2 2
Data-Intensive Computing • Support computation at the remote storage location • Low complexity operations (small number of operations per byte) • Manage workflows through distributed rule engine • Integrate with computation at supercomputer • High complexity operations (large number of operations per byte) • Virtualize the workflow • Manage completion of the workflow tasks independently of the choice of platform • Manage provenance information • Derived data products can include generation of advanced indices to support discovery and browsing
Overview of iRODS Architecture User w/Client Can Search, Access, Add and Manage Data & Metadata iRODS Middleware iRODS Metadata Catalog Track information iRODS Rule Engine Tracks Policies iRODS Data Server Disk, Tape, etc. Access distributed data with Web-based Browser or iRODS GUI or Command Line clients. 4
Grid-Enabled Storage • Integrate data processing within storage controller • Very high-speed access to disk • Application of rules that control execution of procedures within the storage controller • Native data grid software runs within controller • Connect disk to any data grid • Next generation of connectivity beyond SAN/NAS technology • Data grid manages the properties of the collection
iRODS Extensible Infrastructure • Clients – specific to discipline and life cycle state • Policies – specific to discipline • Procedures – specific to discipline • Remaining infrastructure is generic • Network transport • Authentication / Authorization • Distributed storage access • Remote execution • Metadata management • Message passing • Rule engine
iRODS is a "coordinated NSF/OCI-Nat'l Archives research activity" under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2011 Budget Supplement in the area of Human and Computer Interaction Information Management technology research. Reagan W. Moore rwmoore@renci.org http://irods.diceresearch.org NSF OCI-0848296 “NARA Transcontinental Persistent Archives Prototype” NSF SDCI-0721400 “Data Grids for Community Driven Applications” 7