230 likes | 377 Views
Rule-Based Data Management Systems. Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, sekar}@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/. Topics. Managing distributed shared collections Data grids Control of name spaces - SRB Production system
E N D
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, sekar}@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/
Topics • Managing distributed shared collections • Data grids • Control of name spaces - SRB • Production system • Data and trust virtualization • Infrastructure independence • Control of management policies - iRODS • Next generation technology • Management virtualization • Rules controlling remote operations • Constraints on the rules and remote operations
Data Management Applications • Data grids • Share data • Digital libraries • Publish data • Persistent archives • Preserve data • Real-time sensor streams • Data federation • Data analysis • Automate access to distributed data
Concepts • Distributed Data Management Concepts • Data virtualization • Manage the properties of a shared collection independently of the storage systems • Trust virtualization • Administrative domain independence • Federation • Managing interactions between data grids • Rule-based Data Management • Policy virtualization • Automating execution of management policies • Applying management policies to remote operations
Data delivered Ask for data • The data is found and returned • Where & how details are hidden Using a Data Grid – in Abstract Data Grid • User asks for data from the data grid
DB Storage Resource Broker Server Metadata Catalog Storage Resource Broker Server Using a Data Grid - Details • User asks for data • Data request goes to SRB Server • Server looks up information in catalog • Catalog tells which SRB server has data • 1st server asks 2nd for data • The data is found and returned
Data Virtualization • Manage properties of each digital entity independently of the remote storage systems • Infrastructure independence • Properties of the shared collection • Name spaces • Persistent state information (location, size,…) • Manage standard operations • Map from client requests to standard operations • Map from standard operations to remote storage system protocol
Data Virtualization Data Access Methods (C library, Unix, Web Browser) Data Collection • Storage Repository • Storage location • User name • File name • File context (creation date,…) • Access controls • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical context (metadata) • Access constraints Data is organized as a shared collection
Data Virtualization Access Interface Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Standard Access Actions Data Grid Standard Micro-services Storage Protocol Storage System
Standard Operations • File manipulation • Posix I/O calls - open, close, read, write, seek, … • Register, replicate, checksum, synchronize • Bulk operations • Bulk data transport, metadata load • Parallel I/O streams • Remote procedures • Data filtering, subsetting, metadata extraction • Remote library execution (HDFv5, DataCutter)
BaBar High-Energy Physics • Stanford Linear Accelerator • IN2P3 • Lyon, France • Rome, Italy • San Diego • RAL, UK • A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day
Next Generation Technology • Every fault that occurs in the distributed environment is the responsibility of the data grid • Network outage / system crash / operator error • Minimize risk through checksums, replicas, synchronization, federation • Management of large collections is labor intensive • Initiation of recovery operations after remote system failure • Need to automate execution of management policies
Controlling Remote Operations iRODS - integrated Rule-Oriented Data System Support unique organizational / social management policies for each collection
Rule-based Data Management • Express assessment criteria through sets of required persistent state information • Express management policies as sets of rules controlling the execution of micro-services • Express capabilities as sets of micro-services • Manage persistent state information resulting from the application of rules controlling execution of remote micro-services
Management Virtualization • Examples of management policies • Integrity • Validation of checksums • Synchronization of replicas • Data distribution • Data retention • Access controls • Authenticity • Chain of custody - audit trails • Track required preservation metadata - templates • Generation of Archival Information Packages
Rule-based Data Management • Rules required for standard operations • Posix I/O control • Standard SRB operations • Administrator controlled rules to implement management policies • Administrative - adding / deleting users, resources • Data ingestion - pre-processing, post-processing • Data transport / deletion - parallel I/O streams, disposition • User-defined rules, create your own server-side workflow • Rule set for a particular collection, particular user group, particular storage system, particular micro-service
iRODS Rule • Each rule defines • Event • Condition • Action sets (micro-services and rules) • Recovery sets • Rule types • Atomic, applied immediately • Deferred, support deferred consistent constraints • Periodic, typically used to validate assertions
Rule-based Access • Associate security policies with each digital entity • Redaction, access controls on structures within a file • Time-dependent access controls (how long to hold data proprietary) • Associate access controls with each rule • Restrict ability to modify, apply rules • Associate access controls with each micro-service • Explicit control of operation execution within a given collection • Much finer control than provided by Unix r:w:e
Federation Between Data Grids Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Collection A Data Collection B • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical rule name space • Logical micro-service name • Logical persistent state • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical rule name space • Logical micro-service name • Logical persistent state
Rule-based Federation • When registering a digital entity into another data grid, register required management rules along with the digital entity • Move management policies with data • Expectation that each operation on each digital entity can be controlled across federated data grids • Example is end-to-end encryption
Evolution of Rule-based Systems • Logical name spaces enable dynamic addition of new rules, micro-services, and state information • Apply new rules on one collection while applying old rule sets on a legacy collection • Can run old and new rule sets in parallel • Can build a system that manages its evolution • Can create rules that track the evolution of the rule-based system • Can create rules that govern migration to new rule sets
Assessment Rules • Can build a system that monitors its own state information • Parse audit trails to verify accesses by authorized persons • Parse persistent state information for compliance with management rules • Test micro-services for compliance with rules • Audit all accesses to a collection • Compare system properties to desired outcomes
For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu SRB: http://www.sdsc.edu/srb/ iRODS: http://irods.sdsc.edu/