150 likes | 275 Views
Production Data Grids SRB - iRODS. Storage Resource Broker. Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb. Topics. Production data grids Architecture and installation challenges Production challenges Interoperability challenges (federation) Applications Data grids - sharing
E N D
Production Data GridsSRB - iRODS Storage Resource Broker Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb
Topics • Production data grids • Architecture and installation challenges • Production challenges • Interoperability challenges (federation) • Applications • Data grids - sharing • Digital libraries - publication • Persistent archive - preservation • Real-time sensor data - collection • Cyberinfrastructure - analysis
BaBar High-Energy Physics • Stanford Linear Accelerator • Palo Alto, CA • IN2P3 • Lyon, France • A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day
Architecture Challenges • Infrastructure heterogeneity • Storage in file systems, archives, ORBs • Choice of database for metadata catalog • Network devices • Management of firewalls, private virtual networks, load levelers • Network latency • Geographic distance between storage locations
Installation Choices • Infrastructure heterogeneity • Provision of drivers for each type of storage system or database • Porting of APIs for each preferred access mechanism • Network devices • Establishment of range of ports for access through firewall • Server-initiated parallel I/O and bulk operations • Network latency • Master-slave metadata catalogs • Federation across multiple independent data grids
Using a Data Grid – in Abstract Data delivered Ask for data • The data is found and returned • Where & how details are hidden Data Grid • User asks for data from the data grid
Data Grid Management • Data grids integrate multiple system components • Application level client software • Federation software • Data grid servers • Data grid metadata catalog • Security infrastructure • Storage systems • Database catalog • Network • A failure in any of the systems is viewed as a failure of the data grid
Operation Challenges • Data grids provide mechanisms to analyze all types of infrastructure failure • Integrity checks • Authenticity checks • System logs • Data grids provide mechanisms to manage all types of infrastructure failure • Replication of data and metadata • Synchronization of replicas • Federation of data grids • Server rebooting and server maintenance mode
Operation Procedures Periodic system administration Manage integrity checks on data Manage audit trails Manage consistency checks on collections Manage synchronization of replicas Manage deletion of files (empty trash can) Track all errors and reported data losses Manage upgrades to new versions of the data grid servers Operational tasks for each data grid Add servers for new storage systems Add new users Respond to user questions Modify access controls on collections and storage Restart data grid servers as needed Identify problems with storage systems Respond to installation questions Integrate user interfaces with data grid
Automation of Management Tasks • integrated Rule-Oriented Data System - iRODS • Express management policies as rules that control the execution of micro-services • Micro-service is a standard operation performed on a remote storage system • Manage persistent state information that describes outcome of the micro-service • Persistent Metadata catalog stores state information • Virtualize the management policies • Logical name space for rules • Logical name space for micro-services • Logical name space for state information • First release in December 2006
iRODS - integrated Rule-Oriented Data System Client Interface Admin Interface Rule Invoker Resources Service Manager Rule Modifier Module Config Modifier Module Metadata Modifier Module Resource-based Micro-services Rule Consistency Check Module Consistency Check Module Consistency Check Module Engine Micro Service Modules Current State Confs Metadata-based Micro-services Rule Base Metadata Persistent Repository Micro Service Modules
Interoperation Virtualization • Management of federation with other data grid technologies • Define micro-service that executes the protocols required by the alternate data grid • Define rule for when this micro-service is executed (link to explicit storage location) • Separately manage state information from application of this micro-service • iRODS enables encapsulation of the rules, access mechanisms, and state information needed for interoperation with other data grids
Federation Between Data Grids Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Collection A Data Collection B • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical persistent state name space • Logical rule name space • Logical micro-service name space • Data Grid • Logical resource name space • Logical user name space • Logical file name space • Logical persistent state name space • Logical rule name space • Logical micro-service name space Access controls and consistency constraints on cross registration of logical name spaces
For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/