300 likes | 444 Views
Storage Resource Broker. An Introduction Arcot Rajasekar. SDSC/UCSD/NPACI. Data Architecture to Enable Collaboration. Objectives Foster collaboration Provide controlled data sharing Facilitate data-intensive computing Provide Data Persistence Support massive data scales
E N D
Storage Resource Broker An Introduction Arcot Rajasekar SDSC/UCSD/NPACI
Data Architecture to Enable Collaboration • Objectives • Foster collaboration • Provide controlled data sharing • Facilitate data-intensive computing • Provide Data Persistence • Support massive data scales • Secure storage and access to peta bytes of data • Containing several million data sets • Housing thousands of data collections • Query metadata to locate files • Access files regardless of location, operating system, media or hierarchy
Creating A Data Grid Provide uniform access to Lots of data Lots of metadata Lots of Locations Heterogeneous Data Organizations Lots of storage gadgetry Need consistent and straightforward interface Rich Authentication, Access Control & Auditing Facilities Logical Collection Creation & Management Transparent Data Management – Movement, Replication, Placement Extendable Metadata Management
SRB – A Data Grid Solution Storage Resource Broker Federated client-server system that integrates distributed heterogeneous resources using an uniform interface Provides a simple tool to integrate data and metadata handling – attribute-based access Blends browsing and searching Developed at SDSC - operational for 4 years; - brokering over 40 TeraBytes
Case Study: Hayden Planetarium NCSA SGI NY AMNH NYC 2.5 TB UniTree visualization data simulation CalTech SDSC GPFS 7.5 TB IBM SP2 BIRN HPSS 7.5 TB UVa rendering
Data involved in Hayden ISM = Interstellar Medium Simulation run by Mordecai Mac Low of AMNH at NCSA 2.5 Terabytes sent from NCSA to SDSC. Data stored in SRB (HPSS, GPFS). Ionization : Simulation run at AMNH 117 Gigabytes sent from AMNH to SDSC. Data stored in SRB. Star motion: Simulation run at AMNH by Ryan Wyatt 38 Megabytes sent from AMNH to SDSC. Rendering Movies: Intermediate Steps produced 7.5 Terabytes. Data stored in SRB (SDSC, CalTech)
Case Study: SRB in BIRN BIRN Toolkit Collaboration Applications Queries/Results Data Management Viewing/Visualization Mediator GridPort Grid Management Data Model Database Scheduler Database Data Grid Computational Grid NMI SRB Globus MCAT Data Access HPSS File System Distributed Resources
Using a Data Grid – in Abstract Data delivered Ask for data • The data is found and returned • Where & how details are hidden Data Grid • User asks for data from the data grid
Using a Data Grid - Details DB Storage Resource Broker Metadata Catalog Storage Resource Broker • User asks for data • Data request goes to SRB Server • Server looks up data in catalog • Catalog tells which SRB server has data • 1st server asks 2nd for data • The data is found and returned
Using a Data Grid - Details DB MCAT SRB SRB SRB SRB SRB SRB • Data Grid has arbitrary number of servers • Complexity is hidden from users
SDSC Storage Resource Broker & Meta-data Catalog Application Resource, User Java, NT Browsers Prolog Predicate C, C++, Linux I/O Unix Shell Third-party copy Web User Defined SRB Remote Proxies MCAT Databases DB2, Oracle, Sybase Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX HRM Dublin Core DataCutter Application Meta-data
SRB Interface Application Application MCAT Core SRB Master SRB Agent SRB Server MCAT Dublin Core Eco Core SRB Server SRB Server
Federated SRB Operation Peer-to-peer Brokering Read Application in Boston Parallel Data Access Logical Name Or Attribute Condition 1 6 5/6 SRB server 3 SRB server 4 SRB agent 5 SRB agent Durham 2 San Diego Server(s) Spawning R2 R1 MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Data Access R2
SRB Concepts Abstraction of Data and Collections – Virtual Data Organization Virtual Collections: Persistent Identifier and Global Name Space Organization independent of physical location Virtual Data Management Replication & Segmentation Data Aggregation: Containers Seamless Cache Management and Data Placement Metadata & Data Discovery – semantic linking System Metadata - metadata needed to run a data grid User-defined Metadata – Structural & Descriptive Application, Schema-based, Domain-centric extensible and dynamic Attribute-based Access (path names become irrelevant)
SRB Concepts Abstraction of User Space– Global User Space Single sign-on & Seamless Authorization Certificates, (secure) passwords, tickets, group permissions, roles Abstraction of Methods APIs, Command Line, GUI Browsers, Web-Access (Portal,WSDL, CGI) Parallel Access with both Client and Server-driven strategies Fault-tolerant and Reliable data management Proxy and Remote Operations Abstraction of Resources - Resource Virtualization Resource Location, Type & Access transparency Logical Resource Definitions – bundling Duke NCMIR UCLA MCAT Virtual Data Grid (SRB) CalTech SDSC
SRB Features Resource Transparency - uniform access interface Location Transparency - logical naming in collections Cross-Domain Authentication - single account management Rich Access Control - Users, Groups, Resources User Transparency Uniform User Name Space Replicated Data fault tolerant Data Discovery User-defined Metadata Multiple Access Methods GUI, API, Command-line Supported on Heterogeneous Platforms Unix, Windows, Linux DR DR DL DL DR Client Client Client Client Client Client DR DR MC DR DL SRB DL
Authentication Management • GSI • Encrypted Password • GSS-API for Kerberos or DCE • Collection-owned Data • Collection ID installed at each storage system • Users authenticate themselves to the SRB • SRB authenticates to local server
Attributes SRB metadata Location, protocol Unix semantics Authorization, authentication Latency management Container aggregation Administrative Dublin core, provenance Annotations, comments Discipline specific attributes Collection User defined
Logical File Name One of the major functions of SRB is the mapping between a logical file name and its physical file. The mapped info of a logical filename includes: Location of name in collection hierarchy Physical file location: host name and path Protocol: for fetching ‘local’ file Unix semantics for file manipulation Location in container Audit trail Access control list Locking status
Digital Entities Digital entities which can be registered into a logical name space are: • Files/directories • “outside” Files & Directories (shadow links) • URLs • SQL commands • Relational databases
Data Access Control Access Control for Data & Collections User level access control Domain level access control Group level access control Ticket based data access Multi-level Access Read, Annotate,Write, Curate, Own Audit Access
Storage Resource Management Plug-and-play model Well-defined framework for developing a new driver for a new storage system Easy registration of a new resource into SRB Each physical resource has a logical name which is mapped to host name and resource type. Resource Access Control: r/w permission for user, domain, or group
Replica Management Files can be replicated into any valid physical storage resource registered in SRB. Each replica is managed by the same logical filename as the original one and a unique replication number. Each replica can have unique metadata. 1-to-many Replication: A logical resource can contain several physical storage resources. Multiple replicas can be made to the same storage resource Many Modes of Replication: Synchronous Replication Asynchronous Replication - Offline Out of Band Replication - outside SRB
Containers Physical Grouping of Objects Similar to tar but has significant differences Multiple Uses: To take advantage of resource characteristics To aid access patterns Move data sets together Tie together logically different files Automatic Archiving/Caching Chaining of Containers Sharing of metadata Containers for Collections
Metadata Management Metadata Insertion Through User Interfaces Bulk Metadata Insertion from XML files Template Based Metadata Extraction Metadata Search system data user-defined metadata File Content Search: Key words are pre-extracted by a template and saved as user-defined metadata.
Software SRB Federated Server Architecture Uniform Access Interface – Thread-safe Client Programmatic API (C, C++, Java, Perl-through-C, Python) GUI (Java for Unix, Windows Browser inQ [ NT, Me, 98, 2000 ]) Web Support (CGI-Scripts, Portals ) Command Line Interface (Unix, DOS) Metadata Catalog (Oracle, DB2, Sybase, SQLServer) Handles transparencies, authentication, access control, replication, container support, … User-defined Metadata Multi-Platform Support Unix, Linux, Windows, MacOSX (from Cray to Desktop) HPSS, ADSM, UniTree, …, UnixFS, NTFS,…, Oracle, DB2,…
SRB Projects Digital Libraries UCB, Umich, UCSB, Stanford,CDL NSF NSDL - UCAR / DLESE NASA Information Power Grid Astronomy National Virtual Observatory 2MASS Project (2 Micron All Sky Survey) Particle Physics Particle Physics Data Grid (DOE) GriPhyN SLAC Synchrotron Data Repository Medicine Digital Embryo (NLM) Earth Systems Sciences ESIPS LTER Persistent Archives NARA LOC Neuro Science & Molecular Science TeleScience/NCMIR, BIRN SLAC, AfCS, … Over 40 Tera Bytes in 6.6 million files
Contacts For Additional Information: Web: http://www.npaci.edu/dice/srb Mail: srb@sdsc.edu