300 likes | 417 Views
Storage Resource Broker. Reagan Moore Arcot Rajasekar* Mike Wan* George Kremenek* Bing Zhu Charles Cowart* Sheau-Yen Chen Roman Olschanowsky. Introduction Application Interfaces User Interfaces Demo of Commands Demo of Windows SRB Browser Demo of Metadata Access
E N D
Storage Resource Broker Reagan Moore Arcot Rajasekar* Mike Wan* George Kremenek* Bing Zhu Charles Cowart* Sheau-Yen Chen Roman Olschanowsky
Introduction Application Interfaces User Interfaces Demo of Commands Demo of Windows SRB Browser Demo of Metadata Access Introduction to Admin Tools Accounts & Access Discussion & Hands On Lab Agenda
Background Problems with Data Handling Environment Problems with Metadata Handling SRB and MCAT as a Solution Features of the SRB System Introduction to Extensible MCAT Introduction
Handle 100s of Millions of Datasets Handle Peta Bytes of Data Integrate Data Collections Handle Metadata for Collections Provide Information Discovery Handle Legacy Data and Methods Background
Astronomy: National Virtual Observatory (ITR prop) Integrate 18 sky surveys 2MASS (2 Micron All Sky Survey) 10TB; 5million files LSST (Large-scale Synoptic Survey Telescope) 3TB/night Co-locate Images for Spatial Access Particle Physics: GrPhyN (NSF ITR proj) CERN LHC (Large Hadron Collider) 1PB/yr (1billion obj) Multi-Lab integration Large Data Projects
Molecular Science: (NPACI proj) SLAC 2MB/sec 2TB/yr Privacy Issues Biology: Functional MRI Integration (NIH prop) Integrate multiple MRI Facilities; 1TB/dataset Medicine: Digital Embryo (NIH proj) Collection Management; 500TB Education: NSDL Federate over 40 Existing Collections Projects (Contd.)
Earth System Sciences ESIPS (Earth System Information Providers) Build and Federate Multiple Digital Libraries EOSDIS 7PB by 2007; 3-15MB/sec LTER (Long Term Ecological Research) BioComplexity (KDI) Federate Collections from 20+ sites Persistent Archives: NARA project Store and Recover Data after 400 years 5 million emails; 33 million web pages 90 million personnel records; … Projects (Contd.)
Large Datasets; Large Number of Datasets Distributed, Heterogeneous Storage Collaboration, Access Control, Authentication Replication, Coherency Caching and Data Placements Data Migration over Time and Space Fault Tolerance and Load Distribution Collection Curation and Management Uniform Name Space Management Data Handling Problems
Large Number of Attributes; Large Size Standardized Metadata User-defined Metadata Federation - integration over space Evolution - integration over time Presentation Extraction and Maintenance Metadata Problems
Integration Schema Integration and Crosswalks Ontological Differences Context Dependency Attribute Mappings Type/Semantic Conversion Inter-domain & Intra-domain Integration Metadata Problems
Resource Transparency Local or Remote, Resource Type & Access Method Location Transparency Path Names, Schemas Cross-Domain Authentication & Access Control Uniform User Name Space Uniform Data/Collection Name Space Data Discovery – User-defined Metadata Scalable System Solution SRB
Federated Server Architecture Uniform Access Interface – API Metadata Catalog Handles transparency and access control Proxy Super User – access to remote users Integration of Data Handling & Digital Library Functionalities Replicated Data Management Solution SRB
SRB Interface Application Application MCAT Core SRB Master SRB Agent SRB Server MCAT Dublin Core Eco Core SRB Server SRB Server
Federated SRB Operation Application 1 6 SRB Master 2 3 5 SRB agent SRB agent 4 MCAT
DR DR DL DL DR Client Client Client Client Client Client DR DR MC DR DL SRB Space SRB SRB SRB SRB SRB SRB SRB DL DR - Data Repository DL - Dig Library MC - Meta Catalog SRB SRB SRB
Access to Heterogeneous Resources Concept of Collections Concept of Logical Resources Replication Support Container Support Proxy Operation Support Support for Methods Users and Groups Extensive Access Control Multiple Authentication Schemes User-defined Metadata Information-based Access Multiple Platform Support Rich Interfaces Administration Features
Resources Supported HPSS (DCE and DCE-less), UniTree, ADSM,DMF, Unix FS, Mac OSX FS, NTFS, Oracle, DB2, Sybase, DPSS, HTTP, FTP 32 & 64-bit File Sizes Database Tables and LOBs Platforms/OS Supported Cray, Sun, SGI, AIX, DEC, Linux, MacOSX, NT, 2000, 98*, Me* * Browser only Resources
Logical Abstraction of Directories/Folders Not tied to a host or file system Independent of Path Names Can have datasets in Multiple Resources Access Controlled Curator of Collections/Sub-collections Collection-level metadata Collections
Group of Multiple Physical Resources Resource Metadata : type, bandwidth,… Resource Class: Archival, Permanent, Cache, Volatile, Primary, Secondary Various Usage Modes Automatic Replication Choice (m of n resources) Round Robin (load balancing) Distributed Archive-Cache System Near-Far System Container Movement Logical Resources
Core Functionality Synchronous Replication Replication via Logical Resource definition integrated into open/create & write function Can choose: k out of n Associate replication with containers/collections Consistency Asynchronous Replication - Offline srbObjReplicate API , Sreplicate command, GUI Out of Band Replication - outside SRB Registering of Replicas using srbRegisterReplica API Replication
Choice at Read any replica specific replica (by copy number) round-robin by resource characteristics by timestamp or other characteristics data itself may be identified by meta characteristics user defined metadata & annotations data type, owner, comments, ... Replication (Contd.)
Physical Grouping of Objects Similar to tar but has significant differences Multiple Uses: To take advantage of resource characteristics To aid access patterns Move data sets together Tie together logically different files Automatic Archiving/Caching Chaining of Containers Sharing of metadata Containers for Collections Containers
Containers (Contd.) 1.Create Container 5.Read File 1 2.Write File 1 6.Write File 3 3. Write File 2 7.Sync & Purge 4.Sync Container
Access Control Datasets Collections Resources Multi-level Access Read, Annotate,Write, Curate, Own Access Control for Users and Groups Ticket-Based Access Control Audit Access Access Control
Four Types of Authentication Plain Password (useful for web and ssh) Challenge-Response SEA - RSA Public/Private Keys and RC5 Encryption Algorithm GSI Certificate-based Systems Authentication
Operations performed server-side Compiled/Preloaded Operations (secure) Flexible Interface Examples DataCutter (Univ. Maryland) Copy & Replicate (third-party data movement) Methods Metadata Proxy Operations
Annotations Metadata for datasets and collections 10 strings and 2 integers Flexibly used for storing arbitrary number of metadata Collection-level metadata can store attribute names for datasets Extensible Metadata (next generation of MCAT) User-defined Metadata
Access to Heterogeneous Resources Concept of Collections Concept of Logical Resources Replication Support Container Support Proxy Operation Support Support for Methods Users and Groups Extensive Access Control Multiple Authentication Schemes User-defined Metadata Information-based Access Multiple Platform Support Rich Interfaces Administration Features
Get software at http://www.npaci.edu/DICE/SRB/tarfiles /??? To register as a user (at one of SDSC-based SRBspace) Fill form at http://www.npaci.edu/DICE/SRB/install/SRBUserRegister.html SRB Admin will respond with your authorization password (should be changed immediately) User and Domain name Host name and port number home collection & default resource details Client Registration
Two environment files .srb/.MdasEnv mdasCollectionHome ‘/home/myName.myDomain’ mdasDomainHome ‘myDomain’ srbUser ‘myName’ srbHost ‘srb.sdsc.edu’ defaultResource ‘unix-sdsc’ AUTH_SCHEME ‘ENCRYPT1’ .srb/.MdasAuth MYSRBPASSWD Setting the Client Environs