1 / 29

Storage Resource Broker Persistent Management of Distributed Data

Storage Resource Broker Persistent Management of Distributed Data. Reagan W. Moore General Atomics, Inc. San Diego Supercomputer Center moore@sdsc.edu http://www.nirvanastorage.com. Topics. Data management systems Data collections, digital libraries Distributed data management Data grids

loan
Download Presentation

Storage Resource Broker Persistent Management of Distributed Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage Resource BrokerPersistent Management of Distributed Data Reagan W. Moore General Atomics, Inc. San Diego Supercomputer Center moore@sdsc.edu http://www.nirvanastorage.com

  2. Topics • Data management systems • Data collections, digital libraries • Distributed data management • Data grids • Persistent data management • Persistent archives • Common infrastructure for data management

  3. Data Collections • Astronomy • CACR Computing Resource (NPACI) • National Virtual Observatory (NSF) • 2 Micron All Sky Survey (NPACI) • DPOSS Collection (NSF-NVO) • Hayden Planetarium • Ecology and Environmental Sciences • CEED (NPACI) • Bionome • HyperLTER (NPACI) • Land Data Assimilation System • Knowledge Networks for BioComplexity (NSF) • Medical Sciences • Digital Embryo (NLM) • Molecular Sciences • JCSG, Synchrotron Data Repository (NSF) • AFCS, Alliance for Cell Signaling (NIH) • NeuroSciences • Biomedical Information Research Network (NIH) • TeleScience Portal (NPACI) • Brain Databases (NPACI) • Brain Data Archiving (NPACI) • Computer Science • DataStreaming (NPACI) • AppLeS, DataCutter (NPACI)

  4. Data Collections • Physics and Chemistry • PPDG, Particle Physics Data Grid (DOE) • GriPhyN (NSF) • BaBar (DOE) • GAMESS (NPACI) • Digital Libraries and Archives • SIO Digital Libraries (NSF) • California Digital Library • ADEPT (NSF) • Stanford Digital Library Project (NSF) • National Archives and Records Administration (NARA) • Data Grids • ROADNet, Real-time Observatories App.and Data management • E-Science at CLRC, UK Grid Starter Kit (UK) • Library of Congress data grid • DOE ASCI Data Visualization Corridor • NASA Information Power Grid • DOE SciDAC - Portal Web Services • NPACI Portal Projects • Education • Transana (NPACI) • Digital Insight (NPACI) • NSDL National STEM Education Digital Library (NSF)

  5. Data Collections • Define the context for describing a collection of digital entities • Context specified by metadata attributes • Provenance, origin of the digital entities • Administrative, location of the digital entities • Technical, purpose of the digital entities • Support organization of attributes as hierarchy of sub-collections

  6. Digital Libraries • Provide services on the data collection • Ingestion, loading of attribute values • Extensibility, definition of new attributes • Discovery, queries on attributes • Browsing, hierarchical listing • Presentation, formatting specified data models

  7. Data Grids • Manage data in a distributed environment • Logical name space, provide global identifier • Data access, storage system abstraction • Replication, disaster back up • Uniform access, common API across file systems, archives, and databases • Single sign-on, authenticate across administration domains

  8. Persistent Archives • Manage technology evolution • Storage system abstraction, support data migration across storage systems • Information repository abstraction, support catalog migration to new databases • Logical name space, support global persistent identifier

  9. SRB • Integration of collection-based management of digital entities, with • Remote data access through storage system abstraction • Catalog access through information repository abstraction • Automation through collection-owned data Storage Resource Broker

  10. Capabilities • Support legacy systems • Integrate archives with file systems • Share distributed data • Maintain persistent collection • Control data access

  11. Digital Entities • Digital entities are “images of reality” made of • Data, the bits (zeros and ones) put on a storage system • Information, the attributes used to assign semantic meaning to the data • Knowledge, the structural relationships described by a data model • Every digital entity requires information and knowledge to correctly interpret and display

  12. Digital Entities • Files • Text documents, images, spread sheets, binary files • URLs • Database query commands • Databases • Directories

  13. Digital Entities • Register digital entities into a catalog • Assign metadata to describe each digital entity • Separate management of the associated data bits from management of the metadata • Support manipulation of each digital entity data type

  14. Technology Management New Application New Operating System Wrap Storage System Wrap Display System Old Storage System Old Display System Migrate Encoding Format Digital Object

  15. Preservation of Data • Migration • Preserve the data bits • Preserve the digital entity name • Preserve the information and knowledge content for presentation by new applications

  16. Migration Advantages • By migrating the digital entity encoding format to new standards, more sophisticated technologies can be applied to express the information and knowledge content inherent in collections of digital entities. • Requires the ability to associate data model with digital entity

  17. Uniform API • Provide common access semantics • Map from the interface preferred by your application to the interfaces required by legacy storage systems

  18. SRB and MCAT C, C++, Libraries Unix Shell Databases DB2, Oracle, Postgres Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX Uniform APIs Application Linux I/O Web WSDL DLL / Python Java, NT Browsers GridFTP Access APIs Consistency Management / Authorization-Authentication Prime Server Logical Name Space Latency Management Data Transport Metadata Transport Catalog Abstraction Storage Abstraction Databases DB2, Oracle, Sybase Servers HRM

  19. Discovery Transparencies • Naming transparency - find a data set without knowing its name • Map from attributes to a global file name • Location transparency - access a data set without knowing where it is • Map from global file name to local file name • Access transparency - access a data set without knowing the type of storage system • Federated client-server architecture

  20. SRB and MCAT C, C++, Libraries Unix Shell Databases DB2, Oracle, Postgres Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX Transparencies Application Linux I/O Web WSDL Access APIs DLL / Python Java, NT Browsers GridFTP Consistency Management / Authorization-Authentication Prime Server Logical Name Space Latency Management Data Transport Metadata Transport Catalog Abstraction Storage Abstraction Databases DB2, Oracle, Sybase HRM Servers

  21. Persistent Collection • Maintain authenticity • Authenticate all accesses • Assign roles for access control lists (curation, write, annotate, read) • Manage audit trails of all operations • Collection-owned data • All accesses through the data management system

  22. SRB and MCAT Application Linux I/O Web WSDL DLL / Python Java, NT Browsers GridFTP C, C++, Libraries Unix Shell Consistency Management / Authorization-Authentication Logical Name Space Latency Management Data Transport Metadata Transport Catalog Abstraction Storage Abstraction Databases DB2, Oracle, Postgres Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX Databases DB2, Oracle, Sybase HRM Persistency Access APIs Prime Server Servers

  23. Preservation • Name transparency • Find a file by attributes (map from attributes to global name) • Location transparency • Access a file by a global identifier (map from global to local file name) • Access transparency • Use same API to access data in archive or file cache • Authenticity • Disaster recovery, replicate data across storage systems • Audit and process management (Similar requirements to a data grid)

  24. SRB & MCAT C, C++, Libraries Unix Shell Databases DB2, Oracle, Postgres Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX Preservation Application Linux I/O Web WSDL DLL / Python Access APIs Java, NT Browsers GridFTP Consistency Management / Authorization-Authentication Prime Server Logical Name Space Latency Management Data Transport Metadata Transport Catalog Abstraction Storage Abstraction Databases DB2, Oracle, Sybase HRM Servers

  25. Technology Convergence • Data grids as basis for distributed data management • Federation of distributed resources • Creation of logical name space to automate discovery • Distributed data collections • Discovery based on attributes • Distributed data storage systems • Digital libraries • Development of services for manipulating, viewing data • Persistent archives • Management of technology evolution

  26. Data Naming Ontologies Concept space Discipline concepts Data grid Global Identifier Collection Discipline attributes Archive / file systems Local file name Data model Attributes that describe data structure

  27. Knowledge Creation • Knowledge syntax (consensus) • RDF, XMI, Topic Map • Knowledge management (recursive operations) • Oracle parallel database • Knowledge manipulation (spatial/procedural rules) • Generation of inference rules and mapping to data models • Knowledge generation (scalable inference engine) • Application of inference rules in inference engine

  28. Knowledge-based Data Grid Ingest Services Management Access Services Relationships Between Concepts Knowledge Repository for Rules Knowledge or Topic-Based Query / Browse Knowledge XTM DTD • Rules - KQL (Model-based Access) Information Repository Attribute- based Query Attributes Semantics SDLIP XML DTD Information (Data Handling System - SRB) Data Fields Containers Folders Storage (Replicas, Persistent IDs) MCAT/HDF Grids Feature-based Query

  29. Reagan W. Moore General Atomics San Diego Supercomputer Center moore@sdsc.edu http://www.nirvanastorage.com

More Related