140 likes | 219 Views
The Globus Replica Management System. The Problem. “Enable a geographically distributed community [of thousands] to perform sophisticated, computationally intensive analyses on Petabytes of data”. Example: CERN Large Hadron Collider. Multiple petabytes of data per year
E N D
The Problem “Enable a geographically distributed community [of thousands] to perform sophisticated, computationally intensive analyses on Petabytes of data”
Example: CERN Large Hadron Collider • Multiple petabytes of data per year • Copy of everything at CERN (Tier 0) • Subsets at national centers (Tier 1) • Smaller regional centers (Tier 2) • Individual researchers have copies • How to keep track of all copies? • Select among available copies or create a new copy?
Outline • Globus Replica Management • Replica catalog • Cooperation with other Information Services • Replica selection • Dynamic replica creation • Metadata catalogs • Application scenario • Outstanding issues
Our Approach to Replica Management • Identify replica cataloging and reliable replication as two fundamental services • Layer on other Grid services: GSI, transport, MDS Information Service • Use LDAP as catalog format and protocol, for consistency • These services can be used as building blocks for higher-level services
The Replica Catalog:An Information Service • Registers new copies of files and collections • Responds to queries about existing replicas • Maintains a mapping between logical names for files and collections and one or more physical locations • Uses the LDAP protocol • Accessed by higher-level tools that perform: • Selection of replicas based on performance • From Information Services (MDS, NWS) • Dynamic creation of replicas in response to demand
Replica Catalog Structure: A Climate Modeling Example Replica Catalog Logical Collection C02 measurements 1998 Logical Collection C02 measurements 1999 Filename: Jan 1998 Filename: Feb 1998 … Logical File Parent Location jupiter.isi.edu Location sprite.llnl.gov Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical File Jan 1998 Logical File Feb 1998 Size: 1468762
Components of the GlobusReplica Manager • Replica catalog definition • LDAP object classes for representing logical-to-physical mappings in an LDAP catalog • Low-level replica catalog API • globus_replica_catalog library • Manipulates replica catalog: add, delete, etc. • High-level reliable replication API • globus_replica_manager library • Combines calls to file transfer operations and calls to low-level API functions: create, destroy, etc.
Replica Catalog API • globus_replica_catalog_collection_create() • Create a new logical collection • globus_replica_catalog_collection_open() • Open a connection to an existing collection • globus_replica_catalog_location_create() • Create a new location (replica) of a complete or partial logical collection • globus_replica_catalog_collection_list_filenames() • List all logical files in a collection • globus_replica_catalog_location_search_filenames() • Search for the locations (replicas) that contain a copy of all the specified files
Replica Selection Relies on Information Services • Replica catalog identifies all existing copies of files or collections • Select among them based on performance • Consult other Information Services • Network Weather Service: network performance between source, destination • Information Service for Storage Systems: file system capacity and performance • Wide variety of selection algorithms
Dynamic Replica Creation andInformation Services • Application manager needs to guarantee a certain level of performance • Bandwidth from source to destination • Rate of accesses • Using information services (NWS, MDS): • Determine that existing replicas can’t provide that performance • Identify location to create a new replica with desired capacity and performance • Data distribution services
Relationship of Replica Managerand Metadata Catalogs • Metadata Services: • Information Services that describe data contents • Replica Management Service interacts with a variety of metadata catalogs • Globus: simple set of object classes • MCAT • Community-defined metadata catalogs using common set of attributes • Metadata service produces logical names needed by replica catalog: • Logical collections • Logical files
A Model Architecture for Data Grids Attribute Specification Replica Catalog Metadata Catalog Application Multiple Locations NWS Logical Collection and Logical File Name Selected Replica Replica Selection MDS gsiftp commands Performance Information and Predictions Disk Cache TapeLibrary Disk Array Disk Cache Replica Location 1 Replica Location 2 Replica Location 3
Outstanding Issues for Replica Management • Early architecture assumed a read-only workload • What update models should we support? • What high-level operations are needed? • Combine storage and catalog operations • Relationship to databases • Replicating the replica catalog • Alternate catalog views: files belong to more than one logical collection