1 / 15

Globus Replica Management

This article discusses the importance of replica management in data grids, including the use of a replica catalog and reliable replication methods. It also explores the integration of replica management with metadata catalogs.

dalet
Download Presentation

Globus Replica Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000

  2. Replica Management • Maintain a mapping between logical names for files and collections and one or more physical locations • we define a replica to be a “managed copy of a file”. • The replica management system controls where and when copies are created, and provides information about where copies are located. However, the system does not make any statements about file consistency. In other words, it is possible for copies to get out of date with respect to one another, if a user chooses to modify a copy.

  3. Our Approach to Replica Management • Identify replica cataloging and reliable replication as two fundamental services • Layer on other Grid services: GSI, transport, information service • Use LDAP as catalog format and protocol, for consistency • Use as a building block for other tools • Advantage • These services can be used in a wide variety of situations

  4. Reliable Replication Reliable Transport Disk Cache TapeLibrary Disk Array A Model Architecture for Data Grids Attribute Specification Replica Catalog Metadata Catalog Application Multiple Locations NWS Logical Collection and Logical File Name Selected Replica Replica Selection MDS Performance Information and Predictions Disk Cache Replica Location 1 Replica Location 2 Replica Location 3

  5. Replica Manager Components • Replica catalog definition • LDAP object classes for representing logical-to-physical mappings in an LDAP catalog • Low-level replica catalog API • globus_replica_catalog library • Manipulates replica catalog: add, delete, etc. • High-level reliable replication API • globus_replica_manager library • Combines calls to file transfer operations and calls to low-level API functions: create, destroy, etc.

  6. Replica Catalog Logical Collection C02 measurements 1998 Logical Collection C02 measurements 1999 Filename: Jan 1998 Filename: Feb 1998 … Logical File Parent Location jupiter.isi.edu Location sprite.llnl.gov Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical File Jan 1998 Logical File Feb 1998 Size: 1468762 Replica Catalog Structure: A Climate Modeling Example

  7. Replica Catalog API • globus_replica_catalog_collection_create() • Create a new logical collection • globus_replica_catalog_collection_open() • Open a connection to an existing collection • globus_replica_catalog_location_create() • Create a new location (replica) of a complete or partial logical collection • globus_replica_catalog_collection_list_filenames() • List all logical files in a collection • globus_replica_catalog_location_search_filenames() • Search for the locations (replicas) that contain a copy of all the specified files

  8. Replica Management API • globus_replica_management_register_files: • Register a set of files at a source location in a replica catalog. • globus_replica_management_copy_files: • Replicate a set of files from a source location to a destination: i.e., copy the files and update the replica catalog. • globus_replica_management_is_current: • Function that returns a boolean vector that indicates if the specified files are out of date, with respect to a user-provided comparison function (Note: Just how to implement this function remains to be seen.) • globus_replica_management_update_files: • Update a set of files from a source location to a destination. • globus_replica_management_delete_files: • Delete a set of replicas from a specified location: i.e., delete the files and update the replica catalog. • globus_replica_management_synchronize_filenames() • Ensure that the location object for a physical storage directory correctly reflects the contents of the directory

  9. Replica Catalog Servicesas Building Blocks: Examples • Combine with information service to build replica selection services • E.g. “find best replica” using performance info from NWS and MDS • Use of LDAP as common protocol for info and replica services makes this easier • Combine with application managers to build data distribution services • E.g., build new replicas in response to frequent accesses

  10. Relationship to Metadata Catalogs • Metadata services describe data contents • Have defined a simple set of object classes • Must support a variety of metadata catalogs • MCAT being one important example • Others include LDAP catalogs, HDF • Community metadata catalogs • Agree on set of attributes • Produce names needed by replica catalog: • Logical collection name • Logical file name

  11. Globus and SRB:Integration Plan • FTP access to SRB-managed collections • SRB access to Grid-enabled storage systems GSI FTP Protocol Misc. FTP Clients Globus Server Transport API SRB Client API SRB Server FTP Transport Interface MCAT Globus Client Transport API GSI Enabled FTP Server GSI FTP Protocol

  12. Outstanding Issues • What write consistency should we support? • Methodology for handling updates • Access Control • Replicating the replica catalog • Replication of partial files • Alternate catalog views: files belong to more than one logical collection • Intermediate feedback required (callbacks) • Timing

  13. Status • Grid FTP and catalog management API and tools in alpha test • Demonstration applications with climate data • SRB/Globus data grid services integration underway • Replica Management API under design • Grid based access control strategy under design

  14. ReplicaPrograms Globus Data-IntensiveServices Architecture CustomServers globus-url-copy globus_replica_manager CustomClients globus_gass globus_gass_copy globus_gass_transfer globus_ftp_client globus_ftp_control globus_replica_catalog Legend globus_io OpenLDAP client Program In Alpha globus_common GSI (security) Library Released

  15. The End

More Related