370 likes | 473 Views
Globus – Part II. Sathish Vadhiyar. Globus Information Service. MDS. Meta directory service, Monitoring and discovery service For publishing and accessing system and application data Can restrict access to MDS information by using GSI
E N D
Globus – Part II Sathish Vadhiyar
MDS • Meta directory service, Monitoring and discovery service • For publishing and accessing system and application data • Can restrict access to MDS information by using GSI • Interacts with local information services – hour-glass mechanism • Provides caching to minimize transfer of upto-date information and lessen network overhead
MDS • Integrates existing systems while providing uniform and extensible data model • Uniform API • Adopts data representation and API, query language and protocol from LDAP directory service • Uses 2 protocols • GRIP – for providing information about entities • GRRP – for registering entities • LDAP query language supports: • Search • Enquiry • subscription
MDS Architecture GIIS – Grid Index Information Service GRIS – Grid Resource Information Service
MDS • Support for multiple information service providers - information providers specified on a per attribute basis • MDS Data: • System information: architecture, OS • Network information • Load status • Additional information sent to GIIS by GRAM reporter • Job status • Queue information • Information viewed through web browser or web client commands
MDS • Contains entries where each entry is associated with one or more attribute:value pairs • Each entry associated with a distinguished name. • Object class are associated with entries – for object types
Data Grid • Challenges: • Petabytes and terabytes of data • Query management to this huge data • Cache management • Providing gigabit/sec QoS • Coscheduling data transfers and computation • Selection of dataset replicas • Maximize use of scarce storage, computation and network resources
Data Grid Motivation • Application requirements: • A reliable secure high-performance data transfer protocol • Management of multiple copies of files and collections of files
GridFTP • Secure file transfer over Grid • Multiple data channels for parallel transfers – using multiple TCP streams in parallel to improve aggregate bandwidth • Partial file transfers • Third-party (direct server-to-server) transfers by adding GSSAPI security to the existing third-party data transfers in FTP standard – transfers between 2 servers mediated by a third-party client • GSSAPI operations authenticate the third party to the source and destination machines of data transfer
Grid FTP contd… • Authenticated data channels - both GSI and Kerberos security • Reusable data channels • Striped data transfers • 2 libraries: • globus_ftp_control_library – implements control channel API • gobus_ftp_client_librray – implement GridFTP API • Plugin mechanisms for fault tolerance, performance monitoring, and extended data processing
Globus Replica Management Architecture • Replica management • For better performance or availability to accesses • Mainly for access to “published” resources – read-only model • Functions: • Architecture: • Lower level replica catalog API • Higher level replica management API
Replica catalog • Provides mapping between logical names of files/locations and physical objects on storage systems • Stores 3 kinds of entries • Logical collection – user defined collections of files – file aggregation • Location entries – physical locations of files • Logical files – globally unique names • Replica catalog API provides operations on the replica catalog • Replica management API provides session management, catalog creation, file maintenance, access control • Implemented with LDAP
Replica management • Globus Replica Management integrates the Globus Replica Catalog (for keeping track of replicated files) and GridFTP (for moving data) and provides replica management capabilities for data grids. • The globus_replica_management library provides client functions that allow files to be registered with the replica management service, published to replica locations, and moved among multiple locations. • Managing the copying and placement of files in a distributed computing system so as to improve the performance of data analysis
Replica management service - functions • Registration of files with the replica management service • Creation and deletion of replicas of previously registered files • Enquiries concerning the location and performance characteristics of replicas. • Replica selection based on performance characteristics
Replica management • Replica management API – combines storage system operations with calls to low-level catalog API functions • Replica management system controls where and when copies are created and provides information about copies • But does not ensure file consistency
RM API • Session management • Session handles and attributes • Restart • Rollback • Catalog creation and file management • Creating catalog entries • registering files • Publishing files • Copying, deleting files • Future ideas • Incorporating advance researvation • Automatic replica selection and creation • Data grid projects • http://www.globus.org/datagrid/projects.html
Replica Selection in Globus Data Grid (Vazhkudai et al.) • Replica selection uses MDS for information regarding characteristics of storage systems • LDAP information organized as DIT (Directory Information Tree) • Each storage resource in Data Grid incorporates GRIS • LDAP can execute shell scripts in the background to obtain various dynamic entities like availableSpace, mountPoint etc. • Static attributes like seek times can be entered by the system administrator • Attributes like data transfer rates across networks to clients can be obtained based on past performance, i.e., historical data • ClassAds can also be used for expressing storage attributes
Steps in Replica Management • Application queries metadata expressing desired characteristics of logical files • A logical file is returned • Application queries replica catalog for replica instances for the logical file • Storage broker helps to choose a particular replica
Storage Architecture steps • Application presents classAds regarding replica requirements to SB • SB does search: • Queries replica catalogs with the list of all replicas • Queries individual GRIS of replicas about their characteristics • Collects all information and proceeds to matching • Match: • Converts replica capabilities to replica classAds • Matches application classAds to replica classAds • Accesses file using GridFTP
Globus References / sources / credits • Grid Information Services for Distributed Resource Sharing. K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001. • Usage of LDAP in Globus. I. Foster, G. von Laszewski.This short note describes the use of LDAP in the Globus toolkit. It answers three questions: What is LDAP? Where is it used? and Why is it used in Globus? • A Directory Service for Configuring High-Performance Distributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symposium on High-Performance Distributed Computing, pp. 365-375, 1997.Describes the Metacomputing Directory Service used to maintain information about Globus components.
Globus References / sources / credits • The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. Journal of Network and Computer Applications, 23:187-200, 2001 (based on conference publication from Proceedings of NetStore Conference 1999). • Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing. B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. IEEE Mass Storage Conference, 2001.Presents the design and performance characteristics of two fundamental technologies for data management. • Replica Selection in the Globus Data Grid. S. Vazhkudai, S. Tuecke, I. Foster. Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), pp. 106-113, IEEE Computer Society Press, May 2001.Discusses a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives.
RFT (Reliable File Transfer) • Treat movement of multiple files as a single job • Accept transfer requests and reliably manage requests • OGSI compliant • To transfer data reliably between two GridFTP servers • Uses Grid Service Handles (GSH) • Acts as a proxy for the user, acts as client on user’s behalf for third-party transfers
RFT • Client submits SOAP description of data transfer job • Maintains checkpoints in data bases • Supports both “push” and “pull” mechanisms
Data Grid Replica Services • Need for meta-data services • Various kinds: • Application metadata • Replica metadata • System configuration metadata • Replica management • For better performance or availability to accesses • Mainly for access to “published” resources – read-only model
Replica Catalog • Provide mappings between logical names for file or collections and one or more copies of those objects on physical systems • Services provided by replica catalog: • Registering a list of files as a logical collection • Registering the physical location of a complete or partial replica of a logical collection • Registering information about a particular logical file in a logical collection • Modifying the contents of registered entities of the catalog • Responding to queries of the catalog • The Globus Replica Catalog supports replica management by providing mappings between logical names for files and one or more copies of the files on physical storage systems