240 likes | 348 Views
Part Four: The LSC DataGrid. Part Four: LSC DataGrid. A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool. A: Data Replication. General Principle. Not all pipes are created equal. Neither are all storage locations. Data Requirements.
E N D
Part Four: LSC DataGrid • A: Data Replication • B: What is the LSC DataGrid? • C: The LSCDataFind tool
General Principle Not all pipes are created equal. Neither are all storage locations.
Data Requirements • Catalog 108 files and their locations • What files are where (possibly at more than one place) • Across multiple sites within a Grid • No single point of failure • No central catalog/server
Data Replication Services: Concepts • Abstract logical file name (LFN) from physical filename (PFN) • Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files. • Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.
rls://serverA:39281 rls://serverB:39281 site A site B LRC LRC file1→ gsiftp://serverA/file1file2→ gsiftp://serverA/file2 file3→ gsiftp://serverB/file3file4→ gsiftp://serverB/file4 RLI RLI file1file2 file3file4 file3→ rls://serverB/file3file4→ rls://serverB/file4 file1→ rls://serverA/file1file2→ rls://serverA/file2 Replica Location Service
RLS: Replica Location Service • Globus RLS • Each RLS server usually runs two catalogs: • LRC: Local Replica Catalog • Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs • RLI: Replica Location Index • Catalog of which files (LFNs) that other LRCs in your data grid know about
A Site’s LRC • Each site has LRC with mappings of LFNs to PFNs • usually contains the “local” mappings • where files are located at the site • Example: UMW might have this mapping in its LRC: H-R-792845521-16.gwf → gsiftp://dataserver.phys.uwm.edu/LIGO/H-R-792845521-16.gwf
LRCs Inform Each Other LRC catalog at each site tells remote RLIs what LFNs it has mappings for. • Example: UWM tells Caltech it has a mapping for H-R-792845521-16.gwf • So Caltech RLI has mapping H-R-792845521-16.gwf → LRC at Milwaukee
How it Works (Under the Hood) Ask your local LRC: “Do you know about file X?” • If yes, you can ask your local LRC for the corresponding URL (PFN). • If no, • Ask your local RLI: “Who do I ask about X?” • It will answer, “The RLS server at Site Y.” • Ask the LRC at Site Y, “Do you know about file X?” • It will return the PFN.
SRB: Storage Request Broker • http://www.sdsc.edu/srb/ • Distributed data management solution • Supports management, collaborative (and controlled) sharing, publication, and preservation of distributed data collections • Provides rich set of APIs available to higher-level applications • Provides a management layer on top of a wide variety of storage systems.
SRB • SRB can be thought of as a: • Distributed file system • Datagrid management system • Digital Library system • Semantic Web
SRB as Data Grid Management • Transparent replication • Archiving, caching, synchs, and backups • Heterogeneous storage • Container and aggregated data movement • Bulk data ingestion • Third-party copy & move
LDR: Lightweight Data Replicator • http://www.lsc-group.phys.uwm.edu/LDR • Replicates datasets within a data grid • High-speed data transfers with Globus GridFTP • Globus RLS stored using a MySQL backend • Metadata stored in MySQL backend • Uses GSI for security
LDR • Collections of files to be replicated defined by LRD administrator as a SQL query • Priority queue for scheduling replication
What is the LSC DataGrid? • A collection of LSC computational and storage resources… • … linked through Grid middleware… • … into a uniform LSC data analysis environment.
LSC DataGrid Sites • Tier 1: CalTech • Tier 2: UWM and PSU • Tier 3: UT-Brownsville and Salish Kootenai College (SKC) • Linux clusters at GEO sites Birmingham, Cardiff and the Albert Einstein Institute (AEI) • LDAS instances at Caltech, MIT, PSU, and UWM
Monitoring the LSC DataGrid http://watchtower.phys.uwm.edu/ganglia-webfrontend/
Lab 4: LSCDataFind • In this lab, you’ll: • Verify your DataFind configuration • Find observatories • Find data types • Find actual data (wow!) • Refine a search • Retrieve data you’ve found
Credits • NSF disclaimer • Portions of this presentation were adapted from the following sources: • GryPhyN Grid Summer Workshop • NEESgrid Sysadmin Workshop