1 / 24

Part Four: The LSC DataGrid

Part Four: The LSC DataGrid. Part Four: LSC DataGrid. A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool. A: Data Replication. General Principle. Not all pipes are created equal. Neither are all storage locations. Data Requirements.

emera
Download Presentation

Part Four: The LSC DataGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part Four:The LSC DataGrid

  2. Part Four: LSC DataGrid • A: Data Replication • B: What is the LSC DataGrid? • C: The LSCDataFind tool

  3. A: Data Replication

  4. General Principle Not all pipes are created equal. Neither are all storage locations.

  5. Data Requirements • Catalog 108 files and their locations • What files are where (possibly at more than one place) • Across multiple sites within a Grid • No single point of failure • No central catalog/server

  6. Data Replication Services: Concepts • Abstract logical file name (LFN) from physical filename (PFN) • Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files. • Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.

  7. rls://serverA:39281 rls://serverB:39281 site A site B LRC LRC file1→ gsiftp://serverA/file1file2→ gsiftp://serverA/file2 file3→ gsiftp://serverB/file3file4→ gsiftp://serverB/file4 RLI RLI file1file2 file3file4 file3→ rls://serverB/file3file4→ rls://serverB/file4 file1→ rls://serverA/file1file2→ rls://serverA/file2 Replica Location Service

  8. RLS: Replica Location Service • Globus RLS • Each RLS server usually runs two catalogs: • LRC: Local Replica Catalog • Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs • RLI: Replica Location Index • Catalog of which files (LFNs) that other LRCs in your data grid know about

  9. A Site’s LRC • Each site has LRC with mappings of LFNs to PFNs • usually contains the “local” mappings • where files are located at the site • Example: UMW might have this mapping in its LRC: H-R-792845521-16.gwf → gsiftp://dataserver.phys.uwm.edu/LIGO/H-R-792845521-16.gwf

  10. LRCs Inform Each Other LRC catalog at each site tells remote RLIs what LFNs it has mappings for. • Example: UWM tells Caltech it has a mapping for H-R-792845521-16.gwf • So Caltech RLI has mapping H-R-792845521-16.gwf → LRC at Milwaukee

  11. How it Works (Under the Hood) Ask your local LRC: “Do you know about file X?” • If yes, you can ask your local LRC for the corresponding URL (PFN). • If no, • Ask your local RLI: “Who do I ask about X?” • It will answer, “The RLS server at Site Y.” • Ask the LRC at Site Y, “Do you know about file X?” • It will return the PFN.

  12. SRB: Storage Request Broker • http://www.sdsc.edu/srb/ • Distributed data management solution • Supports management, collaborative (and controlled) sharing, publication, and preservation of distributed data collections • Provides rich set of APIs available to higher-level applications • Provides a management layer on top of a wide variety of storage systems.

  13. SRB • SRB can be thought of as a: • Distributed file system • Datagrid management system • Digital Library system • Semantic Web

  14. SRB as Data Grid Management • Transparent replication • Archiving, caching, synchs, and backups • Heterogeneous storage • Container and aggregated data movement • Bulk data ingestion • Third-party copy & move

  15. LDR: Lightweight Data Replicator • http://www.lsc-group.phys.uwm.edu/LDR • Replicates datasets within a data grid • High-speed data transfers with Globus GridFTP • Globus RLS stored using a MySQL backend • Metadata stored in MySQL backend • Uses GSI for security

  16. LDR • Collections of files to be replicated defined by LRD administrator as a SQL query • Priority queue for scheduling replication

  17. B: What is the LSC DataGrid?

  18. What is the LSC DataGrid? • A collection of LSC computational and storage resources… • … linked through Grid middleware… • … into a uniform LSC data analysis environment.

  19. LSC DataGrid Sites • Tier 1: CalTech • Tier 2: UWM and PSU • Tier 3: UT-Brownsville and Salish Kootenai College (SKC) • Linux clusters at GEO sites Birmingham, Cardiff and the Albert Einstein Institute (AEI) • LDAS instances at Caltech, MIT, PSU, and UWM

  20. Monitoring the LSC DataGrid http://watchtower.phys.uwm.edu/ganglia-webfrontend/

  21. Lab 4: LSCDataFind

  22. Lab 4: LSCDataFind • In this lab, you’ll: • Verify your DataFind configuration • Find observatories • Find data types • Find actual data (wow!) • Refine a search • Retrieve data you’ve found

  23. Credits • NSF disclaimer • Portions of this presentation were adapted from the following sources: • GryPhyN Grid Summer Workshop • NEESgrid Sysadmin Workshop

More Related