1 / 11

Lightweight Replication of Heavyweight Data

Lightweight Replication of Heavyweight Data. Scott Koranda University of Wisconsin-Milwaukee & National Center for Supercomputing Applications. Heavyweight Data from LIGO. Sites at Livingston, LA (LLO) and Hanford, WA (LHO) 2 interferometers at LHO, 1 at LLO

jaden
Download Presentation

Lightweight Replication of Heavyweight Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lightweight Replication ofHeavyweight Data Scott Koranda University of Wisconsin-Milwaukee & National Center for Supercomputing Applications www.griphyn.org

  2. Heavyweight Data from LIGO • Sites at Livingston, LA (LLO) and Hanford, WA (LHO) • 2 interferometers at LHO, 1 at LLO • 1000’s of channels recorded at rates of 16 KHz, 16 Hz, 1 Hz,… • Output is binary ‘frame’ files holding 16 seconds data with GPS timestamp ~ 100 MB from LHO ~ 50 MB from LLO • ~ 1 TB/day in total • S1 run ~ 2 weeks • S2 run ~ 8 weeks 4 km LIGO interferometer at Livingston, LA www.griphyn.org

  3. Networking to IFOs Limited • LIGO IFOs remote, making bandwidth expensive • Couple of T1 lines for email/administration only • Ship tapes to Caltech (SAM-QFS) • Reduced data sets (RDS) generated and stored on disk ~ 20 % size of raw data ~ 200 GB/day GridFedEx protocol www.griphyn.org

  4. Replication to University Sites Cardiff MIT AEI UWM PSU CIT UTB www.griphyn.org

  5. Why Bulk Replication to University Sites? • Each has compute resources (Linux clusters) • Early plan was to provide one or two analysis centers • Now everyone has a cluster • Cheap storage is cheap • $1/GB for drives • TB RAID-5 < $10K • Throw more drives into your cluster • Analysis applications read a lot of data • Different ways to slice some problems, but most want access to large sets of data for a particular instance of search parameters www.griphyn.org

  6. LIGO Data Replication Challenge • Replicate 200 GB/day of data to multiple sites securely, efficiently, robustly (no babysitting…) • Support a number of storage models at sites • CIT → SAM-QFS (tape) and large IDE farms • UWM → 600 partitions on 300 cluster nodes • PSU → multiple 1 TB RAID-5 servers • AEI → 150 partitions on 150 nodes with redundancy • Coherent mechanism for data discovery by users and their codes • Know what data we have, where it is, and replicate it fast and easy www.griphyn.org

  7. Prototyping “Realizations” • Need to keep “pipe” full to achieve desired transfer rates • Mindful of overhead of setting up connections • Set up GridFTP connection with multiple channels, tuned TCP windows and I/O buffers and leave it open • Sustained 10 MB/s between Caltech and UWM, peaks up to 21 MB/s • Need cataloging that scales and performs • Globus Replica Catalog (LDAP) < 105 and not acceptable • Need solution with relational database backend scales to 107 and fast updates/reads • No need for “reliable file transfer” (RFT) • Problem with any single transfer? Forget it, come back later… • Need robust mechanism for selecting collections of files • Users/sites demand flexibility choosing what data to replicate • Need to get network people interested • Do your homework, then challenge them to make your data flow faster www.griphyn.org

  8. LIGO, err… Lightweight Data Replicator (LDR) • What data we have… • Globus Metadata Catalog Service (MCS) • Where data is… • Globus Replica Location Service (RLS) • Replicate it fast… • Globus GridFTP protocol • What client to use? Right now we use our own • Replicate it easy… • Logic we added • Is there a better solution? www.griphyn.org

  9. Lightweight Data Replicator • Replicated 20 TB to UWM thus far • Just deployed at MIT, PSU, AEI • Deployment in progress at Cardiff • LDRdataFindServer running at UWM www.griphyn.org

  10. Lightweight Data Replicator • “Lightweight” because we think it is the minimal collection of code needed to get the job done • Logic coded in Python • Use SWIG to wrap Globus RLS • Use pyGlobus from LBL elsewhere • Each site is any combination of publisher, provider, subscriber • Publisher populates metadata catalog • Provider populates location catalog (RLS) • Subscriber replicates data using information provided by publishers and providers • Take “Condor” approach with small, independent daemons that each do one thing • LDRMaster, LDRMetadata, LDRSchedule, LDRTransfer,… www.griphyn.org

  11. Future? • LDR is a tool that works now for LIGO • Still, we recognize a number of projects need bulk data replication • There has to be common ground • What middleware can be developed and shared? • We are looking for “opportunities” • Code for “solve our problems for us…” • Want to investigate Stork, DiskRouter, ? • Do contact me if you do bulk data replication… www.griphyn.org

More Related