1 / 18

Managing NOAO Distributed Archive using SRB

Managing NOAO Distributed Archive using SRB. Irene Barg Ray Plante Phil Warner NOAO/NCSA DTS Team. Overview. Introduction Background The NOAO Data Flow NOAO-NCSA zone Architecture DTS Components DTS Point2Point Messaging Integrating SRB into NCSA Model

yaphet
Download Presentation

Managing NOAO Distributed Archive using SRB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing NOAO Distributed Archive using SRB Irene Barg Ray Plante Phil Warner NOAO/NCSA DTS Team

  2. Overview • Introduction • Background • The NOAO Data Flow • NOAO-NCSA zone Architecture • DTS Components • DTS Point2Point Messaging • Integrating SRB into NCSA Model • NSA Data Service a replacement for DTS • Conclusion

  3. Introduction • The National Optical Astronomy Observatory (NOAO): • formed in 1982 to consolidate all AURA-managed ground-based astronomical observatories: • Kitt Peak National Observatory, Cerro Tololo Inter-American Observatory, and the National Solar Observatory. • URL: http://www.noao.edu/ • The NOAO Science Archive (NSA): • first released in 2002, • a scientific archive of the optical and infrared data holdings from NOAO’s Survey Program. • URL: http://archive.noao.edu/nsa/ • The NOAO Data Products Program (DPP) data flow system: • combines new data storage, data reduction pipelines, VO portals, and a transport system to link these together. • NOAO’s first step towards establishing a data center that is relevant in the Virtual Observatory (VO) era. • a core piece of this integrated system is the data management and transport system (DTS). • Partnership with National Center for Supercomputing Applications (NCSA): • contributing member of the DTS team providing permanent storage of NOAO data products; • contributing member of the Security Services component of NSA R3. • URL: http://www.ncsa.edu/

  4. Background • April 2004 Prototype - Data Cache Initiative (DCI) • Designed to use existing software • NOAO Save-the-Bits - based on BSD UNIX line printer daemon, lpd, to provide queued network data transfers from instruments in the telescopes to a central mountain cache. • NCSA BIMA Archive Real Time Transfer - an rsync-based queuing mechanism, used to mirror the mountain cache to downtown data centers. • SCSC SRB for transport and management of replicas from each hemisphere’s data center to NCSA for off-site storage.

  5. NOAO Data Flow

  6. Prototype DCI (conti) • Pros • Easy to get up and running - all three segments integrated within 3 months. • Mostly automated. • Cons • Single MCAT in Tucson - single point of failure. • Replicas vs. copies. • SRB wasn’t being used to it’s full potential: • Remote resources were used but distributed data management was not.

  7. Time to get serious…. • December 2004 - initial 4TByte caches were near capacity. • NOAO NEWFIRM instrument was in the wings and would contribute an additional 40GB/night. • Distributed volume management was imperative.

  8. Enter SRB Zones ….. • February 2005 began evaluating Federated MCAT SRB zone models. • Quick proof of concept tests were promising. • SRB was a mature package with a good track record (BIRN, NARA, NASA IPG). • Momentum with current DCI prototype. • November 2005 current Data Transport System (DTS) zone model was deployed at 5 sites.

  9. NOAO Zone Architecture • Each DTS site is a zone with it’s own MCAT. • The NCSA data transfer code ‘rsyncer’ was modified: • Functionality remained, but SRB replaced ‘rsync’. • The new zone client (‘zclient’) resides at each site: • operates in a ‘pull’ fashion reducing network overhead; • communicates with local SRB server; • transfers are executed between zone SRB servers for efficiency. • Each zone has a copy of each others data. • All transfers will be threaded for efficiency. • Point2point messaging provides robustness and automation.

  10. NOAO Zone Architecture

  11. DTS Point2Point Messaging • The P2P message system was chosen because: • Wanted to keep it simple: • Each message has only one consumer. • A sender and a receiver of a message have no timing dependencies. The receiver can fetch the message whether or not it was running when the client sent the message. • The receiver acknowledges the successful processing of a message by making a remove request. • Wanted to make it robust: • Upon any successful request, the message queue dumps to file. • Message daemon runs continuously.

  12. DTS Point2Point Messaging Msg Msg STB (KPNO) noao-kpno (zclient) Message Queue (1335) Request get Request add Request remove Msg Msg noao-kpno (zclient) noao-tuc (zclient) Message Queue (1435) Request get Request add Request remove Msg Msg noao-tuc (zclient) noao-ls (zclient) Message Queue (1535) Request get Request add Request remove Msg Msg noao-tuc (zclient) uiuc-ncsa (zclient) Message Queue (4335) Request get Request add Request remove

  13. Integrating SRB into the NCSA model for astronomical archives • Model • Size of archive exceeds capacity of spinning disk • Long-term storage: NCSA Unitree system • Note: not SRB-enabled • Archive disk cache contains copies of most preferred and/or recently accessed data • Data access services provide uniform access regardless of physical location • Data is transparently migrated from Unitree to cache as needed by users • Older data are silently removed from the cache to make room • Question: can we easily get SRB requests to trigger external actions, like migrating data? • Data can be distributed across disk caches to optimize for different patterns of access

  14. Integrating SRB into the NCSA model for astronomical archives (conti) • Further developments • Supporting new high-data rate archives • Dark Energy Survey 300 TB raw data over 5 years • Large Synoptic Survey Telescope: 15 TB raw data, 130 TB processed per night • Deployable data replication system • For reliable replication between observatory sites and out to tiered partner sites. • Based on SRB • We will be looking for ways to take greater advantage of SRB’s information management • Advanced caching strategies for high-throughput data transfers

  15. NSA Data Service A replacement for DTS • DS a "data access layer" for NSA • file incorporation/registration • file replication • file access • SRB Usage • volume management • file transfer (replication) • DS extends SRB functionality for DS-specific tasks • provides abstract access to data • contains logic for replication • copies in each physical location • not simply resource registration • DS Uses Jargon for SRB access

  16. NSA Data Service A replacement for DTS • DS Abstraction Layer • near-term, DS needs: • disk-level access • SRB access • Long-term • access to other resources • Globus, HTTP, FTP, WebDAV, etc.

  17. Conclusions • The new DTS design: • Eliminates: • Single point-of-failure • Single site dependency • Gives us: • Location independency • Mountain caches, downtown data centers, can function as independent components. • Location transparency • Request for a file can be obtained from any location, even if the file no longer resides at that location. • Federated MCAT and zone SRB play key role.

  18. Acknowledgements • NOAO DTS Team - Dr. Chris Smith, Rob Seaman, Nelson Zarate, Nelson Saavedra • NOAO NSA Data Service: Phil Warner • NCSA: Ray Plante, Ramon Williamson, and David Fleming • SDSC: Reagan Moore, Arcot Rajasekar, Wayne Schroeder, Michael Wan, George Kremenek, Roman Olschanowsky, Sheau-Yen Chen, and the entire SRB team! • SRB Perl API author: Michal Wronski • srbChat Community - Jean-Yves Nief, Barz Hsu, Adil Hasan

More Related