180 likes | 350 Views
Managing NOAO Distributed Archive using SRB. Irene Barg Ray Plante Phil Warner NOAO/NCSA DTS Team. Overview. Introduction Background The NOAO Data Flow NOAO-NCSA zone Architecture DTS Components DTS Point2Point Messaging Integrating SRB into NCSA Model
E N D
Managing NOAO Distributed Archive using SRB Irene Barg Ray Plante Phil Warner NOAO/NCSA DTS Team
Overview • Introduction • Background • The NOAO Data Flow • NOAO-NCSA zone Architecture • DTS Components • DTS Point2Point Messaging • Integrating SRB into NCSA Model • NSA Data Service a replacement for DTS • Conclusion
Introduction • The National Optical Astronomy Observatory (NOAO): • formed in 1982 to consolidate all AURA-managed ground-based astronomical observatories: • Kitt Peak National Observatory, Cerro Tololo Inter-American Observatory, and the National Solar Observatory. • URL: http://www.noao.edu/ • The NOAO Science Archive (NSA): • first released in 2002, • a scientific archive of the optical and infrared data holdings from NOAO’s Survey Program. • URL: http://archive.noao.edu/nsa/ • The NOAO Data Products Program (DPP) data flow system: • combines new data storage, data reduction pipelines, VO portals, and a transport system to link these together. • NOAO’s first step towards establishing a data center that is relevant in the Virtual Observatory (VO) era. • a core piece of this integrated system is the data management and transport system (DTS). • Partnership with National Center for Supercomputing Applications (NCSA): • contributing member of the DTS team providing permanent storage of NOAO data products; • contributing member of the Security Services component of NSA R3. • URL: http://www.ncsa.edu/
Background • April 2004 Prototype - Data Cache Initiative (DCI) • Designed to use existing software • NOAO Save-the-Bits - based on BSD UNIX line printer daemon, lpd, to provide queued network data transfers from instruments in the telescopes to a central mountain cache. • NCSA BIMA Archive Real Time Transfer - an rsync-based queuing mechanism, used to mirror the mountain cache to downtown data centers. • SCSC SRB for transport and management of replicas from each hemisphere’s data center to NCSA for off-site storage.
Prototype DCI (conti) • Pros • Easy to get up and running - all three segments integrated within 3 months. • Mostly automated. • Cons • Single MCAT in Tucson - single point of failure. • Replicas vs. copies. • SRB wasn’t being used to it’s full potential: • Remote resources were used but distributed data management was not.
Time to get serious…. • December 2004 - initial 4TByte caches were near capacity. • NOAO NEWFIRM instrument was in the wings and would contribute an additional 40GB/night. • Distributed volume management was imperative.
Enter SRB Zones ….. • February 2005 began evaluating Federated MCAT SRB zone models. • Quick proof of concept tests were promising. • SRB was a mature package with a good track record (BIRN, NARA, NASA IPG). • Momentum with current DCI prototype. • November 2005 current Data Transport System (DTS) zone model was deployed at 5 sites.
NOAO Zone Architecture • Each DTS site is a zone with it’s own MCAT. • The NCSA data transfer code ‘rsyncer’ was modified: • Functionality remained, but SRB replaced ‘rsync’. • The new zone client (‘zclient’) resides at each site: • operates in a ‘pull’ fashion reducing network overhead; • communicates with local SRB server; • transfers are executed between zone SRB servers for efficiency. • Each zone has a copy of each others data. • All transfers will be threaded for efficiency. • Point2point messaging provides robustness and automation.
DTS Point2Point Messaging • The P2P message system was chosen because: • Wanted to keep it simple: • Each message has only one consumer. • A sender and a receiver of a message have no timing dependencies. The receiver can fetch the message whether or not it was running when the client sent the message. • The receiver acknowledges the successful processing of a message by making a remove request. • Wanted to make it robust: • Upon any successful request, the message queue dumps to file. • Message daemon runs continuously.
DTS Point2Point Messaging Msg Msg STB (KPNO) noao-kpno (zclient) Message Queue (1335) Request get Request add Request remove Msg Msg noao-kpno (zclient) noao-tuc (zclient) Message Queue (1435) Request get Request add Request remove Msg Msg noao-tuc (zclient) noao-ls (zclient) Message Queue (1535) Request get Request add Request remove Msg Msg noao-tuc (zclient) uiuc-ncsa (zclient) Message Queue (4335) Request get Request add Request remove
Integrating SRB into the NCSA model for astronomical archives • Model • Size of archive exceeds capacity of spinning disk • Long-term storage: NCSA Unitree system • Note: not SRB-enabled • Archive disk cache contains copies of most preferred and/or recently accessed data • Data access services provide uniform access regardless of physical location • Data is transparently migrated from Unitree to cache as needed by users • Older data are silently removed from the cache to make room • Question: can we easily get SRB requests to trigger external actions, like migrating data? • Data can be distributed across disk caches to optimize for different patterns of access
Integrating SRB into the NCSA model for astronomical archives (conti) • Further developments • Supporting new high-data rate archives • Dark Energy Survey 300 TB raw data over 5 years • Large Synoptic Survey Telescope: 15 TB raw data, 130 TB processed per night • Deployable data replication system • For reliable replication between observatory sites and out to tiered partner sites. • Based on SRB • We will be looking for ways to take greater advantage of SRB’s information management • Advanced caching strategies for high-throughput data transfers
NSA Data Service A replacement for DTS • DS a "data access layer" for NSA • file incorporation/registration • file replication • file access • SRB Usage • volume management • file transfer (replication) • DS extends SRB functionality for DS-specific tasks • provides abstract access to data • contains logic for replication • copies in each physical location • not simply resource registration • DS Uses Jargon for SRB access
NSA Data Service A replacement for DTS • DS Abstraction Layer • near-term, DS needs: • disk-level access • SRB access • Long-term • access to other resources • Globus, HTTP, FTP, WebDAV, etc.
Conclusions • The new DTS design: • Eliminates: • Single point-of-failure • Single site dependency • Gives us: • Location independency • Mountain caches, downtown data centers, can function as independent components. • Location transparency • Request for a file can be obtained from any location, even if the file no longer resides at that location. • Federated MCAT and zone SRB play key role.
Acknowledgements • NOAO DTS Team - Dr. Chris Smith, Rob Seaman, Nelson Zarate, Nelson Saavedra • NOAO NSA Data Service: Phil Warner • NCSA: Ray Plante, Ramon Williamson, and David Fleming • SDSC: Reagan Moore, Arcot Rajasekar, Wayne Schroeder, Michael Wan, George Kremenek, Roman Olschanowsky, Sheau-Yen Chen, and the entire SRB team! • SRB Perl API author: Michal Wronski • srbChat Community - Jean-Yves Nief, Barz Hsu, Adil Hasan