370 likes | 506 Views
DES/DPP Meeting September 2008. Welcome to Tucson!. Objectives for meeting. Introductions Key DPP participants: new manager and staff Key SISPI developers Key DES DM developers Brazilian DES participants Review Requirements Community Needs Focus on SISPI-related issues
E N D
DES/DPP MeetingSeptember 2008 Welcome to Tucson! DES/DPP Sept 2008
Objectives for meeting • Introductions • Key DPP participants: new manager and staff • Key SISPI developers • Key DES DM developers • Brazilian DES participants • Review Requirements • Community Needs • Focus on SISPI-related issues • Community Pipeline DES/DPP Meeting Sept 2008
Objectives • Review interfaces • Fundamental data, metadata, aux. data issues • SISPI with users & mountaintop QA • SISPI with DTS/DPP • DTS/DPP with DESDM • E2E with Community Pipeline • Define Questions & Action Items • May not make decisions, but define which decisions need to be made DES/DPP Meeting Sept 2008
The NOAOData Products Program& End-to-EndData Management System R. Chris Smith for the NOAO Data Products Program team DES/DPP Sept 2008
DPPProgram Motivation NOAO Mission (Cooperative Agreement version) “Provide broad community access, based on peer review, to a complete and balanced system of state-of-the-art facilities, including telescopes of all apertures AND THE DATA FROM THEM.” DPP Charge (ReSTAR version) “All facilities participating in the System … should provide data that can be reduced using standard systems and the data should be made publicly available after an appropriate proprietary period. Pipeline Reduction is encouraged if appropriate.” DECam = CERTAINLY appropriate DES/DPP Meeting Sept 2008
DPPProgram Overview • Two Complementary Pieces • Data Management for the System • Capture, Preserve, Process (some), Provide Access • First for existing NOAO facilities • New NOAO and affiliate instruments w/ pipelines • Later for broader System facilities • Science Support Software • Supporting Science with data from NOAO and System • Linking GB O/IR data to larger (multiwavelength) System through VO standards and technologies • Support community access to new initiatives (e.g., LSST), through user portals and analysis tools DES/DPP Meeting Sept 2008
Data Managementfor NOAO & System • Motivation • Data capture and preservation • Preserving investment in telescope nights • Important for time domain… looking back! • Support Archival Science (more efficient use of telescope time) • New instruments & large data volumes changing model of how we do science • Who are we Serving? User Groups include… • PI use (proprietary) • Download data, data backup, preliminary processing of large datasets, reference to data for publications • Default proprietary period = 18 months for visiting astronomers • Archival use (non-proprietary) • Images (raw & some reduced) in support of new science • Catalogs, e.g., MOSAIC, NEWFIRM, etc. • Survey use • Archive high-level products for long term; ensure long-term return on telescope time invested in surveys, e.g., DES! DES/DPP Meeting Sept 2008
NOAOData Management • Management of data from all NOAO and some affiliated facilities = capturing CONTENT • KPNO, including Mayall 4m (MOSAIC, NEWFIRM,+) • CTIO, including Blanco 4m (MOSAIC, ISPI,+) • Plus SOAR & WIYN systems TOTAL >30 instruments • “Back end” = ACCESS • Provide access to large volume (TBs to PBs) of archived ground-based optical & infrared data and data products through standard interfaces (VO) • “Front end” = UI and TOOLS • Enable science (data discovery, understanding, access, and analysis) by developing and operating user interfaces, tools, and services (VO, web, and desktop) DES/DPP Meeting Sept 2008
What is “E2E”? • The integrated end-to-end (E2E) system for NOAO and System data management, including data capture, transport, ingestion, archiving, pipeline processing, and user access (data discovery, retrieval, and analysis) • A modern infrastructure to support the science of the U.S. ground-based O/IR astronomy community • Components • Data Capture (iSTB) & Transport (DCI, DS) • Pipelines • NOAO Science Archive (NSA) • NOAO VO Portal • Distributed operations at 6 sites (3 mtn, 3 base) DES/DPP Meeting Sept 2008
Scope of theE2E system • Data from • >30 instruments (many commissioned >10 years ago) • On 11 telescopes • On 3 mountaintops (KP, CT, CP) • On 2 continents • To storage at • 3 sites: Tucson, La Serena, NCSA • All archived, some processed • Processing wide-field imaging = MOSAIC & NEWFIRM • Delivered to users throughout the world • Through VO interfaces & portal DES/DPP Meeting Sept 2008
Data Management Serving VO Content UI & Tools DES/DPP Meeting Sept 2008
Data Management DES/DPP Meeting Sept 2008
Data Capture & Transport I • Entry point into E2E system • Capture currently “iSTB” (Save The Bits) • Evolution from STB system; >10yrs of experience • Evolution in future: move from lpr base to similar queue based system • Feed data to Mountain Cache • Currently = 2TB non-scalable “data brick” • FY09 purchase = 16TB scalable “data brick” • Operational request ~2 months of storage • DECam @ 350GB/night, 1month = 10.5TB DES/DPP Meeting Sept 2008
Data Capture & Transport II • Transport & Storage • DCI v1/v2: operational for >4 years now • Has moved & persisted >25 TB • Replicated full copies stored in 3 locations • Current fundamental underpinning: SDSC’s SRB • SRB no longer supported • We are evaluating alternative storage options • Including SRB “replacement” = iRODs • To be evaluated in collaboration with NCSA and LSST • Next version possibly Java-based “Data Service” on top of new storage system DES/DPP Meeting Sept 2008
Data Capture& Transport III • Networks • Mountain-La Serena • Currently 300Mbps; Shared with Gemini • Have joined the 2x155Mbps into single large pipe • La Serena-U.S. • Currently 50Mbps; Shared with Gemini • Contract ends in late 2009 • Discussions of next contract already begun for more than 100Mbps DES/DPP Meeting Sept 2008
Pipelines:MOSAIC+NEWFIRM • Current v1.1 MOSAIC pipeline produces: • Processed science images (“level 2 products”) • Data quality pixel masks • Data quality information (headers & archive database) • Tangent-plane reprojected images • PNG preview images • Combined/processed calibration data (e.g., flatfields, etc.) • Stacked dithers • NEWFIRM v1.0 • 4Kx4K Infrared Wide-field Imager • Real-time pipeline already deployed at KPNO 4m • Science pipeline in beta this semester DES/DPP Meeting Sept 2008
MOSAIC Pipeline: MOSAIC pipeline processes data by observing run, removing instrumental signatures and calibrating astrometry, photometry, and measuring data quality. New features (under development for 2008B): Stacked dither sequences with cosmic ray and transient rejection. DES/DPP Meeting Sept 2008
MOSAIC r-band single 150s raw frame MOSAIC r-band 150s single calibrated frame MOSAIC r-band 6x150s stack DES/DPP Meeting Sept 2008
Pipeline Reductions • NEWFIRM • Quick Reduce pipeline operational at KP 4m during 2008A • Provides reduced, stacked image mosaics during night (using “processing shortcuts”) • Provides data quality information, including seeing, sky brightness, transparency • Good feedback from users! • Science Pipeline being readied for testing and science verification in 2008B • Partially paced by instrument stabilizing (no major ongoing modifications) and developing a full understanding of the instrument (e.g., persistence). DES/DPP Meeting Sept 2008
NEWFIRM QRP (Quick Reduce Pipeline) DES/DPP Meeting Sept 2008
NEWFIRM QRP review pages show the observer individual, calibrated images, data quality measurements, and stacked mosaics from observing sequences. During the night. DES/DPP Meeting Sept 2008
NEWFIRM K-band single 60s frame, raw NEWFIRM K-band single 60s frame, calibrated NEWFIRM K-band 1 hour stack DES/DPP Meeting Sept 2008
Serving VO Content DES/DPP Meeting Sept 2008
The ‘NSA’: NOAO Science Archive • NSA “R2” operational since 2002 • Focused on SURVEY datasets • A community-based data source model • ~1TB of reduced “community data products”; integrated visualization & cutout service • New NSA (aka NSA R3) • Heart of integrated E2E system • Includes archive, VO services, maybe data transport • Service Oriented Architecture (SOA) • Components can be reused & reconfigured • Services distributed throughout system (6 sites) DES/DPP Meeting Sept 2008
UI & Tools DES/DPP Meeting Sept 2008
NOAO VO Portal • Design Principles • Fully decoupled from NSA, use only VO standards or VO prototype standards (e.g., security) • Access NSA through VOIs, no direct links • Provide portal for VO, not just NSA • Support data discovery (GUIs), retrieval, and analysis (tools) • Originally released in Jan 2006 at AAS • Key components: NOAO Sky and Calendar tools • Monitoring usage (e.g. ~900 monthly visits, ~3500 files retrieved) • Following releases • Support for security in VO model (“Single Sign On”, or SSO) • Additional tools & services available DES/DPP Meeting Sept 2008
Evolvingthe DM System • Establish flexible infrastructure • Modular: Data Transport, Archive, Pipeline, Portal • Exercise flexibility of E2E system • Incorporate new large-volume instruments • E2E compliant pipelines delivered with instruments • Incorporate new System facilities • Emphasize Time Domain Support (e.g., VOEvent) • Emphasize scientific tools • Data discovery, exploration, and analysis • Focus on tools which scale to DECam, LSST, … • Supporting both data volume and time domain DES/DPP Meeting Sept 2008
The Data Products ProgramStaff Betty Stobie Program Manager, DPP DES/DPP Meeting Sept 2008
DPP Staff LCOGT Aug 2008 DES/DPP Meeting Sept 2008
DPP Staffing Chris Smith Head of Program (thru mid-Nov) Betty Stobie Program Manager Development Team (Betty Stobie Team Lead) Evan Deaubl Archive Specialist Alvaro Egana VO Services Specialist Exequiel Fuentes Portal Specialist Mike Fitzpatrick Science Software Specialist Frank Valdes Pipeline Expert Rob Seaman Systems Engineer, Time-Domain Jerry Schneider Testing Engineer DES/DPP Meeting Sept 2008
DPP Staffing Cont Operations: Irene Barg Operations Manager Rob Seaman Mountain Operations Liason Nelson Saavedra Systems Mgr/Operations Specialist Derec Scott Operations Specialist David Walker Operations Specialist DES/DPP Meeting Sept 2008
DPP Staffing Cont Scientific Staff Chris Miller Portal Scientist (50%) Tod Lauer Archive Scientist (50%) Dick Shaw Operations Scientist (80%) Frank Valdes Pipeline Scientist (40%) Dave DeYoung Science Advisor (20%) Mark Dickinson Sabbatical (FY09) DES/DPP Meeting Sept 2008
Development Plans FY09 Data Quality Assurance/Remediation Data Transport/Storage Revision MOSAIC/NEWFIRM Pipeline QA, QRP Metadata Management (Keywords & Engineering Data) DES Data Challenge Participation FY10 DES Data Challenge Participation Additional Details TBD DES/DPP Meeting Sept 2008
DPP Operations DES/DPP Sept 2008
Operations • Infrastructure/Hardware Layer • WAN Networks: managed by CISS • Mountain Caches • Monitored & maintained by DP Ops • Archives in La Serena, Tucson, NCSA • L.S. & Tucson: Apple XSAN systems • NCSA: tape based storage • Portals in Tucson & La Serena • Current download: staging then FTP • Need multi-threaded download client DES/DPP Meeting Sept 2008
Operations Activities • Monitor infrastructure (NS+DW) • Monitor data capture & transport (RS) • Some basic data quality checks (headers) • Monitor data ingestion (DS+IB) • Additional data quality checks (headers) • Monitor pipeline processing (DS) • Both flow and data quality • Monitor data ingestion (DS+IB) • Monitor data access & delivery (IB+) DES/DPP Meeting Sept 2008