340 likes | 440 Views
Science Data System Architectural Approach for JPL-led Decadal Survey Missions. Chris A. Mattmann, Ph.D. Senior Computer Scientist Instrument Software and Science Data Systems Section
E N D
Science Data System Architectural Approach for JPL-led Decadal Survey Missions Chris A. Mattmann, Ph.D.Senior Computer Scientist Instrument Software and Science Data Systems Section NASA Jet Propulsion LaboratoryAdjunct Assistant ProfessorComputer Science DepartmentUniversity of Southern California Project Management Committee Lucene Project Apache Software Foundation
Agenda • NASA earth science missions • JPL Lineage and building software for missions • Decadal System Challenges • JPL’s Science Data System Architecture • Architectural Principles • Components • Experience and Evaluation • Conclusions JPL-ESDSWG-SDS
Context • JPL supports and develops science data processing systems for multiple earth science missions • These systems convert the instrument telemetry delivered to earth from space into useful data for scientific research • Typical characteristics • Remote sensing instruments that orbit the Earth multiple times daily • Data are acquired constantly • Complex algorithms convert instrument measurements to geophysical quantities JPL-ESDSWG-SDS
NASA Earth Science Missions Science Processing Center 1 Archive & Distribution (DAAC 1) e.g., PO.DAAC DS Mission #1 Science Processing Center 2 Archive & Distribution (DAAC 2) Distributed Data Analysis (Subsetting, Gridding, Transformation,Modeling) DS Mission #2 Users SMAP, DESDynI Other Data Sources (e.g. NOAA) New Area: Infrastructure to support Analysis of Distributed Data JPL-ESDSWG-SDS
Characteristics of current approaches • Stove-piped • The architecture is re-built each time, as is the implementation • Limited reusability • Some services, but no incentive to use them • Reuse is basically the people • What are the skills that can be reused? What about the components? • No reusable components or product lines • Product lines are proven means of reducing costs • No standard paradigm of how you build systems at all 3 levels • Science Processing Center/GDS, DAAC, and analysis beyond JPL-ESDSWG-SDS
The Past: Abridged JPLEarth Science Mission Lineage • AMMOS MGDS • Central Database (CDB) • Ahead of its time • Focused on centralized multi-mission support for file and metadata management, file transfer and user interface • Alaska SAR Facility (ASF) and NSCAT • CDB software used by each project independently, but no shared or common adaptation code • SeaWinds (NSCAT follow-on) • Leverage Object Oriented Data Technology (OODT) framework: CAS component to re-unify common software and architecture for missions • SeaWinds still going 10 years later (planned 2 year mission) • AMT built off of success JPL-ESDSWG-SDS
Object Oriented Data Technology (OODT) • Seed funding provided by NASA’s Office of Space Science in 1998 • Designed, implemented, deployed, operationalized, and refined over the past 10 years across multiple scientific domains • Planetary Science, Earth Science, Cancer Research, Modeling and Simulation, Pediatric Intensive Care • Runner up NASA software of the year in 2003 • Also 2008 Software Reuse Peer Award Winner • JPL’s investment in product lines for Earth Science Missions JPL-ESDSWG-SDS
Product Line Architectures • A set of related products that have substantial commonality • In general, the commonality exists at the architecture level • One potential ‘silver bullet’ of software engineering • Power through reuse of • Engineering knowledge • Existing product architectures, styles, patterns • Pre-existing software components and connectors • JPL has developed a product line for the Earth Science Data System Pipeline based on OODT JPL-ESDSWG-SDS
Business Case for Product Lines Traditional Software Engineering Credit: Medvidovic, Taylor, Dashofy, 2009 JPL-ESDSWG-SDS
Business Case for Product Lines Product-line-basedengineering Credit: Medvidovic, Taylor, Dashofy, 2009 JPL-ESDSWG-SDS
JPL SDS Architecture Evolution Credit: Freeborn, Woollard, 2009 JPL-ESDSWG-SDS
The Now: “Decadal-Style” Missions Orbiting Carbon Observatory Reflight (OCO) Carbon Dioxide Abundance NPOESS Preparatory Project (NPP) Satellite Meteorology - Atmospheric Chemistry SMAP Soil Moisture Active/Passive Soil Moisture and Freeze/Thaw Future missions promise to capture orders of magnitude more data and require 10s of 1000s of jobs to be processed per day and store 100s of Terabytes of data. JPL-ESDSWG-SDS
Decadal System Challenges • Performance Issues • Store hundreds of terabytes of data • Schedule and manage 10s of 1000s of jobs per day • Interoperability Issues • Integration of algorithms and the SDS • Support different underlying computing platforms (grid, cloud, cluster, etc.) • Architectural Issues • Clearly separating out the Ingestion from Processing • Designing for technology infusion and evolution JPL-ESDSWG-SDS
Representative Mission Examples Note: 1 XOVWM is a turn-key system to be operated by NOAA 2 PGE – Product Generation Executable 3 Large number of radar imaging modes to accommodate variety of science targets; Interferometric and higher level product algorithms (PGEs) are considerably complex and may not be amenable to full automation 4 1X is on the order of 100s of jobs per day 5 Did not go into operations JPL-ESDSWG-SDS
JPL’s SDS Product Line:Architectural Principles • Component Distribution • Components can be distributed both inside and outside JPL • Fast responsive queries • Loose Coupling • Decoupling of file management and workflow management • Decoupling of workflow management and resource management • Model Independence – Components can support differing information models • Extension via configuration • Data curation and metadata extraction using standard formats • Policy is stored outside of the database • Common Interfaces – Core components use standard interfaces as connectors • Common Information Packages – Cross-disciplinary Information Objects use common containers for packaging information products • Standards – International and industry standards are used when available for interface and data definitions • Open Source – Open source approaches are used when available • Everything is metadata JPL-ESDSWG-SDS
JPL’s SDS Architecture: High Level View Credit: Crichton, Mattmann, 2009 JPL-ESDSWG-SDS
JPL’s SDS Architecture: Processing • Separation of file management from workflow management • Allow for heterogeneous computing resources • Easily integrate PGEs • Leverages same ingestion crawler JPL-ESDSWG-SDS
JPL’s SDS Architecture: Ingestion • Allow for push/pull of data over arbitrary protocols- Ingestion builds std catalog and archive • Deliver product metadata to search, portal or GIS • Plug in arbitrary met extractors JPL-ESDSWG-SDS
JPL’s SDS Architecture: Experience • Orbiting Carbon Observatory (OCO) • NPOESS Preparatory Project (NPP) Sounder PEATE • Soil Moisture Active Passive (SMAP) Testbed JPL-ESDSWG-SDS
The Orbiting Carbon Observatory OCO will acquire the space-based data needed to identify CO2 sources and sinks and quantify their variability over the seasonal cycle • Approach: • Collect spatially resolved, high resolution spectroscopic observations of CO2 and O2 absorption in reflected sunlight • Use these data to resolve spatial and temporal variations in the column averaged CO2 dry air mole fraction,Xco2over the sunlit hemisphere • Employ independent calibration and validation approaches to produce Xco2 estimates with random errors and biases no larger than 1 - 2 ppm (0.3 - 0.5%) on regional scales at monthly intervals Credit: B. Weiss, B. Chafin, OCO SCF Peer Review, 2006 JPL-ESDSWG-SDS
OCO Pipeline Architecture Credit: B. Chafin, 2008 JPL-ESDSWG-SDS
NPP Sounder PEATE • The National Polar-Orbiting Operational Environmental Satellite System (NPOESS) Preparatory Project (NPP) • Joint Mission involving NASA, NOAA and DOD • Managed jointly by the NPOESS Integrated Program Office (IPO) • The NPP mission collects and distributes remotely-sensed land, ocean, and atmospheric data to the meteorological and global climate change communities • Responsible for transition from existing Earth-observing missions to the NPOESS. • NPP provides risk reduction for NPOESS • Provides opportunity to demonstrate and validate new instruments and processing algorithms • Demonstrate and validate aspects of the NPOESS command, control, communications and ground processing capabilities prior to the launch of the first NPOESS spacecraft. • PEATE - Product Evaluation and Analysis Tool Element Credit: NPP Sounder CDR: 23-Sept-2008 The NPP mission is to measure the atmospheric and sea surface temperatures, humidity sounding, land and ocean biological productivity, and cloud and aerosol properties. Source: http://jointmission.gsfc.nasa.gov/ Sounder PEATE Responsibilities: Assess Climate Quality of EDRs (support role) Assess and Validate CrIS and ATMS Calibration and Pre-Launch & Post-Launch xDRs (support role) Provide Data and Analysis to the Science Team (analysis role) Develop Tools for Data Comparisons (validation role) Develop and Demonstrate Algorithm Enhancements JPL-ESDSWG-SDS
NPP Pipeline Architecture Credit: B. Foster, 2009 JPL-ESDSWG-SDS
OCO and NPP Sounder PEATE: Experience • Processed 100% of all Thermal Vacuum Data • Provided critical instrument testing analysis • Process over 2 years of Fourier Transform Spectrometry (FTS) • Ground-based OCO data (validation) • Were positioned to process space-based data • Easily able to handle mission changes in workflow and deployment architectures • Fully integrated with PGE wrapper component for rapid adaption of PGEs in SCF environment • Push Pull and ingest of IASI data • Plug into AIRS DOM catalog and development of CAS catalog prototype • Ingestion and processing of level 3 products for NPP including granule maps • Pull of large amounts of data from NOAA • Ready to support OPS JPL-ESDSWG-SDS
Soil Moisture Active Passive • Orbit: • Sun-synchronous, 6 am/pm orbit • 670 km altitude • Instruments: • L-band (1.26 GHz) radar • High resolution, moderate accuracy soil moisture • Freeze/thaw state detection • SAR mode: 3 km resolution • Real-aperture mode: 30 x 6 km resolution • L-band (1.4 GHz) radiometer • Moderate resolution, high accuracy soil moisture • 40 km resolution • Shared instrument antenna • 6-m diameter deployable mesh antenna • Conical scan at 14.6 rpm • Constant incidence angle: 40 degrees • 1000 km-wide swath • Swath and orbit enable 2-3 day revisit • Mission Operations • 3-year baseline mission, 2014-2017 • duration: 3 years Credit: SMAP SDS/Impl Peer Review • Objectives • Provide global measurements of soil moisture with the accuracy, resolution and coverage to improve our understanding of the hydrologic cycle JPL-ESDSWG-SDS
SMAP Testbed Processing control & Local storage Science Data System (SDS) Algorithm testbed Project/ Science Team Users Ground truth, other data Credit: SMAP SDS/Impl Peer Review JPL-ESDSWG-SDS
SMAP Leveraging • Supporting the SMAP Testbed * Generated using David Wheeler’s SLOCcount tool JPL-ESDSWG-SDS
Lessons Learned • Deployment on real-world Decadal-Scale Missions • OCO/2, ACOS and NPP Sounder PEATE and SMAP Testbed • Architecture first, product-line second, implementation third • Data analysis and data access are intrinsically tied • Architecture components useable as-is, or together in tandem • Must be able to support production pipelines as easily as science computing facilities JPL-ESDSWG-SDS
Future Challenges • Continued movement towards distributed processing and operations • Challenge for DESDynI and other decadal scale missions • Understanding how and when to infuse technology into missions • Insulation • Continued understanding and acknowledgement given to product lines JPL-ESDSWG-SDS
Acknowledgements • Tom Bicknell, Brian Chafin, Dan Crichton, David Cuddy, Steve Friedman, Brian Foster, Dana Freeborn, Oh-Ig Kwon, Emily Law, Kon Leung, Rob Toaz, Barry Weiss, David Woollard JPL-ESDSWG-SDS
Questions? • Contact: • chris.a.mattmann@jpl.nasa.gov • Further reference: • C. Mattmann, D. Freeborn, D. Crichton, B. Foster, A. Hart, D. Woollard, S. Hardman, P. Ramirez, S. Kelly, A. Y. Chang, C. E. Miller. A Reusable Process Control System Framework for the Orbiting Carbon Observatory and NPP Sounder PEATE missions. In Proceedings of the 3rd IEEE Intl Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), pp. 165-172, July 19 - 23, 2009. • Extended Paper in the works for IEEE Transactions on Software Engineering (TSE) JPL-ESDSWG-SDS
NASA Missions: the Ground Data System perspective Credit: D. Woollard JPL-ESDSWG-SDS
Related Efforts • Grid Computing • Workflow Technologies • Cloud Technologies • Distributed Analysis Infrastructure • GO-ESSP JPL-ESDSWG-SDS
Typical Science Processing and DAAC Functions • File Management • Management of files and metadata • Heavy coupling with DBMS • Metadata is a flat structure • Workflow Management • Execution of processing programs based on cataloged files • Limited ability to initiate processing pipelines • No explicit control flow and data flow model • Tight coupling with file management JPL-ESDSWG-SDS