420 likes | 515 Views
Overview. Ocean Data Processing System (ODPS). Introduction Missions Evolution Philosophy Software Components and Subsystems: > RDBMS > Ingest > VDC/Scheduler > Distribution Scientific Support Browser. Introduction.
E N D
Overview Ocean Data Processing System (ODPS) Introduction Missions Evolution Philosophy Software Components and Subsystems: > RDBMS > Ingest > VDC/Scheduler > Distribution Scientific Support Browser
Introduction The ODPS is an automated data system that provides ingest, processing, archive, and distribution functions for legacy, operational, and future remote-sensing satellite missions. Legacy Missions: > CZCS Oct 1978 – Jun 1986 > OCTS Nov 1996 – Jun 1997 Operational Missions: > Aqua-MODIS Jul 2002 - > MERIS Mar 2002 - > SeaWiFS Sep 1997 - > Terra-MODIS Feb 2000 - Future Missions: > Aquarius > Glory > NPP VIIRS
Evolution Originally developed between 1991 and 1996 to support SeaWiFS Support for OCTS added in 1996 Delivered to MODIS project to serve as the MODIS Emergency Backup System (MEBS) in 1997 Complete system redesign and rewrite 2003-2004 Delivered to GISS in 2008 to support Glory mission Multiple evolutionary cycles in response to changes in hardware infrastructure and support-function requirements > Began on early multi-processor SGI IRIX systems > Ported to Linux in 2000 > Processing concurrency increased from 30 to over 500 > Distribution functions added in 2004 > Storage evolution > Validation targets
Philosophy Adaptive framework that allows any standalone program to be incorporated as a system job Loosely coupled, modular subsystems > Ease of maintenance > Development and testing alongside production > Subsystem swapping Standardized coding practices minimize impact of operating-system upgrades > SGI IRIX to Linux > 32-bit to 64-bit > Strict GSFC IT requirements necessitate more-frequent OS updates Software lifecycle of requirements analysis, rapid-prototype development, and refinement allows new concepts to be quickly developed and adopted for operational use > Data subscriptions and orders
Ingest and Distribution Statistics ODPS currently manages over 20 million files in its archive, about 1.06 petabytes Daily ingests: 576 MODIS-L0 granules, 120 GB (60 GB each for Aqua and Terra) 2 SeaWiFS recorder dumps, 200 MB each 2-3 SeaWiFS HRPT (direct broadcast) passes, 50 MB each 5-6 MERIS-L1 granules, 1 GB each Distribution (Oct 2010): 978 orders; 650,786 files; 5.2 TB 473 active subscriptions; 576,346 files staged
Proprietary Software RDBMS Sybase Adaptive Server Enterprise 15.0.3 Sybase Open Client CT Library Sybase Transact-SQL Processing IDL (limited use) Open Source Software Framework GCC 4.x Perl 5 Perl DBI module with Sybase driver OpenMotif 2.x Bash Image Generation GMT ImageMagick NetPbm Octave Version Control Subversion Software
Subsystems RDBMS Data Acquisition and Ingest VDC/Scheduler Level-3 Scheduler Archive Device Manager Data Distribution File Management and Migration
Components and Subsystems: RDBMS Primary element that manages all system activity Core databases support generic system framework, data ingest, processing, file management, and distribution functions Mission databases house mission-specific data and procedures High level of reuse possible for similar missions, i.e. MODIS Aqua/Terra, SeaWiFS, and OCTS are ocean-color missions and have similar product suites, data flows, and processing requirements Database and transaction-log dumps performed regularly and stored in three different locations Clone of database-server hardware and OS maintained as a warm backup
Components and Subsystems: RDBMS Generic Core Databases Admin Catalog Dataflow Processing MODIS Aqua MODIS Terra OCTS SeaWiFS CZCS New Mission Aquarius VIIRS Mission-Specific Databases
Components and Subsystems: RDBMS Goal: Isolate RDBMS from system software RDBMS To use a different RDBMS vendor, swap in a new Database Services Layer Vendor Library Module Vendor Client Library Database Services Layer Perl DBI Module C Interface Functions Perl Scripts C Programs
Subsystems: Ingest Data types and sources are described in the database Active, passive, and periodic notification methods > Active method scans remote systems for new files > Passive method handles messages for new files > Periodic method schedules transfers of files at specified intervals File transfers performed by ingest daemons and scheduler tasks FTP, RCP, SCP, SFTP, and HTTP transfer protocols supported Generic file transfer process hands off to data-specific post-transfer scripts
Subsystems: VDC/Scheduler Visual Database Cookbook (VDC) > Prototype developed in 1991 > Four separate programs > Originally a distributed model Runs in a daemon-like state on each server on which processing or supporting jobs need to run Two main functions: Task Scheduler – Run high-level jobs (tasks) that support a variety of system functions Processing Engine – Run processing streams, typically scientific programs, sequenced into steps such as L0->L1, L1->L2, etc Greedy client model adapted in 2004 Unification of task scheduler and processing engine in 2007
VDC Function: Scheduler Primary system element responsible for coordinating most of the system activity Monitors task records in a to-do list database table and runs tasks according to defined attributes > Manual > Periodic > Timed > Triggered Standard job-shell interface allows new programs to be quickly adapted for Scheduler control Tasks may be bound to specific hosts or claimed by any available host in the processing group
VDC Function: Scheduler Daily Tasks To-do List User input via SCHEDMON GUI Tasks for the current day Daily Task Scheduler VDC/ Scheduler Task Shell
VDC Function: Processing Engine Scalable infrastructure for concurrent processing of serial streams (e.g. L0 -> L1A -> L1B -> L2) Each instance of the VDC Engine actively competes for jobs that it is allowed to run based on priority, length of time in the queue, and processing weight Uses recipes to encapsulate data-specific processing schemes, parameters, and pre-processing rules Virtual Processing Units (VPUs) serve as distinct processing resources and are allocated based on available time, current OS load, and processing weight Comprehensive processing priorities allow high-priority real-time data to be handled ahead of lower-priority processing Standard job-shell interface allows new scientific programs to be quickly adapted as recipe steps
VDC Function: Processing Engine Captures system boot time and monitors OS load Invokes recipe steps and monitors step-execution time Handles operator-requested stream actions Performs flushing operations on completed tasks and streams
VDC: Rule Manager Runs in a daemon-like state Polls jobs in the processing queue and runs the pre-processing rule procedures Promotes job status when all rule procedures complete successfully Governed by currently configured processing priorities Primarily used for matching proper ancillary data with granules in the processing queue
VDC: MakeVDC Polls processing queue for jobs that have met pre-processing requirements Generates VDC job files from recipe templates according to configured priorities and populates the VDC queue Runs as a Scheduler task, so it can easily be configured to run as often as needed to keep the VDC queue full
Subsystems: Distribution Interactive, web-based Data Ordering System, currently supporting Aqua and Terra MODIS, CZCS, OCTS, SeaWiFS Data Subscription System, currently supporting Aqua and Terra MODIS and SeaWiFS, allows users to define region and products of interest Order and Subscription Manager daemons poll the order and subscription queues and stage files on FTP servers (stage rate ~12 GBs / hr) Near-real-time data extraction and image support Web-CGI applications that allow users to view and update their orders and subscriptions
Distribution: Flowchart Order Manager Data Orders Local Distribution Servers U s e r s U s e r s Subscription Manager Data Sub- scriptions Extraction and Mapping Recipe Regional Extraction and Map Requests Data and images optionally pushed to users
Scientific Support 24/7 operational support for forward-stream processing > 9-to-5 staffing > Extended lights-out periods > No unscheduled down time in past year due to system-software faults Support algorithm/calibration testing alongside production > Product suites > Test recipes > Alternate tags in science-software repository > Processing priorities Non-standard processing requests > Regional L3 processing > Great Barrier Reef research > Mozambique Whale Shark research > GMT Intermediate Coastline > Aquarius Simulation
OceanColor Web oceancolor.gsfc.nasa.gov Consolidated data access, information, services and community feedback
OceanColor Web oceancolor.gsfc.nasa.gov Consolidated data access, information, services and community feedback
OceanColor Web oceancolor.gsfc.nasa.gov Consolidated data access, information, services and community feedback