1 / 6

Describe workflows used to maintain and provide the RDA to users Both are 24x7 operations 

NWSC Planning for RDA – 21 Dec. 2011. Describe workflows used to maintain and provide the RDA to users Both are 24x7 operations  Transition to the NWSC with zero downtime  NWSC is new environment Processing adjustments and test  Today - starting point for actionable plan

greta
Download Presentation

Describe workflows used to maintain and provide the RDA to users Both are 24x7 operations 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NWSC Planning for RDA – 21 Dec. 2011 • Describe workflows used to maintain and provide the RDA to users • Both are 24x7 operations  • Transition to the NWSC with zero downtime  • NWSC is new environment • Processing adjustments and test  • Today - starting point for actionable plan • Focus on NWSC DAV, HPC, & CFDS  • Baseline metrics • 7000 unique users annually • 1.4 PB of primary data – HPSS (2x in total) • 450 TB GLADE, permanent data for users, areas for data preparation • Web servers and DB servers – DSG • Use 6 DAV servers, mirage 0-5

  2. Requirements for RDA Data Processing Services @ NWSC • Homogeneous architecture and OS • Common file system for RDA product development, NCAR access, and connection to DSS web servers • CFDS usage metrics for NCAR users at NWSC? • Read/write connectivity to DB servers from Caldera, Geyser, and Yellowstone • Dedicated and shared compute resources for user driven workload and burst DSS needs to prepare data • For example: A DSS dedicated system or queues, minimum restrictions?

  3. NWSC RDA Systems Structure

  4. RDA data processing examples and tools Run Research Data Archive Management System (RDAMS) tools and daemons, executed as user “rdadata” • dsarch, archive files from work disk spaces to HPSS and to CFDS • gather-metadata, read all incoming files to verify content, and create metadata records for DBs • dsrqst, manage delayed mode user requests • subsetting, process data extraction and re-dimensioning • format conversion, e.g. GRIB2 to netCDF • file staging, bulk data moves, HPSS file to CFDS /transfer • dsupdt, complex DB governed scripting to regularly download new data, routine growth for 150+ datasets

  5. Daemon managed data processing work flow -A system initialized daemon named “dsstart” checks on dsrqst daemon status -A cron job checks on the status of the “dsstart” daemon on each server

  6. Current Scale of ActivitySystem works well and demand is accelerating upward • Subsetting, format conversion, file staging • 166 user requests/week • 1-2 hours, average execution time/request • 65 Tb/week, input data volume processed • 3 TB/week, output data volume for users • 385 TB data added to RDA in FY 2011 • One case the data processing was too large for mirage servers. Used Lynx, 3-4 weeks, 5-7 concurrent streams

More Related