ROVER Hardening Data Delivery by the internet

ROVER Hardening Data Delivery by the internet

The challenge and motivation Collecting X terabytes of arbitrary data from the FDSN data centers is possible but: • Usually only possible by partitioning the request, orchestrated by user • Orchestration is non-trivial, need to deal with errors & re-tries • Complete? Downloads may be quietly truncated, a weakness of HTTP + streaming • Local data management • Summarization, indexing and sub-setting are all left to the users to (re)invent

Enter rover Retrieval of Various Experiment data Robustly • A command-line client • Designed to run long-term, until the request is complete (restartable) • Designed to verify that all requested data that can be retrieved has been retrieved • Using the DMC’s availability service • Designed to check for additions and, in the future, updates to requested data set • Builds a data index: for summarization, lookup and extraction • Index is in SQLite, support is ubiquitous. Simple text summaries trivially generated. • Index is the key for integrating such a data set into a workflow & a bridge to other systems.

Rover workflow 2launch retrieval per request, loop until nothing to retrieve 1 Create desired data request Subscription 1 Subscription 1 2d index data Request 1..N 2a check availability Data index 2b compare to local holdings Data set (miniSEED) 2c fetch needed data in parallel

How to install Two part installation: 1) Install mseedindex from source code: https://github.com/iris-edu/mseedindex Requirements: C compiler and make program 2) Install rover using pip: > pip install rover Requirements: Python >= 2.7 (and pip)

Rover: Quick Start, an example request $ rover init-repository datarepo $ cd datarepo 1. Initialize a data repository (and change into that directory) IU ANMO * LHZ 2012-01-01T00:00:00 2012-02-01T00:00:00 TA MSTX -- BH? 2012-01-01T00:00:00 2012-02-01T00:00:00 2. Create a request file named request.txt containing: $ rover retrieve request.txt 3. Run rover retrieve to fetch these data: * HTTP status & email when done <datarepo>/data/<network>/<year>/<day>/<station>.<network>.<year>.<day> Data are saved, in miniSEED format, to files with this organization:

Once you have data Report what is in the repository $ rover list-summary IU_ANMO_00_LHZ 2012-01-01T00:00:00.069500 2012-01-31T23:59:59.069500 IU_ANMO_10_LHZ 2012-01-01T00:00:00.069500 2012-01-31T23:59:59.069500 TA_MSTX__BHE 2012-01-01T00:00:00.000000 2012-01-31T23:59:59.975000 TA_MSTX__BHN 2012-01-01T00:00:00.000000 2012-01-31T23:59:59.975000 TA_MSTX__BHZ 2012-01-01T00:00:00.000000 2012-01-31T23:59:59.975000 List a summary (extents) of data in the repository • Limit summary to specific networks, stations, locations, channels & time ranges • Alternatively, use list-index for full details: actual contiguous traces

Once you have data Run your own fdnsws-dataselect service Run an FDSN web service on your local repository: https://iris-edu.github.io/portable-fdsnws-dataselect/ • Python-based web service that returns data based on a time series index • Most tools that use FDSN web services (FetchData, ObsPy, etc.) can be redirected to alternate services

Once you have data Direct use with ObsPy (next release) The DMC has contributed a new sub-module to ObsPy, which will be included in the next release, that allows directly discovering and reading of data in a rover-created repository: obspy.clients.filesystem.tsindex.Client Very similar to other ObsPy interfaces, this module provides: get_waveforms() get_availability_extent() get_availability() and a few more.

Once you have the data Use the data index directly The data index: for data discovery and summary, no need to crawl through files • Filenames, data identifiers (net, sta, loc, chan), earliest, latest, exact segments, sample rates, low level details and more... Index is stored in SQLite, a very powerful single file database, but easy to use! $ sqlite index.sql 'SELECT filename,network,station,location,channel,starttime,endtime FROM tsindex;' /path/cola.mseed|IU|COLA|00|LH1|2010-02-27T06:50:00.069539|2010-02-27T07:59:59.069538 /path/cola.mseed|IU|COLA|00|LH2|2010-02-27T06:50:00.069539|2010-02-27T07:59:59.069538 /path/cola.mseed|IU|COLA|00|LHZ|2010-02-27T06:50:00.069539|2010-02-27T07:59:59.069538 ...

Main take away points • Addresses robust collection of small to large data sets • Providing an index data repository • Cost: learning a new tool • Expected release: Spring 2019 • Ask if you would like to be an early tester! • See a demo at IRIS booth (808)

ROVER Hardening Data Delivery by the internet

ROVER Hardening Data Delivery by the internet

Presentation Transcript

System Hardening

Hardening HTaccess

Data Delivery

Flame Hardening

Infrastructure Hardening

The Rover Toolkit

Data delivery

Website Hardening

The Hardening of Risks

The Volume Rover

HARDENING

PRECIPITATION HARDENING

Planning: Hardening the rabbit

Infrastructure Hardening

RAD hardening

2015 Range Rover Commended by Santa Barbara Land Rover

Surface hardening

Work Hardening