1 / 13

MSDC

MSDC MiniSeed Data Completeness S. Pintore Scenario A network of SeisComP Remote Server archiving data on mass storage creating on each storage a Peripheral Archive A Server creating a Central Archive Telecommunication network availability < 100% Limited bandwidth Incomplete data

johana
Download Presentation

MSDC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MSDC MiniSeed Data Completeness S. Pintore

  2. Scenario • A network of SeisComP Remote Server archiving data on mass storage creating on each storage a Peripheral Archive • A Server creating a Central Archive • Telecommunication network availability < 100% • Limited bandwidth

  3. Incomplete data • Data in CA could be incomplete if the network become and stays unreachable for a long time • Some file missing in the CA • Data gaps into the files of the CA

  4. The Mednet SeisComP servers network • Data are stored in 24 hours files • The segsize parameter is set to 5000 -512 byte blocks- • If the link stays down longer than about an hour data will present gaps. • If the link stays down longer than 1-2 days some file will miss. • Telecommunication network availability is generally good • Network faults during more than 1 day are more frequent than faults longer than an hour and shorter than 1 day.

  5. Retransmit or Integrate ? • In order to insure data quality is necessary an integrity check • Due to the bandwidth limit you must choose between : • retransmitting all the file containing a gap • integrating your file transmitting only the data needed to fill the gap • These two execution steps aren’t necessarily distinct

  6. Respect the environment • The procedure to rebuild the correct data should have a low impact on the systems, it should: • run on Linux using low resources • offer link security • permit control on bandwidth use • not need specific firewall rules

  7. MSDC solution • MSDC uses the rsync tool that is already available, optimised for similar problems and well tested • The data check is made by rsync comparing the files in the CA with those in the PA • It uses rsync over ssh to: • secure the connection • avoid using the rsync port (873)

  8. What does rsync offer ? • The features of the rsync algorithm • it works on arbitrary data • the total data transferred is about the size of a compressed diff file • it is fast for large files and large collections of files • it doesn’t assume any prior knowledge of the two files, but takes advantage of similarities • it is computationally inexpensive

  9. MSDC main features • The msdc.sh can be run from command line or in a crontab line • It is a bash script • It avoids concurrent running conflicts, using a simple locking mechanism • It logs events and the name of the files corrected or definitely lost • The installation is made by the sysop user in his home directory

  10. Security • MSDC uses a ssh key pair for the automation of the ssh connession • this key pair is dedicated to the msdc use, no other connections are possible using it • MSDC doesn’t interfere with other keys used to automate ssh connections • it doesn’t need an rsync server running

  11. The MSDC package • The MSDC package msdc.tgz contains the files listed here: • msdc/bin/msdc.sh • msdc/bin/validate_rsync • msdc/bin/rsync • msdc/doc/README.msdc –Documentation- msdc/doc/COPYING -GPL License- • msdc/ssh

  12. TODO • Option to use a different date

  13. Alternative solutions: after the check • The data check could be done using SeedStuff utilities (check_file, extr_file, etc.) or qlib ones (qmerge, etc.). • For the incomplete files you can either: • retransmit all the file • or: • use qmerge to extract the data to fill the gaps, then transmit this “patches” eventually using qmerge –again- to fill the gaps. • Transmission: you should use a tool offering security as scp or sftp • You should then automate this procedure

More Related