1 / 20

Experience of a low-maintenance distributed data management system

Experience of a low-maintenance distributed data management system. W.Takase 1 , Y.Matsumoto 1 , A.Hasan 2 , F.Di Lodovico 3 , Y.Watase 1 , T.Sasaki 1. 1. High Energy Accelerator Research Organization (KEK), Japan 2. University of Liverpool, UK 3. Queen Mary, University of London, UK.

crete
Download Presentation

Experience of a low-maintenance distributed data management system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experience of a low-maintenance distributed data management system W.Takase1, Y.Matsumoto1, A.Hasan2, F.Di Lodovico3, Y.Watase1, T.Sasaki1 1. High Energy Accelerator Research Organization (KEK), Japan 2. University of Liverpool, UK 3. Queen Mary, University of London, UK

  2. Contents • KEK iRODS system • Running in production over 2 years • Rules enable to store file efficiently • Federation with QMUL • iRODS applications • SCALA : Visualization tool for SCALA • iRODS XOR-based backup • Summary

  3. iRODS overview • Distributed data management system • Client-server architecture • Allows data management policies to be enforced on the server-side • Provides interface to many different types of storage • Client can access to iRODS via • i-commands : Commands-line utilities • iRODS Browser : Web interface

  4. KEK iRODS Systems • 4 iRODS servers • RHEL 5.6 • iRODS 2.5 ⇒ 3.2 • PostgreSQL 9.1.1 • 2 years〜 • iRODS Zone • KEK-T2K • KEK-MLF • KEKZone • demoKEKZone • Storage resource HPSS (High Performance Storage System) Disk System

  5. Data Management for T2K • Tokai to Kamioka (T2K) Neutrino experimental group • The experimental data is stored to KEK storage • The group needed to provide an easy way to quickly access data collected to evaluate the quality of the data from outside of KEK • iRODS provided the solution http://t2k-experiment.org/wp-content/uploads/t2kmap.gif

  6. Data Management for T2K • KEK-T2K Zone for the experimental group started operation from October 2010 • Detected data are processed then transferred to KEK iRODS • People in the group became to able to access the stored data easily and quickly • i-commands • iRODS Browser

  7. file file file iRODS Rules for KEK-T2K Zone • Bundle and replicate the data DB Disk system tar file Client iRODS server disk HPSS T2K data server tar file rodsweb Each experimental data file is small (〜several MB) HPSS prefers large file

  8. file iRODS Rules for KEK-T2K Zone • Response to request DB Disk system Client file iRODS server disk HPSS T2K data server rodsweb tar file request

  9. Federation with QMUL • Data replication among 2 sites • Share each site data KEK-T2K Experimental data Federation QMULZone Analytical data

  10. Amount of data in KEK-T2K T2K group start the data taking on 22nd Dec, 2011

  11. SCALA : Visualization tool for iRODS • Statistical Charts And Log Analyzer • iRODS lacked an interface for usage statistics and also for debugging problems • We developed a web interface for visualizing iRODSstatus overview • Statistical Chartspage • Log Analyzer page • SCALA has been installed to KEK iRODS

  12. SCALAOverview • Input : iRODS outputs • Output : Visualized system daily status as charts SCALA iRODS Summarize Parse Display Resource usage Parsed table Log files Summarized table Database

  13. Statistical Charts • Visualizes iRODS daily operational data

  14. Log Analyzer • Provides error debugging tool 1. User clicks an bar 2. Error detail displayed 3. User clicks an error message 4. Related log displayed

  15. Download SCALA • http://tgwww.kek.jp/scala/

  16. iRODS XOR-based backup • Full file replication • Current method for reliable storage of data is replicate data • If disk fails or server fails still have a copy • Requires much storage space • Portion of the file becomes corrupt you have to replace the full file • XOR-based backup • Reduces the space with same robustness • Splits file into some blocks and creates parity blocks • If a block becomes corrupt you have to recreate only corrupted block

  17. XOR-based backup:100% recovery with any 2 servers fail XOR-based backup uses 4 servers but only needs 200GB Full-File Replication uses 3 servers andneeds 300GB • iRODS rule enables automatic processing

  18. XOR-based backup:Decoding flow

  19. Summary • KEK iRODS system has been running in production over 2 years • iRODS gives a way to quickly and easily access data outside of KEK • Rule of bundle and replicate the data leads to store files efficiently • Federation with QMUL enables to share each data and backup • SCALA is a visualizing tool and has been installed KEK iRODS • It leads to better management of the iRODS overall service • XOR-based backup provides data reliability and less storage cost compared with replication • iRODS rule enables automatic processing

  20. Thank you for your attention! Wataru Takase wataru.takase@kek.jp

More Related