200 likes | 326 Views
Experience of a low-maintenance distributed data management system. W.Takase 1 , Y.Matsumoto 1 , A.Hasan 2 , F.Di Lodovico 3 , Y.Watase 1 , T.Sasaki 1. 1. High Energy Accelerator Research Organization (KEK), Japan 2. University of Liverpool, UK 3. Queen Mary, University of London, UK.
E N D
Experience of a low-maintenance distributed data management system W.Takase1, Y.Matsumoto1, A.Hasan2, F.Di Lodovico3, Y.Watase1, T.Sasaki1 1. High Energy Accelerator Research Organization (KEK), Japan 2. University of Liverpool, UK 3. Queen Mary, University of London, UK
Contents • KEK iRODS system • Running in production over 2 years • Rules enable to store file efficiently • Federation with QMUL • iRODS applications • SCALA : Visualization tool for SCALA • iRODS XOR-based backup • Summary
iRODS overview • Distributed data management system • Client-server architecture • Allows data management policies to be enforced on the server-side • Provides interface to many different types of storage • Client can access to iRODS via • i-commands : Commands-line utilities • iRODS Browser : Web interface
KEK iRODS Systems • 4 iRODS servers • RHEL 5.6 • iRODS 2.5 ⇒ 3.2 • PostgreSQL 9.1.1 • 2 years〜 • iRODS Zone • KEK-T2K • KEK-MLF • KEKZone • demoKEKZone • Storage resource HPSS (High Performance Storage System) Disk System
Data Management for T2K • Tokai to Kamioka (T2K) Neutrino experimental group • The experimental data is stored to KEK storage • The group needed to provide an easy way to quickly access data collected to evaluate the quality of the data from outside of KEK • iRODS provided the solution http://t2k-experiment.org/wp-content/uploads/t2kmap.gif
Data Management for T2K • KEK-T2K Zone for the experimental group started operation from October 2010 • Detected data are processed then transferred to KEK iRODS • People in the group became to able to access the stored data easily and quickly • i-commands • iRODS Browser
file file file iRODS Rules for KEK-T2K Zone • Bundle and replicate the data DB Disk system tar file Client iRODS server disk HPSS T2K data server tar file rodsweb Each experimental data file is small (〜several MB) HPSS prefers large file
file iRODS Rules for KEK-T2K Zone • Response to request DB Disk system Client file iRODS server disk HPSS T2K data server rodsweb tar file request
Federation with QMUL • Data replication among 2 sites • Share each site data KEK-T2K Experimental data Federation QMULZone Analytical data
Amount of data in KEK-T2K T2K group start the data taking on 22nd Dec, 2011
SCALA : Visualization tool for iRODS • Statistical Charts And Log Analyzer • iRODS lacked an interface for usage statistics and also for debugging problems • We developed a web interface for visualizing iRODSstatus overview • Statistical Chartspage • Log Analyzer page • SCALA has been installed to KEK iRODS
SCALAOverview • Input : iRODS outputs • Output : Visualized system daily status as charts SCALA iRODS Summarize Parse Display Resource usage Parsed table Log files Summarized table Database
Statistical Charts • Visualizes iRODS daily operational data
Log Analyzer • Provides error debugging tool 1. User clicks an bar 2. Error detail displayed 3. User clicks an error message 4. Related log displayed
Download SCALA • http://tgwww.kek.jp/scala/
iRODS XOR-based backup • Full file replication • Current method for reliable storage of data is replicate data • If disk fails or server fails still have a copy • Requires much storage space • Portion of the file becomes corrupt you have to replace the full file • XOR-based backup • Reduces the space with same robustness • Splits file into some blocks and creates parity blocks • If a block becomes corrupt you have to recreate only corrupted block
XOR-based backup:100% recovery with any 2 servers fail XOR-based backup uses 4 servers but only needs 200GB Full-File Replication uses 3 servers andneeds 300GB • iRODS rule enables automatic processing
Summary • KEK iRODS system has been running in production over 2 years • iRODS gives a way to quickly and easily access data outside of KEK • Rule of bundle and replicate the data leads to store files efficiently • Federation with QMUL enables to share each data and backup • SCALA is a visualizing tool and has been installed KEK iRODS • It leads to better management of the iRODS overall service • XOR-based backup provides data reliability and less storage cost compared with replication • iRODS rule enables automatic processing
Thank you for your attention! Wataru Takase wataru.takase@kek.jp