230 likes | 412 Views
Development of the distributed computing system for the MPD at the NICA collider, analytical estimations. Gertsenberger K . V . Joint Institute for Nuclear Research , Dubna. Mathematical Modeling and Computational Physics 2013. NICA scheme. MMCP’2013. Multipurpose Detector (MPD).
E N D
Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna Mathematical Modeling and Computational Physics 2013
NICA scheme GertsenbergerK.V. MMCP’2013
MultipurposeDetector (MPD) The softwareMPDRootis developed for the event simulation, reconstruction and physical analysisof the heavy ions’ collision registered by MPD at the NICA collider. GertsenbergerK.V. MMCP’2013
Prerequisites of the NICA cluster • high interaction rate (to 6 KHz) • high particle multiplicity, about 1000 charged particles for the central collision at the NICA energy • one event reconstruction takes tens of seconds in MPDRoot now, 1M events – months • large data stream fromthe MPD: 100k events~ 5 TB 100 000k events ~ 5 PB/year • unified interface for parallel processing and storing of the event data GertsenbergerK.V. MMCP’2013
Development of the NICA cluster 2 main lines of the development: • data storage developmentfor the experiment • organization of parallel processing of the MPD events development and expansiondistributed cluster for the MPD experimentbased on LHEP farm GertsenbergerK.V. MMCP’2013
Current NICA clusterin LHEPforMPD GertsenbergerK.V. MMCP’2013
Distributed file systemGlusterFS • aggregatesthe existing file systems in common distributed file system • automatic replication works as background process • background self-checking service restores corrupted files in case of hardware or software failure • implemented on application layer and working in user space GertsenbergerK.V. MMCP’2013
Data storageon the NICA cluster GertsenbergerK.V. MMCP’2013
Development of the distributed computing system NICA cluster concurrent data processing on cluster nodes PROOF server parallel data processing in a ROOT macro on the parallel architectures MPD-scheduler scheduling system for the task distribution to parallelize data processing on cluster nodes GertsenbergerK.V. MMCP’2013
Parallel data processing with PROOF • PROOF (Parallel ROOT Facility) – the part ofthe ROOT software, no additional installations • PROOF uses data independent parallelism based on the lack of correlation for MPD events good scalability • Parallelization for three parallel architectures: • PROOF-Lite parallelizesthe data processing on one multiprocessor/multicores machine • PROOF parallelizes processing on heterogeneous computing cluster • Parallel data processing inGRID • Transparency: the same program code can execute both sequentially and concurrently GertsenbergerK.V. MMCP’2013
Using PROOF inMPDRoot • The last parameter of the reconstruction: run_type (default,“local”). Speedup on the user multicore machine: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof”) parallel processing of 1000 events with thread count being equal logical processor count $ root reco.C (“evetest.root”, “mpddst.root”, 0, 500, “proof:workers=3”) parallel processing of 500 events with 3 concurrent threads Speedup on the NICA cluster: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof:mpd@nc8.jinr.ru:21001”) parallel processing of 1000 events on all cluster nodes of PoD farm $ root reco.C (“eve”, “mpddst”, 0, 500,“proof:mpd@nc8.jinr.ru:21001:workers=10”) parallel processing of 500 events on PoD cluster with 10 workers GertsenbergerK.V. MMCP’2013
Speedup of the reconstruction on 4-cores machine GertsenbergerK.V. MMCP’2013
PROOF on the NICA cluster event count $ root reco.C(“evetest.root”,”mpddst.root”, 0, 3, “proof:mpd@nc8.jinr.ru:21001”) GlusterFS mpddst.root *.root evetest.root event №2 event №0 event №1 proof proof proof proof proof proof proof proof (8) (8) (24) (16) (16) (24) (32) proof = master server Proof On Demand Cluster proof = slave node GertsenbergerK.V. MMCP’2013
Speedup of the reconstruction on the NICA cluster GertsenbergerK.V. MMCP’2013
MPD-scheduler • Developed on C++ language with ROOT classes support. • Usesscheduling systemSun Grid Engine (qsub command) for execution in cluster mode. • SGE combinescluster machines on LHEP farminto the pool of worker nodes with 78 logical processors. • The job for distributed execution on the NICA cluster is described and passed to MPD-scheduler as XML file: $ mpd-scheduler my_job.xml GertsenbergerK.V. MMCP’2013
Job description <job> <macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000”add_args=“local”/> <file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/> <file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/> <file db_input="mpd.jinr.ru*,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/> <run mode="local" count="5" config=“~/build/config.sh" logs="processing.log"/> </job> The description starts and ends with tag <job>. Tag<macro> sets information about macro being executed by MPDRoot Tag <file> defines files to process by macro above Tag <run> describes run parameters and allocated resources * mpd.jinr.ru – server name with production database GertsenbergerK.V. MMCP’2013
Job execution on the NICA cluster job_reco.xml <job> <macroname="~/mpdroot/macro/mpd/reco.C"/> <file input="$VMCWORKDIR/evetest1.root" output="$VMCWORKDIR/mpddst1.root"/> <file input="$VMCWORKDIR/evetest2.root" output="$VMCWORKDIR/mpddst2.root"/> <file input="$VMCWORKDIR/evetest3.root" output="$VMCWORKDIR/mpddst3.root"/> <runmode=“global" count=“3"config=“~/mpdroot/build/config.sh"/> </job> job_command.xml <job> <command line="get_mpd_productionenergy=5-9 "/> <run mode="global" config="~/mpdroot/build/config.sh"/> </job> GlusterFS *.root evetest3.root evetest1.root MPD-scheduler evetest2.root qsub SGE mpddst3.root mpddst1.root mpddst2.root job_command.xml SGE SGE SGE SGE SGE SGE SGE free free busy free busy busy busy (8) (8) (24) (16) (16) (24) (32) SGE= Sun Grid Engine server SGE batch system SGE = Sun Grid Engine worker GertsenbergerK.V. GertsenbergerK.V. 17 MMCP’2013 MMCP’2013
Speedup of the one reconstruction on NICA cluster GertsenbergerK.V. MMCP’2013
NICA cluster section on mpd.jinr.ru GertsenbergerK.V. MMCP’2013
Conclusions • The distributed NICA cluster was deployed based on LHEP farm for the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Torque, Maui). 128cores • The data storage was organized with distributed file systemGlusterFS: /nica/mpd[1-8]. 10 TB • PROOF On Demand cluster was implemented to parallelize event data processing for the MPD experiment, PROOF support was added to the reconstruction macro. • The system for the distributed job execution MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. • The web sitempd.jinr.ru insectionComputing –NICA cluster presents the manuals for the systems described above. GertsenbergerK.V. MMCP’2013
Analytical model for parallel processing on cluster speedup for point (data independent) algorithm of image processing Pnode– count of logical processors, n – data to process (byte), ВD – speed of the data access (MB/s), T1 – “pure”time of the sequential processing (s) GertsenbergerK.V. MMCP’2013
Prediction of the NICA computing power How many are logical processors required to process NTASKphysical analysis tasks and onereconstruction within Tdaydays in parallel? Ifn1 = 2 MB, NEVENT = 10000000 events, TPA = 5 s/event, TREC = 10 s/event., BD = 100 MB/s, Tday = 30 days GertsenbergerK.V. MMCP’2013