180 likes | 332 Views
Scheduling system for distributed MPD data processing. Gertsenberger K . V . Joint Institute for Nuclear Research , Dubna. NICA scheme. Multipurpose Detector (MPD).
E N D
Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna
NICA scheme GertsenbergerK.V.
MultipurposeDetector (MPD) The softwareMPDRootis developed for the MPD event simulation, reconstruction of experimental or simulated data and following physical analysis of heavy ion collisions registered by the MultiPurpose Detector at the NICA collider. GertsenbergerK.V.
Development of the NICA cluster 2 main directions of the development: • data storage developmentfor the experiment • organization of parallel processing of the MPD events development and expansiondistributed cluster for the MPD experimentbased on LHEP farm GertsenbergerK.V.
Current NICA clusterin LHEP GertsenbergerK.V.
Data storageon the NICA cluster Distributed file systemGlusterFS • it aggregates existing file systems in a common distributed file system • automatic replication works as background process • background self-checking service restores corrupted files in case of hardware or software failure GertsenbergerK.V.
Parallel MPD data processing concurrent data processing PROOF server parallel data processing in ROOT macros on the parallel architectures MPD-scheduler scheduling system for the task distribution to parallelize data processing on the cluster nodes GertsenbergerK.V.
MPD-scheduler • Developed on C++ language with ROOT classes’ support. SVN: mpdroot/macro/mpd_scheduler • Usesscheduling systemthe Sun Grid Engine system (qsub command) for execution in cluster mode. • SGE combinescluster machines at the LHEP farm(nc10, nc11 and nc13) into the pool of worker nodes with34 logical processors. • Jobs for distributed execution on the NICA cluster are described and passed to MPD-scheduler as XML file: $ mpd-scheduler my_job.xml GertsenbergerK.V.
Job description. Tag <macro>. <job> <macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000”add_args=“local”/> <file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/> <file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/> <file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/> <run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/> </job> • The description starts and ends with tag <job>. • Tag<macro> sets information about macro being executed by MPDRoot: • name– file path of a ROOT macro to execute, necessary parameter • start_event– number of the first event to process for all input files, optional • count_event – count of the events to process for all input files, optional • add_args– additional arguments of the ROOT macro, if required GertsenbergerK.V.
Job description. Tag <file>. <job> <macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000”add_args=“local”/> <file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/> <file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/> <file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/> <run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/> </job> • Tag <file> defines files to process by macro above: • input– input file path • output – result file path • start_event– number of the first event in the input file, optional • count_event – count of the events to processin the input file, optional • paralell_mode – processor count to parallel event processing of input file, optional • merge– whether merge result part files in parallel_mode, default: “true” GertsenbergerK.V.
Processing event files from MPD simulation database. <job> … <file db_input="mpd.jinr.ru, energy=3, gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/> … </job> • db_input – string for defining a list of files from MPD simulation database • mpd.jinr.ru– net address of the server with simulation database and some selection parameters: range of the collision energy, type of the particle generator, particles of the collision, descriptionand other. • The list of special variables ofargument “output”: • ${counter} = file counter with start value and step being equal 1 • ${input} = input file path • ${file_name} = name of the input file without extension • ${file_name_with_ext} = name of the input file with extension GertsenbergerK.V.
Job description. Tag <run>. <job> <macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000”add_args=“local”/> <file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/> <file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/> <file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/> <run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/> </job> • Tag <run> describes run parameters and the allocated resources for the job: • mode – execution mode: ‘global’ – distributed processing on the NICA cluster, ‘local’ – multithreaded executionon a multicore computer • count – maximum count of the processors allocated for this job • config – path of a bash file with environment variables (including ROOT environment variables) being executed before macro • logs– log file path for multithreaded mode GertsenbergerK.V.
Job description. Non-ROOT command. Tag <command> with argumentline is used to run a non-ROOT command. <job> <command line="get_mpd_prodenergy=5-9 "/> <run mode="global" config="~/mpdroot/build/config.sh"/> </job> Running non-ROOT command on the NICA cluster GertsenbergerK.V.
Local use MPD-scheduler can be used to parallel event processing on user multicore machine in local mode <job> <macro name=“~/mpdroot/macro/mpd/reco.C"/> <file input=“~/mpdroot/macro/mpd/evetest1.root" output="~/mpdroot/macro/mpd/mpddst1.root“ start_event=”0” count_event=”0”/> <file input="~/mpdroot/macro/mpd/evetest2.root" output="~/mpdroot/macro/mpd/mpddst2.root“ start_event=”0” count_event=”1000” parallel_mode=“5” merge=“true”/> <run mode="local" count=“6" config=“~/mpdroot/build/config.sh" logs="processing.log"/> </job> GertsenbergerK.V.
MPD-scheduler on the NICA cluster job_reco.xml <job> <macroname="~/mpdroot/macro/mpd/reco.C"/> <file input="$VMCWORKDIR/evetest1.root" output="$VMCWORKDIR/mpddst1.root"/> <file input="$VMCWORKDIR/evetest2.root" output="$VMCWORKDIR/mpddst2.root"/> <file input="$VMCWORKDIR/evetest3.root" output="$VMCWORKDIR/mpddst3.root"/> <runmode=“global" count=“3"config=“~/mpdroot/build/config.sh"/> </job> job_command.xml <job> <command line="get_mpd_productionenergy=5-9 "/> <run mode="global" config="~/mpdroot/build/config.sh"/> </job> GlusterFS *.root evetest3.root evetest1.root MPD-scheduler evetest2.root qsub mpddst3.root mpddst1.root mpddst2.root job_command.xml SGE SGE SGE SGE free free free (10) (10) (14) SGE= Sun Grid Engine server SGE batch system SGE = Sun Grid Engine worker GertsenbergerK.V. GertsenbergerK.V. 15
The speedup of the one reconstruction on the NICA cluster GertsenbergerK.V.
The description of the scheduling system on mpd.jinr.ru GertsenbergerK.V.
Conclusions • The distributed NICA cluster was deployed based on LHEP farm for the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Sun Grid Engine). 128 cores • The data storage was organized with the GlusterFS distributed file system: /nica/mpd[1-8]. 10 TB • The system for the distributed job execution – MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. It’s based on the Sun Grid Engine scheduling system. • The web sitempd.jinr.ru insectionComputing –NICA cluster – Batch processing presents the manual for the developed MPD scheduling system. GertsenbergerK.V.