230 likes | 241 Views
This paper introduces the distributed BELLE Analysis Framework (dBASF), a new computing system designed to handle the increasing data size and computational demands of the BELLE experiment. The dBASF utilizes a network of PC and SPARC servers to provide event-by-event parallel processing, resource usage optimization, and maximum I/O rate from tape servers. This new framework aims to support not only DST production but also user analysis, Monte Carlo simulation, and other parallel processing applications.
E N D
Distributed BELLEAnalysis Framework BELLE/CHEP2000
Introduction to B Factory at KEK • KEK-B • Accelerator is e +e - asymmetric energy collider: • 3.5GeV/c for positrons • 8.0GeV/c for electrons • Designed luminosity is 1.0 x 1034 cm-2s-1 • Now KEK-B is operated at ~5.0 x 1032 cm-2s-1 • BELLE Experiment • Goal of BELLE experiment is to study CP violation in B meson decays • Experiment is in progress at KEK BELLE/CHEP2000
SVD Precise vertex detection CDC Track momentum reconstruction Particle ID with dE/dx ACC Aerogel Cherenkov counter for particle ID TOF Particle ID and trigger ECL Electromagnetic calorimeter for e - and g reconstruction KLM Muon and KL and detection EFC Electromagnetic calorimeter for luminosity measurement BELLE Detector BELLE/CHEP2000
Current Event Reconstruction • Computing Environments • Event reconstruction is performed on 8 SMP machines • UltraEnterprise x 7 servers equipped with 28 CPUs • Total CPU power is 1,200 SEPCint95 • Sharing CPUs with user analysis jobs • MC production is done on PC farm (P3 500MHz x 4 x 16) • Reconstruction Speed • 15Hz/server • 70Hz/server with L4 (5.0 x 1032 cm-2s-1) BELLE/CHEP2000
Necessity for System Upgrade • In Future • We will have more luminosity • 200Hz after L4 (1.0 x 1034 cm-2s-1) • Data size may increase more • Possibly background • It causes lack of computing power • We need 10 times of current computing power when considering DST reproduction and user analysis activities BELLE/CHEP2000
Next Computing System • Low Cost Solution • We will build new computing farm with sufficient computing power • Computing servers will consist of • ~50 units of 4-CPU PC servers with Linux • ~50 units of 4-CPU SPARC servers with Solaris • Total CPU power will be 12,000 SPECint95 BELLE/CHEP2000
hub hub PC PC PC PC Configuration of Next System Tape I/O: 24MB/s FS tape library switch file server Sun Sun Sun Sun I/O servers Gigabit switch 100Base-T PC servers
Current Analysis Framework • BELLE AnalysiS Framework (B.A.S.F.) • B.A.S.F. supports event by event parallel processing on SMP machines hiding parallel processing nature from users • B.A.S.F. is currently used widely in BELLE from DST production to user analysis • We develop an extension to B.A.S.F. to utilize many PC servers connected via network to be used in next computing system BELLE/CHEP2000
New Analysis Framework • New Framework Should Provide: • Event by event parallel processing capability over network • Resource usage optimization • Maximize total CPU usage • Draw maximum I/O rate from tape servers • Capability of handling other purpose than DST production • User analysis, Monte Carlo simulation or anything • Application for parallel processing at university site • dBASF – Distributed B.A.S.F • Super-framework for B.A.S.F.
Link of dBASF Servers I/O B.A.S.F. Job Client init/term B.A.S.F. B.A.S.F. B.A.S.F. I/O report of resource usages dynamic change of node allocation Resource PC server SPARC PC server I/O B.A.S.F. PC server B.A.S.F. Job Client B.A.S.F. B.A.S.F. I/O
Communication among Servers • Functionality • Call function on a remote node by sending a message • Shared memory expanded over network space • Implementation • NSM – Network Shared Memory • House-grown product • Originally used for BELLE DAQ • Based on TCP and UDP BELLE/CHEP2000
Components of dBASF • dBASF Client • User interface • Accepts from user: • B.A.S.F. execution script • Number of CPUs to be allocated for analysis • Asks Resource manager to allocate B.A.S.F. daemons • Resource manager returns allocated nodes • Initiates B.A.S.F. execution on allocated nodes • Waits for completion • Notified from B.A.S.F. daemons when job ends BELLE/CHEP2000
Components of dBASF • Resource Manager • Collects resource usage from B.A.S.F. daemons through NSM shared memory • CPU load • Network traffic rate • Monitors idling B.A.S.F. daemons of each dBASF session • Increase/decrease number of allocated B.A.S.F. daemons dynamically when better assignment is discovered BELLE/CHEP2000
Components of dBASF • B.A.S.F. Daemon • Runs on each computing server • Accepts ‘initiation request’ from dBASF client and forks B.A.S.F. processes • Reports resource usage to Resource manager through NSM shared memory BELLE/CHEP2000
Components of dBASF • I/O Daemon • Reads tapes or disk files and distributes events to B.A.S.F. running on each node through network • Collects processed data from B.A.S.F. through network and writes them to tapes or disk files • In case of Monte Carlo event generation, event generator output is distributed to B.A.S.F. where detector simulation is running BELLE/CHEP2000
Components of dBASF • Miscellaneous Servers • Histogram server • Merges histogram data accumulated on each node • Output server • Collects standard out on each node and saves them to file BELLE/CHEP2000
Resource Management • Best Performance • Achieved when total I/O rate becomes maximum with minimum number of CPUs • Dynamic Load Balancing • CPU bound: • Increase number of Computing servers so that I/O speed becomes maximum • I/O bound: • Decrease number of Computing servers so as not to change I/O speed BELLE/CHEP2000
Resource Management • Load Balancing • When n now CPUs are assigned for a job, best assignment number of CPUs; n new is given by: BELLE/CHEP2000
initiate B.A.S.F. increase node report of resource usage decrease node B.A.S.F. B.A.S.F. B.A.S.F. B.A.S.F. B.A.S.F. terminate B.A.S.F. B.A.S.F. B.A.S.F. Resource Management B.A.S.F. B.A.S.F. Job Client B.A.S.F. B.A.S.F. best allocation? no Resource Job Client
PC servers STDOUT Histogram Data Flow SPARC Raw Data I/O B.A.S.F. TCP/IP B.A.S.F. PC servers B.A.S.F. B.A.S.F. I/O TCP/IP Processed Data BELLE/CHEP2000
Status • System test is in progress on BELLE PC farm consisting of 16 units of P3 550MHz x 4 servers • Node-to-node communication framework was developed and being tested • Resource management algorithm is under study • Basic speed test of network data transfer has been finished • Fastether: Point-to-Point, 1-to-n • GigbitEther: Point-to-point, 1-to-n • New computing system will be available in March 2001
Summary • We will build computing farm of 12,000 SPECint95 with PC Linux and Solaris servers to solve facing computing power shortness • We began to develop management scheme of computing system extending current analysis framework • We have developed communication framework and are studying resource management algorithm BELLE/CHEP2000