Computing for Belle

Computing for Belle ISGC2005 April 27, 2005 Nobu Katayama KEK

Outline • Belle in general • (Software) • Computing • (Production) • (Networking and Collaboration issues) • Super KEKB/Belle/Belle computing upgrades Nobu Katayama(KEK)

Belle detector Nobu Katayama(KEK)

Integrated luminosity • May 1999:first collision • July 2001:30fb-1 • Oct. 2002:100fb-1 • July 2003:150fb-1 • July 2004:287fb-1 • Apr 2005:400fb-1 Super KEKB 2~10ab-1/year! >1 fb-1/day ! >1TB raw data/day more than 20% is hadronic events Nobu Katayama(KEK)

Summary of b  sqq CPV - “sin2f1” Something new in the loop?? 2.7 We need a lot more data (A: consistent with 0) Nobu Katayama(KEK)

Something new? Nobu Katayama(KEK)

The Belle Collaboration Seoul National U. Shinshu U. Sungkyunkwan U. U. of Sydney Tata Institute Toho U. Tohoku U. Tohuku Gakuin U. U. of Tokyo Tokyo Inst. of Tech. Tokyo Metropolitan U. Tokyo U. of A and T. Toyama Nat’l College U. of Tsukuba Utkal U. VPI Yonsei U. BINP Chiba U. Chonnam Nat’l U. Chuo U. U. of Cincinnati Ewha Womans U. Frankfurt U. Gyeongsang Nat’l U. U. of Hawaii Hiroshima Tech. IHEP, Beijing IHEP, Moscow IHEP, Vienna ITEP Kanagawa U. KEK Korea U. Krakow Inst. of Nucl. Phys. Kyoto U. Kyungpook Nat’l U. U. of Lausanne Jozef Stefan Inst. U. of Melbourne Nagoya U. Nara Women’s U. National Central U. Nat’l Kaoshiung Normal U. Nat’l Lien-Ho Inst. of Tech. Nat’l Taiwan U. Nihon Dental College Niigata U. Osaka U. Osaka City U. Panjab U. Peking U. Princeton U. Riken-BNL Saga U. USTC 13 countries, ~56 institutes, ~400 members Nobu Katayama(KEK)

Collaborating institutions • Collaborators • Major labs/universities from Russia, China, India • Major universities from Japan, Korea, Taiwan, Australia… • Universities from US and Europe • KEK dominates in one sense • 30~40 staffs work on Belle exclusively • Most of construction and operating costs are paid by KEK • Universities dominates in another sense • Young students to stay at KEK, help operations, do physics analysis • Human resource issue • Always lacking man power Nobu Katayama(KEK)

Computing

Computing Equipment budgets • Rental system • Four  five year contract (20% budget reduction!) • 1997-2000 (25Byen;~18M euro for 4 years) • 2001-2005 (25Byen;~18M euro for 5 years) • A new acquisition process is starting for 2006/1- • Belle purchased systems • KEK Belle operating budget 2M Euro/year • Of 2 M Euro, 0.4~1Meuro/year for computing • Tapes(0.2MEuro), PCs(0.4MEuro) etc. • Sometimes we get bonus(!) • so far in five years we got about 2M Euro in total • Other institutions • 0~0.3Meuro/year/institution • On the average, very little money allocated Nobu Katayama(KEK)

PC farm of several generations Integrated Luminosity Total # of GHz Dell 36PCs (Pentium-III ~0.5GHz) Compaq 60PCs (Pentium-III 0.7GHz) 99 00 01 02 03 04 168GHz 320GHz 1020GHz Dell 150PCs (Xeon 3.4GHz×2) Fujitsu 127PCs (Pentium-III 1.26GHz) 377GHz 768GHz 470GHz Fujitsu 120PCs (Xeon 3.2GHz×2) Appro 113PCs (Athlon 1.67GHz×2) NEC 84PCs (Xeon 2.8GHz×2) Nobu Katayama(KEK)

Disk servers@KEK • 8TB NFS file servers • Mounted on all UltraSparc servers via GbE • 4.5TB staging disk for HSM • Mounted on all UltraSparc/Intel PC servers • ~15TB local data disks on PCs • generic MC files are stored and used remotely • Inexpensive IDE RAID disk servers • 160GB  (7+1)  16 = 18TB @ 100K Euro (12/2002) • 250GB  (7+1)  16 = 28TB @ 110K Euro (3/2003) • 320GB  (6+1+S)  32 = 56TB @ 200K Euro (11/2003) • 400GB  (6+1+S)  40 = 96TB @ 250K Euro (11/2004) • With a tape library in the back end Nobu Katayama(KEK)

Tape libraries • Direct access DTF2 tape library • 40 drives (24MB/s each) on 20 UltraSparc servers • 4 drives for data taking (one is used at a time) • Tape library can hold 2500 tapes (500TB) • We store raw data and dst using • Belle tape IO package (allocate, mount, unmount, free) • perl script (files to tape database), and • LSF (exclusive use of tapes) • HSM backend with three DTF2 tape libraries • Three 40TB tape libraries are back-end of 4.5TB disks • Files are written/read as all are on disk • When the capacity of the library becomes full, we move tapes out of the library and human operators insert tapes when requested by users by mails Nobu Katayama(KEK)

Data size so far • Raw data • 400TB written since Jan. 2001 for 230 fb-1 of data on 2000 tapes • DST data • 700TB written since Jan. 2001 for 230 fb-1 of data on 4000 tapes, compressed with zlib • MDST data (four vectors, verteces and PID) • 15TB for 287 fb-1 of hadronic events (BBbar and continuum), compressed with zlib • t, two photon: add 9TB for 287 fb-1 Total ~1PB in DTF(2) tapes plus 200+TB in SAIT DTF(2) tape usages DTF2(200GB) tapes 2001-2004 Number of tapes DTF(40GB) tapes from 1999-2000 • 2002 2003 2004 Nobu Katayama(KEK)

Compact, inexpensive HSM • The front-end disk system consists of • 18 dual Xeon PC servers with two SCSI channels • 8(10) connecting one 16 320(400)GB IDE disk RAID system • Total capacity is 56(96)TB • The back-end tape system consists of • SONY SAIT Petasite tape library system in seven rack wide space; main system (12 drives) + five cassette consoles with total capacity of 1.3 PB (2500 tapes) • They are connected by a 16+32 port 2Gbit FC switch • Installed in Nov. 2003~Feb. 2004 and in fall 2004 and is working well • We keep all new data on this system • Lots of disk failures (even power failures) but system is surviving Nobu Katayama(KEK)

150 150 150 EX EX EX D B C Floor space required DTF2 library systems Lib（~500TB） ~141㎡ BHSM（~150TB） ~31㎡ Backup（~13TB） ~16㎡ D D D D D D D D 12,285mm B J D D D 3,100mm C C C 3,970mm C SAIT library systems C 2,950mm 3,325mm C 200BF 200C 200C 200C 200C 200C 200C C 4,792mm 6,670mm Nobu Katayama(KEK) 11,493mm

Use of LSF • We have been using LSF since 1999 • We started using LSF on PCs since 2003/3 • Of ~1700 CPUs (as of 2005/3), ~1300 CPU are under LSF and being used for DST production, generic MC generation, calibration, signal MC generation and user physics analyses • DST production uses it’s own distributed computing (dbasf) and child nodes do not run under LSF • All other jobs share the CPUs and dispatched from LSF • For users we use fair share scheduling • We are using new features in LSF 6.0 such as report and multi-clusters. We hope to use it a collaboration wide job management tool Nobu Katayama(KEK)

Rental system plan for Belle • We have started the lengthy process of the computer acquisition for 2006 (-2012) • 40,000 specCINT2000_rates compute servers • 70,000 sC2000R in 2009 • 5(1)PB tape(disk) storage system with extensions • 13(3)TB tape(disk) in 2011 • fast enough network connection to read/write data at the rate of 2-10GB/s (2 for DST, 10 for physics analysis) • User friendly and efficient batch system that can be used collaboration wide In a single six-year lease contract we hope to double the resource in the middle, assuming Moore’s law in the IT commodity market Nobu Katayama(KEK)

Mainly used for physics background study 400GHz Pentium III~2.5fb-1/day 80~100GB/fb-1 data in the compressed format No intermediate (GEANT3 hits/raw) hits are kept. When a new release of the library comes, we try to produce new generic MC sample For every real data taking run, we try to generate 3 times as many events as in the real run, taking Run dependence Detector background are taken from random trigger events of the run being simulated into account 644M events at remote institutes and 864M events at KEK have been generated since 2004/04 However we need4725M events for 3350fb-1 We need ~300TB of disk space for generic MC generic MC production M events produced since Apr. 2004 Nobu Katayama(KEK)

Belle hasn’t made commitment to Grid technology (We hope to learn more here) Belle@KEK has not but Belle@remote institutions might have been involved, in particular, when they are also involved in one of the LHC experiments Parallel/distributed computing aspect Nice but we do have our own solution Event (trivial) parallelism works However, as we accumulate more data, we hope to have more parallelism (several tens to hundreds to thousands of CPUs) Thus we may adopt a standard solution Parallel/distributed file system Yes we need something better than many UNIX files on many nodes For each “run” we create ~1000 files; we have had O(10,000) runs so far Collaboration wide issues WAN aspect Network connections are quite different from one institution to another authentication We use DCE to login We are testing VPN with OTP CA based system would be nice resource management issues We want submit jobs to remote institutions if CPU is available Grid Plan for Belle Nobu Katayama(KEK)

Belle’s attempts • We have separated stream IO package so that we can connect to any of (Grid) file management packages • We started working with remote institutions (Australia, Tohoku, Taiwan, Korea) • SRB (Storage Resource Broker by San Diego Univ.) • We constructed test beds connecting Australian institutes, Tohoku university using GSI authentication • gfarm (AIST) • We are using gfarm (cluster) at AIST to generate generic MC events Nobu Katayama(KEK)

Human resources • KEKB computer system+Network • Supported by the computer center (1 researcher, 3~4 system engineers+1 hardware engineer, 2~3 operators) • PC farms and Tape handling • 1 KEK/Belle researcher, 2 Belle support staffs (they help productions as well) • DST/MC production management • 2 KEK/Belle researchers, 1 pos-doc or student at a time from collaborating institutions • Library/Constants database • 2 KEK/Belle researchers + sub detector groups We are barely surviving and are proud of what we have been able to do; make the data usable for analyses Nobu Katayama(KEK)

Super KEKB, S-Belle and Belle computing upgrades

The Standard Model of particle physics cannot really explain the origin of the universe It does not have enough CP violation Models beyond the SM have many CP violating phases The is no principle to explain flavor structure in models beyond the Standard model Why Vcb>>Vub What about the lepton sector? Is there a relationship between quark and neutrino mixing matrices? We know that Standard Model (incl. Kobayashi-Maskawa mechanism) is the effective low energy description of Nature. However most likely New Physics lies in O(1) TeV region. LHC will start within a few years and (hopefully) discover new elementary particles. Flavor-Changing-Neutral-Currents (FCNC) suppressed New Physics w/o suppression mechanism excluded up to 103 TeV.  New Physics Flavor Problem Different mechanism  different flavor structure in B decays (tau, charm as well) More questions in flavor physics Luminosity upgrade will be quite effective and essential. Nobu Katayama(KEK)

Projected Luminosity 50ab-1 “Oide scenario” SuperKEKB Nobu Katayama(KEK)

With more and more data We now have more than 250M B0 on tape. We can not only observe very rare decays but also measure time dependent asymmetry ot the rare deay modes and determine the CP phases 68 events 2716 events enlarged Nobu Katayama(KEK)

I xy by* L ∝ SuperKEKB: schematics Super Belle (Super Quad) 8 GeV e+ beam 4.1 A 3.5 GeV e- beam 9.6 A 4 x 3 0.5 ~24 Super B Factory at KEK Nobu Katayama(KEK)

Computing for super KEKB • For (even) 1035 luminosity; • DAQ:5KHz, 100KB/event500MB/s • Physics rate: BBbar@100Hz • 1015 bytes/year:1PB/year • 800 4GHz CPUs to catch up data taking • 2000 4GHz 4CPU PC servers • 10+PB storage system (what media?) • 300TB-1PB MDST/year online data disk • Costing >50 M Euro? Nobu Katayama(KEK)

Conclusions • Belle has accumulated 350fb-1 of BBbar events so far (Mar, 2005); data are fully processed; everyone is enjoying doing physics (tight competition with BaBar) • A lot of computing resource are added to the KEKB computer system to handle floods of data • The management team remains small (less than 10 KEK staffs (not full time, two of whom are group leasers of sub detectors) + less than 5 SE/CE) in particular, the PC farms and the new HSM are managed by a few people • We look for the GRID solution for us at KEK and for the rest of Belle collaborators Nobu Katayama(KEK)

Continuous Injection No need to stop a run Always at ~max. currents and therefore maximum luminosity [CERN courier Jan/Feb 2004] ~30% more Ldt both KEKB & PEP-II continuous injection (new) normal injection (old) HER current LER current Luminosity - 0 12 24 Time Nobu Katayama(KEK) >1 fb-1/day !(>~1x106 BB)

Software

Core Software • OS/C++ • Solaris 7 on sparc and RedHat 6/7/9 on PCs • gcc 2.95.3/3.0.4/3.2.2/3.3 (code compiles with SunCC) • No commercial software except for batch queuing system and hierarchical storage management system • QQ, EvtGen, GEANT3, CERNLIB (2001/2003), CLHEP(~1.5), postgres 7 • Legacy FORTRAN code • GSIM/GEANT3/ and old calibration/reconstruction code) • I/O:home-grown stream IO package + zlib • The only data format for all stages (from DAQ to final user analysis skim files) • Index file (pointer to events in data files) are used for final physics analysis Nobu Katayama(KEK)

Reconstruction software • 30~40 people have contributed in the last several years • For many parts of reconstruction software, we only have one package. Very little competition • Good and bad • Identify weak points and ask someone to improve them • Mostly organized within the sub detector groups • Physics motivated, though • Systematic effort to improve tracking software but very slow progress • For example, 1 year to get down tracking systematic error from 2% to less than 1% • Small Z bias for either forward/backward or positive/negative charged tracks • When the problem is solved we will reprocess all data again Nobu Katayama(KEK)

Analysis software • Several ~ tens of people have contributed • Kinematical and vertex fitter • Flavor tagging • Vertexing • Particle ID (Likelihood) • Event shape • Likelihood/Fisher analysis • People tend to use standard packages but… • System is not well organized/documented • Have started a task force (consisting of young Belle members) Nobu Katayama(KEK)

Postgresql database system • The only database system Belle uses • other than simple UNIX files and directories • A few years ago, we were afraid that nobody uses postgresql but it seems postgresql is now widely used and well maintained • One master, several copies at KEK, many copies at institutions/on personal PCs • ~120,000 records (4.3GB on disk) • IP (Interaction point) profile is the largest/most popular • It is working quite well although consistency among many database copies is the problem Nobu Katayama(KEK)

Rental system(2001-2005) total cost in five years (M Euro) Nobu Katayama(KEK)

Belle’s reference platform Solaris 2.7 Everyone has account 9 workgroup servers (500Hz, 4CPU) 38 compute servers 500GHz, 4CPU LSF batch system 40 tape drives (2 each on 20 servers) Fast access to disk servers 20 user workstations with DAT, DLT, AITs Maintained by Fujitsu SE/CEs under the big rental contract Compute servers (@KEK, Linux RH 6.2/7.2/9) User terminals (@KEK to log onto the group servers) 120 PCs (~50Win2000+X window sw, ~70 Linux) User analysis PCs(@KEK, unmanaged) Compute/file servers at universities A few to a few hundreds @ each institution Used in generic MC production as well as physics analyses at each institution Tau analysis center @ Nagoya U. for example Sparc and Intel CPUs Nobu Katayama(KEK)

Commercial software from SONY We have been using it since 1999 on Solaris It now runs on Linux (RH7.3 based) It used Data Management API of XFS developed by SGI When the drive and the file server is connected via FC switch, the data on disk can directly be written to tape Minimum size for migration and the size of which the first part of the file remains on disk can be set by users The backup system, PetaBack, works intelligently with PetaServe; In particular, in the next version, if the file has been shadowed (one copy on tape, one copy on disk), it will not be backed up, saving space and time So far, it’s working extremely well but Files must be distributed by ourselves among disk partitions (now 32, in three months many more as there is 2TB limit…) Disk is a file system and not cache HSM disk can not make an effective scratch disk There is a mechanism to do nightly garbage collection if the users delete files by themselves PetaServe: HSM software Nobu Katayama(KEK)

Mass storage strategy • Development of next gen. DTF drives was canceled • SONY’s new mass storage system will use SAIT drive technology (Metal tape, helical scan) • We decided to test it • Purchased a 500TB tape library and installed it as the backend (HSM) using newly acquired inexpensive IDE based RAID systems and PC file servers • We are moving from direct tape access to hierarchical storage system • We have learned that automatic file migration is quite convenient • But we need a lot of capacity so that we do not need operators to mount tapes Nobu Katayama(KEK)

Production

As we take data, we write raw data directly on tapes (DTF2), at the same time we run dst production code (event reconstruction) using 85 dual Athlon 2000+ PC servers See Itoh san’s talk on RFARM on 29th in Online computing session Using the results of event reconstruction we send feedback to the KEKB accelerator group such as location and size of the collision so that they can tune the beams to maximize instantaneous luminosity, keeping them collide at the center We also monitor BBbar cross section very precisely and we can change the machine beam energies by 1MeV or so to maximize the number of BBbars produced The resulting DST are written on temporary disks and skimmed for detector calibration Once detector calibration is finished, we run the production again and make DST/mini-DST for final physics analysis Online production and KEKB Luminosity Vertex Z position Changed “RF phase” corresponding to .25 mm Nobu Katayama(KEK)

Online data processing Level 3 software trigger Recorded 4710231 events (52%) Used 186.6[GB] Run 39-53 (Sept. 18, 2004) 211 pb-1 accumulated Accepted 9,029,058 events Accept rate 360.30Hz Run time 25,060s Peak Lum was 91033, we started running from Sept. 9 DTF2 tape library RFARM 82 dual Athlon 2000+ servers More than 2M BBbar events! Total CPU time: 1812887 sec L4 outputs: 3980994 events 893416 Hadrons DST:381GB(95K/ev) Skims: m pair 115569 (5711 are written) Bhabha: 475680 (15855) Tight Hadron 687308 (45819) New SAIT HSM Nobu Katayama(KEK)

Reprocessing strategy Goal:3 months to reprocess all data using all KEK compute servers Often we have to wait for constants Often we have to restart due to bad constants Efficiency:50~70% History 2002: Major software updates 2002/7 Reprocessed all data till then (78fb-1 in three months) 2003: No major software updates 2003/7 Reprocessed data taken since 2002/10 (~60fb-1 in three months) 2004: SVD2 was installed in summer 2003 Software is all new and being tested till mid May 2004/7 Reprocessed data taken since 2003/10 (~130fb-1 in three months) DST production Nobu Katayama(KEK)

Elapsed time (actual hours) Dbase update Belle incoming data rate (1fb-1/day) dE/dx Setup ToF (RFARM) Nobu Katayama(KEK)

Comparison with previous system -2004 -2003 5 fb-1/day See Adachi/Ronga poster Using 1.12THz total CPU we achieved >5 fb-1/day Lprocessed/day(pb-1) day New! Old… Nobu Katayama(KEK)

Nobu Katayama(KEK)

Network/data transfer

KEKB computer system has a fast internal network for NFS We have added Belle bought PCs, now more than 500 PCs and file servers We have connected Super-SINET dedicated 1Gbps lines to four universities We have also requested that we connect this network to outside KEK for tests of Grid computing Finally, a new Cisco 6509 has been added to separate the above three networks A firewall and login servers make the data transfer miserable (100Mbps max.) DAT tapes to copy compressed hadron files and MC generated by outside institutions Dedicated GbE network to a few institutions are now being added Total 10Gbit to/from KEK being added Still slow network to most of collaborators Networks and data transfer Nobu Katayama(KEK)

Data transfer to universities We use Super-SINET, APAN and other international Academic networks as the back bone of the experiment e+e- Bo Bo Tsukuba Hall Belle Tohoku U 400 GB/day ~45 Mbps ~ 1TB/day NFS 10Gbps Osaka U Australia U.S. Korea Tiwan Erc. 1TB/day ~100Mbps 170 GB/day KEK Computing Research centor U of Tokyo Nagoya U TIT Nobu Katayama(KEK)

Roadmap of B physics Tevatron (m~100GeV)  LHC (m~1TeV) KEKB (1034)  SuperKEKB (1035) Concurrent program Identification of SUSY breaking mechanism Anomalous CPV in bgsss if NP=SUSY sin2f1, CPV in Bgpp, f3,Vub, Vcb, bgsg, bgsll, new states etc. Study of NP effect in B and t decays time or integrated luminosity Precise test of SM and search for NP Yes!! 2001 NP discovered at LHC (2010?) Discovery of CPV in B decays Now 280 fb-1 Nobu Katayama(KEK)

Computing for Belle