Belle computing upgrade

Belle computing upgrade Ichiro Adachi 22 April 2005 Super B workshop in Hawaii

Belle’s computing goal • Data processing • 3 months to reprocess entire data accumulated so far using all of KEK computing resources • efficient resources • flexibility • Successful ( I think at least ) • 1999 - 2004 all data processed and used for analysis for summer conferences ( good or bad? ) • Example: DsJ(2317) from David Brown’s CHEP04 talk • BaBar discovery paper : Feb 2003 • Belle: confirm DsJ(2317) : Jun 2003 • Belle: discover B DsJ(2317)D: Oct 2003 • BaBar: confirm B DsJ(2317)D: Aug 2004 also validate software reliability “How can we keep computing power ?”

Present Belle computing system Tape Library 500TB DTF2 Sparc 0.5GHz 8TB disk 50TB disk Athron 1.67GHz HSM 4TB disk Tape Library 120TB DTF2 50TB IDE disk Xeon 0.7GHz 155 TB disk + Tape Library 1.29PB S-AIT Athron 1.67GHz Pen3 1.26GHz Xeon 3.2GHz Xeon 3.4GHz • 2 major components • under rental contract • start from 2001 • Belle own system Xeon 2.8GHz

Computing resources evolving • Purchased what we needed as we accumulated integrated luminosities so far • Rental system contract • Expired on 2006 Jan. • Has to be replaced to new one GHz TB TB CPU HSM volume Disk capacity Processing power at 2005: 7fb-1/day  5fb-1/day at 2004

New rental system • Specifications • Based on Oide’s luminosity scenario • 6-year contract to 2012 Jan • Middle of bidding process • 40,000 specCINT2000_rates compute servers at 2006 • 5(1)PB tape(disk) storage system with extensions • fast enough network connection to read/write data at the rate of 2-10GB/s (2 for DST, 10 for physics analysis) • User friendly and efficient batch system that can be used collaboration wide x 6 data Rental period • In a single 6-year lease contract we hope to double the resource in the middle, assuming Moore’s law in the IT commodity market

Lessons and remarks • Data size and access • Mass storage • Hardware • Software • Compute server

Data size & access • Possible consideration • rawdata • rawdata size  integ. lum • 1 PB for 1 ab-1 (at least) • Read once or twice/year • Keep at archive • compact beam data for analysis (“mini-DST”) • 60 TB for 1 ab-1 • Access frequently and (almost) randomly • Easy access preferable • MC • 180 TB for 1 ab-1 •  3 beam data in Belle’s law • Read all data files by most of users rawdata/yr(TB) Detector & accelerator upgrades can change this slope Belle 2004 2003 2002 2000 2001 Integ.luminosity/yr(fb-1) on disk on disk? where to go?

Mass storage : hardware • Central system in the coming computing • Lesson from Belle • We have been using SONY DTF drive technology since 1999. • SONY DTF2…No roadmap to future development. Dead-end. SONY’s next technology choice is S-AIT. • Testing a tape library of S-AIT from 2004. • Already recorded in 5000 DTF2 tapes. We have to move…  vendor’s trend  cost & time • The front-end disks • 18 dual Xeon PC servers with two SCSI channels • 8(10) connecting one 16 320(400)GB IDE disk RAID system • Total capacity is 56(96)TB • The back-end S-AIT system • SONY Petasite tape library system in 7 rack wide space • main system (12 drives) + 5 cassette consoles with total capacity of 1.3 PB (2500 tapes) 2Gbit FC switch

Mass storage : software • 2nd lesson • We are moving from direct tape access to hierarchical storage system • We have learned that automatic file migration is quite convenient. • But we need a lot of capacity so that we do not need operators to mount tapes • Most of users go through all of (MC) data available in HSM, and each access from user is random, not controlled at all. • Each access requires tape reloading to copy data onto disk. • # of reloading for a tape is hitting its limit !  in our usage,HSM not archive, but a big cache  need optimization in both of HSM control & user I/O  huge disk may help ?

Compute server • 40,000 specCINT2000_rate at 2006 • Assume Moor’s law is still valid for coming years • Bunch of PC’s is difficult for us to manage • At Belle, limited human resources • Belle software distribution • “Space” problem • One floor of Tsukuba exp. hall B3(~10m20m) • 2002 cleared and flooring  2005 full ! No more space ! • Air condition system should be equipped  “electricity” problem:~500W for dual 3.5GHz CPUs • Moor’s law is not enough to solve this problem

Software • Simulation & reconstruction • Geant4 framework for Super Belle detector underway • Simulation with beam background is being done • For reconstruction, robustness against BG can be a key.

Grid • Distributed computing at Belle • MC production carried out at 20 sites outside KEK • ~45 % of MC events produced at remote institutes from 2004 • Infrastructure • Super-SINET 1Gbps to major universities inside Japan • Need improvements for other sites • Grid • Should help us • Effort with KEK computing research center • SRB(storage resource broker) • Gfarm at Grid technology research center, National Institute of Advanced Industrial Science and Technology(AIST)

Summary • Computing for physics output • Try keeping the present goal • Rental system • Renew from 2006 Jan • Mass storage • PB scale: not only size but also type of accesses • Technology choice and vendor’s roadmap • CPU • Moor’s law alone does not solve “space” problem • Software • Geant4 simulation underway • Grid • Infrastructure getting better in Japan (SuperSINET)

Belle computing upgrade