300 likes | 519 Views
May 23-25, 2011 4 th Belle II Computing Workshop Ljubljana, Slovenia. Belle II Data Handling System. Kihyeon Cho High Energy Physics Team KISTI (Korea Institute of Science and Technology Information). Contents. Belle II Data Handling Group Belle II Data Handling Scenario
E N D
May 23-25, 2011 4th Belle II Computing Workshop Ljubljana, Slovenia Belle II Data Handling System Kihyeon Cho High Energy Physics Team KISTI (Korea Institute of Science and Technology Information)
Contents • Belle II Data Handling Group • Belle II Data Handling Scenario • Embedment metadata system @ grid farm • Standalone metadata registration tool kit • Snapshot vs. query • Products • To do list • Summary
Belle II Data Handling Group • KISTI • HEP Team – JungHyun Kim, Kihyeon Cho, YoungJin Kim, Taegil Bae, … • AMGA Team – Soonwook Hwang, Sunil Ahn, Taesang Huh, Geunchul Park,… • Melbourne • Tom Fifield, Martin Sevior, … • Krokow • Maciej Sitarz, Mitosz Zdybat, Rafat Grzymkowski, … • KEK, Karlsruhe, etc…
Belle II Data Handling Group meeting • First and Third Thursday 5:00 PM (KEK Time) • Latest meeting • - May 12, 5:00 PM • Upcoming meeting • - June 2, 5:00 PM
The 3rd Belle II Computing Workshop • Date: November 22-24 (Mon-Wed.) 2010 • 1st day (Monday): Off-line and HLT software • 2nd day (Tuesday): Distributed Computing and Data Handling • 3rd day (Wednesday) • Morning – Overflow • Afternoon- Excursion • Place: KISTI, Daejeon, Korea • Participants: More than 30 persons from 9 countries • Just after Belle II General Meeting at KEK
News @ Belle II DH group • Manpower • AMGA team - Sunil Ahn, Taesang Huh, Geunchul Park, Soonwook Hwang (P.I)… • Sunil Ahn will be at USC, USA from June for one year. He will work there. • However, Taesang Huh is mainly in charge of AMGA stuffs. => To give a talk about his activities on Wednesday
Meetings • KISTI DH meeting • Date: April 27, 2011 11:00-13:00 • Participants: Kihyeon Cho, Junghyun Kim, Sunil Ahn, Taesang Huh, Soonwook Hwang, etc. • Contents: The status and plan • Korea Belle/Belle II Collaboration Meeting • Date: April 30, 2011 (Saturday) • Place: Hanyang University, Seoul, Korea • Report on Belle II DH Group – Kihyeon Cho
Meetings • Bell II DH meeting • Date: May 12, 2011 17:00-18:00 • Talks • Status of Belle II Data Handling group – Kihyeon Cho • Activities for data handling system – Taesang Huh • The 4th Belle II computing workshop • Date: May 23-25, 2011 • Place: Ljubljana, Slovenia ⇒ Three talks by KISTI • Kihyeon Cho (Monday) – status report • Junghyun Kim (Wednesday) – metadata system • Taesang Huh (Wednesday) – user-created metadata set
Upcoming meetings (cont’d) • Korea Belle/Belle II Collaboration meeting • Date: End of August, 2011 • Place: KISTI, Daejeon, Korea • BAM/BAS • Date: Sep. 26-27, 2011 (BAM) Sep. 28-30, 2011 (BAS) • Place: Lotte Hotel, Jeju Island, Korea • Host: KISTI, Korea U., Yonsei U. SNU, etc.
Belle II Data Handling scenario Raw Data Storage And Processing Tape Raw Data KEK Grid Site Grid Site CPU mDST Data Disk mDST MC Ntuples AMGA Client gbast2 UI Data Tools DIRAC Data Tools Data Tools MC Production (optional) Data Tools MC Production And Ntuple Production Cloud Ntuple Analysis Local Resources Local Resources Local Resources Local Resources Belle II computing model
USER ① KEK Detector ②,③ AMGA system data Tape Metadata Storage Analysis File Location LFC Computing Farm ④ File Replication GRID Site Storage File Location ① Metadata Query ⑥ ② List of Files & Events ③ List of Grid Sites ④Request Job Execution ⑤ ⑤ LFN to PFN Computing Farm LFC ⑥File Read Belle II data handling scenario
Status =>done • Constructed Belle II Data Handling System • Data Handling and Job Management for Belle II Grid => done • Test of Large Scale Data Handling • Belle Data at GSDC farm at KISTI => done • Around 125 TB for Belle data • Belle II Data (Random data) => • Realistic test (file level) => done Based on TDR Raw data: 100 M files Real : 4.3 M files MC: 12.5 M files • Test of High Frequency Test => done • Embedment metadata system @ grid farms
Embedment of metadata system @ grid farms UI Grid Farm Belle VO Belle II VO @ KISTI @ Melbourne @ Beijing @ KISTI @ Melbourne @ Ljubljana (not yet) AMGA Server • Sunil has written documentation of how to submit grid jobs. • We had tested AMGA Sever at HEP team(150.183.246.196). • AMGA Server at Melbourne is in Grid. • => To write draft for JKPS (Journal of the Korean Physical Society)
Scheme of test of metadata system @ grid farms KEK WN Tape Disk Melbourne KISTI UI UI AMGA System AMGA System Disk WN Disk WN Beijing UI
Log files => To write the draft for JKPS (Journal of the Korean Physical Society)
Standalone metadata registration tool kit Martin’s mail (April 14) => This is requested by the distributed computing group Data Now that many Belle skim datasets have been created and distributed around the world, I think it is very important we implement a dataset registration tool for our distributed computing solution. This is defined in redmine feature 196: http://ekpbelle2.physik.uni-karlsruhe.de/redmine/issues/196The feature would place a dataset on a grid enabled storage element, register it with the LFC and place the appropriate metadata associated with the skim in the AMGA meta-database. With this tool we can begin to use our distributed data analysis system to analyse Belle data. This project is particularly vital given the situation with B-computer at KEK. We have access to large amounts of computing power on the grid but without this feature we can't really use it. I thought that you might be interested working together in developing this feature since it involves data handling, AMGA and the python interface to AMGA. It would also give you a chance to become familiar with gbasf2.
Stand-alone metadata registration tool kit • Junghyun and Taesang Huh • C++ modulefor basf2 1st step. To install basf2 @ KISTI • The difference between basf2 and gbasf2? • The concept of basf2 compared to roabasf @Belle • To check log file message and metadata => Working on progress
Stand-alone metadata registration tool kit • Junghyun and Taesang Huh • C++ modulefor basf2 2nd step. C++ Module • Open metadata system • New metadata system with registration • Close metadata system • => To do list
Issues (Snap shot vs. Query) • Methods • Snap shot • Query • Snap-shot like method • Snapshot method • Official data → 1.5TB • User created data • User file (signal) → 70 MB • Subset of official data → up to1.5 TB/user =>huge => Talk by Junghyun on Wednesday
Snap shot vs. Query Metadata A 1. Snap shot B Duplicated Metadata Duplicated Metadata C … 2. Query Metadata A Query Query B C …
3. Snapshot-like Method Delete Delete OK? yes Metadata A No Keep metadata (Change Permission) Query Query B C … • Snap-shot style ⇒ snap-shot like style for user created data • To store queries which are not duplicated • To keep the metadata for old data
Snap shot vs. Query Can users create official data set? 3. Snapshot-like Method Yes Up to 1.5TB/user ⇒ We will work this if necessary. No Partially allowed? 2. Query Yes ※ We need Belle II policy. • 1.Can users create official data set? • (ex. Grand reprocessing data) • 2. Can users create data set partially? A few GB No 1. Metadata Snapshot Only for official data → 1.5 TB only
Products 2011 Products • International Conference talk • Meta-information system for Belle II experiment • J.H. Kim, YongPyong 2011 Conference, Korea, Feb. 2011 • Drafts for JKPS (Journal of the Korean Physical Society) • The embedment of metadata system at grid farms at the Belle II Experiment • Working on progress • by this month • User created metadata set • This summer
To do list To do list • Working on • Stand-alone metadata registration tool kit • Snap-shot like style • To restart event level R&D • Delayed • Data Transfer from HLT at Experiment hall to Computing center at KEK ⇒ Need a network expert • To move master node to KEK for making meta system for Belle II • To extract Metadata from Belle Data at KEK
Summary Summary • In order to handle 50 times more data of Belle, Belle II Data Handling Group works on: • Metadata system and data cache system based on grids • The scalability of Large Scale Data Handling • using Belle and Belle II data • Full service for user friendly • Keep going on
Plan 2014 • To do metadata system on grid ⇒ Full service • To do realization facility for production • To continue supporting and upgrade