140 likes | 154 Views
Update on Panda GRID management relocation, services status, job statistics, database changes, and issues encountered. Discussion on Panda-ALICE relationship, GRID users, developers, and future development goals. Addressing database access, middleware development, system bugs, and collaboration improvements.
E N D
Panda Grid Status Kilian Schwarz, GSI on behalf of PANDA GRID Group (slides to a large extend from Radoslaw Karabowicz)
Central services, LDAP, DB and ML transfers • Phone meeting on 1st Feb 2012 • Till end of February GRID management center has to be moved out of Glasgow, including: • Lightweight Directory Access Protocol (LDAP) -> GSI • MySQL DataBases (DB) -> GSI, Torino • Alien2 Central Services (CS) -> GSI • PANDA GRID MonaLisa (ML) -> Jülich
Panda GRID @ GSI • Central Services installation status after the May Panda GRID meeting: • Lightweight Directory Access Protocol (LDAP) -> GSI • MySQL DataBases (DB) -> GSI, Torino • AliEn2 Central Services (CS) -> GSI • PANDA GRID MonaLisa (ML) -> Jülich / Torino • Recent changes of AliEn required direct interventions of the CERN people to our MySQL and our machine settings - still working to bring the Panda GRID back
Panda GRID Map ~12 sites ~1400 CPUs SC, LDAP, DB in GSI
Jobs share +------------+--------+ | status | jobs | +------------+--------+ | DONE | 204271 | | DONE_WARN | 4833 | | ERROR_E | 11026 | | ERROR_IB | 1931 | | ERROR_RE | 14766 | | ERROR_SV | 14273 | | ERROR_V | 59 | | EXPIRED | 6338 | | INTERRUPTE | 31 | | OVER_WAITI | 1408 | | SAVED | 338 | +------------+--------+ +------------------------------------------+-------+-------+-------+------+---------+---------+------+--------+-------+ | site | jobs | DONE | ERROR | WAIT | STARTED | RUNNING | SAVE | ZOMBIE | OTHER | +------------------------------------------+-------+-------+-------+------+---------+---------+------+--------+-------+ | | 1573 | 0 | 0 | 0 | 0 | 0 | 0 | 165 | 1408 | | PANDA::Bucharest::panda01 | 31141 | 25978 | 4892 | 0 | 0 | 0 | 0 | 271 | 0 | | PANDA::Dubna::pbs | 9570 | 8212 | 251 | 0 | 0 | 0 | 69 | 1038 | 0 | | PANDA::GSI::lxgrid8 | 88322 | 74471 | 12005 | 0 | 0 | 0 | 0 | 1815 | 31 | | PANDA::Juelich::ce642 | 1382 | 1201 | 169 | 0 | 0 | 0 | 0 | 12 | 0 | | PANDA::KVI::PBS | 36445 | 32052 | 3784 | 0 | 0 | 0 | 242 | 367 | 0 | | PANDA::Mainz::himster | 64449 | 47635 | 14444 | 0 | 0 | 0 | 0 | 2370 | 0 | | PANDA::Torino::CREAM | 9414 | 8502 | 758 | 0 | 0 | 0 | 0 | 154 | 0 | | PANDA::Torino::PBS | 3963 | 2686 | 1276 | 0 | 0 | 0 | 0 | 1 | 0 | | PANDA::Vienna::smigrid02 | 9123 | 8367 | 584 | 0 | 0 | 0 | 27 | 145 | 0 | +------------------------------------------+-------+-------+-------+------+---------+---------+------+--------+-------+ TOTAL NUMBER OF JOBS IN THE LAST 6 MONTH: +--------+ | 259274 | +--------+ Because of the database changes the information about old jobs is accessible only from the MySQL, and is not available from Monalisa. Also, the job counter started from 0 again.
PandaRoot @ GRID Installed: panda_extern: apr08, jul08, jul09, may11, jan12 pandaroot: may11, july11,august11 nov11, stable, trunk (updated every Tuesday with results published in pandaroot cdash)
needed • more GRID users • and we have to regain the users trust after a longer period of only partial functionality • http://panda-wiki.gsi.de/cgi-bin/view/Computing/PandaGridAliEn2ClientInstall • more sites • http://panda-wiki.gsi.de/cgi-bin/view/Computing/PandaGridAliEn2SiteInstall • GRID developers
ALICE & PANDA • The PANDA-ALICE relationship: • we use middleware written by ALICE • we have our own requirements and requests • we are supposed to give back: • allocate dedicated manpower for middleware development and user support • manpower will come also via LSDMA • develop in-house expertise with this middleware, and not only as users • debug and develop AliEn: Oracle Interface, Slurm Interface, PoD interface, VO-VO interface • PANDA uses already AliEn v2-20 and is debugging this for ALICE
Issues • masterjob –printsite does not work • fquota does not work properly for many users • “services” command not working • packman install –everywhere does not work • job triggered installation is not sufficient for PANDA since we compile on site • AliEn installer installation works only with manual fixes (Gnu.so ...) • masterSE replicate
Issues #2 • some sites still do not take jobs • Deletion of files • inter site data transfer/mirror • ROOT API • packages list in ML • activation of backup DB
wish list • JAliEn • To be able to install specific revision number via AliEn installer
conclusion • ALICE/FAIR collaboration also in context of Grid computing works quite well • Still there is room for improvement • PANDA can not be beta tester within its production environment • common testbed maintained by ALICE and PANDA ? • information flow needs to be improved. We can not always be taken by surprise if there is some majore change in the AliEn DB • how to solve all the existing issues ? Currently we put them all in the GSI ticketing system. Who is responsible for what ?