50 likes | 179 Views
A first look on the gLite RB (and more) Stefano Bagnasco I.N.F.N. Torino. ALICE Software Week – CERN June 1, 2005. www.eu-egee.org. EGEE is a project funded by the European Union under contract IST-2003-508833. Catalog. IN F N GRID Prod BDII. LFC Clients. The test setup.
E N D
A first look on the gLite RB (and more)Stefano BagnascoI.N.F.N. Torino ALICE Software Week – CERN June 1, 2005 www.eu-egee.org EGEE is a project funded by the European Union under contract IST-2003-508833
Catalog INFNGRIDProd BDII LFC Clients The test setup • Set up of a test gLite RB & CE in Torino • test job submission, interaction with the ALICE file catalogue and, gradually, other pieces of the framework • Thanks to R. Brunetti, F. Nebiolo • Tests of storage and data management components in Bari • dCache+SRM, FTS, DPM (coming soon) • To be integrated with the Torino setup to build a full testbed • Thanks to G. Donvito, N. Fioretti, F. Minafra grid007.to.infn.it gLite 1.1 RB+LB gLite 1.1 UI Collaboration Meeting ALICE-Italia – Cagliari May 4, 2005- 2
First results • Jobs sent to gLite RB (grid007.to.infn.it): 1000, to LCG 2.4.0 on INFNGRID CEs: • Not yet completed: 37 • Completed: 661 (68%) • Aborted: 72 (8%) • Error: 230 (24%) • AliROOT crash: 28 • NFS crash: 143 • WN disk space < 4GB: 55 • Other (not investigated) 4 • Funny RB problem: 100 (1 bunch) job destination lost Collaboration Meeting ALICE-Italia – Cagliari May 4, 2005- 3
Some comments • Problems & issues: • The gLite UI command does not interact correctly with the VOMS. • known problem, fixed but the fix did not get through to the release! (not even 1.1) • Submission to the gLite RB fails with certificates mapped to a SGM (software manager) account (Savannah bug #8616, fixed on friday) • Some problems with the RB (e.g. missing location from status report), being investigated • Not all the “usual” C libraries on the WN – had to ship libg2c.so with AliRoot • DGAS (accounting system) is using these jobs to debug its first deployment • The infrastructure (just after the upgrade to 2.4.0) showed the same toothing problems of last year, e.g.: • Hanging NFSs make software area inaccessible (this is a nasty one – remember the “Black Hole Effect”!) • Communication problems between WN and RB • Problems with environment configuration on WNs • The support responsiveness definitely improved • Problem generally solved within an hour of submitting the ticket Collaboration Meeting ALICE-Italia – Cagliari May 4, 2005- 4
Next steps: the interface • Registering to Alien Catalogue without AliEn • Will probably not be needed at all… • Multi-thread submission (either direct or from AliEn Task Queue) • Efficient use of dedicated RBs • Testing the gLite RB ability to query the AliEn Data Catalogue • The main “new” feature in the gLite RB • Accessing files in a “standard” gLite SE (through DPM/SRM/AliEnSE/xrootd/FiReMan/whatever) • This should be much easier than last year Collaboration Meeting ALICE-Italia – Cagliari May 4, 2005- 5