100 likes | 189 Views
LCG Middleware Certification and Support. Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004. Where is the code?. CERN central CVS system autobuild (see next page) EDG/LCG code http://isscvs.cern.ch:8180/cgi-bin/cvsweb.cgi/?cvsroot=lcgware LCG configuration:
E N D
LCG Middleware Certification and Support Maarten Litmaath CERN IT/GD GridPP Workshop 2-4 June 2004
Where is the code? • CERN central CVS system autobuild (see next page) • EDG/LCG code • http://isscvs.cern.ch:8180/cgi-bin/cvsweb.cgi/?cvsroot=lcgware • LCG configuration: • http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi • Everything else is “external” • Simplifies build • Complicates debugging • Need at least the sources • All RPMs under /afs/cern.ch/project/gd/RpmDir • LCG code guidelines adapted from EDG • http://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/index.cgi • Documentation menu Maarten Litmaath, GridPP meeting, 2004/06/03
The Builds • EDG autobuild system has been ported to LCG • http://lxshare0297.cern.ch/LCG/autobuild/ • Allows nightly build of latest compliant CVS tag per package • Build-on-demand tag triggers immediate build • Currently only RH 7.3 supported • Porting to RH Enterprise Linux underway in GD group • WN being tested • Collaboration with CERN OpenLab to port code + build recipes to IA-64 • CE + WN already included in EIS testbed • Other platforms being considered: • Fedora • RH 9 • RH 6.2 • Solaris • IRIX • … Maarten Litmaath, GridPP meeting, 2004/06/03
The Certification • Resulting middleware must be integrated and then certified on all supported platforms • Also verify interoperability of all platforms • Complicates certification exponentially • Goal is production quality: • Stability, robustness, performance, scalability • Easy configuration, operation, maintenance • A lot of effort has been going into debugging • Get feedback from production system (e.g. rollout mailing list) • Send feedback to developers, but apply in-house patches in the meantime • See next talk by David Smith • Current “big” certification testbed shown on next page • Only RH 7.3 for now • Remote sites to be added (again) • Madison (VDT), Taipei, Budapest, … • Simulates multiple realistic configurations • Can test multiple platforms at the same time Maarten Litmaath, GridPP meeting, 2004/06/03
Cluster_1 Cluster_2 Cluster_3 Cluster_4 Cluster_5 h275 UI_1 h276 UI_3 h234 RB_a h240 RB_b Certification & Testing Testbed h239 RB_3 h246 MyProxy h243 BDII_a h281 BDII_b h284 BDII_3 h285 UI_4 h235 CE_a h277 CE_2_a h237 CE_5 Condor lxs5243 CE_6 LSF h290 CE_3_a h286 CE_4 h241 CE_b h236 SE_a h278 SE_2_a h291 SE_3_a h287 SE_4 h244 WN_b1 h270 WN_5_1 lxs5238 h282 SE_c dcache h247 SE_2_b dcache share local /home h229 SE_3_b Castor h245 WN_b2 h296 WN_4_a1 lxs5239 h206 WN_5_2 lxs5240 h289 WN_b3 h294 WN_4_a2 h238 WN_a1 h248 pool dcache No home sharing lxs5241 h300 WN_3_a1 h303 SE_d Castor h271 WN_a2 No home sharing lxs5242 h288 WN_3_a2 h280 WN_2_a1 h279 WN_a3 No home sharing h230 WN_3_a3 h272 WN_2_a2 h273 WN_2_a3 rlscert02 RLS_Oracle No home sharing h274 WN_2_a4 Maarten Litmaath, GridPP meeting, 2004/06/03
The Tests • Feature testing • Workload Management, Data Management, Information System, … • Job distribution with and w/o data constraints, resource saturation, proxy renewal • Data access, replica services • Different architectures/configurations • Try to simulate the production system to some extent • Stress tests • Performance should degrade gracefully, no crashes • Explicit error injection • Study system reaction • Security • One should not be able to bypass it • Experiments integration testing done by GD/EIS on their testbed Maarten Litmaath, GridPP meeting, 2004/06/03
Certification, Testing & Release Cycle CERTIFICATION TESTING EXPERIMENTS INTEGRATION DEPLOYMENT EGEE fix problems new releases Integrate Experiments software installation Basic Functionality Tests LCG C&T section add features fix problems transmit problems Run Certification Matrix Testing experiments specific features RELEASE PRE-DEPLOYMENT GENERAL RELEASE Run Special Tests Certified release tag Release candidate tagged VDT fix problems new releases Maarten Litmaath, GridPP meeting, 2004/06/03
Typical Certification Matrix • Errors reflect ongoing development • Details available through links • An LCG release candidate must not have any serious errors reported by the test suites Maarten Litmaath, GridPP meeting, 2004/06/03
The Tasks • Web page to open bugs and tasks: • https://savannah.cern.ch/projects/lcgoperation/ • Main task: stabilize LCG-2 • Allow serious work to get done efficiently • Minor remaining inconveniences should be tolerable • To be addressed by EGEE/ARDA • Main ingredients • dCache • Porting to RH 7.3 successors • Redo Replica Manager core • Flexible info providers corresponding changes in WP1/WP2 code • Shield CE against overload risk • … Maarten Litmaath, GridPP meeting, 2004/06/03
More Tasks • Try and follow Globus releases (via VDT) • Use the VDT more: • Helps EU-US interoperability • Try more functionality already provided by VDT • Condor as default batch system? • PacMan? • Try and put more into the VDT • Try R-GMA for monitoring • Combine with GridICE • Get rid of MDS completely • LCFGng Quattor • … Maarten Litmaath, GridPP meeting, 2004/06/03