140 likes | 159 Views
LCG2 development planning. Zdenek Sekera/IT-GD-CT. Outline (1). Release process certification testbed, activities monthly releases, procedures “Aug” release, content, destination, release note Different O/S support HW & SW setup RH7.3 SLC3 for IA32 & IA64 others “September” release
E N D
LCG2 development planning Zdenek Sekera/IT-GD-CT zdenek.sekera@cern.ch
Outline (1) • Release process • certification testbed, activities • monthly releases, procedures • “Aug” release, content, destination, release note • Different O/S support • HW & SW setup • RH7.3 • SLC3 for IA32 & IA64 • others • “September” release • RH7.3 fixes, features • Special software: dCache, Tank&Spark, accounting • New: SLC3 release, installation manual, full interoperability with RH7.3 Zdenek.Sekera@cern.ch
Outline (2) • Data Management • File Catalog • DPM (Disk Pool Management) • LCG and EGEE • migration from LCG2 to EGEE ? • preproduction testbed Zdenek.Sekera@cern.ch
Release process – C&T testbed • RH7.3 C&T is constantly evolving to reflect our activities • new BDIIs • dCache • added SLC3 cluster of WNs • We have a mini-SLC3 testbed running • this will evolve into a fully featured SLC3 TB, waiting for machines from FIO • When appropriate we will connect both RH7.3 and SLC3 TBs to certify the full interoperability Zdenek.Sekera@cern.ch
Cluster_1 Cluster_2 Cluster_3 Cluster_4 Cluster_5 1523 UI_1 1751 UI lxs5243 CE_6 LSF lxs5243 CE_6 LSF Certification & Testing Testbed 1911 RB_a 1913 RB_b 1915 RB_3 1757 MyProxy 1912 BDII_a 1914 BDII_b 1916 BDII_3 lxs5238 lxs5238 1543 CE_5 Condor lxs5239 lxs5239 1524 UI_4 lxs5240 lxs5240 1538 PlainGris lxs5241 lxs5241 1758 CE_3_a 1544 WN_5_1 lxs5242 lxs5242 1766 CE_a 0738 CE_2_a 1759 SE_3_a 1540 CE_4 1905 CE_b 1767 SE_a 0739 SE_2_a 1760 SE_3_b Castor 1541 SE_4 1906 WN_b1 0732 UI_7 1753 SE_c dcache 0740 SE_2_b dcache 1907 WN_b2 1542 WN_4_1 1761 WN_3_a1 lxb0731 CE_7 1908 WN_b3 1742 WN_4_1 1754 WN_a1 1752 pool dcache 1762 WN_3_a2 1743 WN_4_1 1909 SE_d Castor 733 RB + BDII 733 RB + BDII 1755 WN_a2 1763 WN_3_a3 0741 WN_2_a1 1764 WN_3_a4 734 SE 0742 WN_2_a2 1765 WN_3_a5 1539 MON_a 730 WN 0743 WN_2_a3 rlscert02 RLS_oracle 0744 WN_2_a4 Cluster_6 NO home sharing NO home sharing Cluster_7 NO home sharing NO home sharing sl3 wn NO home sharing SL3 clust Zdenek.Sekera@cern.ch
Certification, Testing and Release Cycle Certification testbed Deployment EGEE fix problems new releases Run Certification Matrix Integrate yes errors? yes errors? no LCG C&T section add features fix problems transmit problems RELEASE PRE-DEPLOYMENT Run C&T test suites site test suites no GENERAL RELEASE Basic Functionality Tests errors? certified release tagged yes no fix problems yes errors? no VDT fix problems new releases Release Candidate tagged EXPERIMENTS INTEGRATION TESTBED fix problems candidate not acceptable deployment feedback Zdenek.Sekera@cern.ch
Release process – monthly releases • We have taken decision (about 5 months ago) to release monthly • smaller, more maintainable increments • hopefully more predictable • easier to manage • CT release note • We (CT section) are releasing for the GD-GIS (Grid Infrastructure), not really for the public • GIS releases to the public • last verifications of the release, independent from CT • adding some wrappers if needed • GIS may skip a CT release if judged “too internal” without any visible benefit to sites. This may happen when most of the changes in the release were internal, to prepare perhaps for a bigger change in the future. • GIS releases for the public • decides the version number etc Zdenek.Sekera@cern.ch
Release process – “Aug” release • CT distributes an email describing a overview of the release and attaches to it Release Note that describes all changes in details (example of both from the August release are attached to the agenda of this meeting) • internally to GD group • to LCG management • to FIO Zdenek.Sekera@cern.ch
Different O/S support • We have asked at EGEE meeting in Cork what would be the “other” O/S’es sites would be interested in • no input from CICs nor ROCs • We are closely collaborating with irish group porting LCG2 to IRIX, AIX, perhaps other Unix’es • Internally we have taken a decision to go ahead with SLC3 port because it is used by big labs (CERN, FNAL, …) • We have added SLC3 WNs to the RH7.3 TB to test the basic interoperability • We have installed a full SLC3 mini-TB, it works, lots of rough edges in particular during the installation • Waiting for number of machines from the FIO to complete the SLC3 testbed (~40 machines) for full certification • When SLC3 is certified, we’ll connect it to the RH7.3 and certify the full interoperability Zdenek.Sekera@cern.ch
Different O/S support – “other” • support SLC3 in two flavors • IA32, first • IA64, asap, IA64 port has is mostly beeing done by OpenLAB, we’ll integrate their changes in the CVS tree, build machine is almost ready, when SLC3 IA64 port is certified, the IA64 TB will be connected to the RH7.3 + SLC3 IA32 for certification. • port to other UNIX’es (IRIX, AIX, ??) is being done by Irish group, when they are ready we’ll investigate how to certify the port and interoperability with other ports in due time (remember, we do not have the necessary HW). • we would be very interested what other ports are needed, waiting for some summary of requirements from CICs and ROCs • do we need Fedora Core? • others? • who can tell us what is needed? Zdenek.Sekera@cern.ch
“Sep” release • After every release we’ll get together and review all new and outstanding bugs and requirements and broadly define the expected contents of the next release • For the September release: • full support of RH7.3 will continue, bug fixes • we’ll try again to integrate dCache, number of reported problems have apparently been fixed, need to verify this • certify accounting package we received from GOC • certify the Tank&Spark expts software installation tool • site GIIS replaced by BDII (already happening) • moving to Torque and Maui scheduler replacing PBS • full SLC3 support Zdenek.Sekera@cern.ch
After “Sep” releases • In the pipeline for later • new info provider (generic info provider with caching to avoid site occasionally dropping out of BDII) • should find a solution to avoid BDII restarting every 2 mins • WP1 still chasing some bugs (e.g. error 155) • WP1 BDII “travelling” with the job (through env variable), this will solve some RM problems, needs GFAL cooperation • DPM (disk pool manager), SRM 1.1 + 2.1 support • File Catalogs, many features requested by experiments, big performance improvements (see CHEP talk by J.-P.Baud and J.Casey) • lcgutils – improve error messages, introduce more retries and timeouts Zdenek.Sekera@cern.ch
Migration to EGEE software • Discussing possible scenarios • replacing modules with EGEE versions whenever feasible • new RB supposed to support both push & pull models, could we run the old (push) in parallel with new? Need new CE, … • running both softwares (modules) in parallel • or do we need to separate both on different testbeds and merge them together later ? • installing the EGEE pre-production testbed • to test EGEE software • to understand the needs for migration • later to run pre-release versions of EGEE software before installing it in production environment Zdenek.Sekera@cern.ch
EGEE Certification, Testing and Release Cycle JRA1 SA1 CERTIFICATION TESTING EXPTS INTEGR DEPLOY SERVICES Integrate LHC EXPTS Basic Functionality Tests MEDICAL DEVELOPMENT & INTEGRATION UNIT & FUNCTIONAL TESTING Run Certification Matrix OTHER TBD RELEASE PRE-PRODUCTION DEPLOYMENT PREPARATION PRODUCTION Run tests C&T suites Site suites APPS SW Installation Release candidate tag Certified release tag Deployment release tag Production tag Dev Tag Zdenek.Sekera@cern.ch