220 likes | 232 Views
DM/SMD section meeting, 18/1/08. German Cancio. Outline. Your new SL The section: Mandate, people, activities, MARS Assignments of people Planning and tracking Discussion Note: regular section Meetings: Bi-weekly, Fridays @ 14:00 in 513-1-027, starting next week. $ finger gcancio.
E N D
DM/SMDsection meeting, 18/1/08 German Cancio
Outline • Your new SL • The section: Mandate, people, activities, MARS • Assignments of people • Planning and tracking • Discussion • Note: regular section Meetings: • Bi-weekly, Fridays @ 14:00 in 513-1-027, starting next week
$ finger gcancio • spanish/swiss, time_t gcancio.tcreate() ==16934400, grew up in Spain->Switzerland->Spain • Computer Science studies in Madrid, 1 semester in Chambery • Focus on Artificial Intelligence, UNIX • 1995-6: Undergraduate fellowship @ UPM Madrid • Work on intelligent client/server information delivery middleware and object-oriented rule-based expert systems • 1997-9: tech student, then fellow @ CERN/IT • MSc on automated software build + distribution/installation system development (ASIS), service management (~4K nodes) • LINUX kernel module programming; AIMS installation server • 2000-present: staff member @ CERN/IT • 2000-2003: EU DataGrid project: Architect, then deputy WP Manager for Fabric Management; participation to the overall DataGrid architecture. Tools: LCFG, Quattor, Lemon • 2004-2005: Deployment of Quattor/Lemon at CERN-CC, replacing legacy solutions (SUE/ASIS). Deployment of Quattor outside CERN (now ~25 sites) • 2006-2007: Section Leader IT/FIO/FD, looking after CASTOR + Fabric Management developments (ELFms: Quattor+Lemon+LEAF), Remedy, SLS, SDB
The section: Mandate • No formal mandate yet, but cf. slides from last Group Meeting: • “Enable synergies between the merged development teams” • “Strong focus on ongoing activities: maintenance of Grid software (DPM, LFC, FTS and lcg_utils) and Castor2 as the maintained production environment.” • “Data Management project [led by Jean-Philippe] is attached to the SMD section and will focus in research and development activities in strategic directions”
The section: People • 10 LD staff, 1 IC • 4 fellows (-1 in 03/08) • 1 technical student, 1 visitor • Manpower retention and replacement (contract/people) will be a challenge
The section: Activities 4 main areas: • CASTOR(2) • first LHC production environment! • Address user + operational problems (at CERN and T1’s using CASTOR) with top priority • Prioritise RFE’s and complete agreed developments • Grid Data Management tools • DPM/LFC, FTS, lcg_utils/GFAL • Address user/operational problems and RFE’s • Fulfill EGEE commitments (e.g. biomed requirements, management/accounting overhead) • Build/test environments • For all DM group developments; investigate and maintain • Data Management R&D project • Preparing for the medium/long term future • 2008 investigation areas: Tape interface, remote access protocols, file systems (local/shared) • Activities and plans to be defined by DM PL (Jean-Philippe)
Activities: CASTOR2 • Coordination with operations team at CERN and T1’s • monthly deployment meeting; daily morning meeting; bi-weekly external operations phonecon (bi-weekly); bi-yearly f2f workshop • Coordination with users • Stabilize new 2.1.6 version, increase maturity for CCRC08 phase II (2.1.7?) • Synergies with DPM • Complete common RFIO project (Q1 08); RFIO extensions (e.g. complete DIRECT_IO) • Patch exchange with LFC / DPMns (Q1 08) • Pending developments • Security – complete strong authentication; VOMS authorization incl. ‘legacy’ mode (Q2 08); study CUPV replacement • Repack2 - evaluation beginning of March. Understand what problems we have and where they originate • VDQM2 re-engineering and extensions - evaluation beginning of April • Complete xrootd plugin developments following ALICE requirements • Closely follow SRMv2 developments at RAL • Taking over unmaintained components and apply agreed bugfixes/RFE’s • VMGR, rtcpd/rtcpclientd (Q3 08) • Other (possible) developments: • Rewrite tape daemon (TBD, priority to be understood) • Oracle Advanced Queueing (message queuing) • 3rd-level support + rota • Client porting to other platforms • MacOSX, priority unclear
Activities: Grid DM tools • DPM • Common RFIO • Developments: SRMCopy; complete DPM->DICOM stager IF (Q1 08); (Pool-based) quotas (Q2 08); bandwith/stream or user limitations • Improved admin tools (e.g. dpmns/disk server consistency) • DPM performance analysis • LFC • Data encryption (Q1 08) • Patch exchange with Castor ns (Q1 08) • FTS • Developments: Adding space tokens (Q2 08); GridFTP/SRM separation (Q1 08); static clouds+non-shared star channels (Q1 08) • Maintenance/support of web services part (was Gavin) • Cross-site deployment (e.g. schema upgrades) • Documentation/FAQ’s for site admins and users (and for 2nd level support) • Lcg_utils + GFAL • Thread-safe library • Misc developments as requested by LCG/GSSD • All: Support • 1st/2nd level support (was Sophie/Gavin - to be outsourced to GS?) • 3rd level support (via
Activities: Build/test environments • Build/test • Maintain current ETICS based build system (for Grid DM tools) • Maintain current build/test system for CASTOR • Investigate common solutions for the whole group (not only the section)
The section: MARS MARS timelines: Interviews to be completed March 14th Written appraisals to be signed by GL’s by April 15th All staff must be interviewed, including those with expiring contracts (“abbreviated MARS”) Interviews: Staff coming from FIO: 2007 results and 2008 objectives with me Staff coming from GD: 2007 results with your previous supervisor, 2008 objectives with me Proposal for dates to be circulated next week. Fellows: Appraisal interview after 6 and 18 months of contract Personally, prefer every 12 months, just after completing the staff MARS cycle
The section: Allocations of people • Disclaimer: Allocations are provisional and can be discussed over the next days. They may later require adjustements and/or changes, in particular after Q2’08 • Attempting to spread task assignments between ex-FIO and ex-GD team members, share CASTOR / Grid DM tools / DM R&D activities • sharing will gradually increase over time • Matrix structure: SL to work closely with DM project leader to ensure right allocations to activities with appropriate priority
People: staff • Akos: • Castor2 security (finishing/deploying strong authentication, VOMS integration), study CUPV replacement options • Build/test: ETICS; strategy in DM group – task coordinator • EGEE-II JRA1 representative • FTS + LFC web services support • DM R&D: filesystems, remote access protocols • Rosa: • Castor2 security (with Akos) • Castor2 3rd-level support • Lcg_utils + GFAL (TBD) • DM R&D (TBD) • Robert: • LFC/DPM testing – SAM interfacing
People: staff (II) • Giuseppe (staff from 3/08 on) • Castor2 support/maintenance and core framework • SRMv2 maintenance – RAL backup • DM R&D: remote access protocols, Oracle adv queueing (TBD) • Steven: • VDQM2 re-engineering, productization, prioritisation • “intermediate layer”: VMGR, rtcpd/rtcpclientd maintenance • DM R&D: TBD • Krzysztof: • DPM->DICOM stager • CSEC extensions (with Lana) • FTS maintenance / extensions
People: Staff (III) • Andreas: • XROOTD (both DPM and CASTOR) • DPM bugfixes/support (http(s)) • DPM (and CASTOR?) performance testing • DM R&D: Client access protocols, remote fs (e.g. LUSTRE) – task coordinator • Sebastien: • CASTOR2 support/maintenance/development coordination • Grid Data Management responsibilities, starting with FTS (~Q3 08) • David: • Common RFIO merge, RFIO maintenance (both DPM/CASTOR – e.g. DIRECT_IO for XFS) – task coordinator • Patch exchange with CastorNS • DPM development + support • DM R&D: Client access protocols
People: Staff (IV) • Giulia: • Repack2 • RFIO common (with David) • Castor2 support/maintenance • Build/test frameworks (with Akos) • Dennis: • Castor2 support/maintenance • Patch exchange with LFC / DPMns • DM R&D: Tape (taped, TBC), tape/robotic infrastructure
People: Fellows, tech student • Lana: • LFC (support, data encryption) • DPM support, performance testing (with Andreas) • DM R&D test/prototyping e.g. filesystems • Chang: • DM testing • Marisa: • CASTOR2 – until end 02/08 • Remi: • GFAL + lcg_util • CSEC maintenance • DPM client tools maintenance • VOMS service manager (EGEE-II) • Grid security (TBC) • DM R&D • Paolo: • FTS development • FTS support • Build/test frameworks (TBD)
Plans, support, tracking • Bugs/RFE’s, TODO’s, work plans and priorities: • Currently spread across different activities, following different formats, using different tools, discussed in different forums.. Room for streamlining? • Twiki, Savannah/CASTOR, Savannah/DM, EGEE… • Sharing support activities • Each developer should be at least on two support lists • Activity tracking: • Light-weight but regular activity tracking, allowing to understand who has been working on what (development, deployment/support, meetings, etc) and compare with planning • cf next slide – data collected during the Castor review 2006
R 16: Credible WBS (III) • Examples: • Senior developer (Giuseppe)
Allocation: 100% Castor means in reality ~ 95% Planned tasks: Only 23% of Castor-dev time (21% total time) Not in plan: 73% (68%) R 16: Credible WBS (III) • Examples: • Senior developer (Giuseppe)
Not planned tasks: Most of them of ongoing nature (eg support, releases) development+deployment vs. support: 45% of Castor-dev time spent on support activities R 16: Credible WBS (III) • Examples: • Senior developer (Giuseppe)
Allocation: 88% (training) Planned tasks: 43% of Castor-dev time Not planned : 53%. Out of which 45% development activities (mostly unforeseen) and 8% support development+deployment vs. support: Only 15% goes to support R 16: Credible WBS (IV) • Examples: • Junior developer (Giulia)