350 likes | 503 Views
FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN .ch. Headline Services. Physics Services P=4FTE; M=800K; I=1,300K Computing Hardware Supply P=6.35FTE; M=15K; I= 200K (funded by sales uplift) Computer Centre Operations P=3.7FTE; M=100K; I= 635K Printing
E N D
FIO Services and ProjectsPost-C5February 22nd 2002Tony.Cass@CERN.ch
Headline Services • Physics Services • P=4FTE; M=800K; I=1,300K • Computing Hardware Supply • P=6.35FTE; M=15K; I=200K (funded by sales uplift) • Computer Centre Operations • P=3.7FTE; M=100K; I= 635K • Printing • P=3.5FTE; M=20K; I=175K • Remedy Support • P=1.65FTE; M=215K; I=0 • Mac Support • P=1.25FTE; M=5K; I=50K
Projects • WP4 • Contribution to WP4 of EU DataGrid • P=0.5FTE • LCG – Implementation of WP4 tools • Active progress towards day-to-day management of large farms. • P=LCG allocation (2FTE?) • Computer Centre Supervision (PVSS) • Test PVSS for CC monitoring. Prototypes in Q1, Q2 and Q4. • P=2.7FTE, M=25K • B513 Refurbishment • Adapt B513 for 2006 needs. Remodel vault in 2002. • P=0.3FTE, M=1,700K
Macintosh Support • Support for MacOS and applications, plus backup services. • We only support MacOS 9, but this is out of date. No formal MacOS X support, but software is downloaded centrally for general efficiency. • Staffing level for Macintosh support is declining; now at 1.25FTE plus 50% service contract. • Plus 0.25FTE for CARA—used by some PC people, not just Mac users. • Key work area in 2002 is removal of AppleTalk access to printers. • Migrate users to lpr client (already used by Frank and Ludwig). • Streamlines general printing service—and is another move towards an IP only network. (LocalTalk removed last year.)
Printing Support • Overall service responsibility is with FIO, but clearly much valuable assistance from • PS for OS and Software support for central servers • IS for Print Wizard • General aim is to have happy and appreciative users. • Install printers, maintain replace toner as necessary, … • Seems to be working: spontaneous outburst of public appreciation during January’s Desktop Forum. • Promote and support projector installation in order to reduce (expensive) colour printing. • Working (if slowly) to improve remote monitoring of printers—enable pre-emptive action. • Or (say it softly) a “Managed Print Service”
Computing Hardware Supply • Aim to supply standardised hardware (desktop PCs, portables, printers, Macs) to users rapidly and efficiently. • Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through • use of a standard stock management application, • end-user purchases are by Material Request not TID, • streamlined ordering procedures. • Could we ever move to a “Managed Desktop” service rather than shifting boxes? • Idea is appreciated outside IT but needs capital. • Service relies on the Desktop Support Contract… • Service also handles CPU and Disk Servers.
Remedy Support • “Remedy” was introduced to meet the needs of the Desktop Support and Serco contracts for workflow management—problem and activity tracking. • FIO supports two Remedy Applications • PRMS for general problem tracking (the “3 level” model for support) • Used for Desktop Contract (including the helpdesk) and within IT • ITCM tracks direct CERNContractor activity requests for Serco and Network Management Contracts. • Do we need two different applications? Yes and No. • Two distinct needs, but could be merged. • However, this isn’t a priority and effort is scarce. • And don’t even ask about consolidated Remedy support across CERN!
PRMS and ITCM Developments • PRMS • Continuing focus over past couple of years has been to consolidate the basic service—integrate the many little changes that have been made to meet punctual needs. • Outstanding requests for additional functionality include • An improved “SLA Trigger mechanism”—defining how and when (and to whom) to raise alarms if tickets are left untreated too long. • A “service logbook” to track interventions on a single system • Various small items including a Palm interface • ITCM • No firm developments planned, but many suggestions are floating around. • Overall, we need to migrate to Remedy 5… • … and available effort is limited.
Computing Hardware Supply • Aim to supply standardised hardware (desktop PCs, portables, printers, Macs) to users rapidly and efficiently. • Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through • use of standard stock management application, • end-user purchases are by Material Request not TID, • streamlined ordering procedures. • Could we ever move to a “Managed Desktop” service rather than moving boxes? • Idea is appreciated outside IT but needs capital. • Service relies on the Desktop Support Contract. • Service also handles CPU and Disk Servers…
Physics Services • Last year’s reorganisation split “PDP” Services across • ADC: services to “push the envelope” • PS: Solaris and engineering services • FIO: Everything else.
Physics Services • Last year’s reorganisation split “PDP” Services across • ADC: services to “push the envelope” • PS: Solaris and engineering services • FIO: Everything else. • So, what is “Everything else”? • lxplus: main interactive service • lxbatch: ditto for batch • lxshare: time shared lxbatch extension • RISC remnants—mainly for LEP experiments • Much general support infrastructure • First line interface for physics support
Physics Service Concerns • RISC Reduction
Physics Service Concerns • RISC Reduction • RISC Reduction
Physics Service Concerns • RISC Reduction • RISC Reduction • RISC Reduction
Physics Service Concerns • RISC Reduction • RISC Reduction • RISC Reduction • Managing Large Linux Clusters • Fabric Management
Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management
Fabric Management Concerns • Software Installation — OS and Applications • We need rapid and rock-solid system and application installation tools. • Development discussions are part of EDG/WP4 to which we contribute. • Full scale testing and deployment as part of LCG project. • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management
Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • The Division is now committed to testing PVSS as a monitoring and control framework for the computer centre. • Overall architecture remains as decided within PEM and WP4 • New “Computer Centre Supervision” Project has 3 key milestones for 2002 • “Brainless” rework of PEM monitoring with PVSS • 900 systems now being monitored. Post-C5 presentation in March/April. • Intelligent rework for Q2 then wider system for Q4 • Configuration Management • Logistics • State Management
Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • How do systems know what they should install? • How does the monitoring system know what a system should be running? • An overall configuration database is required. • Logistics • State Management
Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • How do we keep track of 20,000+ objects? • We can’t manage 5,000 objects today. • Where are they all? (Feb 9th: Some systems couldn’t be found) • Which are in production? New? Obsolete? • And which are temporarily out of service? • How do physical and logical arrangements relate? • Where is this service located? • What happens if this normabarre/PDU fails? • State Management
Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management • What needs to be done to move this box • from receptionto a final locationto be part of a given service? • What procedures should be followed if a box fails (after automatic recovery actions, naturally!) • This is workflow management • that should integrate with overall workflow management.
Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management • Work on these items is the FIO contribution to the Fabric Management part of the LHC Computing Grid Project. • Detailed activities and priorities will be set by LCG • They are providing the additional manpower! • Planning document being prepared now based on input from FIO and ADC.
… And where do the clusters go? • Estimated Space and Power Requirements for LHC Computing • 2,500m2 — increase of ~1,000m2 • 2MW — nominal increase of 800kW (1.2MW above current load) • Conversion of Tape Vault to Machine Room area agreed at post-C5 in June 2001. • Best option for space provision • Initial cost estimate of 1,300-1,400KCHF
Vault Conversion • We are converting the tape vault to a Machine Room area of ~1,200m2 with • False floor, finished height of 70cm • 6 “In room” air conditioning units. • Total cooling capacity: 500kW • 5 130kW electrical cabinets • Double power input • 5 or 6 20kW normabarres/PDU • 3-4 racks of 44PCs/normabarre • 2 130kW cabinets supplying “critical equipment area” • Critical equipment can be connected to each PDU • Two zones, one for network equipment, one for other critical services. • Smoke detection, but no fire extinction
The Next Steps • Create a new Substation for B513 • To power 2MW of computing equipment plus air-conditioning and ancillary loads. • Included in the site-wide 18kV loop—more redundancy. • Favoured location: • Underground, but5 transformerson top. • Refurbish the main Computer Room once enough equipment has moved to the vault.
Summary • Six Services • Physics Services Remedy Support • Computing Hardware Supply Printing • Computer Centre Operations Macintosh Support • Service Developments to • Follow natural developments • Remedy 5, LSF 4.2, RedHat 7.2 • Streamline provision of existing services • To reduce “P”—c.f. BAAN for Hardware Supply • To manage more—c.f. developments for Physics Services • Four Projects • Computer Centre Supervision B513 Refurbishment • Fabric Management Development (EDG) • Fabric Management Implementation (LCG)