100 likes | 214 Views
GridPP Tier1 Review Fabric. 20 June 2012 Martin Bly – Fabric Team Leader. Rôle. Fabric Team ‘runs the hardware’ Look after and develop fabric of the Tier: Networks and SANS Servers ( storage, CPU, Virtualisation, core ) Configuration Management Tape Robotics Maintenance & Spares.
E N D
GridPP Tier1 ReviewFabric 20 June 2012 Martin Bly – Fabric Team Leader
Rôle • Fabric Team ‘runs the hardware’ • Look after and develop fabric of the Tier: • Networks and SANS • Servers ( storage, CPU, Virtualisation, core ) • Configuration Management • Tape Robotics • Maintenance & Spares Fabric - Tier1 Review, June 2012
Highlights • CVMFS • Solved the problem of serving software directories • MagDB • Integration of hardware information sources • Solving disk issues • Finding fixes for hardware problems enabling continued use of older hardware • Continued rollout of Quattor Fabric - Tier1 Review, June 2012
Configuration Management • Configuration Management system is used to control all aspects of the state of a machine from installation to production and eventually decommissioning • Using Quattor with Puppet for two years • Hybrid system • Quattor for bare metal provisioning and in most cases full configuration control • Puppet for some service configuration files, notably for Castor • Moving towards integrating Puppet functions into Quattor • Some services still raw kickstart: • Oracle database servers running RedHat • Migration to Quattor planned this year • Overall works well Fabric - Tier1 Review, June 2012
Current Tier1 Network • Star configuration • Single Force10 C300 ‘core’ switch with 32 x 10GbE ports • Several Force10/Arista/Fujitsu switches providing access at 10Gb/s to storage servers and switch stacks. • Nortel/Avaya switches in stacks providing 1Gb/s access. • Mostly dual 10Gb/s uplinks, some direct, some via Force10/Arista/Fujitsu switches • Uplinks to ULKR (20Gb/s) and Router A (10Gb/s) • Issues: • Core is single point of failure though a resilient configuration • Spare switch components available • No path to higher bandwidths • Limited expansion capability Fabric - Tier1 Review, June 2012
Tier1 Network - Current Fabric - Tier1 Review, June 2012
Network plan - Requirements • Resilient configuration • Against unit and link/path failures • Future-proof • Higher bandwidth on switch links and in ‘core’ • Room for bandwidth expansion • Higher bandwidth to RAL Site network • Affordable Fabric - Tier1 Review, June 2012
Network Plan - Designs • Core with ‘Star’ topology • Same design as now, simple • Duplicate ‘core’ switch to add resilience • Expensive to upgradeto higher bandwidth links • Higher load on individual elements • Element failures impair bandwidth • Mesh topology • More complex • More but cheaper elements • Upgrade paths cheaper • Lower load on elements • Failures have less impact on overall bandwidth Fabric - Tier1 Review, June 2012
Network Plan - Mesh New design will use a mesh Routing layer between Tier1 and Site Aggregation layer: Force10 Z9000 - 32x40GbE switches 10Gb/s access layer: Force10 S4810 - 48x10GbE + 4x40GbE 1Gb/s access layer: Force10 S60, Avaya 56xx, Arista 7124 Aggregation and 10Gb access layer linked a 4x40Gb/s Access layers linked at 2 or 4x10Gb/s Fabric - Tier1 Review, June 2012
Future Network Topology Fabric - Tier1 Review, June 2012