280 likes | 290 Views
Utilizing circuit switches and shareable backup architecture to overcome network failures and maintain application performance in data center networks. The proposed architecture ensures fast recovery, low cost, and reliable operation.
E N D
Masking Failures from Application Performance in Data Center Networks with Shareable Backup DingmingWu+,Yiting Xia+*, XiaoyeStevenSun+, XinSunny Huang+,SimbarasheDzinamarira+, T. S. Eugene Ng+ +Rice University, *Facebook, Inc.
NetworkFailuresareDisruptive • Median case of failures: 10% less traffic delivered • Worst 20% of failures: 40% less traffic delivered Gill et al. SIGCOMM 2011
Today’sFailureHandling---Rerouting • Fast local rerouting inflatedpathlength • Global optimal rerouting highlatencyofroutesupdates • Impact flows not traveling trough the failure location
Impact on Coflow Completion Time (CCT) • Facebookcoflowtrace • k=16Fat-treenetwork • Globaloptimal rerouting
DoWeHaveOther Options? • Restores network capacity immediately after failure • Be cost efficient • --Small pool of backup switch • How do we achieve that?
Circuit Switches • Physicallayerdevice • Circuitcontrolledbysoftware C A • Examples • --optical 2D-MEMS switch, 40us, $10 per-port cost • --electrical cross-point switch, 70ns, $3 per-port cost B D
IdealArchitecture Circuit Switch … … … BackupSwitch Servers Regularswitches • Entirenetworksharesonebackupswitch • Unreasonablehighport-countofcircuitswitch • Replaceanyfailedswitchwhennecessary • Singlepointoffailure
How to Make It Practical • Feasibility • -small port-count circuit switches • Scalability • -partition network into failure groups • -distribute circuit switches across the network • Low cost • -small backup pool • -share backup switches per failure groups
ShareBackupArchitecture AnoriginalFat-treewith k=6 • Partitiontheswitchesintofailuregroups;eachwithk/2switches. Corelayer • Addbackupswitchesperfailuregroups Agg.layer Edgelayer
EdgeLayer Edge switches Backup Switch 0 1 2 Circuit switches 1 0 2 0 2 1 Servers i
AggregationLayer Backup switch Agg. switches 0 1 2 ? 1 0 2 1 2 Circuit switches 0 1 2 0 1 2 0 ? Edge switches Backup switch 0 1 2
Core Layer Core switches 0 3 6 1 4 7 2 5 8 Circuit switches Aggregation switches Backup switch 0 1 2 0 1 2 0 1 2
Recover First, Diagnose Later • FailureRecovery • --switchfailurereplacedbybackupsviacircuitreconfiguration • --linkfailureswitchesonbothsidearereplaced • Automatic failure diagnosis performed offline • -details in the paper
Live Impersonation of Failed Switch Backup switch Edge switches 0 1 2 Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 Servers
Live Impersonation of Failed Switch Backup switch Edge switches 0 1 2 Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 0 Servers
Live Impersonation of Failed Switch Edge switches Backup switch 0 1 2 Routing Table of Every Edge Switch Routing Table 0 VLAN 0 Routing Table 1 VLAN 1 Routing Table 2 VLAN 2 0 Servers
What does control system do? • Collects keep-alive messages & link status reports from switches • Reconfigures circuit switches under failures • Performs offline failure diagnosis • Implications • -needs to talk to many circuit switches and packet switches • -keeps a large amount of states of circuit/switch/link status
DistributedControl System • Onecontrollerforafailuregroupofk/2switches --configuresthecircuitswitchesadjacent toswitchesinthegroup • Maintainsonlylocalcircuitconfigurationsinitsgroup • --doesnotsharestateswithothercontrollers • Talkstocircuitswitchesusinganout-of-bandcontrolnetwork
Summary • FastFailureRecovery • --asfastastheunderlyingcircuitswitchingtechnology • LiveImpersonation • --Traffic is redirected to the backups in physical layer • --Switchesinafailuregrouphavesameroutingtables,useVLANidfordifferentiation • --Regular switches recovered from failures become backup switchesthemselves Fastfailurerecovery,nopathdilation,noroutingdisturbance
Evaluation • Bandwidth Advantage • --Iperf throughput on testbed • Application performance • --MapReduce job completion time
Bandwidth Advantage • 4racks,8 servers,12switches • 8 iPerf flows saturate the network core ShareBackup restores network to full capacity regardlessoffailurelocations
Application Performance 1.2X MapReduce Sort w/ 100GB input data 4.2X ShareBackup preservesapplicationperformanceunderfailures!
ExtraCost • Smallport-countcircuitswitches---veryinexpensive • --e.g.$3per-portcostforcross-pointswitches • Smallbackupswitchpool • --1backupperfailuregroupisusuallyenough • --k = 48 fat-tree with 27648 servers ~6.7%extranetworkcost • Partialdeployment • --failuresmoredestructiveatedgelayer • --employbackuponlyforToRfailures
Conclusion • ShareBackup:anarchitecturalsolutionforfailurerecoveryinDCNs • --usescircuitswitchingforfastfailover • --is aneconomicalapproachofusingbackupsinnetworks • --preservesapplicationperformanceunderfailures • Keytakeaways: • --reroutingisnotthe only approach forfailurerecovery • --fast,transparentfailurerecoveryispossiblethroughcarefulbackupplacements&fastcircuitswitching
Backup---ControlSystemFailures • Circuitswitchsoftwarefailure/controlchannelfailure • --circuitswitchesbecomeunresponsive • --keepexistingcircuitconfigurations,dataplaneisnotimpacted • --fallbacktorerouting • Hardware/powerfailure • --controllerwillreceivelotsfailurereportsinashorttime • --callforhumanintervention • Controllerfailure • --state replication on shadow controllers
Backup---Offline Failure Diagnosis 0 0 0 Aggregation switch ? ? • Recycle healthy switch - Only one switch has failed - Back to normal after reboot • Chain up circuit switches using side ports Circuit switches 0 0 0 ? ? Edge switches 17
Backup---Offline Failure Diagnosis 0 0 0 Aggregation switch Circuit switches 0 0 0 Edge switches 18