Data Center Trends and Challenges: A Comprehensive Overview

Data Center (and Network)Lecture 4 Seungwon shin CNS group, ee, kaist

Data Center A data center is a facility used to house computer systems and associated components, such as telecommunications and storage systems. - from wikipedia

Why do we need it? • How to manage data • now, we have huge amount of data • How to compute something • now, we have really complicated applications • Can we handle these issues by ourselves?

Data Center Trend

Cloud Computing

Cloud Model

Application Architecture

Top of Rack (ToR) ToR switch

ToR More….

Data Center - Network Architecture

Data Center Switch Products - Cisco Core - Nexus 7000 100G Access or Aggregation - Nexus 6000 40G ToR - Nexus 3000 10G

Some Interesting Data Centers

Data Center Cost • Servers: 45% • CPU, memory, disk • Infrastructure: 25% • UPS, cooling, power distribution • Power draw: 15% • Electrical utility costs • Network: 15% • Switches, links, transit

Data Center Challenges • Traffic load balance • Support for VM migration • Achieving bisection bandwidth • Power savings / Cooling • Network management (provisioning) • Security (dealing with multiple tenants)

Problems • Single point of failure • Over subscription of links higher up in the topology • Tradeoff between cost and provisioning

Data Center Architecture

Background • Current Datacenter architecture • recommended by CISCO

Background • Data center design requirements • Data centers typically run two types of applications • outward facing (e.g., serving web pages to users) • internal computations (e.g., MapReduce for web indexing) • Workloads often unpredictable: • Multiple services run concurrently within a DC • Demand for new services may spike unexpected • Spike of demands for new services mean success! • But this is when success spells trouble (if not prepared)! • Failures of servers are the norm • Recall that GFS, MapReduce, etc., resort to dynamic re-assignment of chunkservers, jobs/tasks (worker servers) to deal with failures; data is often replicated across racks, … • “Traffic matrix” between servers are constantly changing

Background • Data center cost • Total cost varies • upwards of $1/4 B for mega data center • server costs dominate • network costs significant • Long provisioning timescales: • new servers purchased quarterly at best

Background • Networking issues in Data center • Uniform high capacity • Capacity between servers limited only by their NICs • No need to consider topology when adding servers • In other words, high capacity between two any servers no matter which racks they are located ! • Performance isolation • Traffic of one service should be unaffected by others • Ease of management: “Plug-&-Play” (layer-2 semantics) • Flat addressing, so any server can have any IP address • Server configuration is the same as in a LAN • Legacy applications depending on broadcast must work

A scalable, commodity data center network architecture UCSD SIGCOMM 2008 some slides from Prof. Amin Vadhat and Prof. Zhi-Li Zhang

Problem Domain • Single point of failure • Core routers are bottleneck • Require high-end routers • High-end routers are very expensive • Switching hardware cost to interconnect 20,000 hosts with full bandwidth • $7,000 for each 48-port GigE switch at the edge • $700,000 for 128-port 10 GigE switches in the aggregation and core layers. • approximately $37M.

Main Goal • Main Goal: addressing the limitations of today’s data center network architecture • single point of failure • oversubscription of links higher up in the topology • trade-offs between cost and providing

Considerations • Backwards compatible with existing infrastructure • No changes in application • Support of layer 2 (Ethernet) and IP • Cost effective • Low power consumption & heat emission • Cheap infrastructure • Scalable interconnection bandwidth • an arbitrary host can communicate with any other host at the full bandwidth of its local network interface.

Fat-Tree 2 3 0 1

Fat-Tree • Fat-Tree • a special type of Clos Networks (after C. Clos) • K-ary fat tree: three-layer topology (edge, aggregation and core) • each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches • each edge switch connects to k/2 servers & k/2 aggr. switches • each aggr. switch connects to k/2 edge & k/2 core switches • (k/2)^2 core switches: each connects to k pods

Fat-Tree • Why Fat-Tree? • Fat tree has identical bandwidth at any bisections • Each layer has the same aggregated bandwidth • Can be built using cheap devices with uniform capacity • Each port supports same speed as end hosts • All devices can transmit at line speed if packets are distributed uniform along available paths • Great scalability

Addressing in Fat-Tree • Use ”10.0.0.0/8" private addressing block • Pod switches have address "10.pod.switch.1" • Core switches have address "10.k.j.i" • “i” and “j" denote core position in (k/2)^2 core switches • Hosts have address "10.pod.switch.ID" • ID is host ID in switch subnet ([2, (k/2) + 1]) • k < 256, this scheme does not have scalability issue

Lookup in Fat-Tree • First level is prefix lookup • Used to route down the topology to end host • Second level is a suffix lookup • Used to route up towards core • Diffuses and spreads out traffic • Maintains packet ordering by using the same ports for the same end host

Lookup in Fat-Tree • Two-level table lookup

Routing • Pod switches • prefix - /24 matching (but no prefix for the lower level pod switches) • suffix - /8 matching • Core switches • prefix - /16 matching • e.g., • 10.0.0.0/16 - port 0 • 10.1.0.0/16 - port 1 • 10.2.0.0/16 - port 2 • 10.3.0.0/16 - port 3

10.0.1.2 —> 10.2.0.3 \ 2 3 0 1

10.0.1.2 —> 10.2.0.3 Routing Example 2 3 0 1

More Functions • Flow Classification • Dynamic Port-assignment • Measure against Packet reordering • Ensure Fair distribution • Flow Scheduling • Edge Switches • Central Scheduler

Evaluation • Cost of maintenance

Evaluation • Cost

Critiques • Scalability issues • K > 256 ?? • What kinds of routing protocols?

Jellyfish: Networking Data Centers Randomly UIUC and UC Berkeley NSDI 2012

Problem Domain • Structured DC network • structure constrains expansion • how to maintain structure; fixed topology, fixed connection…..

Solution • Then, how to? • Forget about the structure • let’s have no structure at all !!! • random graph

Jellyfish - Topology

JellyFish

Easy to Build

Throughput Simulation Can you believe this?

Incredible Throughput

Compare Topology

Path Length

Critiques • OK, it seems that is fine • throughput • easy to build (scalability) • Can we realize this in a real environment? • how can you connect switches like a random graph?

Summing up

Data Center Trends and Challenges: A Comprehensive Overview

Data Center Trends and Challenges: A Comprehensive Overview

Presentation Transcript

Introducing Data Center Solutions

Network Monitoring and Data Center Operation

Data center providers

Data Center Networks

The Data Center and Hadoop

Data Center and Network Planning and Services

ElasticTree : Saving Energy in Data Center Networks

Data Center Hardware

Data Center Colocation Services

Data Center Networking

UM/MITC Data Center

Data Center Network Topologies: FatTree

Data Center Network Security

Data Center Update

Cisco Data Center Network Architecture

Data Center 2

Data Center Management

NETWORK DATA CENTER IN TRICHY

Data Center Firewall

Data Center Network Infrastructure and Security Topics

Data Center Network Infrastructure

Tierpoint Data Center