1 / 12

Measuring a (MapReduce) Data Center

Measuring a (MapReduce) Data Center. Typical Data Center Network. IP Routers. ToR. Aggregation Switches. 24-, 48- port 1G to server, 10Gbps up ~ $7K. …. …. …. …. Top-of-rack Switch. …. …. Servers. Agg . Less bandwidth up the hierarchy Clunky routing

omer
Download Presentation

Measuring a (MapReduce) Data Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring a (MapReduce) Data Center

  2. Typical Data Center Network IP Routers ToR Aggregation Switches 24-, 48- port 1G to server, 10Gbps up ~ $7K … … … … Top-of-rack Switch … … Servers Agg • Less bandwidth up the hierarchy • Clunky routing • e.g., VL2, BCube, FatTree, Portland, DCell Modular switch Chassis + up to 10 blades >140 10G ports $150K-$200K

  3. What does traffic in a datacenter look like? Goal • A realistic model of data center traffic • Compare proposals How to measure a datacenter? (Macro-) Who talks to whom? Congestion, its impact (Micro-) Flow details: Sizes, Durations, Inter-arrivals, flux

  4. How to measure? Servers Agg. Switches MapReduce Scripts + ToR Distr. FS Router = … … … … … … SNMP reports Use the end-hosts to share load • per port: in/out octets • sample every few minutes • miss server- or flow- level info • Auto managed already Packet Traces • Not native on most switches • Hard to set up (port-spans) Measured 1500 servers for several months Sampled NetFlow Tradeoff: CPU overhead on switch for detailed traces

  5. Who Talks To Whom? 1Gbps .4 Gbps 3 Mbps 20 Kbps .2 Kbps 0 Server To Server From • Two patterns dominate • Most of the communication happens within racks • Scatter, Gather

  6. Flows are small. 80% of bytes in flows < 200MB are short-lived. 50% of bytes in flows < 25s turnover quickly. median inter-arrival at ToR = 10-2s Flows which lead to… • Traffic Engineering schemes should react faster, few elephants • Localized traffic  additional bandwidth alleviates hotspots

  7. Congestion, its Impact • are links busy? • who are the culprits? • are apps impacted? Often! 1 .8 .6 .4 .2 0 Contiguous Duration of >70% link utilization (seconds)

  8. Congestion, its Impact • are links busy? • who are the culprits? • are apps impacted? Often! Apps (Extract, Reduce) Marginally

  9. Measurement Alternatives Link Utilizations (e.g., from SNMP) Server 2 Server Traffic Matrix Tomography + make do with easier-to-measure data – under-constrained problem  heuristics gravity

  10. Measurement Alternatives Link Utilizations (e.g., from SNMP) Server 2 Server Traffic Matrix Tomography + make do with easier-to-measure data – under-constrained problem  heuristics gravity max sparse

  11. Measurement Alternatives Link Utilizations (e.g., from SNMP) Server 2 Server Traffic Matrix Tomography + make do with easier-to-measure data – under-constrained problem  heuristics gravity max sparse tomography + Job Information

  12. a first look at traffic in a (map-reduce) data center • some insights • traffic stays mostly within high bandwidth regions • flows aresmall, short-lived and turnover quickly • net highly-utilized oftenwith moderate impact on apps. • measuring @ end-hosts is feasible, necessary (?) • → a model for data center traffic

More Related