220 likes | 524 Views
Transparent and Flexible Network Management for Big Data Processing in the Cloud. Cristian Lumezanu Yueping Zhang Vishal Singh Guofei Jiang. Anupam Das Curtis Yu. Data processing. Network. Schedule computation. Schedule communication. 33% of average job running time.
E N D
Transparent and Flexible Network Management for Big Data Processingin the Cloud Cristian Lumezanu Yueping Zhang Vishal Singh Guofei Jiang Anupam Das Curtis Yu
Data processing Network
Schedule communication 33% of average job running time
FlowCombnetwork management framework for Big Data processing 2. which path to choose? 3. how to change the path? 1. what is the traffic demand?
Demand prediction Use application semantics information to effectivelyand transparentlyinfer network transfers (possibly before they start)
Demand prediction Agents on Hadoop nodes analyze Hadooplogs, query nodes and predict data transfers. Parses JobTracker logs to identify finished mappers Agent Parses TaskTracker logs to identify reducers and size of map output Hadoop node
Flow scheduling Reroute flows on paths with sufficient available bandwidth
Flow scheduling Where? Centralized decision engine Which flows? FIFO Reroute? If congestion on default path Which path? First with available bandwidth
Flow control Use OpenFlowto install new forwarding rules in the network and enforce the new paths
Install routing rules 5 System Architecture OpenFlow Controller PFS PFS PFS PFS PFS PFS Hadoop Cluster 4 Set up flow paths Master Slaves FlowComb Middleware 1 Analyze Hadoop logs 3 Schedule upcoming flows FlowComb agent 2 Extract flow information NEC Confidential
Does the network matter? 4 times slower !!!
Can FlowComb predict transfers? 28% of transfers detected before they start (and 56% before they end)
How quickly can FlowComb change paths? 10% 70% 20% 60% beforetransfer midpoint
Can FlowComb reduce processing time? 36% fasterthan Hadoop without FlowComb (and 28% faster than Hadoop with ECMP)
FlowComb Network management platform for Big Data processing that is transparent to applications and quickand accurate in detecting their demand uses application semanticsto detect data transfers(sometimes before they even start)
OpenFlow network Controller
Hadoop sort performance baseline Avg utilization (MBps) FlowComb Time (s)