Thus Far

Thus Far • Locality is important!!! • Need to get Processing closer to storage • Need to get tasks close to data • Rack locality: Hadoop • Kill task and if a local slot is available: Quincy • Why? • Network is bad: gives horrible performance • Why? • Over-subscription of network

What Has Changed? • Network is no-longer over-subscribed • Fat-tree, VL2 • Network has fewer congestion points • Helios, C-thru, Hedera, MicroTE • Server uplinks are much faster • Implications: network transfers are much faster • Network is now just as fast as Disk I/o • Difference between local and rack local is only 8% • Storage practices have also changed • Compression is being used • Smaller amount of data need to be transferred • De-replication is being practiced • There’s only one copy so locality is really hard to achieve

So What Now? • No need to worry about locality when doing placement • Placement can happen faster • Scheduling algorithms can be smaller/simpler • Network is as fast as SATA disk? But still a lot slower than SSD? • If SDD used then disk-locality is a problem AGAIN! • However too costly to be used for all storage

Caching with Memory/SSD • 94% of all jobs can have input fit in memory • So a new problem is memory locality • Want to place a task where it will have access to data already in memory • Interesting challenges: • 46% of task use data that is never re-used • So need to pre-catch for these tasks • Current caching scheme are ineffective

How do you build a FS that Ignore locality • FDS from MSR ignores locality • Eliminate networking problem to remove importance of locality • Eliminate meta-data server problems to improve throughput of the whole system

Meta-data Server • Current meta-data server (name-node) • Stores mapping of chunks to servers • Central point of failure • Central bottle-neck • Processing issues: before anyone reads/writes must consult metadata server • Storage issues: must store location of EVERY chunk and size of every chunk

FDS’s Meta-data Server • Only store list of servers: • smaller memory footprint: • # servers <<< # chunks • Clients only interact with it at startup • Not every-time they need to read/write • To determine where to read/write: Consistent hashing • Write/read data at server at this location in array • Hash(GUID)/#-server • # reads/writes <<<< # client boot

Network Changes • Uses VL2 style Clos Network • Eliminates over-subscription+ congestion • 1 TCP doesn’t saturate Server 10-gig NIC • Use 5 TCP connections to saturate link • Since VL2, No congestion in core but maybe at receiver • Receiver controls the senders sending rate • Receiver sends rate-limiting messages to

Disk locality is almost a distant problem • Advances in networking • Eliminate over-subscription/congestion • We have prototype of FDS that doesn’t need locality • Uses VL2 • Eliminates meta-data servers • New problem, new challenges • Memory locality • New cache replacement techniques • New pre-caching schemes

Class Wrap-UP • What have we covered and learned? • The big data-stack • How to optimize each layer? • What are the challenges in each layer? • Are there any opportunities to optimize across layers?

Big-Data Stack: App Paradigms • Commodity devices impact the design of application paradigms • Hadoop: dealing with failures • Addresses n/w oversubscription—rack aware placement • Straggler detection mitigation --- restart tasks • Dryad: hadoop for smarter programmers • Can create more expressive task DAGs (non cyclic) • Can determine which should run locally on same devs • Dryad does optimizations: adds extra nodes to do temp aggregation

App Hadoop Dryad Sharing Virt Drawbacks N/W Paradign Tail Latency N/W Sharing SDN Storage

Big-Data Stack: App Paradigms Revisited • User visible services are complex and composed of multiple M-R jobs • Flume & DryadLinQ • Delay Execution until output is required • Allows for various optimizations • Storing output to HDFS between M-R jobs adds times • Eliminate HDFS between jobs • Programmers aren’t smart, often have extra un-necessarily steps • Knowing what is required for output, you can eliminate unnecessary

FlumeJava App DryadLinQ Hadoop Dryad Sharing Virt Drawbacks N/W Paradign Tail Latency N/W Sharing SDN Storage

Big-Data Stack: App Paradigms Revisited-yet-again • User visible services require interactivity: so jobs need to be fast. Jobs should return results before completing processing • Hadoop-Online: • pipeline results from map to reduce before done. • Pipeline too early and reduce need to do sorting • Increases processing overhead on reduce: BAD!!! • RRD: Spark • Store data in memory: much faster than disk • Instead of doing process: create abstract graph of processing and to processing when output is required • Allows for optimizations • Failure recovery is the challenge

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Sharing Omega Mesos Virt Drawbacks N/W Paradign Tail Latency N/W Sharing SDN Storage

Big-Data Stack: Sharing in CaringHow to share a non-virtualized cluster • Sharing is good: you have too much data and cost too much to build many cluster for same data • Need dynamic sharing: if static, you can waste • Mesos: • Resource offers: give app options of resources and let them pick • App knows best • Omega: • Optimistic allocation: each scheduler picks resources and if there’s a conflict omega detects this and gives resources to only one. Others pick new resources • Even with conflicts this is much better than centralized entity

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Sharing Omega Mesos Virt Drawbacks N/W Paradign Tail Latency N/W Sharing SDN Storage

Big-Data Stack: Sharing in CaringCloud Sharing • Clouds gives the illusion of equality • H/W differences  diff performance • Poor isolation  tenants can impact each other • I/O and CPU bound jobs can conflict.

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Virt Drawbacks N/W Paradign Tail Latency N/W Sharing SDN Storage

Big-Data Stack: Better Networks • Networks give bad performance • Cause: Congestion + over-subscription • VL2/Portland • Eliminate over-subscription + congestion with commodity devices+ECMP • Helios/C-through • Mitigate congestion by carefully adding new capacity

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Virt Drawbacks Hedera VL2 Portland C-Thru Helios MicroTE N/W Paradign N/W Paradign Tail Latency N/W Sharing SDN Storage

Big-Data Stack: Better Networks • When you need multiple servers to service a request • .99100 = .65 (HORRIBLE) • Duplicate requests: send same request to 2 servers • At-least one will finish within acceptable time • Dolly: be smart when selecting the 2 servers • You don’t want I/O contention because that leads to bad perf • Avoid Maps using same replicas • Avoid Reducers reading same intermediate output

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Hedera VL2 Portland C-Thru Helios MicroTE N/W Paradign Mantrei Tail Latency Dolly (Clones) Tail Latency

Big-Data Stack: Networks Sharing • How to share efficiently while making guarantees • Elastic-Switch • Two level bandwidth allocation system • Orchestra • M/R has barriers and completion is based on a set of flows not individual flows • Make optimization to a set of flows • Hull: Trade BW for latency • Want zero buffering: but TCP needs buffering • Limit traffic to 90% of link and use the remaining 10% as buffers

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Hedera VL2 Portland C-Thru Helios MicroTE N/W Paradign Mantrei Tail Latency Dolly (Clones) Tail Latency Hull N/W Sharing Elastic Cloud Orchestra SDN Storage

Big-Data Stack: Enter SDN • Remove the Control plane from the switches and centralize it • Centralization == Scalability challenges • NOX: how does it scale to data-centers • How many controllers do you need? • How should you design these controllers: • Kandoo: a hierarchy (many local and 1 global controller, local communicate with the global;) • ONIX: a mesh (communication through a DHT or DB)

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Hedera VL2 Portland C-Thru Helios MicroTE N/W Paradign Mantrei Tail Latency Dolly (Clones) Tail Latency Hull N/W Sharing Elastic Cloud Orchestra Kandoo ONIX SDN Storage

Big Data Stack: SDN+Big-Data • FlowComb: • Detect app patterns and have SDN controller assign paths based on knowledge of traffic patterns and contention • Sinbad: • HDFS writes are important • Let SDN controller tell HDFS best place to write data to based on knowledge of n/w congesetion

App FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Virt Drawbacks Hedera VL2 Portland C-Thru Helios MicroTE N/W Paradign N/W Paradign Tail Latency Mantrei Tail Latency Dolly (Clones) Tail Latency Hull N/W Sharing N/W Sharing Elastic Cloud Orchestra SDN Kandoo ONIX SDN FlowComb SinBaD Storage

Big Data Stack: Distributed Storage • Ideal: Nice API, low latency, scalable • Problem: H/W fails a lot, in limited locations, and contains limited resources • Partition: gives good performance • Cassandra: use consistent hashing • Megastore: each partition == A RDBMS with good consistency guarantees • Replicate: Multiple copies avoid failures • Megastore: replicas allow for low latency

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Hedera VL2 Portland C-Thru Helios MicroTE N/W Paradign Mantrei Tail Latency Dolly (Clones) Tail Latency Hull N/W Sharing Elastic Cloud Orchestra Kandoo ONIX SDN FlowComb SinBaD Megastore Casandra Storage

Big Data Stack: Disk locality Irrelevant • Disk Locality is becoming irrelevant • Data is getting smaller (compressed) so smaller times • Networks are getting much faster (only 8% slower) • Mem locality is new challenge • Input for 94% fit in mem • Need new caching+prefetching schemes

FlumeJava HadoopOnline App DryadLinQ Spark Hadoop Dryad Sharing Omega Mesos BobTail RFA CloudGaming Virt Drawbacks Hedera VL2 Portland C-Thru Helios MicroTE N/W Paradign Mantrei Tail Latency Dolly (Clones) Tail Latency Hull N/W Sharing Elastic Cloud Orchestra Kandoo ONIX SDN FlowComb SinBaD Megastore Casandra Storage Disk-locality irrelevant FDS

Thus Far

Thus Far

Presentation Transcript

10.3 Time Series Thus Far

Miocene Hominoid Distribution, From Fossils Thus Far Discovered

From Good to Getting Better… MEDRAD’s Baldrige Journey thus far

Midrash , our understanding thus far

THUS FAR: uniform bed and uniform velocity

What has been learned thus far...

The Story Thus Far

Here are the verbs we have thus far covered:

Thus Far…

LP: Summary thus far

The Story Thus Far

Homework questions thus far???

The National College Depression Project: The Journey Thus Far

The Story Thus Far…..

My Life Thus Far

SUMMARY OF 2 TIMOTHY THUS FAR

Research Governance Toolkit The journey thus far

Bulk d 18 O thus far

What We’ve Heard Thus Far: ACF Basin Outlook and Impacts

Unit 1 thus far…

Part II: What we have learnt thus far