490 likes | 643 Views
Presenters: Abhishek Verma, Nicolas Zea. Cloud Computing - I. Cloud Computing. Map Reduce Clean abstraction Extremely rigid 2 stage group-by aggregation Code reuse and maintenance difficult Google → MapReduce, Sawzall Yahoo → Hadoop, Pig Latin Microsoft → Dryad, DryadLINQ
E N D
Presenters: Abhishek Verma, Nicolas Zea Cloud Computing - I
Cloud Computing • Map Reduce • Clean abstraction • Extremely rigid 2 stage group-by aggregation • Code reuse and maintenance difficult • Google → MapReduce, Sawzall • Yahoo → Hadoop, Pig Latin • Microsoft → Dryad, DryadLINQ • Improving MapReduce in heterogeneous environment
k1 k1 v1 v1 k1 v1 k1 k2 v3 v2 k1 v3 k2 k1 v3 v2 k1 v5 k2 v2 k1 k2 v5 v4 k2 v4 k2 k1 v4 v5 MapReduce: A group-by-aggregate Input records Output records map reduce Split Local QSort reduce map Split shuffle
Shortcomings • Extremely rigid data flow • Other flows hacked in Stages Joins Splits • Common operations must be coded by hand • Join, filter, projection, aggregates, sorting,distinct • Semantics hidden inside map-reduce fns • Difficult to maintain, extend, and optimize M R M R M R
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins Pig Latin: A Not-So-Foreign Language for Data Processing Research
Pig Philosophy • Pigs Eat Anything • Can operate on data w/o metadata : relational, nested, or unstructured. • Pigs Live Anywhere • Not tied to one particular parallel framework • Pigs Are Domestic Animals • Designed to be easily controlled and modified by its users. • UDFs : transformation functions, aggregates, grouping functions, and conditionals. • Pigs Fly • Processes data quickly(?)
Features • Dataflow language • Procedural : different from SQL • Quick Start and Interoperability • Nested Data Model • UDFs as First-Class Citizens • Parallelism Required • Debugging Environment
Pig Latin • Data Model • Atom : 'cs' • Tuple: ('cs', 'ece', 'ee') • Bag: { ('cs', 'ece'), ('cs')} • Map: [ 'courses' → { ('523', '525', '599'}] • Expressions • Fields by position $0 • Fields by name f1, • Map Lookup #
URL Category PageRank cnn.com News 0.9 bbc.com News 0.8 flickr.com Photos 0.7 espn.com Sports 0.9 Example Data Analysis Task Find the top 10 most visited pages in each category Visits URL Info
Data Flow Load Visits Group by url Foreachurl generate count Load Url Info Join on url Group by category Foreachcategory generate top10 urls
In Pig Latin visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits byurl; visitCounts = foreachgVisitsgenerateurl, count(visits); urlInfo = load‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCountsbyurl, urlInfobyurl; gCategories = groupvisitCountsby category; topUrls = foreachgCategories generatetop(visitCounts,10); storetopUrlsinto‘/data/topUrls’;
Quick Start and Interoperability visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits byurl; visitCounts = foreachgVisitsgenerateurl, count(visits); urlInfo = load‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCountsbyurl, urlInfobyurl; gCategories = groupvisitCountsby category; topUrls = foreachgCategories generatetop(visitCounts,10); storetopUrlsinto‘/data/topUrls’; Operates directly over files
Optional Schemas visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits byurl; visitCounts = foreachgVisitsgenerateurl, count(visits); urlInfo = load‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCountsbyurl, urlInfobyurl; gCategories = groupvisitCountsby category; topUrls = foreachgCategories generatetop(visitCounts,10); storetopUrlsinto‘/data/topUrls’; Schemas 0ptional can be assigned dynamically
UDFs as First-class citizens visits = load ‘/data/visits’ as (user, url, time); gVisits = group visits byurl; visitCounts = foreachgVisitsgenerateurl, count(visits); urlInfo = load‘/data/urlInfo’ as (url, category,pRank); visitCounts = join visitCountsbyurl, urlInfobyurl; gCategories = groupvisitCountsby category; topUrls = foreachgCategories generatetop(visitCounts,10); storetopUrlsinto‘/data/topUrls’; UDFs can be used in every construct
Operators • LOAD: specifying input data • FOREACH: per-tuple processing • FLATTEN: eliminate nesting • FILTER: discarding unwanted data • COGROUP: getting related data together • GROUP, JOIN • STORE: asking for output • Other: UNION, CROSS, ORDER, DISTINCT
Compilation into MapReduce Every group or join operation forms a map-reduce boundary Map1 Load Visits Group by url Reduce1 Map2 Foreachurl generate count Load Url Info Join on url Reduce2 Map3 Other operations pipelined into map and reduce phases Group by category Reduce3 Foreachcategory generate top10 urls
Debugging Environment • Write-run-debug cycle • Sandbox dataset • Objectives: • Realism • Conciseness • Completeness • Problems: • UDFs
Future Work • Optional “safe” query optimizer • Performs only high-confidence rewrites • User interface • Boxes and arrows UI • Promote collaboration, sharing code fragments and UDFs • Tight integration with a scripting language • Use loops, conditionals of host language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey DryadLINQ: A System for General Purpose Distributed Data-Parallel Computing Using a High-Level Language
Dryad System Architecture data plane Files, TCP, FIFO, Network job schedule V V V NS PD PD PD control plane Job manager cluster
LINQ Collection<T> collection; bool IsLegal(Key); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ Constructs C# objects Partition • Partitioning: Hash, Range, RoundRobin • Apply, Fork • Hints Collection
Dryad + LINQ = DryadLINQ Collection<T> collection; boolIsLegal(Key k); string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; Vertexcode Queryplan (Dryad job) Data collection C# C# C# C# results
DryadLINQ Execution Overview Client machine DryadLINQ C# Data center Distributed query plan Invoke Query Expr Query ToDryadTable Input Tables JM Dryad Execution Output DryadTable C# Objects Results Output Tables (11) foreach
System Implementation • LINQ expressions converted to execution plan graph (EPG) • similar to database query plan • DAG • annotated with metadata properties • EPG is skeleton of Dryad DFG • as long as native operations are used, properties can propagate helping optimization
Static Optimizations • Pipelining • Multiple operations in a single process • Removing redundancy • Eager Aggregation • Move aggregations in front of partitionings • I/O Reduction • Try to use TCP and in-memory FIFO instead of disk space
Dynamic Optimizations • As information from job becomes available, mutate execution graph • Dataset size based decisions • Intelligent partitioning of data
Dynamic Optimizations • Aggregation can turn into tree to improve I/O based on locality • Example if part of computation is done locally, then aggregated before being sent across network
Evaluation • TeraSort - scalability • 240 computer cluster of 2.6Ghz dual core AMD Opterons • Sort 10 billion 100-byte records on 10-byte key • Each computer stores 3.87 GBs
Evaluation • DryadLINQ vs Dryad - SkyServer • Dryad is hand optimized • No dynamic optimization overhead • DryadLINQ is 10% native code
Main Benefits • High level and data type transparent • Automatic optimization friendly • Manual optimizations using Apply operator • Leverage any system running LINQ framework • Support for interacting with SQL databases • Single computer debugging made easy • Strong typing, narrow interface • Deterministic replay execution
Discussion • Dynamic optimizations appear data intensive • What kind of overhead? • EPG analysis overhead -> high latency • No real comparison with other systems • Progress tracking is difficult • No speculation • Will Solid State Drives diminish advantages of MapReduce? • Why not use Parallel Databases? • MapReduce Vs Dryad • How different from Sawzall and Pig?
Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley Improving MapReduce Performance in Heterogeneous Environments
Hadoop Speculative Execution Overview • Speculative tasks executed only if no failed or waiting avail. • Notion of progress • 3 phases of execution • Copy phase • Sort phase • Reduce phase • Each phase weighted by % data processed • Determines whether a job failed or is a straggler and available for speculation
Hadoop’s Assumptions • Nodes can perform work at exactly the same rate • Tasks progress at a constant rate throughout time • There is no cost to launching a speculative task on an idle node • The three phases of execution take approximately same time • Tasks with a low progress score are stragglers • Maps and Reduces require roughly the same amount of work
Breaking Down the Assumptions • Virtualization breaks down homogeneity • Amazon EC2 - multiple vm’s on same physical host • Compete for memory/network bandwidth • Ex: two map tasks can compete for disk bandwidth, causing one to be a straggler
Breaking Down the Assumptions • Progress threshold in Hadoop is fixed and assumes low progress = faulty node • Too Many speculative tasks executed • Speculative execution can harm running tasks
Breaking Down the Assumptions • Task’s phases are not equal • Copy phase typically the most expensive due to network communication cost • Causes rapid jump from 1/3 progress to 1 of many tasks, creating fake stragglers • Real stragglers get usurped • Unnecessary copying due to fake stragglers • Progress score means anything with >80% never speculatively executed
LATE Scheduler • Longest Approximate Time to End • Primary assumption: best task to execute is the one that finishes furthest into the future • Secondary: tasks make progress at approx. constant rate • Progress Rate = ProgressScore/T* • T = time task has run for • Time to completion = (1-ProgressScore)/T
LATE Scheduler • Launch speculative jobs on fast nodes • best chance to overcome straggler vs using first available node • Cap on total number of speculative tasks • ‘Slowness’ minimum threshold • Does not take into account data locality
Performance Comparison Without Stragglers • EC2 test cluster • 1.0-1.2 Ghz Opteron/Xeon w/1.7 GB mem Sort
Performance Comparison With Stragglers • Manually slowed down 8 VM’s with background processes Sort
Performance Comparison With Stragglers WordCount Grep
Takeaways • Make decisions early • Use finishing times • Nodes are not equal • Resources are precious
Further questions • Focusing work on small vm’s fair? • Would it be better to pay for large vm and implement system with more customized control? • Could this be used in other systems? • Progress tracking is key • Is this a fundamental contribution? Or just an optimization? • “Good” research?