430 likes | 467 Views
CMSC 34702 ML for Cluster Scheduling (1). Junchen Jiang October 3 , 2019. Logistics. Signup on Piazza https://piazza.com/class/k15fawsrzma6ow Choose your paper to present Paper review format: Paper summary (Three sentences or less about the main idea, approach, or contribution.)
E N D
CMSC 34702ML for Cluster Scheduling (1) Junchen Jiang October 3, 2019
Logistics • Signup on Piazza • https://piazza.com/class/k15fawsrzma6ow • Choose your paper to present • Paper review format: • Paper summary (Three sentences or less about the main idea, approach, or contribution.) • Why we should accept the paper? (Please give 1-3 sentences for the 1-3 strongest things about the paper.) • Why we should not accept the paper? (Please give 1-3 sentences about the 1-3 things about the paper that would most improve it.)
MapReduce: Simplified Data Processing on Large Clusters CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics
Scaling up vs. Scaling out: Origin of Cloud Computing • Scale-up: High-end servers Sun Starfire, Enterprise, … ($1 million a piece) Used by eBay, Amazon, … • Scale-out: “Commercial Off-The-Shelf” (COTS) computers Many of them (Google had 15,000 of them c. 2004)
Price/Performance Comparison (c. 2004) Higher performance and cheaper! Too good to be true?
Disadvantages of a cluster of COTS nodes? Rack of COTS computers High-end server VS CPU RAM Disk
New problems in distributed/cluster computing • Fault tolerance • Network traffic • Data consistency • Programming complexity • …
Cluster Computing Needs a Software Stack Processing Database … Typical software analytics stack Data mngt Resource mngt MapReduce MapReduce Spark … … … Bigtable HBase Shark The Google File System (GFS) Hadoop File System (HDFS) Allexio Borg YARN Mesos Berkeley Hadoop Google
MapReduce: Simplified Data Processing on Large Clusters Cluster computing is popular, but it’s hard to write complex & high performant programs. The first to provide an expressive programming interface that automatically optimizes low-level system details.
Why is parallelization difficult? If the initial state is x=6, y=0, what happens when these threads finish running? Thread 1 void foo(){ x ++; y = x; } Thread 2 void bar(){ y ++; x += 3; } Multithreading = Unpredictability (from https://www.youtube.com/watch?v=-vD6PUdf3Js)
Functional Programming x++ y=x x x y f 6 X A f y++ Y B y y 0 x+=3 x x Functional Programming No mutable variable, No changing state No side effect States can change (not idempotent) Too many variable (interdependency)
Key Functional Programming ops: map & fold f X X’ X f f Y Y’ Y f f Z Z’ Z f map fold
MapReduce: An instantiation of “map” & “fold” reduce map (key_a, val_11) (key_b, val_12) (key_1, val_1) (key_a, R([val_11])) (key_b, R([val_12,val_21])) (key_c, R([val_22])) (key_b, val_21) (key_c, val_22) (key_2, val_2) Example: Count word occurrences “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
Example: Count word occurrences reduce map (“personal”, 1) (“computer”, 1) (URL1, “personal computer”) (“personal”, 1) (“computer”, 2) (“science”, 1) (“computer”, 1) (“science”, 1) (URL2, “computer science”) “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
Rationale behind the MapReduce Interface: A Minimalist Approach Google Search, Machine learning, Graph mining, Grep, Sort, Word Counting… Applications, Data Analytics Algorithms Application developers need to must all the intricacies of resource & comm. Map & Reduce Interface Cluster Computing System MapReduce System Can you think of another example of the minimalist approach?
What’s the contribution of the MapReduce System? Make it easier to write parallel programs “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
What’s the contribution of the MapReduce System? Make it easier to write parallel programs An implementation of the interface that achieves high performance • Fault tolerance • Data locality • Load balancing • Straggler mitigation • Consistency • Data integrity “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
System Architecture “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
Performance: Data locality Co-locate workers with the data Co-locate reducers with mappers “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
Performance: Speeding up “Reducer” with “Combiner” When can “Combiner” help? “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
Fault Tolerance Re-execute in-progress and completed map tasks What if a map worker? “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
Fault Tolerance What if a reduce worker fails? Re-execute in-progress reduce tasks “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
Fault Tolerance What if the master fails? Expose to the user “MapReduce: Simplified Data Processing on Large Clusters”, Jeff Dean, et al, OSDI’04
MapReduce Summary • A minimalist approach • Many problems can be easily expressible by MapReduce primitives • Greatly simplifies fault tolerance & performance optimization • (Almost) complete transparent fault tolerance at a large scale • Dramatically ease the burden of programmers • Still need users to step-in in some cases…
“Hyperparameters” of a cluster/cloud job How many physical machines? How much RAM, CPUs per machine? How much disk space? How much network bandwidth? …
CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics Cloud performance is sensitive to configurations, but there’s no way to pick configurations optimally, quickly and adaptively for any cloud job The first systematic technique to achieve these requirements through modeling the perf-config relationship via a blackbox ML technique
Large space of cloud configurations Providers Machine Types Cluster Sizes r3.8xlarge, i2.8xlarge, m4.8xlarge, c4.8xlarge, r4.8xlarge, c3.8xlarge, Amazon AWS X X 10s~ options Microsoft Azure A0, A1, A2, A3, A11, A12, D1, D2, D3, … n1-standard-4, n1-highmem-2, n1-highcpu-4, … Google Cloud “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Good configuration High performance & Low cost 66 Configurations “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Complex performance-configuration relationship “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
How to find the best cloud configuration One that minimizes the cost given a performance constraint for a recurring job, given its representative workload? “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Key metrics of success “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Strawmen • Exhaustive search • High overhead • Coordinate search • Optimize each config (CPU, RAM, disk, network, etc) • Not accurate (non-convex performance/cost curves across many resources) • Ernest [NSDI’16] • Learn a model for each job type • Not adaptive “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Why CherryPick “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Basic idea: Blackbox modeling Config-performance model Start with any config Run the config Blackbox modeling Choose next config Return config “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Insight: No need to be accurate everywhere How about a model that predicts performance for any given configuration? Config-performance model Start with any config Run the config Blackbox modeling Choose next config Return config Insight: All we need is the top ranking (which one is the better). “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Bayesian Optimization “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Bayesian Optimization “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
How to pick the next configuration? “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Why CherryPick works? “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Does the "blackbox” behave reasonably? “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Conclusion “CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics”, Omid Alipourfard, et al, NSDI’17
Reminder • Signup on Piazza Need to post paper summaries there • Project proposal idea due in 12 days