Machine Learning in the Cloud

Machine Learning in the Cloud Carlos Guestrin Joe Hellerstein David O’Hallaron Yucheng Low Aapo Kyrola Danny Bickson JoeyGonzalez

Machine Learning in the Real World 13 Million Wikipedia Pages 500 Million Facebook Users 3.6 Billion Flickr Photos 24 Hours a Minute YouTube

Parallelism is Difficult • Wide array of different parallel architectures: • Different challenges for each architecture GPUs Multicore Clusters Clouds Supercomputers High Level Abstractions to make things easier.

MapReduce – Map Phase 4 2 . 3 2 1 . 3 2 5 . 8 CPU 1 1 2 . 9 CPU 2 CPU 3 CPU 4 Embarrassingly Parallel independent computation No Communication needed

MapReduce – Map Phase 8 4 . 3 1 8 . 4 8 4 . 4 CPU 1 2 4 . 1 CPU 2 CPU 3 CPU 4 1 2 . 9 4 2 . 3 2 1 . 3 2 5 . 8 Embarrassingly Parallel independent computation No Communication needed

MapReduce – Map Phase 6 7 . 5 1 4 . 9 3 4 . 3 CPU 1 1 7 . 5 CPU 2 CPU 3 CPU 4 8 4 . 3 1 8 . 4 8 4 . 4 1 2 . 9 2 4 . 1 4 2 . 3 2 1 . 3 2 5 . 8 Embarrassingly Parallel independent computation No Communication needed

MapReduce – Reduce Phase 17 26 . 31 22 26 . 26 CPU 1 CPU 2 1 2 . 9 2 4 . 1 1 7 . 5 4 2 . 3 8 4 . 3 6 7 . 5 2 1 . 3 1 8 . 4 1 4 . 9 2 5 . 8 8 4 . 4 3 4 . 3 Fold/Aggregation

MapReduce and ML • Excellent for large data-parallel tasks! Data-Parallel Complex Parallel Structure Is there more to Machine Learning ? Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics

Iterative Algorithms? • We can implement iterative algorithms in MapReduce: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Slow Processor Data Data Data Data Data Barrier Barrier Barrier

Iterative MapReduce • System is not optimized for iteration: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Startup Penalty Disk Penalty Disk Penalty Startup Penalty Startup Penalty Disk Penalty Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data

Iterative MapReduce • Only a subset of data needs computation: (multi-phase iteration) Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

MapReduce and ML • Excellent for large data-parallel tasks! Data-Parallel Complex Parallel Structure Is there more to Machine Learning ? Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics

Structured Problems Example Problem: Will I be successful in research? Success depends on the success of others. May not be able to safely update neighboring nodes. [e.g., Gibbs Sampling] Interdependent Computation: Not Map-Reducible

Space of Problems • Sparse Computation Dependencies • Can be decomposed into local “computation-kernels” • Asynchronous Iterative Computation • Repeated iterations over local kernel computations

Parallel Computing and ML • Not all algorithms are efficiently data parallel ? Data-Parallel Structured Iterative Parallel GraphLab Map Reduce Tensor Factorization Lasso Feature Extraction Cross Validation Kernel Methods Belief Propagation Computing Sufficient Statistics LearningGraphicalModels SVM Sampling Deep Belief Networks Neural Networks

GraphLab Goals • Designed for ML needs • Express data dependencies • Iterative • Simplifies the design of parallel programs: • Abstract away hardware issues • Addresses multiple hardware architectures • Multicore • Distributed • GPU and others

GraphLab Goals Simple Models Complex Models Now Small Data Data-Parallel Goal Large Data

GraphLab Goals Simple Models Complex Models Now Small Data Data-Parallel GraphLab Large Data

GraphLab A Domain-Specific Abstraction for Machine Learning

Everything on a Graph A Graph with data associated with every vertex and edge :Data

Update Functions Update Functions: operations applied on vertex  transform data in scope of vertex

Update Functions Update Function can Schedule the computation of any other update function: - FIFO Scheduling - Prioritized Scheduling - Randomized Etc. Scheduled computation is guaranteed to execute eventually.

Example: Page Rank Graph = WWW multiply adjacent pagerank values with edge weights to get current vertex’s pagerank Update Function: “Prioritized” PageRank Computation? Skip converged vertices.

Example: K-Means Clustering Data (Fully Connected?) Bipartite Graph Clusters Cluster Update: compute average of data connected on a “marked” edge. Data Update: Pick the closest cluster and mark the edge. Unmark remaining edges. Update Function:

Example: MRF Sampling Graph = MRF - Read samples on adjacent vertices - Read edge potentials - Compute new sample for current vertex Update Function:

Not Message Passing! Graph is a data-structure. Update Functions perform parallel modifications to the data-structure.

Safety If adjacent update functions occur simultaneously?

Importance of Consistency ML resilient to soft-optimization? Permit Races? “Best-effort” computation? True for some algorithms. Not true for many. May work empirically on some datasets; may fail on others.

Importance of Consistency Many algorithms require strict consistency, or performs significantly better under strict consistency. Alternating Least Squares

Importance of Consistency Fast ML Algorithm development cycle: Build Test Debug Tweak Model Necessary for framework to behave predictably and consistently and avoid problems caused by non-determinism. Is the execution wrong? Or is the model wrong?

Sequential Consistency GraphLab guarantees sequential consistency  parallel execution, sequential execution of update functions which produce same result time CPU 1 Parallel CPU2 CPU1 Sequential

Sequential Consistency GraphLab guarantees sequential consistency  parallel execution, sequential execution of update functions which produce same result Formalization of the intuitive concept of a “correct program”. - Computation does not read outdated data from the past - Computation does not read results of computation that occurs in the future. Primary Property of GraphLab

Global Information What if we need global information? Algorithm Parameters? Sufficient Statistics? Sum of all the vertices?

Shared Variables • Global aggregation through Sync Operation • A global parallel reduction over the graph data. • Synced variables recomputed at defined intervals • Sync computationis Sequentially Consistent • Permits correct interleaving of Syncs and Updates Sync: Loglikelihood Sync: Sum of Vertex Values

Sequential Consistency GraphLab guarantees sequential consistency  parallel execution, sequential execution of update functions and Syncs which produce same result time CPU 1 Parallel CPU2 CPU1 Sequential

GraphLab in the Cloud

Moving towards the cloud… • Purchasing and maintaining computers is very expensive • Most computing resources seldomly used • Only for deadlines… • Buy time, access hundreds or thousands of processors • Only pay for needed resources

Distributed GL Implementation • Mixed Multi-threaded / Distributed Implementation. (Each machine runs only one instance) • Requires all data to be in memory. Move computation to data. • MPI for management + TCP/IP for communication • Asynchronous C++ RPC Layer • Ran on 64 EC2 HPC Nodes = 512 Processors Skip Implementation

Underlying Network RPC Controller RPC Controller RPC Controller RPC Controller Execution Engine Execution Engine Execution Engine Execution Engine Execution Engine Shared Data Shared Data Shared Data Shared Data Shared Data Cache Coherent Distributed K-V Store Cache Coherent Distributed K-V Store Cache Coherent Distributed K-V Store Cache Coherent Distributed K-V Store Cache Coherent Distributed K-V Store Distributed Graph Distributed Graph Distributed Graph Distributed Graph Distributed Graph Distributed Locks Distributed Locks Distributed Locks Distributed Locks Distributed Locks Execution Threads Execution Threads Execution Threads Execution Threads Execution Threads

GraphLab RPC

Write distributed programs easily • Asynchronous communication • Multithreaded support • Fast • Scalable • Easy To Use (Every machine runs the same binary)

Features • Easy RPC capabilities: • One way calls rpc.remote_call([target_machine ID], printf, “%s %d %d %d\n”, “hello world”, 1, 2, 3); • Requests (call with return value) std::vector<int>& sort_vector(std::vector<int> &v) { std::sort(v.begin(), v.end()); return v; } • vec = rpc.remote_request( • [target_machine ID], • sort_vector, • vec);

Features • Object Instance Context MPI-like primitives dc.barrier() dc.gather(...) dc.send_to([target machine], [arbitrary object]) dc.recv_from([source machine], [arbitrary object ref]) K-V Object RPC Controller K-V Object RPC Controller K-V Object RPC Controller K-V Object RPC Controller MPI-Like Safety

Request Latency Ping RTT = 90us

One-Way Call Rate 1Gbps physical peak

Serialization Performance 100,000 X One way call of vector of 10 X {"hello", 3.14, 100}

Distributed Computing Challenges Q1: How do we efficiently distribute the state ? - Potentially varying #machines Q2: How do we ensure sequential consistency ? Keeping in mind: Limited Bandwidth High Latency Performance

Distributed Graph

Machine Learning in the Cloud

Machine Learning in the Cloud

Presentation Transcript

Topics in Machine Learning

Machine Learning in Bioinformatics

Machine Learning

Machine Learning

Machine Learning in DryadLINQ

Machine learning in IDS

Submodularity in Machine Learning

Electron Cloud Diagnostics in the SPS Machine

Machine Learning in realtime

Machine Learning in GATE

Using Statistical Machine Learning in Cloud Computing

Experiments in Machine Learning

Evaluation in Machine Learning

Machine Learning in Football

Learning in the Cloud! Cloud Computing for Teachers & Schools

Artificial intelligence cloud platform for machine learning

Machine learning Courses | Machine Learning Training

Using Statistical Machine Learning in Cloud Computing

Experiments in Machine Learning

Machine learning in IDS

Advantage and Usage for Machine learning in Cloud Computing- TutorsIndia.com

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn