Big Data Learning Systems

Big Data Learning Systems Cloud and Information Services Lab, Microsoft

Machine Learning Workflow Step I: Example Formation Feature and Label Extraction Step II: Modeling Step III: Evaluation (and eventually Deployment) Examples Model Example Formation Modeling Evaluation

Example Formation: Spam Filter Feature Extraction Large Scale Join Large Scale Join EMail ID Bag of Words Example Data Parallel Functions ID Bag of Words Label Click Log ID Label Label Extraction

Google File System: GFS • Scalable Distributed File System • For large distributed data-intensive applications • Key Design goals • Component failures are the norm rather than the exception • Requires: constant monitoring, error detection, fault-tolerance, and automatic recovery • Files are HUGE by traditional standards • Multi-GB files are common • Co-design of applications and file system API benefits overall system • e.g., relaxed consistency, atomic append • Most files are mutated by appending new data • Random writes within a file a practically non-existent • Once written, files are only read, and often only sequentially “Given this access pattern on huge ﬁles, appending becomes the focus of performance optimization and atomicity guarantees, while caching data blocks in the client loses its appeal.”

HDFS Architecture Namenode Metadata(Name, replicas..) (/home/foo/data,6. .. Metadata ops Client Block ops Read Datanodes Datanodes B replication Blocks Rack2 Rack1 Write Client

Google MapReduce • Functional Programming Model + Runtime for processing large data sets • Users specify • Map function that processes key/value pair to generate intermediate key/value pairs • Reduce function that merges intermediate values with same intermediate key • The MapReduce Runtime • Automatically parallelizes and executes on a large cluster of commodity machines • Takes care of partitioning the input data • Scheduling the program execution • Handling machine failures

Google MapReduce GFS GFS GFS

MapReduce Example Table1 SELECT Year, MAX(Temperature) FROM Table1 WHERE AirQuality = 2 GROUPBY Year

MapReduce Example cont. Selection+ Projection Aggregation (MAX) (1998, 87, 2, …) (1998, 87) 87 94 1998, 84 87 78 (1998, 94) GFS GFS GFS

MapReduce: Dataflow • Map task • Assigned an input block from GFS • For each record: map(record) -> <key, value> then hash the key to an output partition • Sort the final output by <partition, key> • Commit the final output • Reduce task • Assigned a particular partition • Wait for notification of map output commit record • Acquire its partition of the map output • Perform a final merge sort after fetching all map output partitions • Group records by key and call reduce(key, list<values>) -> <key, value> • Write final output to GSF • Fault tolerance is simple: simply re-run tasks on failure • No consumers see partial operator output

MapReduce Dataflow Submit job master map reduce schedule map reduce

MapReduce Dataflow master Read Input File HDFS map reduce Block 1 HDFS map reduce Block 2

Dataflow in Hadoop master map reduce Local FS HTTP GET map reduce Local FS

Dataflow in Hadoop master Write Final Answer HDFS map reduce HDFS map reduce

Example Formation: Relational Algebra at Scale Feature Extraction Large Scale Join REDUCE EMail ID Bag of Words Example MAP ID Bag of Words Label Click Log ID Label Label Extraction

Example Formation: MapReduce Made easy with: I I I W W W Email 0 JOIN Email 1 I L CLkLog 0 L

Machine Learning Workflow Step I: Example Formation Feature and Label Extraction Step II: Modeling Step III: Evaluation Example Formation Modeling Evaluation

Modeling (30,000ft) Learning is Iterative There isn’t one computational model (yet) Statistical Query Model: Algorithm operates on statistics of the dataset Graphical Models: Heavy message passing, possibly asynchronous. Many more: Custom solutions

Machine Learning Workflow Cumbersome & Not scalable Step I: Example Formation Feature and Label Extraction Step II: Modeling Step III: Evaluation Copy Model Sample Features Example Formation Modeling Evaluation

Distributed Learning on Hadoop Machine Learning in MapReduce? + MapReduce model fits statistical query model learning - Hadoop MR does not support iterations (30x slowdown compared to others) - Hadoop MR does not match other forms of algorithm “Solution”: Map only jobs Allocate a set of map tasks Instantiate learning algorithm Execute iterative algorithm until convergence Release mappers  Hadoop Abuse

Hadoop Abusers 1: (All)Reduce and friends Decision Trees on HadoopJerry Ye et al. Runs OpenMPI on a map only job Uses HDFS for coordination/bootstrap VowpalWabbitJohn Langford et al. AllReduce on a map-only job Uses custom server for coordination/bootstrap Constructs a binary aggregation tree Optimizes node selection via Hadoop speculative execution Worker/Aggregator AbstractionMarkus Weimer and Sriram Rao Iterative Map-Reduce-Update on a map-only job Uses Zookeeper for coordination/bootstrap Custom communication between workers and aggregator Statistics Model Updates P… Pn P2 P1

Hadoop Abusers 2: The Graphical View Data and Model Partitions Apache GiraphAvery Chen et al. Open source implementation of Pregel on a map-only job Uses Zookeeper for coordination/bootstrap Graph computation using “think like a vertex” UDFs Executes BSP message passing algorithm Yahoo! LDA (YLDA)Alex Smola and Shravan Narayanamurthy Instantiate Graphical Model on a map-only job Uses HDFS for coordination/bootstrap Coordinate global parameters via shared memory Version 1: memcached. Version 2: ICE Statistics / Updates P1 M1 P2 M2 P… M… Pn Mn

Problems with this Approach Problems for the Abusers Fault Tolerance Mismatch Hadoop’s fault tolerance actually gets in the way Resource Model Mismatch Hadoop’s resource selection often suboptimal for the job at hand (data local vs. communication) Cumbersome integration with M/R Every Abuser has to implement … Networking Cluster Membership Bulk data transfers Problems for the Cluster Abusers Violate MapReduce assumptions Network usage bursts in (All)Reduce Local disk use in VW The Abusers are disrespectful of other users e.g. production workflows Hoarding of resources (even worse as Hadoop does not support preemption)

Resource Managers to the Rescue

Hadoop v1 Reduce Task Reduce Task Map Task HDFS Reduce Slot Reduce Slot Map Slot Map Slot Map Slot Map Slot Reduce Slot Reduce Slot MapReduce Scheduler Worker Worker Client Map Task MapReduce Status and Scheduling Reduce Slot Map Slot Map Slot Reduce Slot Resource Manager Job Submission HDFS Worker Client Master Map Task HDFS

YARN: Hadoop v2 Resource Pool HDFS Node Manager Resource Manager Status Resource Availability Resource Pool Node Manager HDFS Resource Pool Node Manager HDFS

YARN: Hadoop v2 E.g., MapReduce Scheduler Resource Allocation = list of (node type, count, resource) E.g. { (node1, 1, 1GB), (rack-1, 2, 1GB),(*, 1, 2GB) } App Master Resource Allocation Resource Pool HDFS Node Manager Resource Manager Job Submission Client Resource Pool Node Manager HDFS Resource Pool Node Manager HDFS

YARN: Hadoop v2 E.g., MapReduce Scheduler E.g., Map Task App Master Container Resource Pool HDFS Node Manager E.g., Map Task E.g., Reduce Task Container Container Resource Manager Monitor Job Client Resource Pool Node Manager HDFS E.g., Reduce Task Container Resource Pool Node Manager HDFS

YARN: Hadoop v2 Status App Master Container Resource Pool HDFS Node Manager Container Container Resource Manager Monitor Job Client Resource Pool Node Manager HDFS Container Resource Pool Node Manager HDFS

YARN: Hadoop v2 App Master Container Resource Pool Release Resources HDFS Node Manager Container Container Resource Manager Monitor Job Client Resource Pool Node Manager HDFS Container Resource Pool Node Manager HDFS

YARN: Hadoop v2 Resource Pool HDFS Node Manager Resource Manager Job Completion Notification Client Resource Pool Node Manager HDFS Resource Pool Node Manager HDFS

YARN: A step in the right direction Disentangles resource allocation from the computational model YARN manages the cluster resource allocations Application masters manage computation on allocated resources Low-level API Containers are empty processes (with resource limits) No assumed higher-level semantics

REEF: The next layer

The Challenge SQL / Hive … … Machine Learning !?!?!?! YARN / HDFS

The Challenge SQL / Hive … … Machine Learning • Fault Tolerance • Row/Column Storage Formats • High-throughput Networking YARN / HDFS

The Challenge SQL / Hive … … Machine Learning • Fault Awareness • Big Vector Storage Formats • Low Latency Networking YARN / HDFS

The Challenge SQL / Hive … … Machine Learning YARN / HDFS

REEF in the Stack SQL / Hive … … Machine Learning REEF YARN / HDFS

REEF in the Stack (Future) SQL / Hive … … Machine Learning Logical Abstraction Operator API and Library REEF YARN / HDFS

REEF: Computation and Data Management Extensible Control Flow Data Management Services Storage Abstractions: Map and Spool Local and Remote Network Message passing Bulk Transfers Collective Communications State Management Fault Tolerance Checkpointing Activity User code executed within an Evaluator. Execution Environment for Activities. One Evaluator is bound to one YARN Container. Evaluator Job Driver Control plane implementation. User code executed on YARN’s Application Master

Conclusion • Open source release in April • Apache 2 license • MapReduce support(including Hive) • Machine learning libraries coming in June • Iterative Map-Reduce-Update • MPI (Graphical Models) • Mahout compatibility • Contact: Tyson Condie • tcondie@microsoft.com • tcondie@cs.ucla.edu

Big Data Learning Systems

Big Data Learning Systems

Presentation Transcript

Big Data

Building HPC Big Data Systems

Big Data Meets Learning Analytics

Big data and Activity-based Learning

Benchmarking Datacenter and Big Data Systems

Big Data

Big Data

Learning Objectives for Big Data

Learning Analytics, Big Data, and Knowledge Evaluation Systems

HPCC Systems for Big Data Analytics

Big Data

Data Acquisition Systems for Big Science

Computer Systems and Big Data Analysis

Big Data

Big Data

Big Data Training | Big Data Courses | Big Data Online Courses

Big Data Big Data

Data Acquisition Systems for Big Science

Deep Learning for Big Data