410 likes | 541 Views
Big Data Learning Systems. Cloud and Information Services Lab, Microsoft. Machine Learning Workflow. Step I: Example Formation Feature and Label Extraction Step II: Modeling Step III: Evaluation (and eventually Deployment). Examples. Model. Example Formation. Modeling. Evaluation.
E N D
Big Data Learning Systems Cloud and Information Services Lab, Microsoft
Machine Learning Workflow Step I: Example Formation Feature and Label Extraction Step II: Modeling Step III: Evaluation (and eventually Deployment) Examples Model Example Formation Modeling Evaluation
Example Formation: Spam Filter Feature Extraction Large Scale Join Large Scale Join EMail ID Bag of Words Example Data Parallel Functions ID Bag of Words Label Click Log ID Label Label Extraction
Google File System: GFS • Scalable Distributed File System • For large distributed data-intensive applications • Key Design goals • Component failures are the norm rather than the exception • Requires: constant monitoring, error detection, fault-tolerance, and automatic recovery • Files are HUGE by traditional standards • Multi-GB files are common • Co-design of applications and file system API benefits overall system • e.g., relaxed consistency, atomic append • Most files are mutated by appending new data • Random writes within a file a practically non-existent • Once written, files are only read, and often only sequentially “Given this access pattern on huge files, appending becomes the focus of performance optimization and atomicity guarantees, while caching data blocks in the client loses its appeal.”
HDFS Architecture Namenode Metadata(Name, replicas..) (/home/foo/data,6. .. Metadata ops Client Block ops Read Datanodes Datanodes B replication Blocks Rack2 Rack1 Write Client
Google MapReduce • Functional Programming Model + Runtime for processing large data sets • Users specify • Map function that processes key/value pair to generate intermediate key/value pairs • Reduce function that merges intermediate values with same intermediate key • The MapReduce Runtime • Automatically parallelizes and executes on a large cluster of commodity machines • Takes care of partitioning the input data • Scheduling the program execution • Handling machine failures
Google MapReduce GFS GFS GFS
MapReduce Example Table1 SELECT Year, MAX(Temperature) FROM Table1 WHERE AirQuality = 2 GROUPBY Year
MapReduce Example cont. Selection+ Projection Aggregation (MAX) (1998, 87, 2, …) (1998, 87) 87 94 1998, 84 87 78 (1998, 94) GFS GFS GFS
MapReduce: Dataflow • Map task • Assigned an input block from GFS • For each record: map(record) -> <key, value> then hash the key to an output partition • Sort the final output by <partition, key> • Commit the final output • Reduce task • Assigned a particular partition • Wait for notification of map output commit record • Acquire its partition of the map output • Perform a final merge sort after fetching all map output partitions • Group records by key and call reduce(key, list<values>) -> <key, value> • Write final output to GSF • Fault tolerance is simple: simply re-run tasks on failure • No consumers see partial operator output
MapReduce Dataflow Submit job master map reduce schedule map reduce
MapReduce Dataflow master Read Input File HDFS map reduce Block 1 HDFS map reduce Block 2
Dataflow in Hadoop master map reduce Local FS HTTP GET map reduce Local FS
Dataflow in Hadoop master Write Final Answer HDFS map reduce HDFS map reduce
Example Formation: Relational Algebra at Scale Feature Extraction Large Scale Join REDUCE EMail ID Bag of Words Example MAP ID Bag of Words Label Click Log ID Label Label Extraction
Example Formation: MapReduce Made easy with: I I I W W W Email 0 JOIN Email 1 I L CLkLog 0 L
Machine Learning Workflow Step I: Example Formation Feature and Label Extraction Step II: Modeling Step III: Evaluation Example Formation Modeling Evaluation
Modeling (30,000ft) Learning is Iterative There isn’t one computational model (yet) Statistical Query Model: Algorithm operates on statistics of the dataset Graphical Models: Heavy message passing, possibly asynchronous. Many more: Custom solutions
Machine Learning Workflow Cumbersome & Not scalable Step I: Example Formation Feature and Label Extraction Step II: Modeling Step III: Evaluation Copy Model Sample Features Example Formation Modeling Evaluation
Distributed Learning on Hadoop Machine Learning in MapReduce? + MapReduce model fits statistical query model learning - Hadoop MR does not support iterations (30x slowdown compared to others) - Hadoop MR does not match other forms of algorithm “Solution”: Map only jobs Allocate a set of map tasks Instantiate learning algorithm Execute iterative algorithm until convergence Release mappers Hadoop Abuse
Hadoop Abusers 1: (All)Reduce and friends Decision Trees on HadoopJerry Ye et al. Runs OpenMPI on a map only job Uses HDFS for coordination/bootstrap VowpalWabbitJohn Langford et al. AllReduce on a map-only job Uses custom server for coordination/bootstrap Constructs a binary aggregation tree Optimizes node selection via Hadoop speculative execution Worker/Aggregator AbstractionMarkus Weimer and Sriram Rao Iterative Map-Reduce-Update on a map-only job Uses Zookeeper for coordination/bootstrap Custom communication between workers and aggregator Statistics Model Updates P… Pn P2 P1
Hadoop Abusers 2: The Graphical View Data and Model Partitions Apache GiraphAvery Chen et al. Open source implementation of Pregel on a map-only job Uses Zookeeper for coordination/bootstrap Graph computation using “think like a vertex” UDFs Executes BSP message passing algorithm Yahoo! LDA (YLDA)Alex Smola and Shravan Narayanamurthy Instantiate Graphical Model on a map-only job Uses HDFS for coordination/bootstrap Coordinate global parameters via shared memory Version 1: memcached. Version 2: ICE Statistics / Updates P1 M1 P2 M2 P… M… Pn Mn
Problems with this Approach Problems for the Abusers Fault Tolerance Mismatch Hadoop’s fault tolerance actually gets in the way Resource Model Mismatch Hadoop’s resource selection often suboptimal for the job at hand (data local vs. communication) Cumbersome integration with M/R Every Abuser has to implement … Networking Cluster Membership Bulk data transfers Problems for the Cluster Abusers Violate MapReduce assumptions Network usage bursts in (All)Reduce Local disk use in VW The Abusers are disrespectful of other users e.g. production workflows Hoarding of resources (even worse as Hadoop does not support preemption)
Hadoop v1 Reduce Task Reduce Task Map Task HDFS Reduce Slot Reduce Slot Map Slot Map Slot Map Slot Map Slot Reduce Slot Reduce Slot MapReduce Scheduler Worker Worker Client Map Task MapReduce Status and Scheduling Reduce Slot Map Slot Map Slot Reduce Slot Resource Manager Job Submission HDFS Worker Client Master Map Task HDFS
YARN: Hadoop v2 Resource Pool HDFS Node Manager Resource Manager Status Resource Availability Resource Pool Node Manager HDFS Resource Pool Node Manager HDFS
YARN: Hadoop v2 E.g., MapReduce Scheduler Resource Allocation = list of (node type, count, resource) E.g. { (node1, 1, 1GB), (rack-1, 2, 1GB),(*, 1, 2GB) } App Master Resource Allocation Resource Pool HDFS Node Manager Resource Manager Job Submission Client Resource Pool Node Manager HDFS Resource Pool Node Manager HDFS
YARN: Hadoop v2 E.g., MapReduce Scheduler E.g., Map Task App Master Container Resource Pool HDFS Node Manager E.g., Map Task E.g., Reduce Task Container Container Resource Manager Monitor Job Client Resource Pool Node Manager HDFS E.g., Reduce Task Container Resource Pool Node Manager HDFS
YARN: Hadoop v2 Status App Master Container Resource Pool HDFS Node Manager Container Container Resource Manager Monitor Job Client Resource Pool Node Manager HDFS Container Resource Pool Node Manager HDFS
YARN: Hadoop v2 App Master Container Resource Pool Release Resources HDFS Node Manager Container Container Resource Manager Monitor Job Client Resource Pool Node Manager HDFS Container Resource Pool Node Manager HDFS
YARN: Hadoop v2 Resource Pool HDFS Node Manager Resource Manager Job Completion Notification Client Resource Pool Node Manager HDFS Resource Pool Node Manager HDFS
YARN: A step in the right direction Disentangles resource allocation from the computational model YARN manages the cluster resource allocations Application masters manage computation on allocated resources Low-level API Containers are empty processes (with resource limits) No assumed higher-level semantics
The Challenge SQL / Hive … … Machine Learning !?!?!?! YARN / HDFS
The Challenge SQL / Hive … … Machine Learning • Fault Tolerance • Row/Column Storage Formats • High-throughput Networking YARN / HDFS
The Challenge SQL / Hive … … Machine Learning • Fault Awareness • Big Vector Storage Formats • Low Latency Networking YARN / HDFS
The Challenge SQL / Hive … … Machine Learning YARN / HDFS
REEF in the Stack SQL / Hive … … Machine Learning REEF YARN / HDFS
REEF in the Stack (Future) SQL / Hive … … Machine Learning Logical Abstraction Operator API and Library REEF YARN / HDFS
REEF: Computation and Data Management Extensible Control Flow Data Management Services Storage Abstractions: Map and Spool Local and Remote Network Message passing Bulk Transfers Collective Communications State Management Fault Tolerance Checkpointing Activity User code executed within an Evaluator. Execution Environment for Activities. One Evaluator is bound to one YARN Container. Evaluator Job Driver Control plane implementation. User code executed on YARN’s Application Master
Conclusion • Open source release in April • Apache 2 license • MapReduce support(including Hive) • Machine learning libraries coming in June • Iterative Map-Reduce-Update • MPI (Graphical Models) • Mahout compatibility • Contact: Tyson Condie • tcondie@microsoft.com • tcondie@cs.ucla.edu