1 / 6

Motivation

Motivation. Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication operations MPI contains abundant and highly-optimized collective communication operations but is limited on data abstractions

karsen
Download Presentation

Motivation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motivation • Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication operations • MPI contains abundant and highly-optimized collective communication operations but is limited on data abstractions • To improve the expressiveness and performance in big data processing… • We introduce Harp library, which provides data abstractions and related communication abstractions and transform map-reduce programming model to map-collecitve model.

  2. Features • Hadoop Plugin (on Hadoop 1.2.1 and Hadoop 2.2.0) • Hierarchical data abstraction on arrays, key-values and graphs for easy programming expressiveness. • Collective communication model to support various communication operations on the data abstractions. • Caching with buffer management for memory allocation required from computation and communication • BSP style parallelism • Fault tolerance with check-pointing

  3. Architecture MapReduce Applications Map-Collective Applications Application MapReduce V2 Harp Framework YARN Resource Manager

  4. Parallelism Model MapReduce Model Map-Collective Model M M M M M M M M Shuffle Collective Communication R R

  5. Broadcast, Allgather, Allreduce, Regroup-(combine/reduce), Message-to-Vertex, Edge-to-Vertex Hierarchical Data Abstraction and Collective Communication Vertex Table Key-Value Table Message Table Edge Table Array Table <Array Type> Table Message Partition Vertex Partition Key-Value Partition Array Partition< Array Type > Edge Partition Partition Broadcast, Send Long Array Double Array Int Array Vertices, Edges, Messages Byte Array Key-Values Array Struct Object Basic Types Broadcast, Send, Gather Commutable

  6. Performance on Madrid Cluster (8 nodes)

More Related