60 likes | 207 Views
Motivation. Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication operations MPI contains abundant and highly-optimized collective communication operations but is limited on data abstractions
E N D
Motivation • Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication operations • MPI contains abundant and highly-optimized collective communication operations but is limited on data abstractions • To improve the expressiveness and performance in big data processing… • We introduce Harp library, which provides data abstractions and related communication abstractions and transform map-reduce programming model to map-collecitve model.
Features • Hadoop Plugin (on Hadoop 1.2.1 and Hadoop 2.2.0) • Hierarchical data abstraction on arrays, key-values and graphs for easy programming expressiveness. • Collective communication model to support various communication operations on the data abstractions. • Caching with buffer management for memory allocation required from computation and communication • BSP style parallelism • Fault tolerance with check-pointing
Architecture MapReduce Applications Map-Collective Applications Application MapReduce V2 Harp Framework YARN Resource Manager
Parallelism Model MapReduce Model Map-Collective Model M M M M M M M M Shuffle Collective Communication R R
Broadcast, Allgather, Allreduce, Regroup-(combine/reduce), Message-to-Vertex, Edge-to-Vertex Hierarchical Data Abstraction and Collective Communication Vertex Table Key-Value Table Message Table Edge Table Array Table <Array Type> Table Message Partition Vertex Partition Key-Value Partition Array Partition< Array Type > Edge Partition Partition Broadcast, Send Long Array Double Array Int Array Vertices, Edges, Messages Byte Array Key-Values Array Struct Object Basic Types Broadcast, Send, Gather Commutable