390 likes | 534 Views
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining. Wei Jiang and Gagan Agrawal. Outline. Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion. Outline. Background
E N D
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal
Outline • Background • System Design of Ex-MATE • Parallel Graph Mining with Ex-MATE • Experiments • Related Work • Conclusion
Outline • Background • System Design of Ex-MATE • Parallel Graph Mining with Ex-MATE • Experiments • Related Work • Conclusion
Background (I) • Map-Reduce • Simple API : map and reduce • Easy to write parallel programs • Fault-tolerant for large-scale data centers • Performance? • Always a concern for HPC community • Generalized Reduction • First proposed in FREERIDE that was developed at Ohio State 2001-2003 • Shared a similar processing structure • The key difference lies in a programmer-managed reduction-object • Better performance?
Comparing Processing Structures • Reduction Objectrepresents the intermediate state of the execution • Reduce func. is commutative and associative • Sorting, grouping.. .overheads are eliminated with red. func/obj.
Our Previous Work • A comparative study between FREERIDE and Hadoop: • FREERIDE outperformed Hadoop with factors of 5 to 10 • Possible reasons: • Java VS C++? HDFS overheads? Inefficiency of Hadoop? • API difference? • Developed MATE (Map-Reduce system with an AlternaTE API) on top of Phoenix from Stanford • Adopted Generalized Reduction • Focused on API differences • MATE improved Phoenix with an average of 50% • Avoids large set of intermediate pairs between Map & Reduce • Reduces memory requirements
Extending MATE • Main issues of the original MATE: • Only works on a single multi-core machine • Datasets should reside in memory • Assumes the reduction object MUST fit in memory • This paper extended MATE to address these limitations • Focus on graph mining: an emerging class of apps • Require large-sized reduction objects as well as large-scale datasets • E.g., PageRank could have a 8GB reduction object! • Support of managing arbitrary-sized reduction objects • Also reading disk-resident input data • Evaluated Ex-MATE using PEGASUS • PEGASUS: A Hadoop-based graph mining system
Outline • Background • System Design of Ex-MATE • Parallel Graph Mining with Ex-MATE • Experiments • Related Work • Conclusion
System Design and Implementation System design of Ex-MATE Execution overview Support of distributed environments System APIs in Ex-MATE One set provided by the runtime operations on reduction objects Another set defined or customized by the users reduction, combination, etc.. Runtime in Ex-MATE Data partitioning Task scheduling Other low-level details
Ex-MATE Runtime Overview Basic one-stage execution
Implementation Considerations Support for processing very large datasets Partitioning function: Partition and distribute to a number of nodes Splitting function: Use the multi-core CPU on each node Management of a large reduction-object (R.O.): Reduce disk I/O! Outputs (R.O.) are updated in a demand-driven way Partition the reduction object into splits Inputs are re-organized based on data access patterns Reuse a R.O. split as much as possible in memory Example: Matrix-Vector Multiplication
A MV-Multiplication Example Input Vector (1, 1) (1, 2) Output Vector Input Matrix (2, 1)
Outline • Background • System Design of Ex-MATE • Parallel Graph Mining with Ex-MATE • Experiments • Related Work • Conclusion
GIM-V for Graph Mining (I) • Generalized Iterative Matrix-Vector Multiplication(GIM-V) • Proposed at CMU at first • Similar to the common MV Multiplication • MV Mul. : • Three operations in • GIM-V: • combinem(i, j) and v(j) : • Not have to be a multiplication • combineAlln partial results for the element i : • Not have to be the sum • assignv(new) to v(i) : • The previous value of v(i) is updated by a new value Multiplication Sum Assignment
GIM-V for Graph Mining (II) • A set of graph mining applications can fit into this GIM-V • PageRank, Diameter Estimation, Finding Connected Components, Random Walk with Restart, etc.. • Parallelization of GIM-V: • Use Map-Reduce in PEGASUS • A two-stage algorithm: two consecutive map-reduce jobs • Use Generalized Reduction in Ex-MATE • A one-stage algorithm: simpler code
GIM-V Example: PageRank • PageRank is used by Google to calculate the relative importance of web-pages: • Direct implementation of GIM-V: v(j) is the ranking value • The three customized operations are: Multiplication Sum Assignment
GIM-V: Other Algorithms • Diameter Estimation: HADI is an algorithm to estimate the diameter of a given graph • The three customized operations are: • Finding Connected Components: HCC is a new algorithm to find the connected components of large graphs • The three customized operations are: Multiplication Bitwise-or Bitwise-or Multiplication Minimal Minimal
Parallelization of GIM-V (I) • Using Map-Reduce: Stage I • Map: Map M(i,j) and V(j) to reducer j
Parallelization of GIM-V (II) • Using Map-Reduce: Stage I (cont.) • Reduce: Map “combine2(M(i,j) , V(j)) “ to reducer i
Parallelization of GIM-V (III) • Using Map-Reduce: Stage II • Map:
Parallelization of GIM-V (IV) • Using Map-Reduce: Stage II (cont.) • Reduce:
Parallelization of GIM-V (V) • Using Generalized Reduction in Ex-MATE: • Reduction:
Parallelization of GIM-V (VI) • Using Generalized Reduction in Ex-MATE: • Finalize:
Outline • Background • System Design of Ex-MATE • Parallel Graph Mining with Ex-MATE • Experiments • Related Work • Conclusion
Experiments Design • Applications: • Three graph mining algorithms: • PageRank, Diameter Estimation, and Finding Connected Components • Evaluation: • Performance comparison with PEGASUS • PEGASUS provides a naïve version and an optimized version • Speedups with an increasing number of nodes • Scalability speedups with an increasing size of datasets • Experimental platform: • A cluster of multi-core CPU machines • Used up to 128 cores (16 nodes) 26 September 17, 2014
Results: Graph Mining (I) PageRank: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) 10.0 speedup # of nodes
Results: Graph Mining (II) HADI: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) 11.0 speedup # of nodes
Results: Graph Mining (III) HCC: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) 9.0 speedup # of nodes
Scalability: Graph Mining (IV) HCC: 8GB dataset; a graph of 256 million nodes and 0.5 billion edges Avg. Time Per Iteration (min) 1.7 speedup 1.9 speedup # of nodes
Scalability: Graph Mining (V) HCC: 32GB dataset; a graph of 256 million nodes and 2 billion edges Avg. Time Per Iteration (min) 1.9 speedup 2.7 speedup # of nodes
Scalability: Graph Mining (VI) HCC: 64GB dataset; a graph of 256 million nodes and 4 billion edges Avg. Time Per Iteration (min) 1.9 speedup 2.8 speedup # of nodes
Observations • Performance trends are similar for all three applications • Consistent with the fact that all three applications are implemented using the GIM-V method • Ex-MATE outperforms PEGASUS significantly for all three graph mining algorithms • Reasonable speedups for different datasets • Better scalability for larger datasets with a increasing number of nodes 33 September 17, 2014
Outline • Background • System Design of Ex-MATE • Parallel Graph Mining with Ex-MATE • Experiments • Related Work • Conclusion
Related Work: Academia • Evaluation of Map-Reduce-like models in various parallel programming environments: • Phoenix-rebirth for large-scale multi-core machines • Mars for a single GPU • MITHRA for GPGPUs in heterogeneous platforms • Recent IDAV for GPU clusters • Improvement of Map-Reduce API: • Integrating pre-fetch and pre-shuffling into Hadoop • Supporting online queries • Enforcing a less restrictive synchronization semantics between Map and Reduce 35 September 17, 2014
Related Work: Industry • Google’s Pregel System: • Map-reduce may not so suitable for graph operations • Proposed to target graph processing • Open source version: HAMA project in Apache • Variants of Map-Reduce: • Dryad/DryadLINQ from Microsoft • Sawzall from Google • Pig/Map-Reduce-Merge from Yahoo! • Hive from Facebook 36 September 17, 2014
Outline • Background • System Design of Ex-MATE • Parallel Graph Mining with Ex-MATE • Experiments • Related Work • Conclusion
Conclusion Ex-MATE supports the management of reduction objects of arbitrary sizes Deals with disk-resident reduction objects Outperforms PEGASUS for both the naïve and optimized implementations for all three graph mining application Has a simpler code Offers a promising alternative for developing efficient data-intensive applications, Uses GIM-V for parallelizing graph mining
Thank You, and Acknowledgments • Questions and comments • Wei Jiang - jiangwei@cse.ohio-state.edu • Gagan Agrawal - agrawal@cse.ohio-state.edu • This project was supported by: