100 likes | 215 Views
Parallel Applications And Tools For Cloud Computing Environments. CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010. Large Scale PageRank with Iterative MapReduce. Shuohuan,Yuduo,Parag,Hui. Outline. m otivation of large scale pagerank o ptimization s trategies
E N D
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010
Large Scale PageRank with Iterative MapReduce Shuohuan,Yuduo,Parag,Hui
Outline motivation of large scale pagerank optimization strategies experiments results visualization with PlotViz3
PageRank • Large scale PageRank • Large graph processing become popular • Efficient processing of large scale graph challenges current MapReduce runtimes. • Motivation: common optimization strategies for large scale PageRank • Current status • Twister, Hadoop,DryadLINQ with ClueWeb data set with 50 million pages • MPI PageRank
Optimization Strategies • Cache partitions of web graph in Memory • Twister, Pregel, HaLoop, Surfer, • Static Data (am files) • Partition the web graph • DryadLINQ, (Twister, Hadoop) PageRank • Task granularity should fit the memory and network bandwidth in Cloud infrastructure • Hierarchy messaging in reduce stage • Hadoop, (Twister, DryadLINQ) PageRank • Local merge
Partition the WebGraphscalability with various nodes on Madrid
Partition the web graphscalability with various input data size on Tempest
Visualization with PlotViz31k vertices, red vertex: wikipedia.org