220 likes | 426 Views
Analyzing Yellowstone’s Network with a Raspberry Pi Cluster . Lauren Patterson. Objective of the Project. Using a low cost Raspberry Pi cluster to find the interconnect path between two nodes on Yellowstone in order to analyze the performance of jobs. Assembling the Raspberry Pi cluster.
E N D
Analyzing Yellowstone’s Network with a Raspberry Pi Cluster Lauren Patterson
Objective of the Project Using a low cost Raspberry Pi cluster to find the interconnect path between two nodes on Yellowstone in order to analyze the performance of jobs.
Yellowstone Interconnect Credit: Siddhartha Ghosh
Files Used • job1_nodes.txt • Gives the job ID and nodes used • ibnetdiscover.log (Discover File) • Lists connections between switches • LFTS.txt • Routing table for each switch
What is Hadoop? • HDFS • MapReduce
HDFS Name Node Job Tracker Task Tracker Map/Reduce Data Node Task Tracker Map/Reduce Data Node Task Tracker Map/Reduce Data Node Task Tracker Map/Reduce
MapReduce Shuffle phase Input Data Map Phase Reduce phase Output Data
Pig • Apache Pig • Pig Latin • Grunt
Pig Latin Script • Created Pig Latin Script to find the path between two nodes in Yellowstone http://www.edureka.in/blog/pig-programming-create-your-first-apache-pig-script/
JOIN Operations in PIG Default, Inner Join returns intersection of A and B Full Join Set A A B Set B Right U Left Full, Right and Left Outer Joins return A and B with different parts nulled out (white)
Results ±82 ±19 ±15 ±3 ±4 ±3
Python • Single Path Python • Parallel Python • Mpi4py 1.3.1
±0.11 ±0.11 ±0.07 ±0.004 ±0.02 ±0.006
±20 ±7 ±4 ±2 ±1 ±18 ±0.5 ±2 ±4
What Do All Of These Have In Common? • Raspberry Pi • Hadoop • Pig • Python
Acknowledgments Richard Loft Karina Hauser Stephanie Barr Bruce Chittenden Amogh Simha Raghu Raj Prasanna Kumar