Developing a MapReduce Application – packet dissection

Developing a MapReduce Application – packet dissection

How MapReduce Works? • MapReduce application is based on Hadoop Distributed FileSystem, HDFS. • Steps of the MapReduce process: • The client submits the job. • JobTracker coordinates the job and splits into tasks. • TaskTrackers run the tasks, here is the main map and reduce phases.

Shuffle and Sort • These two facilities are the heart of MapReduce which make Cloud Computing powerful. • Sort phase: guarantees the input to every reduce is sorted by key. • Shuffle phase: transfers the map output to the reducers as input.

A MapReduce Application – packet dissection • With Jpcap library, captures packets and writes to HDFS directly.

A MapReduce Application – packet dissection • Setup a job configuration and submit the job.

A MapReduce Application – packet dissection • The mapper filters packets with the port 1863, which is the MSN protocol.

A MapReduce Application – packet dissection • The reducer dissect the packet, and write message to output collector.

A MapReduce Application – packet dissection • See the result:

Developing a MapReduce Application – packet dissection