Nfsen + Hadoop

Nfsen + Hadoop Vytautas Krakauskas LITNET CERT Swedbank SIRT

Problems • Limited storage capacity • Large data set processing time

Storage capacity • Steadily increasing network traffic • Up to six months of history for incident handling • I/O is the major bottleneck

Processing time • Currently no SMP support in nfdump • Important if I/O bottleneck is resolved

Processing with Nfdump

Distributed processing

The idea • Distribute nfcap files between multiple nodes • Process the files using nfdump • Combine the output and return to nfsen • Nfsen and nfdump usage should feel the same

1. File distribution • nfcapd stores files on a temporary file system • due to "random" write of stat header • copy to HDFS at the end of each interval • bonus: limited backup while system is being tested • Redundant copies on multiple nodes • higher redundancy for faster processing and better reliability • lower redundancy for larger storage capacity

Modified architecture

2. Processing • Process using nfdump • I/O through stdin/stdout • Each node works only with locally stored files • Currently based on the first block • Aggregate when possible based on: • stats type, aggregation options, filters • Copy the results back to the HDFS for the combiner

3. Combining • Combine the results as a single stream • a custom tool (nfcat) • some information is lost (e.g. ident) • nfdump does the final processing • single instance (a bottleneck) • Displays the results

Modified architecture

Comparison • Limited to nfdump • Additional delays when using nfsen • Original • single nfdump instance • files on a local file system • Distributed • Two nodes • processes per node: 2 • HDFS replication factor: 2

Comparison • Top10 IPs, ordered by flows • 1-18 files (5-90 minute period) • Filter “proto icmp”

Comparison

Conclusions • Overhead has a significant impact for short periods • Initialization • Job scheduling • Combining and re-processing • Limited speed gains due to aggregation • Filtering is essential for achieving good speed gains • Still needs some issues to be addressed

Thank you!

The code • https://github.com/vytautas/nfdist • Patches (nfdist branch) • https://github.com/vytautas/nfdump • https://github.com/vytautas/nfsen

Comparison: bad case

Nfsen + Hadoop

Nfsen + Hadoop

Presentation Transcript

NFSEN: How is My Network Being Used (Apps)?