340 likes | 605 Views
Threat Connect : a visualized cyber-threats entity reporting system backed with H adoop ecosystem. Scott Miao, Trend Micro s cott_miao@trend.com.tw @ takeshi.miao. Who am I. RD, SPN, Trend Micro 3+ years for Hadoop eco system Expertise in HDFS/MR/ HBase @ takeshi.miao. Agenda.
E N D
Threat Connect : a visualized cyber-threats entity reporting system backed with Hadoopecosystem Scott Miao, Trend Micro scott_miao@trend.com.tw @takeshi.miao
Who am I • RD, SPN, Trend Micro • 3+ years for Hadoop eco system • Expertise in HDFS/MR/HBase • @takeshi.miao
Agenda • Threat intelligence problem • Challenges and Solutions • Summary
“I want to quickly get an overview of the incident, including itsscope, timeline, and impact.” Threat intelligence problem
Threat Connect • A Web Service for Threat Information Report • RESTful Interface to access • Integrated with TM Deep Discovery products • Relevantand ActionableIntelligence
Process and correlates different data sources … IP, domain, URL, filename, process, file hash, Virus detection, registry key, etc. Product 1 Product 2 Product 3 Most relevant threat report with actionable intelligenceon a single portal
Graph Problem Process & Correlate Moving Big Data Storing Real Time Access Pick Your right tool
Accumulate small files FBS FBS FBS Event Logs Hadoop Feed Back log Service Dear users/services
Time • Batch • Performance • Store • Pig/MR • HDFS • Hbase • Solr • RDB • UDFs • MRs for special cases
Free form search • Solr Cloud • Real Time Access • EX. Sandbox Reports • Random Access • HBase • EX. Threat Detection DBs
Active community ? Massive scalable ? Analyzable ?
We use HBase as a Graph Storage • Google BigTableand PageRank • HBaseCon2012
HGraph https://github.com/tinkerpop/blueprints/wiki
Pick right tool for right usecases • Silver bullet ? • No one project fits all • One problem may has several choices http://www.neevtech.com/blog/2013/03/18/hadoop-ecosystem-at-a-glance/
Small files • Namenodefsimage would explore the memory • Too many map tasks to run for a job FBS FBS FBS
Store your data anyway • Store all the raw data on the HDFS • Break invisible isolation from different data sources • Archive your data with deduced easy to use FileFormat • Trenvi, RC file, ORC file
Know MR more • Even you are the pig developer • Deal with MR issues • Write better pig-latin • Sometimes you can only use MR
Know your data & usecases • Realtime ? Batch ? • Access Pattern ? • Therefore, you can pick right tool