220 likes | 360 Views
Big Data Technologies for InfoSec. Dive Deeper. See Further . Ram Sripracha ( rsriprac@ucla.edu ) UCLA / Sift Security. Experiences. RR Systems. What are “Big Data” systems?. XXL in Size Data Volume TBs - PBs Computation Scalability Horizontally Scalable Multi-host Deployment
E N D
Big Data Technologies for InfoSec Dive Deeper. See Further. Ram Sripracha (rsriprac@ucla.edu) UCLA / Sift Security
Experiences RR Systems
What are “Big Data” systems? • XXL in Size • Data Volume • TBs - PBs • Computation Scalability • Horizontally Scalable • Multi-host Deployment • Commodity Hardware
Why now? • Rich Ecosystem • Well Supported Open Source Software • High Adoption Rate • Commercial Backings • “Redhat” Model • Heavily Invested
Is it a “Big Data” problem? • Many moving parts • Initially maybe overwhelming • 100s of configuration setting • Requests some level of expertise • Overkill for some problems • Larger resource footprint
NoSQL • Columnar • Sits on HDFS • Million Rows x Million Columns • Cell-level Security
Titan • Graph-based Datastore • Optimized for (E, V) • Key/Value attributes for vertices and edges • 100s million vertices x 100s billion edges • Capturing relationships • Sits on top of HBase, Cassandra, …
Resilient Distributed Dataset(RDD) • In-Memory RDD • Iterative Algorithms • Machine Learning
Impala • Near-real-time analysis • Micro-batch processing • Pipelining of micro-batches • Stream annotations
Sits on top of • Distributed indexing and search • Indexes • Raw text files from HDFS • HBase content • Titan properties • Other data replicated data streams
Application Log Search • Full Text Indexes • Flexible Faceting • Automatic field extraction • Dashboard-able search interface • Low-cost alternative to Splunk and other search solutions
Real-time Blacklist Alerting • Fault tolerance • Netflow annotation • Match alerting • Application access alerting • Authentication alerting • Network metrics
Netflow Data Warehouse • 3x Nodes • 2x 8-Core Intel E5-2450 per node • 16Gb RAM per node • 72TB Storage Total • ~5B Netflow records/day • >1 year retention • Support complex SQL-like query
Netflow Data Warehouse • Continuous scanning • Direct querying of delimited file • Perform metrics and diffs • Compute trending • Firewall rule validations • Long retention DFS
EMR Access Anomalies • Category of insider threat • Relational networks of • Users/Groups • Department • Document Access • Community structure-based anomaly detection