1 / 22

Big Data Technologies for InfoSec

Big Data Technologies for InfoSec. Dive Deeper. See Further . Ram Sripracha ( rsriprac@ucla.edu ) UCLA / Sift Security. Experiences. RR Systems. What are “Big Data” systems?. XXL in Size Data Volume TBs - PBs Computation Scalability Horizontally Scalable Multi-host Deployment

gada
Download Presentation

Big Data Technologies for InfoSec

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Technologies for InfoSec Dive Deeper. See Further. Ram Sripracha (rsriprac@ucla.edu) UCLA / Sift Security

  2. Experiences RR Systems

  3. What are “Big Data” systems? • XXL in Size • Data Volume • TBs - PBs • Computation Scalability • Horizontally Scalable • Multi-host Deployment • Commodity Hardware

  4. Why now? • Rich Ecosystem • Well Supported Open Source Software • High Adoption Rate • Commercial Backings • “Redhat” Model • Heavily Invested

  5. Platform Providers

  6. Technologies

  7. Is it a “Big Data” problem? • Many moving parts • Initially maybe overwhelming • 100s of configuration setting • Requests some level of expertise • Overkill for some problems • Larger resource footprint

  8. Big Data Stack

  9. Big Data Stack

  10. DFS

  11. NoSQL • Columnar • Sits on HDFS • Million Rows x Million Columns • Cell-level Security

  12. Titan • Graph-based Datastore • Optimized for (E, V) • Key/Value attributes for vertices and edges • 100s million vertices x 100s billion edges • Capturing relationships • Sits on top of HBase, Cassandra, …

  13. Map-Reduce

  14. Resilient Distributed Dataset(RDD) • In-Memory RDD • Iterative Algorithms • Machine Learning

  15. Impala • Near-real-time analysis • Micro-batch processing • Pipelining of micro-batches • Stream annotations

  16. Sits on top of • Distributed indexing and search • Indexes • Raw text files from HDFS • HBase content • Titan properties • Other data replicated data streams

  17. Application Log Search • Full Text Indexes • Flexible Faceting • Automatic field extraction • Dashboard-able search interface • Low-cost alternative to Splunk and other search solutions

  18. Real-time Blacklist Alerting • Fault tolerance • Netflow annotation • Match alerting • Application access alerting • Authentication alerting • Network metrics

  19. Netflow Data Warehouse • 3x Nodes • 2x 8-Core Intel E5-2450 per node • 16Gb RAM per node • 72TB Storage Total • ~5B Netflow records/day • >1 year retention • Support complex SQL-like query

  20. Netflow Data Warehouse • Continuous scanning • Direct querying of delimited file • Perform metrics and diffs • Compute trending • Firewall rule validations • Long retention DFS

  21. EMR Access Anomalies • Category of insider threat • Relational networks of • Users/Groups • Department • Document Access • Community structure-based anomaly detection

More Related