1 / 36

GPU-Accelerating A Deep Learning Anomaly Detection Platform

Learn from Satish Dandu, Michael Balint, and Joshua Patterson on how to accelerate anomaly detection and inferencing by using deep learning and GPU data pipelines.

nvidia
Download Presentation

GPU-Accelerating A Deep Learning Anomaly Detection Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ø Data Platform-as–a-Service Ø Multi-Tenancy Ø Anomaly Detection Platform Ø Training & Inferencing Evolution AGENDA Ø Performance & Learnings Ø GPU Acceleration Ø Dash boarding use case & impetus for GPU- acceleration Ø Performance & Learnings Ø Future & GOAI 2

  2. DATA PAAS OVERVIEW 3

  3. DATA PLATFORM-AS-A-SERVICE SCALE HIGH AVAILABILITY Handles 1M events/second Auto-scales the cluster automatically Offers HA with no data-loss Always-on architecture Data replication • • • • • SECURITY SELF SERVICE Data platform security has been implemented with VPCs in AWS Dashboard access using NVIDIA LDAP Log-to-analytics Kibana, JDBC access Accessing data using BI tools • • • • • 4

  4. ARCHITECTURE V1 5

  5. DATA PLATFORM STATS 6

  6. 7

  7. ANOMALY DETECTION 8

  8. DATA PAAS Anomaly Detection (AD) PaaS 9 *Images created from quickmeme.com

  9. ANOMALY DETECTION USING DEEP LEARNING NGC/NGN GPU Cluster GPU Cluster GPU Cloud NGC/NGN Anomaly Detection Data Platform AI Framework (Keras + TensorFlow) Top Features Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise 10

  10. ANOMALY DETECTION FRAMEWORK Anomalies: Email alerts, Dashboards Feedback from user Time X1 X2 Y Anomaly Description Anomaly Post- processing: Univariate Analysis 1 X1 0 Anomaly Detection Time X1 X2 X’ X’’ Y Supervised Learning: Logistic Regression Unsupervised Learning: Multivariate-Gaussian 1 0 Feature Learning Algorithm: Recurrent Neural Network (RNN), Autoencoders (AE) Time X1 X2 X’ X’’ Time X1 X2 Raw Dataset 11

  11. ANOMALY DETECTION BENEFITS WITH DEEP LEARNING With DL Without DL Top Features Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise 12

  12. ANOMALY DETECTION TRAINING Ø Evolution V1: V2: Multi- GPU support + TensorFlow Serving (Keras + TensorFlow) V0: Manual Feature Creation Automatic Feature Creation using DL (Theano) Ø CPU vs GPU Ø Learnings : Ø Manual feature extraction does not scale Ø Dataset preparation is the long pole Ø Training on CPU takes longer than data collection rate 13

  13. ANOMALY DETECTION INFERENCING Goal : Detect/Inference any anomalies in near-real time based on user activity based on trained models 3. AD platform uses trained model to inference anomalies 1. Aggregated Data output for live data 2. Deep learning Model 4. Sends automated alerts with dashboards 5. Users can label alerts 6. Model gets updated during next training cycle http://www.sci-news.com/othersciences/linguistics/science-mystery-words-gullivers- travels-03135.html 14

  14. V1: DATA PREP FOR INFERENCING Ø Use Case: Detecting anomalies with user’s activity Ø Inferencing flow from 10k feet Live ETL Live Streaming Data AD Platform Streaming Data aggregations for inferencing AD Platform Ø Started with python scripts for windowed aggregation Python Script Performance 200 150 Ø Learnings: Hard to scale for near real time. AD platform runs inferencing every 3 mins as we are impacted by speed of data processing 154 100 103 50 73 0 10 MINS 30 MINS 60 MINS 15

  15. V2: IMPROVING DATA PREP PERFORMANCE V2: To improve performance, we started using Presto with data on S3 in JSON format Ø Live data will be streamed from Kafka to S3. We use Presto for our data warehousing needs Ø Live Streaming Data AD Platform Presto is an open-source distributed SQL query engine optimized for low-latency, ad-hoc analysis of data* Ø 35 30 30 25 25 20 Ø Learnings: Presto with Parquet has best performance but we need to batch data at 30 secs interval. So it’s not completely real time 20 15 10 8 5 6 4 0 PRESTO ON JSON PRESTO ON PARQUET 10 mins 30 mins 60 mins 16

  16. DASHBOARDING 17

  17. GPU ACCELERATION Accelerate the pipeline, not just deep learning! Ø GPUs for deep learning = proven Data Ingestion Ø Where else and how else can we use GPU acceleration? Ø Dashboards Data Processing Ø Accelerating data pipeline Ø Stream processing Visualization Inferencing Model Training Ø Building better models faster Ø First: GPU databases 18

  18. 1+ BILLION TAXI RIDES BENCHMARK Query 1 Query 2 Query 3 Query 4 10190 8134 19624 85942 5000 4500 4000 3500 2970 3000 2500 2250 2000 1560 Time in Milliseconds 1500 1250 1000 696 372 500 269 150 99 80 30 21 0 MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS @marklit82 19

  19. MAPD + IMMERSE VS ELASTIC + KIBANA MapD Core Elastic + Kibana Very fast OLAP queries Fantastic for complex search • • JIT LLVM query compiler Scales easily (up to a point) • • GPUs for compute Indexing consumes more storage (~4-6x) • • CPUs for parse + ingest Kibana for KPI dashboarding? • • Limited join support (for now) • Concurrency? • Immerse c3/d3 + crossfilter = nice dashboards • Backend rendering • 20

  20. ARCHITECTURE V1 21

  21. ARCHITECTURE V2 (with MapD) 22

  22. MAPD VS KIBANA Dashboards Comparison + Performance Test Method 23

  23. DASHBOARD PERFORMANCE MapD Immerse vs Elastic Kibana 300 MapD Immerse (DGX) x MapD Immerse (P2) Elastic Kibana Time to Fully Load (seconds) 200 100 < 12s < 9s 0 1 6 11 16 21 26 31 Days of Data 24

  24. Kratos Artificial Intelligence V3: Data Prep using GPU acceleration V3: Explored GPU databases like MapD to improve the performance for querying on streaming live data • • MapD offers constant query response times MapD has some SQL limitations. We use Presto as an interface & built a “MapD-> Presto” connector for full ANSI Sql features • GPU Database Performance 35 30 Execution Time (seconds) 30 25 25 20 20 Live Streaming Data AD Platform 15 8 10 6 4 5 1.2 1.2 1.2 0.1 0.1 0.1 0 PRESTO ON JSON PRESTO ON PARQUET MAPD PRESTO + MAPD 10 mins 30 mins 60 mins 25

  25. FUTURE 26

  26. EXPAND GPU USAGE More Data, Less Hardware Peak Double Precision 8.0 7.0 6.0 TFLOPS Scaling up and out with GPU 5.0 4.0 3.0 2.0 1.0 0.0 27 2008 2010 2012 2014 2016 2017 NVIDIA GPU x86 CPU

  27. EXPAND GPU USAGE Internal Logs to Cyber Security Cyber Security Analytics Platform 28

  28. GOAI ECOSYSTEM End To End Data Science A Python open-source just-in-time optimizing compiler that uses LLVM to produce native machine instructions Dask is a flexible parallel computing library for analytic computing with dynamic task scheduling and big data collections. 29

  29. GOAI ECOSYSTEM End To End Data Science Graph Analytics & Visualization A CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on vertex or edge operations. SIEM attack escalation Dropbox external sharing logs 30

  30. BETTER DATA PIPELINES User Defined Functions at Scale https://github.com/gpuopenanalytics libgdf, pygdf, dask_gdf 31

  31. BETTER DATA PIPELINES HIVE to BlazingDB 32

  32. BETTER DATA PIPELINES More Models nvGRAPH https://github.com/h2oai/h2o4gpu # edges = E * 2^S ~34M 33

  33. JOIN THE REVOLUTION Everyone Can Help! GPU Open Analytics Initiative http://gpuopenanalytics.com/ APACHE ARROW APACHE PARQUET https://arrow.apache.org/ https://parquet.apache.org/ @ApacheArrow @ApacheParquet @Gpuoai Integrations, feedback, documentation support, pull requests, new issues, or donations welcomed! 34

  34. NVIDIA Team Ed Clune Ayush Jaiswal THANK YOU! Keith Kraus Rohit Kulkarni Jarod Maupin Nixs Nataraja Rohan Somvanshi Michael Wendt 35

More Related