1 / 31

REAL-TIME NETWORK ANALYTICS WITH STORM

This guide explores the benefits of real-time data processing with Storm, covering topics such as real-time rule execution, distributed analytics, and visualization. Learn how Storm addresses challenges like network speed, fault tolerance, and reliability.

dpitts
Download Presentation

REAL-TIME NETWORK ANALYTICS WITH STORM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REAL-TIME NETWORK ANALYTICS WITH STORM Mauricio Vacas Fausto Inestroza Sonali Parthasarathy

  2. The Team MauricioVacas Big Data Architect Anita Mehrotra Data Scientist Krista Schnell Visualization Fausto Inestroza Big Data Architect Sonali Parthasarathy Real-Time Processing Susie Lu Visualization John Akred Product Lead Rick Drushal Engineering Lead

  3. WHY REAL-TIME?

  4. PROCESS Real-Time Data Ingestion Exploratory Analytics Model Prototyping Real-Time Rule Execution Distributed Analytics UNDERSTAND REACT

  5. Accenture Cloud Platform Recommender as a Service Network Analytics Services … Big Data Platform

  6. Drivers consumer devices video usage Issues Operational Costs Understanding service quality degradation Inefficient capacity planning

  7. VISUALIZE STORE PROCESS INGEST ANALYZE

  8. WHY STORM?

  9. What do we need? Multiple use cases Processing, computation, etc. Data types, size, velocity Scalability Mission critical data Fault-tolerance Reliability Time series / pattern analysis

  10. How do we get this from Storm? Processing, computation, etc. Low-level Primitives Scalability Parallelization Fault-tolerance Robust fail-over strategies Processing guarantees Reliability

  11. PRIMITIVES

  12. Tuple Tuple Suboptimal network speed, geospatial analysis Topology Request info (IP, user-agent, etc) Stream Pull messages from distributed queue Spout Sessionization, speed calculation Bolt

  13. PARALLELISM

  14. Supervisor Supervisor W W T T T T W W T T T T Nimbus Zookeeper

  15. Topology Worker Process Executor Executor Task Task Task Task

  16. FAULT TOLERANCE

  17. Supervisor W T T W T T Supervisor W T T Supervisor W T T T T W Nimbus T T W T T T T

  18. RELIABILITY

  19. IP2 IP2 IP3 IP1 A

  20. IP2 IP2 IP3 IP1 A

  21. SUBOPTIMAL NETWORK SPEED TOPOLOGY AN EXAMPLE

  22. Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Cassandra

  23. Tuple (ip 2) Tuple (ip 1) Tuple (ip 1) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 1) Tuple (ip 1) Tuple (ip 2) Tuple (ip 1) Tuple (ip 2) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 1) Parallelism Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Cassandra

  24. Tuple (ip 1) Tuple (NY) Tuple (ip 1/NY) Tuple (ip 1/NY) Branchingand Joins Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Join Compare Speed Store in Cassandra Stream 1 Cassandra Stream 2 Kafka Spout Speed by Location

  25. RULE EXECUTION

  26. METHOD2 Storm + Drools METHOD1 Storm Drools

  27. Storm + Drools Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Drools Cassandra

  28. Integration with Cassandra Cassandra Optimal for time series data Near-linear scalable Low read/write latency Custom Bolt Uses Hector API to access Cassandra Creates dynamic columns per request Stores relevant network data

  29. Lessons Learned • Rebalance Topology • Tweak Parallelism in bolt • Isolation of Topologies • Use TimeUUIDUtils • Log4j level set to INFO by default

  30. DEMO

  31. Next Steps • Trident • Externalizing Rules • Predictive Models • Real-Time Notifications

More Related