310 likes | 318 Views
This guide explores the benefits of real-time data processing with Storm, covering topics such as real-time rule execution, distributed analytics, and visualization. Learn how Storm addresses challenges like network speed, fault tolerance, and reliability.
E N D
REAL-TIME NETWORK ANALYTICS WITH STORM Mauricio Vacas Fausto Inestroza Sonali Parthasarathy
The Team MauricioVacas Big Data Architect Anita Mehrotra Data Scientist Krista Schnell Visualization Fausto Inestroza Big Data Architect Sonali Parthasarathy Real-Time Processing Susie Lu Visualization John Akred Product Lead Rick Drushal Engineering Lead
PROCESS Real-Time Data Ingestion Exploratory Analytics Model Prototyping Real-Time Rule Execution Distributed Analytics UNDERSTAND REACT
Accenture Cloud Platform Recommender as a Service Network Analytics Services … Big Data Platform
Drivers consumer devices video usage Issues Operational Costs Understanding service quality degradation Inefficient capacity planning
VISUALIZE STORE PROCESS INGEST ANALYZE
What do we need? Multiple use cases Processing, computation, etc. Data types, size, velocity Scalability Mission critical data Fault-tolerance Reliability Time series / pattern analysis
How do we get this from Storm? Processing, computation, etc. Low-level Primitives Scalability Parallelization Fault-tolerance Robust fail-over strategies Processing guarantees Reliability
Tuple Tuple Suboptimal network speed, geospatial analysis Topology Request info (IP, user-agent, etc) Stream Pull messages from distributed queue Spout Sessionization, speed calculation Bolt
Supervisor Supervisor W W T T T T W W T T T T Nimbus Zookeeper
Topology Worker Process Executor Executor Task Task Task Task
Supervisor W T T W T T Supervisor W T T Supervisor W T T T T W Nimbus T T W T T T T
IP2 IP2 IP3 IP1 A
IP2 IP2 IP3 IP1 A
SUBOPTIMAL NETWORK SPEED TOPOLOGY AN EXAMPLE
Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Cassandra
Tuple (ip 2) Tuple (ip 1) Tuple (ip 1) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 1) Tuple (ip 1) Tuple (ip 2) Tuple (ip 1) Tuple (ip 2) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 1) Parallelism Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Cassandra
Tuple (ip 1) Tuple (NY) Tuple (ip 1/NY) Tuple (ip 1/NY) Branchingand Joins Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Join Compare Speed Store in Cassandra Stream 1 Cassandra Stream 2 Kafka Spout Speed by Location
METHOD2 Storm + Drools METHOD1 Storm Drools
Storm + Drools Kafka Spout Pre-process Sessionize Calculate N/W Speed per Session Update Speed per IP Identify Suboptimal Speed Store in Cassandra Drools Cassandra
Integration with Cassandra Cassandra Optimal for time series data Near-linear scalable Low read/write latency Custom Bolt Uses Hector API to access Cassandra Creates dynamic columns per request Stores relevant network data
Lessons Learned • Rebalance Topology • Tweak Parallelism in bolt • Isolation of Topologies • Use TimeUUIDUtils • Log4j level set to INFO by default
Next Steps • Trident • Externalizing Rules • Predictive Models • Real-Time Notifications