Distributed Stream Processing Strategies Presented by Ming Jiang: A Detailed Review

Scalable Distributed Stream Processing Presented by Ming Jiang

Centralized stream processing review

Situation when distributed • A distributed federation of participating nodes in different administrative domains • Collaboration between different domains required

Two complementary efforts for the situation • Aurora* intra-participant distribution • Medusa inter-participant distribution

Three pieces to be shard • Aurora • An overlay network of communication • Algorithms for high-availability

Three architectural issues • Communications • Load sharing • High availability in the presence of failure

Communications • Naming (participants, entity-name) • Routing 1. a data source or an administrator registers a schema and a stream 2. When DS produce an event, labels

Communications • Message Transport multiplexing all the message streams on a single TCP connection • Remote definition: process migration is too complicated

Load Management Repartitioning Aurora Networks, based on loads and resources: • Box Sliding • Box Splitting

Box Sliding • Takes a box on the edge of a sub-network on one machine and shifts it to its neighbor. upstream box sliding

Box Splitting • Create a copy of a box that is intended to run on second machine, to offload • Need a filter as router

Box splitting Tumble Merge: Box splitting has to be transparent

Box splitting • If predicate in filter is: B<3 A machine: 1,2,3,4,7 B machine: 5,6  A machine B machine final result after merge

Key partitioning Challenges • Choosing what to offload • Choosing what to split • Choosing filters • Others…

High Availability Utilize the push-based nature

Failure detection and Recovery • 1. periodically send heartbeat msgs to upstream neighbors • 2. if any server does not reply for pre-defined time, we assume it failed • 3. initiate recovery phase, emulating the process of failed server (load shedding can be used)

Thank you!

Distributed Stream Processing Strategies Presented by Ming Jiang: A Detailed Review

Distributed Stream Processing Strategies Presented by Ming Jiang: A Detailed Review

Presentation Transcript

Providing Resiliency to Load Variations in Distributed Stream Processing

Distributed Query Processing

Toward Scalable Transaction Processing

Scalable stream processing with Storm

Fault-Tolerance in the Borealis Distributed Stream Processing System

StreamCloud: an Elastic Parallel-Distributed Stream Processing Engine

Is Distributed Consistency Scalable?

Scalable Approximate Query Processing

Stream Processing

Replica Placement for High Availability in Distributed Stream Processing Systems

Distributed Processing

Scalable Distributed Memory Multiprocessors

Advances and Challenges for Scalable Provenance in Stream Processing Systems

Scalable Trigger Processing

Scalable Secure Distributed Computation

Accommodating Bursts in Distributed Stream Processing Systems

Scalable Distributed Memory Machines

Using Processing Stream

XML Stream Processing

Scalable Trigger Processing

Fault Tolerant Stream Processing using Distributed Replicated File System

XMLTK: An XML Toolkit for Scalable XML Stream Processing