1 / 17

Distributed Stream Processing Strategies Presented by Ming Jiang: A Detailed Review

Explore the comprehensive overview of scalable and distributed stream processing techniques with a focus on collaborative efforts across administrative domains. Learn about Aurora and Medusa distribution methods, architectural issues, load management, and high availability strategies. Discover key challenges like key partitioning and high availability implementation through failure detection and recovery. Dive deep into the intricacies of communications, naming, and routing in decentralized stream processing environments.

dgould
Download Presentation

Distributed Stream Processing Strategies Presented by Ming Jiang: A Detailed Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Distributed Stream Processing Presented by Ming Jiang

  2. Centralized stream processing review

  3. Situation when distributed • A distributed federation of participating nodes in different administrative domains • Collaboration between different domains required

  4. Two complementary efforts for the situation • Aurora* intra-participant distribution • Medusa inter-participant distribution

  5. Three pieces to be shard • Aurora • An overlay network of communication • Algorithms for high-availability

  6. Three architectural issues • Communications • Load sharing • High availability in the presence of failure

  7. Communications • Naming (participants, entity-name) • Routing 1. a data source or an administrator registers a schema and a stream 2. When DS produce an event, labels

  8. Communications • Message Transport multiplexing all the message streams on a single TCP connection • Remote definition: process migration is too complicated

  9. Load Management Repartitioning Aurora Networks, based on loads and resources: • Box Sliding • Box Splitting

  10. Box Sliding • Takes a box on the edge of a sub-network on one machine and shifts it to its neighbor. upstream box sliding

  11. Box Splitting • Create a copy of a box that is intended to run on second machine, to offload • Need a filter as router

  12. Box splitting Tumble Merge: Box splitting has to be transparent

  13. Box splitting • If predicate in filter is: B<3 A machine: 1,2,3,4,7 B machine: 5,6  A machine B machine final result after merge

  14. Key partitioning Challenges • Choosing what to offload • Choosing what to split • Choosing filters • Others…

  15. High Availability Utilize the push-based nature

  16. Failure detection and Recovery • 1. periodically send heartbeat msgs to upstream neighbors • 2. if any server does not reply for pre-defined time, we assume it failed • 3. initiate recovery phase, emulating the process of failed server (load shedding can be used)

  17. Thank you!

More Related