1 / 36

A Grid-Based Middleware’s Support for Processing Distributed Data Streams

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering. A Grid-Based Middleware’s Support for Processing Distributed Data Streams. Introduction- Motivation. Data stream processing and analysis Data stream: data arrive continuously and need to be processed in real-time

stu
Download Presentation

A Grid-Based Middleware’s Support for Processing Distributed Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering A Grid-Based Middleware’s Support for Processing Distributed Data Streams

  2. Introduction-Motivation • Data stream processing and analysis • Data stream: data arrive continuously and need to be processed in real-time • Data Stream Applications: • Online network Intrusion Detection • Sensor networks • Network Fault Management System for Telecommunication Network Elements • Computer Vision Based Surveillance • Common features of data streams • Continuous arrival • Enormous volume • Real-time constraints • Data sources could be distributed

  3. X Introduction-Motivation Network Fault Management System analyzing alarm message streams Switch Network Network Fault Management System

  4. Introduction-Motivation Computer Vision Based Surveillance

  5. Switch Network X Introduction-Motivation • Challenges & possible Solutions • Challenge1: Data and/or Computation intensive

  6. Switch Network Introduction-Motivation • Challenges & possible Solutions • Challenge1: Data and/or Computation intensive • Solution: Grid computing technologies

  7. Introduction-Motivation • Challenges & possible Solutions • Challenge1: Data and/or Computation intensive • Solution: Grid computing technologies • Challenge 2: real-time analysis is required • Solution: Self-Adaptation functionality is desired

  8. Introduction-Motivation • From point of view of the developers who are interested in applications of data streams • Would like to concentrate on applications themselves • Would not like to focus efforts on • Grid computing • Adaptation function

  9. Introduction-Our Approach • A Middle-ware that is based on Grid standards and tools and provides self-adaptation functionality • The middleware is referred to as GATES (Grid-based AdapTive Execution on Stream) • Automatically distributed to proper computing nodes • Automatically self-adaptive to varying environment without implementing certain algorithms

  10. System Architecture and Design(From Application Perspective) • Breaking down a task into several sub-tasks so that the sub-tasks can consist of a pipeline • Implementing each sub-task in Java • Writing an XML configuration file for the sub-tasks to be automatically deployed. I.E. • specify how many stages (sub-tasks) the pipeline has • specify where the codes that are implementing the sub-tasks reside • Launch the application by running a java program (StreamClient.class) provided by the GATES

  11. System Architecture and Design(Architecture)

  12. :Buffers for applications :Queues between Grid services :Grid services of the GATES :Stages of an application System Architecture and Design(Architecture) Stage A Stage B Stage C A B C

  13. System Architecture and Design(Example) Public class Sampling-Stage implements StreamProcessing{ … void init(){…} … void work(buffer in, buffer out){ … while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } … } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();

  14. Self-adaptation Algorithm • Given a queue’s long-term factor at each stage, we want to improve the method of adjusting values of an adaptation parameter • Should the adaptation parameter be modified, and if so, in which direction? • How to find a new value (update the value) of the adaptation parameter

  15. Enhanced Self-adaptation Algorithm • Should the adaptation parameter be modified, and if so, in which direction? • The answer is related to load status of queues at two consecutive stages

  16. A B C A B C A B C Performance Parameter A B C A B C A B C A B C A B C Enhanced Self-adaptation Algorithm A B C Convergent States Non-Convergent States

  17. Enhanced Self-adaptation Algorithm Summary of Load States

  18. Enhanced Self-adaptation Algorithm • How to determine the new value for the adaptation parameter • Linear update: increase or decrease by a fixed value • Hard to find a proper fixed value • Previous method • Binary tree search

  19. Left Border Current Value Right Border Enhanced Self-adaptation Algorithm Left Border Current Value New Value Right Border

  20. Data Mining Applications & System Evaluation • Two Data mining applications • Clustream: Clustering data arriving in data streams

  21. Data Mining Applications &System Evaluation • Dist-Freq-Counting: finding frequent itemsets from distributed streams

  22. Data Mining Applications &System Evaluation

  23. Data Mining Applications &System Evaluation

  24. Data Mining Applications &System Evaluation

  25. Data Mining Applications &System Evaluation

  26. Data Mining Applications &System Evaluation

  27. Data Mining Applications &System Evaluation

  28. Data Mining Applications &System Evaluation

  29. Data Mining Applications &System Evaluation

  30. Data Mining Applications &System Evaluation

  31. Resource Allocation Schemes • Problem Definition • Grid resource scheduling for Pipelined processing and real-time distributed streaming applications • Mapping workflows onto Grid is a NP-complete problem • Static Part: the resource allocation problem for GATES is to determine a deployment configuration • Dynamic Part

  32. Static Allocation Scheme • Static allocation problem: determining a deployment configuration • Objective: Automatically generate a deployment configuration according to the information of available resources • The number of data sources and their location • The destination • The number of stages consisting of a pipeline • The number of instances of each stage • How the instances connect to each other • The node where each instance is placed

  33. Static Allocation Scheme Examples of deployment configurations

  34. Related work • Grid Resource Allocation • Condor • Realtor • ACDS etc. • Main Differences: our work focuses on Grid resource allocation for workflow applications • Adaptation Through a Middleware • Cheng et al.’s adaptation framework • SWiFT • Conductor • DART • ROAM • Main Differences: our work focuses on general supports for adaptation in run-time

  35. Summary • Grid computing could be an effective solution for distributed data stream processing • GATES • Distributed processing • Exploit grid web services • Self-adaptation to meet the real-time constraints • Grid resource allocation schemes

More Related