1 / 39

Practical Theory Perspectives

Practical Theory Perspectives. CS598ig – Fall 04 Presented by: Mayssam Sayyadian. Publish/Subscribe System. Event notification system Producer publishes messages Consumer waits for certain types of events by placing subscriptions Basic components to be defined: Information space

tanika
Download Presentation

Practical Theory Perspectives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Theory Perspectives CS598ig – Fall 04 Presented by: Mayssam Sayyadian

  2. Publish/Subscribe System • Event notification system • Producer publishes messages • Consumer waits for certain types of events by placing subscriptions • Basic components to be defined: • Information space • Subscriptions • Events (event schema) • Notifications • Many applications and examples: stock information delivery, auction systems, air traffic control, news feed , network monitoring, etc…

  3. Research Issues • System architecture • Matching and Dispatching • Routing • Reliable messaging sending • Security • Special application issues • Mobile environment…

  4. Pub/Sub Systems: Examples • IBM – Gryphon • Stanford – SIFT and more… • CU-Boulder – Siena • France – Le Subscribe • Technische University Darmstadt – REBECA • Microsoft – Herald • MIT • Others – XMLBlaster, Elvin4, TIB, Keryx, REBECA

  5. Earlier Classification • Subject based (channel based) • System contains many channels • Subscriptions and notifications belong to special channel • Simple and straight forward matching • Restrictive • Content based • No channel • Notifications are sent to the subscribers based on their content • More generic • Matching suffer from scaling problem (addressed in this paper)

  6. Content Based Matching Problem • Naïve solution: • Match incoming events per each subscription • Linear to the number of subscriptions • Not practical • Requisite: • Matching and dispatching should be sub-linear in terms of subscriptions • Intuition: • Combine parts of subscription to reduce the number of tests for each event

  7. Event Forwarding Algorithms • Decision trees • Use a tree structure to describe the event matching information • Forwarding process is an event go through the tree structure • Example: Gryphon • Hash functions • Use hash function to index all components of notifications • Use other efficient way to find matched notifications • Examples: Le Subscribe

  8. The Big Picture: The Information Bus Picture from “The Information Bus – An Architecture for Extensible Distributed Systems”, Brian M. Opi, et al, SOSP 1993

  9. A Scalable Matching Algorithm • “Matching Events in a Content-based Subscription System”, M. K. Aguilera - IBM • Address scalability of matching algorithms • Sub-linear in the number of subscriptions • Space complexity: linear • Do preprocessing • Assume (almost) infrequent update for subscriptions

  10. Matching Algorithm • Classification ? • Consider a decision tree classifier with subscriptions as set of possible classes • Analyze subscriptions • sub := pr1 ^ pr2 ^ pr3 • Conjunction of elementary predicatespri = testi(e)  resi • e.g. (city=LA) and (temperature < 40) • pr1 = test1(…)  LA • pr2 = test2(…)  “<“ • test1 = “examine attribute city” • test2 = “examine attribute temperature 40”

  11. Matching Algorithm (Cont’d.) • Preprocess to make the matching tree • Each non-leaf node is a test • Each edge from test node is a possible result • Each leaf node is a subscription • Pre-process each of the subscriptions and combine the information to prepare the tree • On receiving events, follow the sequence of test nodes and edges till a leaf node is reached

  12. Matching Tree • Don’t care tests • Related tests sub3=(test1  res1)^(test2  res2) sub4=(test3  res3)^(test4  res4) (test3  res3)  (test1  res1)

  13. Matching Tree (Equality Tests) Conjugation of equality tests: sub1=(attr1=v1)^(attr2=v2)^(attr3=v3) sub2=(attr1=v1)^(attr2=*)^(attr3=v3’) sub3=(attr1=v1’)^(attr2=v2)^(attr3=v3)

  14. Complexity • Assumptions: • All attributes have the same value set • Only equality tests being done • No related test in the tree • Events come from a uniform distribution • Pre-processing: • Time complexity: O(NK), K attributes & N subscriptions • Space complexity: O(NK) • Matching Complexity: • Expected time to match a random event: O(N 1-λ ), sub linear • λ = ln V / (ln V + ln K’), note 1> λ >0 • V: number of possible values for each attribute • K’: number of attributes in the schema + 1 • What about worst case ?

  15. Optimizations • Collapse a chain of * edges (60% gain) • Example: collapse B to A • Statically pre-compute successor nodes (20% gain) • Separate sub-trees for attributes that rarely have don’t care in subscriptions

  16. Performance • Operations per Event • Space per Event = Edges + Successor nodes • Latency: 4ms for 25,000 subscriptions • Attributes vary in popularity, follow Zipf’s distribution • Tests for 30 attributes with 3 possible values • Distribution always got 100 matches per event Operations per Event Space (thousands of cells)

  17. Discussion Points • Topology Matters ! • What about non-equality based subscriptions ? • If content based subscriptions are used with equality tests only, are there other ways to achieve sub-linear matching times? • Exact vs. approximate results • What if • Subscriptions vary by time frequently • Stream of subscriptions • Multi dimensional events

  18. “Computation in Networks of Passively Mobile Finite-State Sensors”, Dana Angluin, James Aspnes, Zoe Diamadi, Michael Fischer, Rene Peralta, PODC 2004.

  19. The Problem … A Flock of Birds ! • Birds: finite state agents (sensors with states) • Resource is limited • Passive mobility (no control) • Communication: How much ? • Problems • Is there a solutions ? • What is the probable solutions ?

  20. A Wider View • Question: • What computations are possible in a cooperative network of passively mobile finite-state sensors. • Assumptions: • Mobility is passive (not under sensor’s control) • Sufficiently rapid and unpredictable (no stable routing strategy) • Complete communication • Identical sensors: no identifier

  21. Formal Model: Population Protocols • Population Protocol (A): • A finite input and output alphabets: X, Y • A finite set of states: Q • An input function: I : X→Q • An output function: O : Q →Y • A transition function: : (Q  Q) → Q  Q • Transitions:(p,q)→(p’,q’) if (p,q)=(p’,q’)

  22. Formal Model (Cont’d) • Population protocol runs in a Population of any finite size n. • Population P : • A set A of n agents with irreflexive relationship E AA that are interpreted as directed edges of an interaction graph • Population Configuration • A mapping C: A Q • Specifies the set of states of each member of the population • Computation: • A finite or infinite sequence of population configurations: C0 , C1 , C2 , … such that i: C  Ci

  23. Formal Models: Computation • No halting but stabilizing ! • Stabilizing is a global property of the population • Individual agents do not know the if they have stabilized • It is possible to bound number of interactions before having outputs stabilized, by some stochastic assumptions • To model computation: • What is the input assignment • What should be the output assignment • Definition of an output stable configuration • Formally define: stably computing an input-output relation by a population protocol • FA(x) = y for R(x, y)  A stably computes the partial function FA: X Y

  24. Functions • Population protocols compute partial functions from X to Y . • Need for suitable input and output encoding for functions on other domains • Functions with multiple arguments • Predicates on X • Integer Functions

  25. A Stably Computable Expression Language • Closure properties: • If f and g are stably computable then so is about f, f  g and f  g • Parity (if there are odd number of 1’s in the input) • Majority • Arithmetic functions • Stably computable expression language • An upper bound on the set of stably computable predicates All predicates stably computable in the model with all pairs enabled are in the class NL  characterization of this theorem is an open problem

  26. Other Issues • Restricted Interactions • Some interaction graphs permit powerful computations • E.g. a population whose interaction graph is a directed line  linear space Turing machine • The complete graph (discussed so far) is the weakest structure for computing predicates •  Any weakly connected graph can simulate this

  27. Randomized Interactions • Measures other than stability • Let’s add probabilistic assumptions on interactions • Consider computations that are correct with high probability • Question about expected resource use • Benefits of a leader • Simulating counters: The model can simulate O(1) counters of O(n) • How to elect a leader  use ideas of majority and parity functions • The set of predicates accepted by a randomized population protocol with probability ½ +  is contained in P RL

  28. Discussion Points • So what ?! • Theoretic fundamentals always help • Consider interaction graph as input  what interesting properties about the underlying interaction graph for input could be stably computed ?  applications in analyzing the structure of sensor nets. • Consider one-way communication • Assume sampling models other than uniform, where does this help? • Formal methods + Methodology • Remember converting differential equations into distributed protocols • What do you THINK ! • Formalizing computation  Apply methodology

  29. “Performance Evaluation of a Communication Round over the Internet”, Omar Bakr, Idit Keidar, PODC’02 Some slides taken from Omar Bakr’s’s presentation

  30. Communication Round • Exchange of information from all hosts to all hosts • Part of many distributed algorithms, systems • consensus, atomic commit, replication, ... • Evaluation  Some metric • Number of rounds (or steps) required • How long is it going to take • Local running time of one host engaged • Overall running time • What is the best way to implement it ? • Centralized vs. decentralized

  31. Example Implementations (b) (a) • All to all • Leader • Secondary Leader (c)

  32. Experiment I • 10 hosts: Taiwan, Korea, US academia, ISPs • TCP/IP (connections always up) • Algorithms: • All-to-all • Leader (initiator) • Secondary leader (not initiator) • Periodically initiated at each host • 650 times over 3.5 days

  33. Overall Running Time: • Elapsed time from initiation (at initiator) until all hosts terminate • Requires estimating clock differences • Clocks not synchronized, drift • We compute difference over short intervals • Compute 3 different ways • Achieve accuracy within 20 ms. on 90% of runs • Overall Running Times From MIT • Ping-measured latencies (IP): • Longest link latency 240 milliseconds • Longest link to MIT 150 milliseconds

  34. Measured Running Times Runs Initiated at MIT / Taiwan

  35. What’s going on ? • Loss rates on two links are very high • 42% and 37% • Taiwan to two ISPs in the US • Loss rates on other links up to 8% • Upon loss, TCP’s timeout is big • More than round-trip-time • All-to-all sends messages on lossy links • Often delayed by loss

  36. Distribution of Running Times Up to 1.3 sec.

  37. Removing Taiwan • Overall running times much better • For every initiator and algorithm, less than 10% over 2 seconds (as opposed to 55% previously) • All-to-all overall still worse than others! • either Leader or Secondary Leader best, depending on initiator • loss rates of 2% - 8% are not negligible • all-to-all sends O(n2) messages; suffers • But, all-to-all has best local running times

  38. Probability of Delay due to Loss • If all links would have same latency • assume 1% loss on all links; 10 hosts (n=10) • Leader sends 3(n-1) = 27 messages • probability of at least one loss: 1 -.9927 »24% • All-2-all sends n(n-1) = 90 messages • probability of at least one loss: 1 -.9990 » 60% • In reality, links don’t have same latency • only loss on long links matters • Each communication has a cost !

  39. Discussioln Points and Lessons Learned • Internet is A VERY SPECIAL distributed system (not an ideal one !) • Message loss causes high variation in TCP link latencies • latency distribution has high variance, heavy tail • Latency distribution determines expected time for receiving O(n) concurrent messages • Secondary leader helps • No triangle inequality, especially for loss • Different for overall vs. local running times • Number of rounds/steps not sufficient metric • One-to-all and all-to-all have different costs

More Related