210 likes | 375 Views
Quasar A Probabilistic Publish-Subscribe System for Social Networks over P2P Kademlia network. David Arinzon Supervisor: Gil Einziger April 2012. Quasar. Quasar is a “Publish-Subscribe” mechanism, which bases its routing mechanism on the usage of Bloom Filters . Bloom Filter.
E N D
QuasarA Probabilistic Publish-Subscribe System for Social Networks over P2P Kademlia network David Arinzon Supervisor: Gil Einziger April 2012
Quasar • Quasar is a “Publish-Subscribe” mechanism, which bases its routing mechanism on the usage of Bloom Filters.
Bloom Filter • A bloom filter is “is a space-efficient probabilisticdata structure that is used to test whether an element is a member of a set”. (Wikipedia entry based on Donald Knuth’s “The art of computer programming”, 1970) • In this structure, false positives are possible, but false negatives are not. • When an element is added, its value is sent to k hash functions which will produce k array positions in the bitmap (They’re set to 1). • Upon querying for an element, the same process is applied, and if at least one of the given bits is 0, the element is not in the structure.
Bloom Filters in Quasar • An entry for a Bloom Filter is an ID for a subscription. (In our case, the publisher ID). • Each node contains an “enhanced” routing table. • The radius defines “how much each node knows about the surrounding subscription interests”. • For radius of k, each node contains a set of k attenuated bloom filters for each level of “closeness” (0 – k-1). • The bloom filter on level n will contain subscription information about nodes existing n+1 hops away. • The location of the information is saved in the attenuated bloom filters of the relevant immediate neighbor. • Along with the information is a set of nodes achievable by using this particular entry.
Bloom Filters in Quasar (example) • For the radius of 2, nodes 2 and 3 are considered immediate neighbors (1 hop), 4 and 5 are considered level 2 neighbors (2 hops), and 6 and 7, are of the “recognition radius” 6 4 2 1 3 5 7
Bloom Filters in Quasar (example) • Node 4 subscription information will be set in the bloom filter of node 2, but on the 1st level. Same for Node 5, but in the bloom filter of node 3. 6 4 2 1 3 5 7
Subscription mechanism • Each node periodically (Depends on whether the network status is static or not) sends it subscription list to its neighbors, which propagate it further, depends on the allowed TTL. • A node updates its proper routing table entry (Attenuated bloom filter) according to the information, and the direction (who’s the original sender node, and from which immediate neighbor it has been received).
Subscription mechanism(Alternative) • During our simulations we realized that the mechanism described above is very consuming in terms of time and traffic. Therefore, a different mechanism has been used in order to achieve the same goal. • Based on the fact that the simulation was executed on UDP over Kademlia-based KeyBasedRoutingnetwork, each node can reach another node regardless of the radius defined for Quasar. • Alternatively to the Quasar subscription mechanism, two steps were applied. In the first, each node requests information from each radius level, about its neighbors. After each node builds a picture of its radius neighborhood, it propagates its own subscription information to each of them directly.
Publication mechanism • When a node decides to publish a topic (A.K.A publisher node), it replicates a message alpha times, and sends it to a random set of neighbors. • When a node receives a publication, it can act in multiple ways • If it is the publisher (Message routed back), it acts as a “middle” node, and routes it randomly. • If the node is subscribed to the topic, it renews the TTL, and sends it again to alpha random neighbors (as if it published it). • Otherwise, the node searches the first routing table entry (level by level) which contains this subscription in the bloom filters, and routes it accordingly.
Publication mechanism (Continued) • The publication methodology may cause problems, which may prevent a message from “leaving a gravity well”, a case in which nodes within a small radius from the publishing node are subscribed to it, and route it between one another. • A set of methods have been applied (Negative information) • Each message contains information about the “already received subscribers”. To complete that, each node stores information about the publications it already received. • When routing, if a candidate entry is found (publication ID exists in the bloom filter), the entry won’t be used in one of the “received subscribers” are in the list of the reachable nodes. • A subscriber which receives a publication more than once, routes it randomly without duplicating it.
Simulation, scenarios and comparison • As mentioned before, the simulation was executed over the Kademlia-based KeyBasedRouting network, developed in the CS faculty. • The main focus of the comparison was the behavior of the Attenuated bloom filters when routing publications. • As a competitor, it has been decided that instead of using the routing table, messages will be propagated to a random neighbor.
Simulation, scenarios and comparison • Three scenarios were tested • Scenario 1Each node is randomly assigned ten subscriptions. Afterwards, each of the nodes in its turn publishes once. • Scenario 2A subset of publishers (10% of all the nodes) is selected from all the nodes (The also act as subscribes, but not to their own publications). Each node is randomly assigned a set of publications (A random number between 1 and half the number of publishers). Afterwards, periodically, each period of time (5 seconds), threepublicators are chosen randomly in order to publish. • Scenario 3A publisher node is chosen in random. 10% of all the nodes are chosen to be subscribers of that publisher. Afterwards, the publisher node publishes once.
Scenario conclusions • In this scenario the advantage of the routing table bloom filters applied in Quasar can be observed. By using the routing table, which contains information about the surrounding subscriptions, the messages were routed properly, which results in a high “hit rate”. It should be noted, that the high hit-rate provided a much higher traffic rate, because for each “first successful hit”, the message is being duplicated alpha times.
Scenario conclusions • This scenario is supposed to represent a “general” state of the network, in which there’s a set of publicators, which periodically publish to the entire network. Even though it seems like Quasar reduces the network traffic by a relatively high amount, the hit-rate is considerably low (at least 15% lower). One possible explanation may be the limitation of the bloom filter. One must keep in mind that one of the caveats of the attenuated bloom filters is the false positive entries that may appear. In our case, this can be resulted in false message routing.
Scenario conclusions • In this scenario, unlike the 1st scenario, the difference is much lower. But, it can be observed again, that by looking on a single publication, the routing policy of Quasar, which is based on information from the neighbors and the attenuated bloom filters, provides a better routing in the publish-subscribe methodology. • Please note that in the case of the Random routing, there was a very high variance rate, since there were cases in which the delivery rate was 1, as opposed to 0, or 0.5. The Quasar execution provided a much more stable rate.