260 likes | 430 Views
The Variable-Increment Counting Bloom Filter. Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion , Israel. Problem Definition. Support queries of the form Requirements for data structure: Space efficient Fast (Insertion, Query). Flow x. Flow y. Flow z.
E N D
The Variable-Increment Counting Bloom Filter Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel
Problem Definition • Support queries of the form • Requirements for data structure: • Space efficient • Fast (Insertion, Query) Flow x Flow y Flow z Flow u Flow y Yes No Set S (Special Flows) Flow y
Naïve Solutions • O(n) – Searching in a list • O(log(n)) – Searching in a sorted list • O(1) ? • Tradeoff: We allow False Positives with low probability • Two possible errors • False Positives - but the answer is • False Negatives - but the answer is Flow x Flow y Flow y Flow z Set S (Special Flows)
Bloom Filters (Bloom, 1970) • Initialization: Array of zero bits. • Insertion: Each of the elements is hashed times, the corresponding bits are set. • Query: Hashing the element, checking that all bits are set. • False positive rate (probability) of . • No false negatives. 0 0 0 0 0 0 0 0 0 0 0 0 y x 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 z w x
Counting Bloom Filters (CBFs) • Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. • The solution: Counting Bloom filters - Storing array of counters instead of bits. • Insertion: Incrementing counters by one. • Deletion: Decrementing counters by one. • Query: Checking that counters are positive. • The same false positive probability. • Require too much memory, e.g. 57 bits per element for . y x 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 y x +1 +1 +1 +1 +1 +1 0 1 0 1 0 0 2 0 1 0 1 0
(Counting) Bloom Filters are Widely Used • Packet Classification • Intrusion Detection • Routing • Accounting • Beyond networking: Spell Checking, DNA Classification • Can be found in • Google's web browser Chrome • Google's database system BigTable • Facebook's distributed storage system Cassandra • Mellanox's IB Switch System
Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary
Intuition for Variable Increments • Upon query, we should consider the exact values of the counters and not just their positiveness. • Idea: Use variable increments to encode the element identity. 0 1 0 2 4 0 1 7 0 1 2 1 y x 8
Architecture • Each hash entry contains a pair of counters: • , fixed increments → number of elements in entry (as in CBF) • , variable increments → weighted sum of elements • weights from a pre-determined set • We use two sets of hash functions: • The first set uses hash functions with range • , i.e. it points to the set of entries. • The second set uses hash functions with range , i.e. it points to the set . 1 2 3 4 5 6 7 8 9 0 5 3 2 2 3 3 3 4 c1 c2 0 34 25 26 17 21 9 6 26 9
Insertion • Insertion: • At each entry , the two counters are updated as follows. • from the set • Example 1: 1 2 3 4 5 6 7 8 9 0 5 3 2 2 3 3 3 4 34 01 34 c1 4 5 3034 2529 c2 0 34 25 17 17 21 9 13 26 3043 08 +8 +4 +13 +4 x z 10
Query 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 c1 c2 0 34 25 17 30 21 30 13 26 • Query ( with ) • Weask whether • 17 can be a sum of 2 elements from the set including 4 • 30 can be a sum of 3 elements from the set including 8 • No: • How should we pick the set of variable increments? Flow y 4? 8? y? • We should use Sequences! 11
Bh Sequences • Definition 1: • Let be a sequence of positive integers. • Then, is a sequence iff all the sums • with are distinct. • Example 2: • All the sums of elements of are distinct: • Therefore, is a sequence. • sequencesare widely used in error-correcting codes.
The Bh-CBF Scheme Query 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 c1 c2 0 34 25 17 30 21 30 13 26 1? X? • Example 3: is a sequence • Since , then the Bh-CBF can determine that 4? 13
The Bh-CBF Scheme Operations The Bh-CBF Scheme Query 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 c1 c2 0 34 25 17 30 21 30 13 26 1? X? • Example 3: is a sequence • Here, and then necessarily • Since , the Bh-CBF can determine that 4? 4? 8? y? 13
The Bh-CBF Scheme Operations The Bh-CBF Scheme Query 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 c1 c2 0 34 25 17 30 21 30 13 26 1? X? • Example 3: is a sequence • Since , the Bh-CBF cannot exclude that 4? 13? 4? 4? 8? z? y? 13
Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary
The VI-CBF Scheme Principles • Two counters in each hash entry use more space. • Can we only keep the variable increment counter? • In the VI-CBF (Variable-Increment Counting Bloom Filter), each hash entry only contains the variable-increment counter. • The counter is updated like thevariable-increment counter in the • Bh-CBF. 1 2 3 4 5 6 7 8 9 0 5 3 2 2 3 3 3 4 c1 c2 0 34 25 26 17 21 9 6 26 15
The VI-CBF Scheme Principles 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 c1 c2 0 34 25 17 30 21 30 13 26 • cannot be a sum of 3 elements from the set including 8 • However, can be a sum of 5 elements from the set including 8 • Problem: We do not know the number of elements in each hash entry. • Example 4: (with the sequence ) 4? 8? y? 16
The VI-CBF Scheme Principles • In the VI-CBF , the set of variable increments is not necessarily a sequence • Example 5: • Based on or , the VI-CBF can deduce that y x +7 +5 +4 +5 +5 +4 7 9 4 5 5 7 5 6 z 0 0 0 0 0 0 0 0 0 0 0 0 17
A Simple Option for D:DL = [L, 2L-1] • For , we define the set of size as • Intuition: • Lemma 1: • Let be an element whose -th hash function hashes into an • entry of the value If then • sum of • zero elements • sum of • one element • sum of • two or more elements • not • possible • not • possible 18
VI-CBF Outperforms CBF • Theorem 1: • While keeping the same bit-per-element ratio , VI-CBF satisfies • the following properties when compared to CBF: (i) VI-CBF obtains a lower false positive rate than CBF. (ii) (iii) VI-CBF obtains a lower counter overflow probability bound than the classical bound of CBF. • Cost: Limited implementation overhead. 19
Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary
Experimental Results • Internet trace (equinix-chicago)with real hash functions. • For the Bh-CBF, (with ). • For the VI-CBF, and . . 21
Concluding Remarks • Encoding the element identity using Variable Increments • Considering the exact values of the counters upon query • Can extend many variants of the counting Bloom filter • First time sequences are presented in networking applications 22