Redundant Bit Vectors for Fast High-Dim Lookup

Redundant Bit Vectors for Fast High-Dim Lookup Database & CCSP Research Groups Jonathan Goldstein, John Platt, Chris Burges Problem Statement Our Three Key Ideas Partitioning the Queries bin Query S2 S1 • Use redundancy to combat high dimensionality. • Use bit vector indices to keep the representation small. • Use redundancy by partitioning the queries, not the data. S3 dim 1 BV1 BV2 BV3 BV4 Query S4 S5 S1 S5 S4 S2 Data sphere S6 S3 dim 2 • Find all data spheres that contain the query point • Low, non-zero error is OK BV5 BV7 BV8 BV6 Construct BV6: 101101 Bit Vector Indices • Redundancy: point stored as “yes” result for multiple bins per dimension • When querying, a single bit vector index is picked per dimension; ANDed together. • The results are post-filtered using the actual data spheres. Applications Bit vector indices • Server for audio fingerprinting: look up matches for audio • Fingerprinting in general • Pub/sub filtering 1 0 0 1 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 0 0 1 Point ID AND Existing Technology Results for RARE • Space partitioning with no redundancy • R-trees, SS-trees, etc. • worse than linear scan for truly high dimensional data! • Advantages: • Small: 1 bit/point/query part • Fast: 1 CPU cycle operates on 32 points in parallel ! • 56x faster than linear scan • Introduces no false negatives • Post-filtering 1000x faster than full linear scan

Redundant Bit Vectors for Fast High-Dim Lookup

Redundant Bit Vectors for Fast High-Dim Lookup

Presentation Transcript

Dim Sum

Dim Dim

FLUTE: Fast Lookup Table Based RSMT Algorithm for VLSI Design

FAST Protocols for High Speed Network

Redundant Bit Vectors for the Audio Fingerprinting Server

Small Forwarding Tables for Fast Routing Lookup

DIM

Section 9.2 Vectors in 3-dim space

STP: A Decision Procedure for Bit-vectors and Arrays

Fast moving high bit rate users and terminals

FAST Protocols for High Speed Network

Fast Forwarding Table Lookup Exploiting GPU Memory Architecture

PARALLEL-SEARCH TRIE-BASED SCHEME FOR FAST IP LOOKUP

Satisfiability modulo the Theory of Bit Vectors

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

MIPS Extension for a TCAM Based Parallel Architecture for Fast IP Lookup

Fast IP Address Lookup Algorithms

SRINKAGE FOR REDUNDANT REPRESENTATIONS ?

Parallel-Search Trie-based Scheme for Fast IP Lookup