240 likes | 269 Views
Explore how Distance Signatures enhance query processing efficiency on road networks by categorizing distances, supporting various query types, and reducing maintenance costs. Learn about distance operations, smart distance categories, and their construction for better results.
E N D
objects query point Modeling Road Networks • Network -> Undirected weighted graph • Road junction -> Vertex (node) • Road segment -> Edge • Distance -> Edge weight • Data object and query point -> On node only
Query Processing on Road Networks • Queries: • Window query • kNN, continuous kNN • Processing methods: • Network Expansion [Papadias VLDB03] • Use Euclidean distance for preliminary pruning • Indexing the objects byspatial index • Precomputed Index [Kolahdouzan VLDB04] • Voronoi Network Nearest Neighbor (VN3) • NN list: precompute and store the kNNs for some large-degree nodes 5
Problems and Disadvantages • Distance computation is still tough • By Dijkstra's single-source shortest path algorithm: • Maintain nodes whose distances are not finalized • Pick the node with the shortest distance and finalize it • Relax all not-yet-finalized distances • Repeat until all distances are finalized • Limitations: • Must visit nodes in the ascending order of distances • Running time O(NlgV) • Precomputed indexes cannot suit all queries • Return k nearest neighbor • Return the actual shortest path • Precomputed indexes are costly to store and update
Our Solution at a Glance • Distance signature --- the first general-purposed index on road networks that • Categorizes the distances of a node to all objects • Supports both rough and exact distance computation • Accelerates processing of common query types • Reduces the storage and maintenance cost • Is orthogonal to other query optimization techniques
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
3 6 12 24 Cat 0 Cat 1 Cat 2 Cat 3 Distance Signature • Basic Idea: • Precomputing distances is a good trade-off between having no indexing and solution space indexing • Maintain the approximate distance between objects and nodes • How rough is the approximation? • Apply rough approximation to faraway objects • Queries are always interested in local objects • Faraway objects are more than local objects • We use an exponential sequence of categories • In the form of [0, T), [T, cT), [cT, c2T), [c2T, c3T), ... • T and c are constant parameters • E.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24), ...
Distance Signature (Cont'd) • For each node n, signature component S(n)[i] denotes the category of dist(n,i) • S(n)[i].link denotes the next node from n in the shortest path to i • Signature S(n) is the whole set of components S(n)[i]
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
Principle: trace back the link until the distance range is accurate enough p1 n3 p1p2: possible positions of n4 11 4 n2 n6 11 p2 Distance Operations on Signatures
Approximate Distance Comparison • What and Why? • Compare the distances of two objects based on one signature • Avoid accessing the signatures of other nodes • Used to get a rough result of distance sorting • How? • Example: compare dist(n4,n2) with dist(n4,n6) • Select an observer n3 • Embed objects n2,n3,n6into Euclidean space • n3 tells if n2 or n6 is closer to n4 • If n4 is on the perpendicularbisector, is it possible for n3to find n4 within distance ranges(n4)[n3]? • Let multiple observers vote
kNN Search on Signatures • Procedures • Read signature s(q) of query node q • Categories tell the approximate distances between q and other objects • Get k closest objects according to their category values • If no need to know the distances or order, return objects based on category ranges • To find the ordering: • Sort objects within each category • To find exact distances: • Perform exact distance retrieval for each knn
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
Smart Choice of Distance Categories • Exponential categories [0, T), [T, cT), [cT, c2T], ... • How to determine c and T? • Factors: • Dataset density, distribution • Query type, load (metric: spreading) • Storage availability • Simplifications • The road network is a uniform grid • Spreading is uniformly distributed in [0, SP] • Unlimited disk storage • Theorem • The optimal c = e, T = (SP/e)0.5
Signature Construction • Basic procedures • Allocate storage for signatures • Build shortest path spanning tree for each object (Dijkstra) • Fill in s(n)[i] when the tree of object i is spanned to node n • Variable length encoding • Observation • the number of objects in each category is not even • # of objects 1 unit, 2 units, 3 units, ... away: 4, 8, 12, ... • Use fewer bits for larger categories
Variable Length Encoding • Reverse zero coding • Based on Huffman encoding scheme • Under assumptions "exponential partition", "grid topology", "uniform distance range of queries", and c>1.5, this coding scheme is optimal • [0, T) [T, cT) [cT, c2T) [c2T, c3T) [c3T, ∞) • Average code length is approximately : Reverse coding 0000 0001 01 001 1 Fixed coding 000 001 010 011 100
Signature Compression • Idea: • Many objects share the same link not compressed in memory u v If s(n)[u] + s(u)[v] = s(n)[v], then s(n)[v] can be replaced by 1-bit flag n
Signature Update • Requirement • The shortest path spanning trees of all objects • A reverse index for each edge of trees that comprise this edge • limit the number of trees affected by the change of this edge • How (suppose edge (a,b) is updated) : • Find those affected spanning trees • For each affected tree of object c, check s(a)[c] or s(b)[c] (whichever is smaller) • Propagate to adjacent nodes until no more updates
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
Experiment Settings • Statistics • 183K nodes • 351K edges • Random edge weights from 1 to 10 • Page size: 4K bytes • kNN Competitors • Signature indexing • Full indexing (NN list for all nodes) • Network Voronoi Diagram (NVD) from VN3 • Tuning parameters • p: object density • T, c, k • Comparison metrics: page access (I/O cost), CPU time
Index Construction Cost Good for medium and sparse datasets
KNN Search Performance Moderate performance over various k
Robustness The choice of parameters does not make large difference
Conclusion • Our Contributions • The first index for distance computation on road networks • Speed up general query processing • Optimal choice of distance categories and category encoding • Future work • Cross-node signature compression • The signatures of nearby nodes are similar • Derivation of optimal distance categories for a wider range of network topologies and object distributions