240 likes | 460 Views
Distance Indexing on Road Networks. objects. query point. Modeling Road Networks. Network -> Undirected weighted graph Road junction -> Vertex (node) Road segment -> Edge Distance -> Edge weight Data object and query point -> On node only. Query Processing on Road Networks. Queries:
E N D
objects query point Modeling Road Networks • Network -> Undirected weighted graph • Road junction -> Vertex (node) • Road segment -> Edge • Distance -> Edge weight • Data object and query point -> On node only
Query Processing on Road Networks • Queries: • Window query • kNN, continuous kNN • Processing methods: • Network Expansion [Papadias VLDB03] • Use Euclidean distance for preliminary pruning • Indexing the objects byspatial index • Precomputed Index [Kolahdouzan VLDB04] • Voronoi Network Nearest Neighbor (VN3) • NN list: precompute and store the kNNs for some large-degree nodes 5
Problems and Disadvantages • Distance computation is still tough • By Dijkstra's single-source shortest path algorithm: • Maintain nodes whose distances are not finalized • Pick the node with the shortest distance and finalize it • Relax all not-yet-finalized distances • Repeat until all distances are finalized • Limitations: • Must visit nodes in the ascending order of distances • Running time O(NlgV) • Precomputed indexes cannot suit all queries • Return k nearest neighbor • Return the actual shortest path • Precomputed indexes are costly to store and update
Our Solution at a Glance • Distance signature --- the first general-purposed index on road networks that • Categorizes the distances of a node to all objects • Supports both rough and exact distance computation • Accelerates processing of common query types • Reduces the storage and maintenance cost • Is orthogonal to other query optimization techniques
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
3 6 12 24 Cat 0 Cat 1 Cat 2 Cat 3 Distance Signature • Basic Idea: • Precomputing distances is a good trade-off between having no indexing and solution space indexing • Maintain the approximate distance between objects and nodes • How rough is the approximation? • Apply rough approximation to faraway objects • Queries are always interested in local objects • Faraway objects are more than local objects • We use an exponential sequence of categories • In the form of [0, T), [T, cT), [cT, c2T), [c2T, c3T), ... • T and c are constant parameters • E.g., T = 3, c = 2, then [0, 3), [3,6), [6,12), [12,24), ...
Distance Signature (Cont'd) • For each node n, signature component S(n)[i] denotes the category of dist(n,i) • S(n)[i].link denotes the next node from n in the shortest path to i • Signature S(n) is the whole set of components S(n)[i]
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
Principle: trace back the link until the distance range is accurate enough p1 n3 p1p2: possible positions of n4 11 4 n2 n6 11 p2 Distance Operations on Signatures
Approximate Distance Comparison • What and Why? • Compare the distances of two objects based on one signature • Avoid accessing the signatures of other nodes • Used to get a rough result of distance sorting • How? • Example: compare dist(n4,n2) with dist(n4,n6) • Select an observer n3 • Embed objects n2,n3,n6into Euclidean space • n3 tells if n2 or n6 is closer to n4 • If n4 is on the perpendicularbisector, is it possible for n3to find n4 within distance ranges(n4)[n3]? • Let multiple observers vote
kNN Search on Signatures • Procedures • Read signature s(q) of query node q • Categories tell the approximate distances between q and other objects • Get k closest objects according to their category values • If no need to know the distances or order, return objects based on category ranges • To find the ordering: • Sort objects within each category • To find exact distances: • Perform exact distance retrieval for each knn
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
Smart Choice of Distance Categories • Exponential categories [0, T), [T, cT), [cT, c2T], ... • How to determine c and T? • Factors: • Dataset density, distribution • Query type, load (metric: spreading) • Storage availability • Simplifications • The road network is a uniform grid • Spreading is uniformly distributed in [0, SP] • Unlimited disk storage • Theorem • The optimal c = e, T = (SP/e)0.5
Signature Construction • Basic procedures • Allocate storage for signatures • Build shortest path spanning tree for each object (Dijkstra) • Fill in s(n)[i] when the tree of object i is spanned to node n • Variable length encoding • Observation • the number of objects in each category is not even • # of objects 1 unit, 2 units, 3 units, ... away: 4, 8, 12, ... • Use fewer bits for larger categories
Variable Length Encoding • Reverse zero coding • Based on Huffman encoding scheme • Under assumptions "exponential partition", "grid topology", "uniform distance range of queries", and c>1.5, this coding scheme is optimal • [0, T) [T, cT) [cT, c2T) [c2T, c3T) [c3T, ∞) • Average code length is approximately : Reverse coding 0000 0001 01 001 1 Fixed coding 000 001 010 011 100
Signature Compression • Idea: • Many objects share the same link not compressed in memory u v If s(n)[u] + s(u)[v] = s(n)[v], then s(n)[v] can be replaced by 1-bit flag n
Signature Update • Requirement • The shortest path spanning trees of all objects • A reverse index for each edge of trees that comprise this edge • limit the number of trees affected by the change of this edge • How (suppose edge (a,b) is updated) : • Find those affected spanning trees • For each affected tree of object c, check s(a)[c] or s(b)[c] (whichever is smaller) • Propagate to adjacent nodes until no more updates
Roadmap • Background • Distance Signature Overview • Operations on Signatures • Query Processing on Signatures • Smart Choice of Distance Categories • Construction and Maintenance • Experimental Results • Conclusion
Experiment Settings • Statistics • 183K nodes • 351K edges • Random edge weights from 1 to 10 • Page size: 4K bytes • kNN Competitors • Signature indexing • Full indexing (NN list for all nodes) • Network Voronoi Diagram (NVD) from VN3 • Tuning parameters • p: object density • T, c, k • Comparison metrics: page access (I/O cost), CPU time
Index Construction Cost Good for medium and sparse datasets
KNN Search Performance Moderate performance over various k
Robustness The choice of parameters does not make large difference
Conclusion • Our Contributions • The first index for distance computation on road networks • Speed up general query processing • Optimal choice of distance categories and category encoding • Future work • Cross-node signature compression • The signatures of nearby nodes are similar • Derivation of optimal distance categories for a wider range of network topologies and object distributions