I/O and Space-Efficient Path Traversal in Planar Graphs

I/O and Space-Efficient Path Traversal in Planar Graphs Craig Dillabaugh, Carleton University Meng He, University of Waterloo Anil Maheshwari, Carleton University Norbert Zeh, Dalhousie University

Background: Succinct Data Structures • What are succinct data structures (Jacobson 1989) • Representing data structures using ideally information-theoretic minimum space • Supporting efficient navigational operations • Why succinct data structures • Large data sets in modern applications: textual, genomic, spatial or geometric

Background: External Memory Model • Parameters • N: number of elements in the problem instance • M: size of the internal memory • B: size of a disk block • Cost: number of I/O’s (block transfers) between internal memory and external memory External Memory Internal Memory CPU Block Aggarwal and Vitter 1988

Our Contributions • Our goal is to design data structures that are both succinct and efficient in the External Memory setting • Our results • A succinct representation of bounded-degree planar graphs that supports I/O-efficient path traversal • A succinct representation of triangulated terrains that supports various geometric queries

Notation • N: number of vertices of the given graph G • d: maximum degree of vertices • q: number of bits required to encode the key of each vertex • K: the length of the path 5 3 1 12 18 3 4 9 9 22 4

Two-Level Partition • A tool: graph separator (Frederickson 1987) • Size of each subgraph (region): r • Number of regions: Θ(N/r) • Number of boundary vertices: O(N/(r1/2)) • Two-level partition • Subdivide G into regions of fixed maximum size • Subdivide each region into sub-regions of smaller fixed maximum size • Types of vertices for each region / subregion • Interior vertices • Boundary vertices

α-Neighbourhood • Definition • Beginning with a given vertex v, we perform a breadth-first search in G and select the first αvertices encountered • The α-neighbourhood of v is the subgraph of G induced by these vertices • Internal and terminal vertices • Property: The distance between v and any terminal vertex in its α-neighbourhood is at least logdα • In our representation, we store α-neighbourhood of each boundary vertex. If a sub-region boundary vertex is interior to a region, we add an additional constraint that its α-neighbourhood cannot be extended beyond the region

Overview of LabelingScheme • Labels at three levels for the same vertex • Graph-label (unique) • Region-label (one or more) • Subregion-label (one or more) • Assign the labels for bottom up

Sub-Region Labels • Encoding subregionRi,j using any succinct representation for planar graphs • This induces a permutation of the vertices in Ri,j • Subregion-label: the kth vertex in the above permutation has subregion-label k in Ri,j

1, 2, 3, 4, 5, 6 7, 8, 9, 10, 11, 12,13,14,15 … 1, 2, 3, 4, 5, 6 1, 2, 3, 4, 5, 6, 7 1, 2, 3, 4, 5 Region-Labels and Graph-Labels R1 R1,3 R1,2 R1,1 The assignment of graph-labels are similar Succinct structures of o(n) bits are constructed to support conversion between labels at different levels in O(1)I/O’s

Data Structures • Denote by A the maximum number of vertices that may be stored in a block, and this is our maximum sub-region size • Choose Alg3N to be the maximum size of each region • We only encode sub-regions and α-neighbourhoods of boundary vertices as components • Encode the graph structure of each component in a succinct fashion • Information is encoded so that we can retrieve the graph labels of the internal vertices in an α-neighbourhood without requiring additional I/O’s

Space Analysis • We assume B = Ω(lg N) • A = (B lg N) / (c + q) • c: number of bits per vertex required to the sub-graph structure and boundary bit vector • Choose α = A1/3 • Intuitively, our structures are space-efficient because: • Region boundary vertices are few enough, so that information such as the graph labels of the vertices in their α-neighbourhoods do not occupy too much space • The number of sub-region boundary vertices is larger, but information such as region-labels uses fewer bits (lg (Alg3N)) • Total space: O(N) + Nq + o(Nq) bits

Traversal Algorithm • Load either a sub-region or the α-neighbourhood of a boundary vertex • Traverse the above component until a boundary/terminal vertex is encountered • Load the next component from external memory and traversal continues

I/O Efficiency • Observations • When encountering a terminal/boundary vertex, the next component can be loaded in O(1) I/O’s • Given a component, the graph labels of all interior/internal vertices can be reported without incurring any additional I/O’s • By loading a constant number of components, we can visit Ω(lg B) vertices along the path • I/O complexity: O(K / lg B)

Main Result • A succinct representation of bounded-degree planar graph: • Space: O(N) + Nq + o(Nq) bits • I/O complexity for path traversal: O(K / lg B)

Terrains Modeled as Triangular-Irregular Network • Notation • N: number of points • Φ: number of bits required to store the coordinates of each point • Space: • NΦ + O(N) + o(NΦ) bits • I/O complexity: • Reporting a path crossing K faces: O(K / lg B)

Queries on Triangulated Terrains • Point location: O(log B N) I/O’s • Terrain profile: O(K / lg B) I/O’s • Trickle path: O(K / lg B) I/O’s • Connected component • O(K / lg B) I/O’s if the component is convex • Can be generalized to components that are not convex, though the result is more complex

Conclusions • We designed a succinct representation of bounded-degree planar graphs that supports I/O-efficient path traversal, and applied this to terrains modeled as TIN to support queries • This provides solutions to modern applications that process very large data • Future work: combining succinct data structures and external memory data structures for other problems

Thank you!

I/O and Space-Efficient Path Traversal in Planar Graphs