A Survey of Techniques for Designing I/O-Efficient Algorithm

A Survey of Techniques for Designing I/O-Efficient Algorithm S.FahimehMoosavi Fall 1389

Basic Techniques • Scanning -N/B I/Os while linear scanning the whole array. • Sorting -O((N/B)logM/B N/B) I/Os.

Simulation of Parallel Algorithms in External Memory

PRAM [Parallel Random Access Machine] • p processors, each with local memory • Each processor has unique id in range 0-(p-1) • Shared memory reads and writes

At each unit of time, a processor is either active or idle (depending on id) • At each time step, all processors may execute different instructions on different data. Note: any pair of processor Pi Pjcan communicate in constant time! Piwrites the message in cell x at timet Pjreads the message in cell x at timet+1 Measures of performance: 1. Running time. 2. Amount of work it performs.

PRAM-algorithm A : Uses: • N processor • O(N) space Run time: O(T(N)) Assumption: every computation step of a processor consists of a constant number of write/read accesses to shared memory.

Simulation one step of algorithm A • Scan the list of processor context (read accesses read requests). • Sort the resulting list of read request by the memory locations they access. • Scan the sorted list of read request and memory representation. • Sort the list of read request again, by the issuing processor. • Scan the sorted list of read request and the list of processor context to transfer the requested operands to each processor.

O(1) scans of list of processor context. • O(1) scans of the representation of the shared memory . • A constant number of times scanning and sorting the list of read/write request. • All this lists have size O(N). Consequence: Simulation one step of algorithm A takes O(sort(N)) I/Os. Theorem 3.2. A PRAM algorithm that uses N processors and O(N) space and runs in time T(N) can be simulated in O(T(N).sort(N)) I/Os.

Time-Forward Processing

Evaluating a DAG G L: an assignment of labels L(v) to the vertices of G. Goal: compute another labelling Sof the vertices of G so that for every vertex vϵG, S(v) be computed from L(v) and S(u1), ... , S(uk) (u1, … , uk: in-neighbors of v).

Expression-tree evaluation Input: a binary tree T whose leaves store real number and internal vertices store binary operation. If v is leaf then val(v)=number stored at v. If v is internal vertex with label o, left child x, right child y then val(v) = val(x) o val(y).

Evaluate a DAG G I/O-efficiently Two assumption: • the vertices of G have to be stored in topologically sorted order. • label S(v) has to computable from labels L(v) and S(u1),..., S(uk) in O(sort(k)) I/Os.

Insertion and deletemin operations on Q (priority queue) be performed in O((1/B).log(|E|/B)M/B). • Total number of priority queue operations: O(|E|) (Every edge inserted into and deleted from Q exactly once). Consequenc: all updates of priority queue takes O(sort(|E|)) I/Os.

Note: • Vertex set + adjacency lists scanned: O(scan(|V| + |E|)) I/Os. • Computation labels S(v) from L(v) and S(u1),..., S(uk), for all vϵG, takes O(sort(|E|)). Theorem 3.3. given a DAG G=(V,E) whose vertices are stored in topologically sorted order, graph G can be evaluated in O(sort(|V|+|E|)) I/Os, provided that the computation of the label of every vertex vϵG can be carried out in O(sort(deg-(v))) I/Os, where deg-(v) is the in-degree of vertex v.

Greedy Graph Algorithm

A vertex labelling algorithm A call: • single-pass: if it compute the desired labelling of vertices of the graph by visiting every vertex exactly once. • local: if label L(v) can be computed in O(sort(k)) I/Os from labels L(u1),...,L(uk), where u1,...,ukthe neighbors of v whose labels are computed before L(v). • Presortable: if there is an algorithm that take O(sort(|V|+|E|)) I/Os to compute an order of the vertices of the graph so that A produces a correct result if it visits the vertices of the graph in this order.

Main Problems at Make Algorithm A I/O-efficient • determine an order in which algorithm A should visit the vertices of graph. • devise a mechanism that provides every vertex v with the labels of its previously visited neighbors.

Theorem 3.4. Every graph problem P that can be solved by a presortable local single-pass vertex labelling algorithm can be solved in O(sort(|V|+(|E|)) I/Os. Proof: A:presortable local single-pass vertex labelling algorithm. L: labelling of vertices of a graph G=(V,E). Á: an algorithm that takes O(sort(|V|+(E)) I/Os to compute an order of the vertices of G (numbering the vertices) G ́́: a derived DAG from G by directing every edge from the with smaller number to the vertex with larger number. Hence, labelling L can be computed using time-forward processing.

Computing a Maximal Independent Set In internal memory: Process the vertices in an arbitrary order. When a vertex vϵV is visited, add it to S if none of its neighbors is in S. Translate into a labelling problem: Xs :V→{0,1} If vϵS then Xs(v)=1, If vϵS then Xs(v)=0. Theorem 3.5. Given an undirected graph G=(V,E), a maximal independent set of G can be found in O(sort(|V|+(E)) I/Os and linear space.

Coloring Graphs of Bounded Degree In internal memory: Process the vertices in an arbitrary order. When a vertex vϵV is visited, assign a color c(v) ϵ {1, …, Δ+1} to vertex v that has not been assigned to any neighbor of v. Theorem 3.6. Given an undirected graph G=(V,E) whose vertices have degree at most Δ, a (Δ+1)-coloring of G can be found in O(sort(|V|+(|E|)) I/Os and linear space.

Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. 6 5 4 2 3 1 0

Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. 6 5 4 2 3 1 0

Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 6 5 4 2 3 0

Application: coloring Graphs of bounded degree G with degree 3 to be colored with 3+1 colors -choose an arbitrary order. -assign the color to the vertex that has not been chosen by its neighbors. To choose a color for a vertex, order all colors of its neighbors and assign the first unused color to it. 6 5 4 2 3 1 0

List Ranking and the Euler Tour Technique

List Ranking List ranking problem: computing distance from head of linked list L to xi, for every vertex of L (the number of edges on the path from head of L to xi). Solving in internal memory: Starting at the head of the list, follow successor pointers and number the vertices of the list from 0 to N-1 in the order they are visited.

Generalization of the List Ranking(prefix product) I/O complexity: O(sort(N)) Input: λ: {x1 ,…,xN} → X  : X×X → X Output: Ø(xi) For each vertex xi of L such that • Ø(xσ(1))=λ(xσ(1)) • Ø(xσ(i))= Ø(xσ(i-1))  λ(xσ(i)) (1< i ≤ N) Where σ=[1,N] → [1,N] is a permutation, And xσ(1) is the head of L, And succ(xσ(i))=xσ(i+1).

3 1 5 2 3 1 0 1 2 3 4 5 3 4 9 11 14 15 Example List ranking: Generalization:

1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 Internal memory algorithm is not I/O-efficient The internal memory algorithm spends W(N) I/Os in the worst case.

An Efficient List Ranking Algorithm • Find an independent set I of L so that |I|=Ω(N). • Remove elements of I from L. for every element x ϵ I with predecessor y and successor z in L let succ(y)=z. The label of x multiplied with the label of z, and result assigned to z. • Apply this algorithm recursively to the compressed list. • Compute the ranks of elements in I by multiplying their labels with the ranks of their predecessors in L.

3 1 5 2 3 1 3 1 5 2 3 1 3 1 7 4 3 4 11 15 3 4 9 11 14 15 Example

I/O-Complexity • Every step, except the recursive invocation, takes O(sort(N)) I/Os. • Total I/O-complexity: Ι(N)=Ι(cN)+O(sort(N)) (0<c<1). • Solution of this recurrence: O(sort(N)). • Theorem 3.7. A list of length N can be ranked in O(sort(N)) I/Os.

r The Euler Tour Technique Euler tour of a tree: a traversal of T that traverses every edge exactly twice, once in each direction. Tour is represented as a linked list L whose elements are the edges in the set {(v,w),(w,v):{v,w} ϵ E} so that for any two consecutive edges e1 and e2, the target of e1 is the source of e2.

Define an Euler tuor • Choose a circular order of the edges incident to each vertex. • Let {v,w1} , … , {v,wk} be the edges incident to vertex v. then let succ((wi,v))=(v,wi+1) for 1≤i<k and succ((wk , v))=(v,w1). • Now by choosing an edge (v,r) with succ((v,r))=(r,w), setting succ((v,r))=null, and choosing (r,w) as the first edge of the traversal.

Computing List L Input: a tree T=(V,E) Output: an tour L • Scan set E to replace every edge {v,w} with two directed edge (v,w) and (w,v). • Sort the resulting set of directed edges by their target vertices. • Scan the sorted edge list to compute the successor of every edge in L. Lemma 3.8. an Euler tour L of a tree with N vertices can be computed in O(sort(N)) I/Os.

Rooting Tree Rooting tree T= computing for every edge {v,w} who is the parent and who is the child. Definition: for every pair of opposite edges (u,v), (v,u) in the ranked euler tour, we call: Forward edge: the edge with the lower rank. Back edge: the other.

Algorithm Input: an unrooted (and undirected) tree T and a special vertex r. Output: For each vertex v  r, the parent p(v) of v in the tree rooted at r. • Construct an euler tour starting at an edge (r,v). • Compute the rank of every edge in the list. For every pair of adjacent vertices x and p(x), edge (p(v) ,x) is a forward edge, and edge (x, p(v)) is a back edge.

I/O Complexity • Constructing euler tour starting at r: O(sort(N)) I/Os. • Ranking euler tour: O(sort(N)) I/Os. • Extracting the set of forward edge: O(sort(N)) I/Os. I/O complexity: O(sort(N))

1 1 1 0 9 1 9 0 8 2 10 1 2 0 8 1 5 0 5 1 6 7 0 4 3 6 1 3 0 8 1 7 0 4 1 8 1 4 0 7 0 3 4 5 8 9 Computing a Preorder Numbering A preorder numbering of a rooted tree T can be computed in O(sort(N)) I/Os. preorder#(r) = 1 preorder#(v) = rank((p(v),v))+1

10 1 9 9 8 8 1 2 8 5 3 5 6 4 3 1 3 8 7 4 8 4 7 3 1 1 1 1 Computing Subtree Sizes The nodes of T can be labelled with their subtreesizes in O(sort(N)) I/Os.

Graph Blocking

Blocking Graph Goal: laying out graphs on disk so that traversals of paths in this graphs cause as few page faults as possible. Assumptions: • Graph to be stored on disk is static. • The paths are traversed in an online fashion. Measures of performance: • Number of page faults incurred by a path traversal in the worst case. • Amount of space used by the graph representation.

Notes: • In order to store a Graph with N vertices at least N/B blocks are required. • The traversal of a path with length L causes at least L/B page faults Definition: storage blow-up a graph blocking to be β if it uses βN/B blocks of storage to store the graph on disk.

Blocking List Natural Approach, β = 1 • Simple traversal in direction 1.. N, With the traversal a path of length L only L/B page faults occur. • More complicated traversal, alternatives if M≥2B, Keep last block in Memory so a page fault occurs every B-1step. With the traversal a path of length L at most L/B page faults occur. 1 2 3 4 ... N

The Pathological Situation M=B An adversary can choose a path that causes a page fault every single step by choosing a path p=(v, w, v, w, …) Whenever vertex v is visited, the block containing v is brought into main memory, thereby overwriting the block containing w. v v w w

Thwarting the adversary’s strategy: Choose β = 2, In a second array stores a copy of the array with an offset B/2. This implies that the visited vertex v is at least B/2-1steps from the next page fault away, since the page handler alternates between the two arrays every time a page fault occurs. Result: Traversing a path of L now incurs at most 2L/B page faults.

Blocking Trees To blocking trees needs some more restrictions on the tree or the type of traversal. Consider a tree with internal degree M, then for any vertex v at most M-1 of its neighbors can reside in memory at the same time as v. So an adversary could always choose the missing neighbor to cause a page fault. Result: For unrestricted traversals, a good blocking of a tree can be achieved if the degree of the vertices of the tree is bounded by some constant d.

Construct Layout Choose one vertex r of T as the root, construct two partitions with layers of height logdB. i-th layer contains: • Partition 1: all vertices have distance (i-1)logdB... ilogdB-1 • Partition 2: all vertices have distance (i-1/2)logdB ... (i+1/2)logdB-1 logdB logdB

A Survey of Techniques for Designing I/O-Efficient Algorithm