240 likes | 321 Views
An Approximation Algorithm for Binary Searching in Trees. Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio). Searching in sorted lists . Sorted list of numbers Marked number m Find the marked number using queries ‘ x ≤ m? ’. 10. 6. 14. 3. . 5.
E N D
An Approximation Algorithm for Binary Searching in Trees Marco Molinaro Carnegie Mellon University joint work with Eduardo Laber (PUC-Rio)
Searching in sorted lists • Sorted list of numbers • Marked number m • Find the marked number using queries ‘x ≤ m?’ 10 6 14 3 ... 5
10 6 14 3 ... 5 DT 10 ≤ > 10 5 14 ≤ > > ≤ 5 3 6 14 ≤ > ≤ > 6 3 5 6 6 10 Searching in sorted lists • Search strategy: procedure that indicates which number should be queried next • Can be represented by a decision tree (DT) • # queries to find m = path length
10 6 14 3 ... 5 Searching in sorted lists • We are given the probability of each number being the marked one • Expected number of queries of a strategy = expected path length of the corresponding decision tree • Efficient strategy is one with minimum expected path 0,05 0,1 0,5 0.2 ... 0,1 10 ≤ > 5 14 ≤ > > ≤ 3 6 14 ≤ > ≤ > 0,5 3 5 6 10 0,05 0,1 0,2 0,1
Searching in trees • Tree with exactly one marked node m • We can query an arc and find out which endpoint is closer to the marked node
a DT (c,d) ~c ~d c b (a,b) (f,h) ~h ~f ~a ~b d (d,f) b ~f ~d f h Searching in trees • Search strategy: procedure that indicates which arc should be queried next • Can be represented by a decision tree f
a c b d h Searching in trees • Search strategy: procedure that indicates which arc should be queried next • Can be represented by a decision tree • # queries to find m = path length DT (c,d) (c,d) ~c ~d (a,b) (f,h) (f,h) ~h ~f ~a ~b (d,f) (d,f) b ~f ~d f f f
Searching in trees • We are given the probability of each node being the marked one • Expected number of queries is the expected path length of the corresponding decision tree • The goal is to find a DT with minimum expected path a (c,d) ~c ~d b c (a,b) (f,h) .1 .2 ~h ~f ~a ~b .1 d (d,f) b ~d ~f .2 .3 f f .3 h
Searching in trees • Def: Given a tree T and weights w, compute a decision tree for searching in T with minimum expected path from root to leaves w.r.t. w • Motivation • Generalizes searches in totally ordered structures to (one type of) partially ordered structures • Application to software testing and filesystem synchronization
Related work • Searching in sorted lists • Worst-case • Binary search is optimal • Average-case • Knuth [Acta Informatica 71]: O(n2) • de Prisco, de Santis [IPL 93]: good approximation in linear time
Related work • Searching in trees • Worst-case • Ben-Asher et al. [SIAM J. Comput. 99]: O(n4 log3 n) • Onak, Parys [FOCS 06]: O(n3) • Mozes et al. [SODA 08]: O(n) • Average-case • Kosaraju et al. [WADS 99]: O(logn)-approximation
Related work • Searching in posets • Worst-case • Arkin et al. [Int. J. Comput. Geometry Appl. 98]: O(log n)-approximation • Carmo et al. [TCS 04] • Finding optimal strategy is NP-Hard • Constant-factor approximation for random posets • Average-case • Kosaraju et al. [WADS 99]: O(logn)-approximation
Our results • First constant-factor approximation for searching in trees (average-case metric) • Linear running time
Overview • We know how to search in sorted lists with probabilities • Searching in paths = searching in ordered lists
Overview • Search strategy
Algorithm • Find a (heavy) path • Compute a decision tree for this path • Append decision trees for querying the hanging arcs • Recursively find strategies for the hanging subtrees and append them
subtrees Tij input tree T Analysis • T – input tree • w(u) – likelihood of node u being the marked one • w(T’) = ∑u є T’ w(u) • Tij – Hanging subtrees of T • Cost of a decision tree – expected path length
entropy of {w(u)} Analysis – upper bound ALGO(T) = expected path of the computed DT = cost(■) + cost(■) + cost(■) ≤H + w(T) + ∑i,j j w(Tij) + ∑i,jALGO(Tij) decision tree input tree T
LB1: LB2: only when H is large for all H, ALGO(T) ≤α OPT(T) Analysis – lower bounds UB: • When H >> w(T) • UB andLB1 • When H ≤ w(T) • UB and (LB1 + LB2)
These paths cost ≥ Analysis – entropy lower bound • OPT(T) = from root to (■) + from (■) to (■) + from (■) to leaves • from root to (■): using Shannon’s lossless coding theorem, we can lower bound byH / log 3 – w(T) • from (■) to (■): • There are at most 2 purple nodes per level • from (■) to leaves: • Every query to arcs in the trees Tij are descendants of purple nodes • Costs at least as much as searching inside the trees Tij, namely ∑i,jOPT(Tij) D*
Analysis – alternative lower bound • OPT(T) ≥ from root to (■) + from (■) to leaves • from root to (■): • Costs = ∑i,j distance to i-th purple node . w(Tij) • At most one purple node can have distance 0 • w(Tij) ≤ w(T)/2 • Costs at least w(T)/2 • from (■) to leaves: • Costs at least as much as searching inside the trees Tij, namely ∑i,jOPT(Tij) D*
Efficient implementation • Most steps take linear time • In order to find a good strategy, the algorithm uses sorting of weights • Use linear time approximate sorting • The algorithm can be implemented in linear time
Conclusions • First constant-factor approximation for searching in trees (average-case) • Linear running time • Open questions • Is searching in trees polynomially solvable? • Improved approximations for more general posets