1 / 55

Optimal Website Design: Natural Navigation and Timely Access to Information

This article discusses a sextic algorithm for website design, focusing on creating a good website with natural navigation and timely access to information. It explores the use of a transitive closure subgraph, constrained subtree selection, and constraint-free graphs to optimize website design.

brianlong
Download Presentation

Optimal Website Design: Natural Navigation and Timely Access to Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A sextic algorithm for website design Brent Heeringa (heeringa@cs.umass.edu) (Joint work with Micah Adler) 21 October 2004 Union College

  2. Knives Maker Type Wüstof Henkels paring chef bread steak 0.26 0.33 0.27 0.14 A website design problem(for example: a new kitchen store) • Given products, their popularity, and their organization: • How do we create a good website? • Navigation is natural • Access to information is timely

  3. Transitive Closure Subgraph of TC Good website: Natural Navigation • Organization is a DAG • TC of DAG enumerates all viable categorical relationships and introduces shortcuts • Subgraph of TC preserves logical relationship between categories TC A B C A B C

  4. Good website: Timely Access to Info • Two obstacles to finding info quickly • Time scanning a page for correct link • Time descending the DAG • Associate a cost with each obstacle • Page cost (function of out-degree of node) • Path cost (sum of page costs on path) • Good access structure: • Minimize expected path cost • Optimal subgraph is always a full tree 1/2 Page Cost = # links Path Cost = 3+2=5 Weighted Path Cost = 5/2

  5. Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) •  is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 (x)=x

  6. Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) •  is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) (x)=x Cost:4

  7. Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) •  is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) 5(1/4) (x)=x Cost:4

  8. Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) •  is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) 5(1/4) 5(1/4) (x)=x Cost:4

  9. Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) •  is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) 5(1/4) 5(1/4) 3(1/4) (x)=x Cost:4

  10. Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) •  is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 1/4(3+5+5+3) = 1/4(16) = 4 (x)=x Cost:4

  11. Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) •  is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/2 1/6 1/6 1/6 (x)=x Cost: 3 1/2

  12. Constraint-Free Graphs and k-favorability • Constraint-Free Graph • Every directed, full tree with n leaves is a subtree of the TC • CSS is no longer constrained by the graph • k-favorable degree cost • Fix . There exists k>1 for any constraint-free instance of CSS under  where an optimal tree has maximal out-degree k

  13. Linear Degree Cost - (x)=x • 5 paths w/ cost 5 • 3 paths w/ cost 5 • 2 paths w/ cost 4 • Unweighted path costs are all less, so weighted path costs must all be less • Generalization to n>6 paths is straightforward

  14. Linear Degree Cost - (x)=x • 4 paths w/ cost 4 • 4 paths w/ cost 4

  15. Linear Degree Cost - (x)=x > 1/2 • Prefer binary structure when a leaf has at least • half the mass • Prefer ternary structure when mass is • uniformly distributed • CSS with 2-favorable degree costs and C.F. graphs is Huffman coding problem • Examples: quadratic, exp, ceiling of log

  16. Results • Complexity: NP-Complete for equal weights and many  • Sufficient condition on  • Hardness depends on constraint graph • Highlighted Algorithm: • Theorem: O(n6)-time DP algorithm • (x)=x and G is constraint free • Other results: • Characterizations of optimal trees for uniform probability distributions • Theorem: poly-time constant-approximation: • ≥1 and k-favorable; G has constant out-degree • Approximate Hotlink Assignment - [Kranakis et. al]

  17. Related Work • Adaptive Websites [Perkowitz & Etzioni] • Challenge to the AI community • Novel views of websites: Page synthesis problem • Hotlink Assignment [Kranakis, Krizanc, Shende, et. al.] • Add 1 hotlink per page to minimize expected distance from root to leaves • Recently: pages have cost proportional to their size • Hotlinks don’t change page cost • Optimal Prefix-Free Codes [Golin & Rote] • Min code for n words with r symbols where symbol ai has cost ci • Resembles CSS without a constraint graph

  18. Dynamic Programming Review • Problems which exhibit: • Optimal substructure • An optimal sol. may be written in terms of opt. solutions to subproblems • Inductive definition • Overlapping subproblems • Different problem instances share subproblems • Repeated computation

  19. Dynamic Programming: Fib 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, … Problem: What is the ith Fibonacci number? • Optimal substructure (inductive definition) • Overlapping subproblems • Fib(7) = Fib(6) + Fib(5) (but Fib(6) calls Fib(5)) • We only need to calculate Fib(5) once • Don’t repeat computations • Idea: Store solutions to subproblems in a table Fib(0) = 0 Fib(1) = 1 Fib(i) = Fib(i-1) + Fib(i-2)

  20. Dynamic Programming: Fib • General Approach • Write inductive definition • Range of parameters in definition defines table size • Fill in table using definition • Analysis: (Table size) * (# of lookups) Fib(0) = 0 Fib(1) = 1 Fib(i) = Fib(i-1) + Fib(i-2) Fib(14) : 0 ≤ i ≤ 14 … Fib(i): 0 1 1 2 3 5 8 144 233 377 i: 0 1 2 3 4 5 6 12 13 14

  21. Dynamic Programming: Subset Sum Subset Sum (SS): Given a set of n positive integers X=(x1,…,xn) and a positive integer T, is there a subset of X which sums to T? • Example: X={2, 3, 5, 9, 10, 15, 17} and T=28

  22. Dynamic Programming: Subset Sum Subset Sum (SS): Given a set of n positive integers X=(x1,…,xn) and a positive integer T, is there a subset of X which sums to T? • Example: X={2, 3, 5, 9, 10, 15, 17} and T=28 • Yes: {2, 9, 17} and {3, 10, 15}

  23. Dynamic Programming: Subset Sum Subset Sum (SS): Given a set of n positive integers X=(x1,…,xn) and a positive integer T, is there a subset of X which sums to T? • Example: X={2, 3, 5, 9, 10, 15, 17} and T=28 • Yes: {2, 9, 17} and {3, 10, 15} • Inductive definition: Let Xi = (x1,…,xi) = the first i integers of X SS(t,i) = TRUE if there is a subset of Xi which sums to t = FALSE, otherwise

  24. Dynamic Programming Review The ith element is in the subset SS(0,i) = TRUE SS(t,0) = FALSE SS(t,i) = SS(t-xi,i-1) OR SS(t,i-1) The ith element is not in the subset T … Parameter Range: 0 ≤ t ≤ T 0 ≤ I ≤ n … … n (t,i) … • Table Size: T*n • Each cell – (t,i) – depends on 2 other cells • O(Tn) time for SS

  25. Recall: (x)=x (3-favorable) and G is constraint free Node level = path cost Adding an edge increases level Grow lopsided trees level by level Lopsided Trees

  26. Lopsided Trees

  27. Lopsided Trees

  28. Lopsided Trees

  29. Lopsided Trees • We know exact cost of tree up to the current level i: • Exact cost of m leaves • Remaining n-m leaves must have path-cost at least i

  30. Lopsided Trees: Cost • Exact cost of C: 3 • (1/3)=1 • Remaining mass up to level 4: (2/3) • 4 = 8/3 • Total: 1+8/3=11/3

  31. Lopsided Trees: Cost • Tree cost at Level 5 in terms of Tree cost at Level 4: • Add in the mass of remaining leaves • Cost at Level 5: • No new leaves • 11/3+2/3=13/3 • Cost updates don’t depend on level

  32. Lopsided Trees

  33. Lopsided Trees

  34. Lopsided Trees • Equality on trees: • Equal number of leaves at or above frontier • Equal number of leaves at each relative level below frontier • Nodes have outdegree ≤ 3 • Node below frontier ≤ (3)=3 • (m;l1, l2, l3) = signature • Example Signature: (2; 3, 2, 0) • 2: C and F are leaves • 3: G, H, I are 1 level past the frontier • 2: J and K are 2 levels past the frontier • Signature if F is interior node with 3 children?

  35. Inductive Definition • Let CSS(m,l1,l2,l3) = min cost tree with sig (m;l1, l2, l3) • Can we define CSS(m,l1,l2,l3) in terms of optimal solutions to subproblems? • Which trees, when grown by one level, have sig (m;l1,l2,l3)? • Which parent sigs (m’;l’1,l’2,l’3) lead to the child sigs (m;l1,l2,l3)

  36. Different Signatures (2; 2, 0, 0) (0; 4, 0, 0)

  37. Same Signature (2; 0, 2, 3) Different signatures lead to (2; 0, 2, 3)

  38. The other direction(which signatures can a tree grow) Sig: (0; 2, 0, 0) • Growing a tree only affects frontier • Only l1 affects next level • Choose # of leaves • The remaining nodes are internal • Choose degree-2 (d2) • Remaining nodes are degree-3 (d3) • O(n2) choices Sig: (1; 0, 0, 3)

  39. The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) PARENT CHILD

  40. The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 interior nodes in (m,l1,l2,l3)) • Let’s determine the values of the remaining variables 1 1 2 2 3 d2 nodes l’1 nodes 3

  41. The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The old number of leaves Internal nodes of degree 2 1 2 m = m’ + l’1- d2 - d3 3 Nodes at one level below the frontier Internal nodes of degree 3 The new number of leaves

  42. The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The old number of leaves Internal nodes of degree 2 1 m = m’ + l’1- d2 - l3/3 2 3 Nodes at one level below the frontier Internal nodes of degree 3 The new number of leaves

  43. The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The old number of nodes at 2 levels below the frontier New nodes one level below the frontier l’2 = l1

  44. The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The new number of nodes 2 levels below the frontier d2 nodes are binary so they contribute 2d2 to the frontier l2 = l3+2d2

  45. The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • l’1 and d2 are sufficient • l’1 and d2 are both O(n) • O(n2) possibilities for (m’;l’1,l’2,l’3) • CSS(m,l1,l2,l3) = min cost tree with sig. (m;l1, l2, l3) = CSS(m’,l’1,l’2,l’3) + cm’ for 1≤d2≤l’1≤n (cm’ are the smallest n-m’ weights) • CSS(n,0,0,0) = cost of optimal tree • Analysis: • Table size = O(n4) • Each cell takes O(n2) lookups • O(n6) algorithm

  46. Some Observations • Generalize algorithm: • Theorem: O(n(k)+k)-time DP algorithm •  is positive, integer-valued, non-decreasing, k-favorable and G is constraint free • Signatures = (k)+1 vectors • Table size = (k)+1 • Each cell requires k-1 lookups

  47. (extra slides follow)

More Related